* Adding content

* Update en.json

* Update README.md

* Update TRANSLATIONS.md

* Adding lesson tempolates

* Fixing code files with each others code in

* Update README.md

* Adding lesson 16

* Adding virtual camera

* Adding Wio Terminal camera capture

* Adding wio terminal code

* Adding SBC classification to lesson 16

* Adding challenge, review and assignment

* Adding images and using new Azure icons

* Update README.md

* Update iot-reference-architecture.png

* Adding structure for JulyOT links

* Removing icons

* Sketchnotes!

* Create lesson-1.png

* Starting on lesson 18

* Updated sketch

* Adding virtual distance sensor

* Adding Wio Terminal image classification

* Update README.md

* Adding structure for project 6 and wio terminal distance sensor

* Adding some of the smart timer stuff

* Updating sketchnotes

* Adding virtual device speech to text

* Adding chapter 21
pull/83/head
Jim Bennett 4 years ago committed by GitHub
parent a648c9ec37
commit 00502d7ed3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,8 +1,8 @@
# A deeper dive into IoT
Add a sketchnote if possible/appropriate
![A sketchnote overview of this lesson](../../../sketchnotes/lesson-2.png)
![Embed a video here if available](video-url)
> Sketchnote by [Nitya Narasimhan](https://github.com/nitya). Click the image for a larger version.
## Pre-lecture quiz

@ -19,7 +19,7 @@ Once you have temperature data, you can use the Jupyter Notebook in this repo to
1. Install some pip packages for Jupyter notebooks, along with libraries needed to manage and plot the data:
```sh
pip install -U pip
pip install --upgrade pip
pip install pandas
pip install matplotlib
pip install jupyter

@ -178,7 +178,7 @@ To use the Azure CLI, first it must be installed on your PC or Mac.
az account set --subscription <SubscriptionId>
```
Replace `<SubscriptionId>` with the Id of hte subscription you want to use. After running this command, re-run the command to list your accounts. You will see the `IsDefault` column will be marked as `True` for the subscription you have just set.
Replace `<SubscriptionId>` with the Id of the subscription you want to use. After running this command, re-run the command to list your accounts. You will see the `IsDefault` column will be marked as `True` for the subscription you have just set.
### Task - create a resource group

@ -4,16 +4,16 @@ from grove.grove_relay import GroveRelay
import json
from azure.iot.device import IoTHubDeviceClient, Message, MethodResponse
connection_string = "<connection_string>"
connection_string = '<connection_string>'
adc = ADC()
relay = GroveRelay(5)
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
print("Connecting")
print('Connecting')
device_client.connect()
print("Connected")
print('Connected')
def handle_method_request(request):
print("Direct method received - ", request.name)
@ -32,7 +32,7 @@ while True:
soil_moisture = adc.read(0)
print("Soil moisture:", soil_moisture)
message = Message(json.dumps({ "soil_moisture": soil_moisture }))
message = Message(json.dumps({ 'soil_moisture': soil_moisture }))
device_client.send_message(message)
time.sleep(10)

@ -7,16 +7,16 @@ from counterfit_shims_grove.grove_relay import GroveRelay
import json
from azure.iot.device import IoTHubDeviceClient, Message, MethodResponse
connection_string = "<connection_string>"
connection_string = '<connection_string>'
adc = ADC()
relay = GroveRelay(5)
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
print("Connecting")
print('Connecting')
device_client.connect()
print("Connected")
print('Connected')
def handle_method_request(request):
print("Direct method received - ", request.name)
@ -35,7 +35,7 @@ while True:
soil_moisture = adc.read(0)
print("Soil moisture:", soil_moisture)
message = Message(json.dumps({ "soil_moisture": soil_moisture }))
message = Message(json.dumps({ 'soil_moisture': soil_moisture }))
device_client.send_message(message)
time.sleep(10)

@ -43,9 +43,9 @@ The next step is to connect your device to IoT Hub.
```python
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
print("Connecting")
print('Connecting')
device_client.connect()
print("Connected")
print('Connected')
```
1. Run this code. You will see your device connect.
@ -66,7 +66,7 @@ Now that your device is connected, you can send telemetry to the IoT Hub instead
1. Add the following code inside the `while True` loop, just before the sleep:
```python
message = Message(json.dumps({ "soil_moisture": soil_moisture }))
message = Message(json.dumps({ 'soil_moisture': soil_moisture }))
device_client.send_message(message)
```

@ -13,9 +13,9 @@ x509 = X509("./soil-moisture-sensor-x509-cert.pem", "./soil-moisture-sensor-x509
device_client = IoTHubDeviceClient.create_from_x509_certificate(x509, host_name, device_id)
print("Connecting")
print('Connecting')
device_client.connect()
print("Connected")
print('Connected')
def handle_method_request(request):
print("Direct method received - ", request.name)
@ -34,7 +34,7 @@ while True:
soil_moisture = adc.read(0)
print("Soil moisture:", soil_moisture)
message = Message(json.dumps({ "soil_moisture": soil_moisture }))
message = Message(json.dumps({ 'soil_moisture': soil_moisture }))
device_client.send_message(message)
time.sleep(10)

@ -16,9 +16,9 @@ x509 = X509("./soil-moisture-sensor-x509-cert.pem", "./soil-moisture-sensor-x509
device_client = IoTHubDeviceClient.create_from_x509_certificate(x509, host_name, device_id)
print("Connecting")
print('Connecting')
device_client.connect()
print("Connected")
print('Connected')
def handle_method_request(request):
print("Direct method received - ", request.name)
@ -37,7 +37,7 @@ while True:
soil_moisture = adc.read(0)
print("Soil moisture:", soil_moisture)
message = Message(json.dumps({ "soil_moisture": soil_moisture }))
message = Message(json.dumps({ 'soil_moisture': soil_moisture }))
device_client.send_message(message)
time.sleep(10)

@ -4,7 +4,7 @@ import pynmea2
import json
from azure.iot.device import IoTHubDeviceClient, Message
connection_string = "<connection_string>"
connection_string = '<connection_string>'
serial = serial.Serial('/dev/ttyAMA0', 9600, timeout=1)
serial.reset_input_buffer()
@ -12,9 +12,9 @@ serial.flush()
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
print("Connecting")
print('Connecting')
device_client.connect()
print("Connected")
print('Connected')
def printGPSData(line):
msg = pynmea2.parse(line)

@ -7,15 +7,15 @@ import pynmea2
import json
from azure.iot.device import IoTHubDeviceClient, Message
connection_string = "<connection_string>"
connection_string = '<connection_string>'
serial = counterfit_shims_serial.Serial('/dev/ttyAMA0')
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
print("Connecting")
print('Connecting')
device_client.connect()
print("Connected")
print('Connected')
def send_gps_data(line):
msg = pynmea2.parse(line)

@ -4,16 +4,16 @@ from grove.grove_relay import GroveRelay
import json
from azure.iot.device import IoTHubDeviceClient, Message, MethodResponse
connection_string = "<connection_string>"
connection_string = '<connection_string>'
adc = ADC()
relay = GroveRelay(5)
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
print("Connecting")
print('Connecting')
device_client.connect()
print("Connected")
print('Connected')
def handle_method_request(request):
print("Direct method received - ", request.name)
@ -32,7 +32,7 @@ while True:
soil_moisture = adc.read(0)
print("Soil moisture:", soil_moisture)
message = Message(json.dumps({ "soil_moisture": soil_moisture }))
message = Message(json.dumps({ 'soil_moisture': soil_moisture }))
device_client.send_message(message)
time.sleep(10)

@ -7,16 +7,16 @@ from counterfit_shims_grove.grove_relay import GroveRelay
import json
from azure.iot.device import IoTHubDeviceClient, Message, MethodResponse
connection_string = "<connection_string>"
connection_string = '<connection_string>'
adc = ADC()
relay = GroveRelay(5)
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
print("Connecting")
print('Connecting')
device_client.connect()
print("Connected")
print('Connected')
def handle_method_request(request):
print("Direct method received - ", request.name)
@ -35,7 +35,7 @@ while True:
soil_moisture = adc.read(0)
print("Soil moisture:", soil_moisture)
message = Message(json.dumps({ "soil_moisture": soil_moisture }))
message = Message(json.dumps({ 'soil_moisture': soil_moisture }))
device_client.send_message(message)
time.sleep(10)

@ -23,6 +23,10 @@ In this lesson we'll cover:
* [Using developer devices to simulate multiple IoT devices](#using-developer-devices-to-simulate-multiple-iot-devices)
* [Moving to production](#moving-to-production)
> 🗑 This is the last lesson in this project, so after completing this lesson and the assignment, don't forget to clean up your cloud services. You will need the services to complete the assignment, so make sure to complete that first.
>
> Refer to [the clean up your project guide](../../../clean-up.md) if necessary for instructions on how to do this.
## Architect complex IoT applications
IoT applications are made up of many components. This includes a variety of things, and a variety of internet services.

@ -0,0 +1,5 @@
.pio
.vscode/.browse.c_cpp.db*
.vscode/c_cpp_properties.json
.vscode/launch.json
.vscode/ipch

@ -0,0 +1,7 @@
{
// See http://go.microsoft.com/fwlink/?LinkId=827846
// for the documentation about the extensions.json format
"recommendations": [
"platformio.platformio-ide"
]
}

@ -0,0 +1,39 @@
This directory is intended for project header files.
A header file is a file containing C declarations and macro definitions
to be shared between several project source files. You request the use of a
header file in your project source file (C, C++, etc) located in `src` folder
by including it, with the C preprocessing directive `#include'.
```src/main.c
#include "header.h"
int main (void)
{
...
}
```
Including a header file produces the same results as copying the header file
into each source file that needs it. Such copying would be time-consuming
and error-prone. With a header file, the related declarations appear
in only one place. If they need to be changed, they can be changed in one
place, and programs that include the header file will automatically use the
new version when next recompiled. The header file eliminates the labor of
finding and changing all the copies as well as the risk that a failure to
find one copy will result in inconsistencies within a program.
In C, the usual convention is to give header files names that end with `.h'.
It is most portable to use only letters, digits, dashes, and underscores in
header file names, and at most one dot.
Read more about using header files in official GCC documentation:
* Include Syntax
* Include Operation
* Once-Only Headers
* Computed Includes
https://gcc.gnu.org/onlinedocs/cpp/Header-Files.html

@ -0,0 +1,46 @@
This directory is intended for project specific (private) libraries.
PlatformIO will compile them to static libraries and link into executable file.
The source code of each library should be placed in a an own separate directory
("lib/your_library_name/[here are source files]").
For example, see a structure of the following two libraries `Foo` and `Bar`:
|--lib
| |
| |--Bar
| | |--docs
| | |--examples
| | |--src
| | |- Bar.c
| | |- Bar.h
| | |- library.json (optional, custom build options, etc) https://docs.platformio.org/page/librarymanager/config.html
| |
| |--Foo
| | |- Foo.c
| | |- Foo.h
| |
| |- README --> THIS FILE
|
|- platformio.ini
|--src
|- main.c
and a contents of `src/main.c`:
```
#include <Foo.h>
#include <Bar.h>
int main (void)
{
...
}
```
PlatformIO Library Dependency Finder will find automatically dependent
libraries scanning project source files.
More information about PlatformIO Library Dependency Finder
- https://docs.platformio.org/page/librarymanager/ldf.html

@ -0,0 +1,16 @@
; PlatformIO Project Configuration File
;
; Build options: build flags, source filter
; Upload options: custom upload port, speed and extra flags
; Library options: dependencies, extra library storages
; Advanced options: extra scripting
;
; Please visit documentation for the other options and examples
; https://docs.platformio.org/page/projectconf.html
[env:seeed_wio_terminal]
platform = atmelsam
board = seeed_wio_terminal
framework = arduino
lib_deps =
seeed-studio/Grove Ranging sensor - VL53L0X @ ^1.1.1

@ -0,0 +1,31 @@
#include <Arduino.h>
#include "Seeed_vl53l0x.h"
Seeed_vl53l0x VL53L0X;
void setup()
{
Serial.begin(9600);
while (!Serial)
; // Wait for Serial to be ready
delay(1000);
VL53L0X.VL53L0X_common_init();
VL53L0X.VL53L0X_high_accuracy_ranging_init();
}
void loop()
{
VL53L0X_RangingMeasurementData_t RangingMeasurementData;
memset(&RangingMeasurementData, 0, sizeof(VL53L0X_RangingMeasurementData_t));
VL53L0X.PerformSingleRangingMeasurement(&RangingMeasurementData);
Serial.print("Distance = ");
Serial.print(RangingMeasurementData.RangeMilliMeter);
Serial.println(" mm");
delay(1000);
}

@ -0,0 +1,11 @@
This directory is intended for PlatformIO Unit Testing and project tests.
Unit Testing is a software testing method by which individual units of
source code, sets of one or more MCU program modules together with associated
control data, usage procedures, and operating procedures, are tested to
determine whether they are fit for use. Unit testing finds problems early
in the development cycle.
More information about PlatformIO Unit Testing:
- https://docs.platformio.org/page/plus/unit-testing.html

@ -89,7 +89,7 @@ Program the device.
Distance = 151 mm
```
The rangefinder is on the back of the sensor, so make sure you use hte correct side when measuring distance.
The rangefinder is on the back of the sensor, so make sure you use the correct side when measuring distance.
![The rangefinder on the back of the time of flight sensor pointing at a banana](../../../images/time-of-flight-banana.png)

@ -38,3 +38,62 @@ The Wio Terminal can now be programmed to use the attached time of flight sensor
1. Create a brand new Wio Terminal project using PlatformIO. Call this project `distance-sensor`. Add code in the `setup` function to configure the serial port.
1. Add a library dependency for the Seeed Grove time of flight distance sensor library to the projects `platformio.ini` file:
```ini
lib_deps =
seeed-studio/Grove Ranging sensor - VL53L0X @ ^1.1.1
```
1. In `main.cpp`, add the following below the existing include directives to declare an instance of the `Seeed_vl53l0x` class to interact with the time of flight sensor:
```cpp
#include "Seeed_vl53l0x.h"
Seeed_vl53l0x VL53L0X;
```
1. Add the following to the bottom of the `setup` function to initialize the sensor:
```cpp
VL53L0X.VL53L0X_common_init();
VL53L0X.VL53L0X_high_accuracy_ranging_init();
```
1. In the `loop` function, read a value from the sensor:
```cpp
VL53L0X_RangingMeasurementData_t RangingMeasurementData;
memset(&RangingMeasurementData, 0, sizeof(VL53L0X_RangingMeasurementData_t));
VL53L0X.PerformSingleRangingMeasurement(&RangingMeasurementData);
```
This code initializes a data structure to read data into, then passes it into the `PerformSingleRangingMeasurement` method where it will be populated with the distance measurement.
1. Below this, write out the distance measurement, then delay for 1 second:
```cpp
Serial.print("Distance = ");
Serial.print(RangingMeasurementData.RangeMilliMeter);
Serial.println(" mm");
delay(1000);
```
1. Build, upload and run this code. You will be able to see distance measurements with the serial monitor. Position objects near the sensor and you will see the distance measurement:
```output
Distance = 29 mm
Distance = 28 mm
Distance = 30 mm
Distance = 151 mm
```
The rangefinder is on the back of the sensor, so make sure you use the correct side when measuring distance.
![The rangefinder on the back of the time of flight sensor pointing at a banana](../../../images/time-of-flight-banana.png)
> 💁 You can find this code in the [code-proximity/wio-terminal](code-proximity/wio-terminal) folder.
😀 Your proximity sensor program was a success!

@ -1,5 +1,20 @@
# Consumer IoT - build a smart voice assistant
The fod has been grown, driven to a processing plant, sorted for quality, sold in the store and now it's time to cook! One of the core pieces of any kitchen is a timer. Initially these started as simple hour glasses - your food was cooked when all the sand trickled down into the bottom bulb. They then went clockwork, then electric.
The latest iterations are now part of our smart devices. In kitchens all throughout the world you'll head chefs shouting "Hey Siri - set a 10 minute timer", or "Alexa - cancel my bread timer". No longer do you have to walk back to the kitchen to check on a timer, you can do it from your phone, or a call out across the room.
In these 4 lessons you'll learn how to build a smart timer, using AI to recognize your voice, understand what you are asking for, and reply with information about your timer. You'll also add support for multiple languages.
> 💁 These lessons will use some cloud resources. If you don't complete all the lessons in this project, make sure you [Clean up your project](../clean-up.md).
## Topics
1. [Recognize speech with an IoT device](./lessons/1-speech-recognition/README.md)
1. [Understand language](./lessons/2-language-understanding/README.md)
1. [Provide spoken feedback](./lessons/3-spoken-feedback/README.md)
1. [Support multiple languages](./lessons/4-multiple-language-support/README.md)
## Credits
All the lessons were written with ♥️ by [Jim Bennett](https://GitHub.com/JimBobBennett)

@ -0,0 +1,223 @@
# Recognize speech with an IoT device
Add a sketchnote if possible/appropriate
![Embed a video here if available](video-url)
## Pre-lecture quiz
[Pre-lecture quiz](https://brave-island-0b7c7f50f.azurestaticapps.net/quiz/33)
## Introduction
'Alexa, set a 12 minute timer'
'Alexa, timer status'
'Alexa set a 8 minute timer called steam broccoli'
Smart devices are becoming more and more pervasive. Not just as smart speakers like HomePods, Echos and Google Homes, but embedded in our phones, watches, and even light fittings and thermostats.
> 💁 I have at least 19 devices in my home that have voice assistants, and that's just the ones I know about!
Voice control increases accessibility by allowing folks with limited movement to interact with devices. Whether it is a permanent disability such as being born without arms, to temporary disabilities such as broken arms, or having your hands full of shopping or young children, being able to control our houses from our voice instead of our hands opens up a world of access. Shouting 'Hey Siri, close my garage door' whilst dealing with a baby change and an unruly toddler can be a small but effective improvement on life.
One of the more popular uses for voice assistants is setting timers, especially kitchen timers. Being able to set multiple timers with just your voice is a great help in the kitchen - no need to stop kneading dough, stirring soup, or clean dumpling filling off your hands to use a physical timer.
In this lesson you will learn about building voice recognition into IoT devices. You'll learn about microphones as sensors, how to capture audio from a microphone attached to an IoT device, and how to use AI to convert what is heard into text. Throughout the rest of this project you will build a smart kitchen timer, able to set timers using your voice with multiple languages.
In this lesson we'll cover:
* [Microphones](#microphones)
* [Capture audio from your IoT device](#capture-audio-from-your-iot-device)
* [Speech to text](#speech-to-text)
* [Convert speech to text](#convert-speech-to-text)
## Microphones
Microphones are analog sensors that convert sound waves into electrical signals. Vibrations in air cause components in the microphone to move tiny amounts, and these cause tiny changes in electrical signals. These changes are then amplified to generate an electrical output.
### Microphone types
Microphones come in a variety of types:
* Dynamic - Dynamic microphones have magnet attached to a moving diaphragm that moves in a coil of wire creating an electrical current. This is the opposite of most loudspeakers, that use an electrical current to move a magnet in a coil of wire, moving a diaphragm to create sound. This means speakers can be used a dynamic microphones, and dynamic microphones can be used as speakers. In devices such as intercoms where a user is either listening or speaking, but not both, one device can act as both a speaker and a microphone.
Dynamic microphones don't need power to work, the electrical signal is created entirely from the microphone.
![Patti Smith singing into a Shure SM58 (dynamic cardioid type) microphone](../../../images/dynamic-mic.jpg)
***Beni Köhler / [Creative Commons Attribution-Share Alike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/deed.en)***
* Ribbon - Ribbon microphones are similar to dynamic microphones, except they have a metal ribbon instead of a diaphragm. This ribbon moves in a magnetic field generating an electrical current. Like dynamic microphones, ribbon microphones don't need power to work.
![Edmund Lowe, American actor, standing at radio microphone (labeled for (NBC) Blue Network), holding script, 1942](../../../images/ribbon-mic.jpg)
* Condenser - Condenser microphones have a thin metal diaphragm and a fixed metal backplate. Electricity is applied to both of these and as the diaphragm vibrates the static charge between the plates changes generating a signal. Condenser microphones need power to work - called *Phantom power*.
![C451B small-diaphragm condenser microphone by AKG Acoustics](../../../images/condenser-mic.jpg)
***[Harumphy](https://en.wikipedia.org/wiki/User:Harumphy) at [en.wikipedia](https://en.wikipedia.org/) / [Creative Commons Attribution-Share Alike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/deed.en)***
* MEMS - Microelectromechanical systems microphones, or MEMS, are microphones on a chip. They have a pressure sensitive diaphragm etched onto a silicon chip, and work similar to a condenser microphone. These microphones can be tiny, and integrated into circuitry.
![A MEMS microphone on a circuit board](../../../images/mems-microphone.png)
In the image above, the chip labelled **LEFT** is a MEMS microphone, with a tiny diaphragm less than a millimeter wide.
✅ Do some research: What microphones do you have around you - either in your computer, your phone, your headset or in other devices. What type of microphones are they?
### Digital audio
Audio is an analog signal carrying very fine-grained information. To convert this signal to digital, the audio needs to be sampled many thousands of times a second.
> 🎓 Sampling is converting the audio signal into a digital value that represents the signal at that point in time.
![A line chart showing a signal, with discrete points at fixed intervals](../../../images/sampling.png)
Digital audio is sampled using Pulse Code Modulation, or PCM. PCM involves reading the voltage of the signal, and selecting the closest discrete value to that voltage using a defined size.
> 💁 You can think of PCM as the sensor version of pulse width modulation, or PWM (PWM was covered back in [lesson 3 of the getting started project](../../../1-getting-started/lessons/3-sensors-and-actuators/README.md#pulse-width-modulation)). PCM involves converting an analog signal to digital, PWM involves converting a digital signal to analog.
For example most streaming music services offer 16-bit or 24-bit audio. This means they convert the voltage into a value that fits into a 16-bit integer, or 24-bit integer. 16-bit audio fits the value into a number ranging from -32,768 to 32,767, 24-bit is in the range 8,388,608 to 8,388,607. The more bits, the closer the sample is to what our ears actually hear.
> 💁 You may have hard of 8-bit audio, often referred to as LoFi. This is audio sampled using only 8-bits, so -128 to 127. The first computer audio was limited to 8 bits due to hardware limitations, so this is often seen in retro gaming.
These samples are taken many thousands of times per second, using well-defined sample rates measured in KHz (thousands of readings per second). Streaming music services use 48KHz for most audio, but some 'loseless' audio uses up to 96KHz or even 192KHz. The higher the sample rate, the closer to the original the audio will be, up to a point. There is debate whether humans can tell the difference above 48KHz.
✅ Do some research: If you use a streaming music service, what sample rate and size does it use? If you use CDs, what is the sample rate and size of CD audio?
## Capture audio from your IoT device
Your IoT device can be connected to a microphone to capture audio, ready for conversion to text. It can also be connected to speakers to output audio. In later lessons this will be used to give audio feedback, but it is useful to set up speakers now to test the microphone.
### Task - configure your microphone and speakers
Work through the relevant guide to configure the microphone and speakers for your IoT device:
* [Arduino - Wio Terminal](wio-terminal-microphone.md)
* [Single-board computer - Raspberry Pi](pi-microphone.md)
* [Single-board computer - Virtual device](virtual-device-microphone.md)
### Task - capture audio
Work through the relevant guide to capture audio on your IoT device:
* [Arduino - Wio Terminal](wio-terminal-audio.md)
* [Single-board computer - Raspberry Pi](pi-audio.md)
* [Single-board computer - Virtual device](virtual-device-audio.md)
## Speech to text
Speech to text, or speech recognition, involves using AI to convert words in an audio signal to text.
### Speech recognition models
To convert speech to text, samples from the audio signal are grouped together and fed into a machine learning model based around a Recurrent Neural network (RNN). This is a type of machine learning model that can use previous data to make a decision about incoming data. For example, the RNN could detect one block of audio samples as the sound 'Hel', and when it receives another that it thinks is the sound 'lo', it can combine this with the previous sound, see that 'Hello' is a valid word and select that as the outcome.
ML models always accept data of the same size every time. The image classifier you built in an earlier lesson resizes images to a fixed size and processes them. The same with speech models, they have to process fixed sized audio chunks. The speech models need to be able to combine the outputs of multiple predictions to get the answer, to allow it to distinguish between 'Hi' and 'Highway', or 'flock' and 'floccinaucinihilipilification'.
Speech models are also advanced enough to understand context, and can correct the words they detect as more sounds are processed. For example, if you say "I went to the shops to get two bananas and an apple too", you would use three words that sound the same, but are spelled differently - to, two and too. Speech models are able to understand the context and use the appropriate spelling of the word.
> 💁 Some speech services allow customization to make them work better in noisy environments such as factories, or with industry-specific words such as chemical names. These customizations are trained by providing sample audio and a transcription, and work using transfer learning, the same as how you trained an image classifier using only a few images in an earlier lesson.
### Privacy
When using speech to text in a consumer IoT device, privacy is incredibly important. These devices listen to audio continuously, so as a consumer you don't want everything you say being sent to the cloud and converted to text. Not only will this use a lot of Internet bandwidth, it also has massive privacy implications, especially when some smart device makers randomly select audio for [humans to validate against the text generated to help improve their model](https://www.theverge.com/2019/4/10/18305378/amazon-alexa-ai-voice-assistant-annotation-listen-private-recordings).
You only want your smart device to send audio to the cloud for processing when you are using it, not when it hears audio in your home, audio that could include private meetings or intimate interactions. The way most smart devices work is with a *wake word*, a key phrase such as "Alexa", "Hey Siri", or "OK Google" that causes the device to 'wake up' and listen to what you are saying up until it detects a break in your speech, indicating you have finished talking to the device.
> 🎓 Wake word detection is also referred to as *Keyword spotting* or *Keyword recognition*.
These wake words are detected on the device, not in the cloud. These smart devices have small AI models that run on the device that listen for the wake work, and when it is detected, start streaming the audio to the cloud for recognition. These models are very specialized, and just listen for the wake word.
> 💁 Some tech companies are adding more privacy to their devices and doing some of the speech to text conversion on the device. Apple have announced that as part of their 2021 iOS and macOS updates they will support the speech to text conversion on device, and be able to handle many requests without needing to use the cloud. This is thanks to having powerful processors in their devices that can run ML models.
✅ What do you think are the privacy and ethical implications of storing the audio sent to the cloud? Should this audio be stored, and if so, how? Do you thing the use of recordings for law enforcement is a good trade off for the loss of privacy?
Wake word detection usually uses a technique know an TinyML, that is converting ML models to be able to run on microcontrollers. These models are small in size, and consume very little power to run.
To avoid the complexity of training and using a wake word model, the smart timer you are building in this lesson will use a button to turn on the speech recognition.
> 💁 If you want to try creating a wake word detection model to run on the Wio Terminal or Raspberry Pi, check out this [Responding to your voice tutorial by Edge Impulse](https://docs.edgeimpulse.com/docs/responding-to-your-voice). If you want to use your computer to do this, you can try the [Get started with Custom Keyword quickstart on the Microsoft docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/keyword-recognition-overview?WT.mc_id=academic-17441-jabenn).
## Convert speech to text
Just like with image classification in the last project, there are pre-built AI services that can take speech as an audio file and convert it to text. Once such service is the Speech Service, part of the Cognitive Services, pre-built AI services you can use in your apps.
### Task - configure a speech AI resource
1. Create a Resource Group for this project called `smart-timer`
1. Use the following command to create a free speech resource:
```sh
az cognitiveservices account create --name smart-timer \
--resource-group smart-timer \
--kind SpeechServices \
--sku F0 \
--yes \
--location <location>
```
Replace `<location>` with the location you used when creating the Resource Group.
1. You will need an API key to access the speech resource from your code. Run the following command to get the key:
```sh
az cognitiveservices account keys list --name smart-timer \
--resource-group smart-timer \
--output table
```
Take a copy of one of the keys.
### Task - convert speech to text
Work through the relevant guide to convert speech to text on your IoT device:
* [Arduino - Wio Terminal](wio-terminal-speech-to-text.md)
* [Single-board computer - Raspberry Pi](pi-speech-to-text.md)
* [Single-board computer - Virtual device](virtual-device-speech-to-text.md)
### Task - send converted speech to an IoT services
To use the results of the speech to text conversion, you need to send it to the cloud. There it will be interpreted and responses sent back to the IoT device as commands.
1. Create a new IoT Hub in the `smart-timer` resource group, and register a new device called `smart-timer`.
1. Connect your IoT device to this IoT Hub using what you have learned in previous lessons, and send the speech as telemetry. Use a JSON document in this format:
```json
{
"speech" : "<converted speech>"
}
```
Where `<converted speech>` is the output from the speech to text call.
1. Verify that messages are being sent by monitoring the Event Hub compatible endpoint using the `az iot hub monitor-events` command.
> 💁 You can find this code in the [code-iot-hub/virtual-iot-device](code-iot-hub/virtual-iot-device), [code-iot-hub/pi](code-iot-hub/pi), or [code-iot-hub/wio-terminal](code-iot-hub/wio-terminal) folder.
---
## 🚀 Challenge
Speech recognition has been around for a long time, and is continuously improving. Research the current capabilities and see how these have evolved over time, including how accurate machine transcriptions are compared to human.
What do you think the future holds for speech recognition?
## Post-lecture quiz
[Post-lecture quiz](https://brave-island-0b7c7f50f.azurestaticapps.net/quiz/34)
## Review & Self Study
* Read about the different microphone types and how they work on the [What's the difference between dynamic and condenser microphones article on Musician's HQ](https://musicianshq.com/whats-the-difference-between-dynamic-and-condenser-microphones/).
* Read more on the Cognitive Services speech service on the [Speech service documentation on Microsoft Docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/?WT.mc_id=academic-17441-jabenn)
* Read about keyword spotting on the [Keyword recognition documentation on Microsoft Docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/keyword-recognition-overview?WT.mc_id=academic-17441-jabenn)
## Assignment
[](assignment.md)

@ -0,0 +1,9 @@
#
## Instructions
## Rubric
| Criteria | Exemplary | Adequate | Needs Improvement |
| -------- | --------- | -------- | ----------------- |
| | | | |

@ -0,0 +1,94 @@
import io
import json
import pyaudio
import requests
import time
import wave
from azure.iot.device import IoTHubDeviceClient, Message
from grove.factory import Factory
button = Factory.getButton('GPIO-HIGH', 5)
connection_string = '<connection_string>'
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
print('Connecting')
device_client.connect()
print('Connected')
audio = pyaudio.PyAudio()
microphone_card_number = 1
speaker_card_number = 1
rate = 48000
def capture_audio():
stream = audio.open(format = pyaudio.paInt16,
rate = rate,
channels = 1,
input_device_index = microphone_card_number,
input = True,
frames_per_buffer = 4096)
frames = []
while button.is_pressed():
frames.append(stream.read(4096))
stream.stop_stream()
stream.close()
wav_buffer = io.BytesIO()
with wave.open(wav_buffer, 'wb') as wavefile:
wavefile.setnchannels(1)
wavefile.setsampwidth(audio.get_sample_size(pyaudio.paInt16))
wavefile.setframerate(rate)
wavefile.writeframes(b''.join(frames))
wav_buffer.seek(0)
return wav_buffer
api_key = '<key>'
location = '<location>'
language = '<language>'
def get_access_token():
headers = {
'Ocp-Apim-Subscription-Key': api_key
}
token_endpoint = f'https://{location}.api.cognitive.microsoft.com/sts/v1.0/issuetoken'
response = requests.post(token_endpoint, headers=headers)
return str(response.text)
def convert_speech_to_text(buffer):
url = f'https://{location}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1'
headers = {
'Authorization': 'Bearer ' + get_access_token(),
'Content-Type': f'audio/wav; codecs=audio/pcm; samplerate={rate}',
'Accept': 'application/json;text/xml'
}
params = {
'language': language
}
response = requests.post(url, headers=headers, params=params, data=buffer)
response_json = json.loads(response.text)
if response_json['RecognitionStatus'] == 'Success':
return response_json['DisplayText']
else:
return ''
while True:
while not button.is_pressed():
time.sleep(.1)
buffer = capture_audio()
text = convert_speech_to_text(buffer)
if len(text) > 0:
message = Message(json.dumps({ 'speech': text }))
device_client.send_message(message)

@ -0,0 +1,33 @@
import json
import time
from azure.cognitiveservices.speech import SpeechConfig, SpeechRecognizer
from azure.iot.device import IoTHubDeviceClient, Message
api_key = '<key>'
location = '<location>'
language = '<language>'
connection_string = '<connection_string>'
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
print('Connecting')
device_client.connect()
print('Connected')
speech_config = SpeechConfig(subscription=api_key,
region=location,
speech_recognition_language=language)
recognizer = SpeechRecognizer(speech_config=speech_config)
def recognized(args):
if len(args.result.text) > 0:
message = Message(json.dumps({ 'speech': args.result.text }))
device_client.send_message(message)
recognizer.recognized.connect(recognized)
recognizer.start_continuous_recognition()
while True:
time.sleep(1)

@ -0,0 +1,61 @@
import io
import pyaudio
import time
import wave
from grove.factory import Factory
button = Factory.getButton('GPIO-HIGH', 5)
audio = pyaudio.PyAudio()
microphone_card_number = 1
speaker_card_number = 1
rate = 48000
def capture_audio():
stream = audio.open(format = pyaudio.paInt16,
rate = rate,
channels = 1,
input_device_index = microphone_card_number,
input = True,
frames_per_buffer = 4096)
frames = []
while button.is_pressed():
frames.append(stream.read(4096))
stream.stop_stream()
stream.close()
wav_buffer = io.BytesIO()
with wave.open(wav_buffer, 'wb') as wavefile:
wavefile.setnchannels(1)
wavefile.setsampwidth(audio.get_sample_size(pyaudio.paInt16))
wavefile.setframerate(rate)
wavefile.writeframes(b''.join(frames))
wav_buffer.seek(0)
return wav_buffer
def play_audio(buffer):
stream = audio.open(format = pyaudio.paInt16,
rate = rate,
channels = 1,
output_device_index = speaker_card_number,
output = True)
with wave.open(buffer, 'rb') as wf:
data = wf.readframes(4096)
while len(data) > 0:
stream.write(data)
data = wf.readframes(4096)
stream.close()
while True:
while not button.is_pressed():
time.sleep(.1)
buffer = capture_audio()
play_audio(buffer)

@ -0,0 +1,82 @@
import io
import json
import pyaudio
import requests
import time
import wave
from grove.factory import Factory
button = Factory.getButton('GPIO-HIGH', 5)
audio = pyaudio.PyAudio()
microphone_card_number = 1
speaker_card_number = 1
rate = 48000
def capture_audio():
stream = audio.open(format = pyaudio.paInt16,
rate = rate,
channels = 1,
input_device_index = microphone_card_number,
input = True,
frames_per_buffer = 4096)
frames = []
while button.is_pressed():
frames.append(stream.read(4096))
stream.stop_stream()
stream.close()
wav_buffer = io.BytesIO()
with wave.open(wav_buffer, 'wb') as wavefile:
wavefile.setnchannels(1)
wavefile.setsampwidth(audio.get_sample_size(pyaudio.paInt16))
wavefile.setframerate(rate)
wavefile.writeframes(b''.join(frames))
wav_buffer.seek(0)
return wav_buffer
api_key = '<key>'
location = '<location>'
language = '<language>'
def get_access_token():
headers = {
'Ocp-Apim-Subscription-Key': api_key
}
token_endpoint = f'https://{location}.api.cognitive.microsoft.com/sts/v1.0/issuetoken'
response = requests.post(token_endpoint, headers=headers)
return str(response.text)
def convert_speech_to_text(buffer):
url = f'https://{location}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1'
headers = {
'Authorization': 'Bearer ' + get_access_token(),
'Content-Type': f'audio/wav; codecs=audio/pcm; samplerate={rate}',
'Accept': 'application/json;text/xml'
}
params = {
'language': language
}
response = requests.post(url, headers=headers, params=params, data=buffer)
response_json = json.loads(response.text)
if response_json['RecognitionStatus'] == 'Success':
return response_json['DisplayText']
else:
return ''
while True:
while not button.is_pressed():
time.sleep(.1)
buffer = capture_audio()
text = convert_speech_to_text(buffer)
print(text)

@ -0,0 +1,22 @@
import time
from azure.cognitiveservices.speech import SpeechConfig, SpeechRecognizer
api_key = '<key>'
location = '<location>'
language = '<language>'
speech_config = SpeechConfig(subscription=api_key,
region=location,
speech_recognition_language=language)
recognizer = SpeechRecognizer(speech_config=speech_config)
def recognized(args):
print(args.result.text)
recognizer.recognized.connect(recognized)
recognizer.start_continuous_recognition()
while True:
time.sleep(1)

@ -0,0 +1,213 @@
# Capture audio - Raspberry Pi
In this part of the lesson, you will write code to capture audio on your Raspberry Pi. Audio capture will be controlled by a button.
## Hardware
The Raspberry Pi needs a button to control the audio capture.
The button you will use is a Grove button. This is a digital sensor that turns a signal on or off. These buttons can be configured to send a high signal when the button is pressed, and low when it is not, or low when pressed and high when not.
If you are using a ReSpeaker 2-Mics Pi HAT as a microphone, then there is no need to connect a button as this hat has one fitted already. Skip to the next section.
### Connect the button
The button can be connected to the Grove base hat.
#### Task - connect the button
![A grove button](../../../images/grove-button.png)
1. Insert one end of a Grove cable into the socket on the button module. It will only go in one way round.
1. With the Raspberry Pi powered off, connect the other end of the Grove cable to the digital socket marked **D5** on the Grove Base hat attached to the Pi. This socket is the second from the left, on the row of sockets next to the GPIO pins.
![The grove button connected to socket D5](../../../images/pi-button.png)
## Capture audio
You can capture audio from the microphone using Python code.
### Task - capture audio
1. Power up the Pi and wait for it to boot
1. Launch VS Code, either directly on the Pi, or connect via the Remote SSH extension.
1. The PyAudio Pip package has functions to record and play back audio. This package depends on some audio libraries that need to be installed first. Run the following commands in the terminal to install these:
```sh
sudo apt update
sudo apt install libportaudio0 libportaudio2 libportaudiocpp0 portaudio19-dev libasound2-plugins --yes
```
1. Install the PyAudio Pip package.
```sh
pip3 install pyaudio
```
1. Create a new folder called `smart-timer` and add a file called `app.py` to this folder.
1. Add the following imports to the top of this file:
```python
import io
import pyaudio
import time
import wave
from grove.factory import Factory
```
This imports the `pyaudio` module, some standard Python modules to handle wave files, and the `grove.factory` module to import a `Factory` to create a button class.
1. Below this, add code to create a Grove button.
If you are using the ReSpeaker 2-Mics Pi HAT, use the following code:
```python
# The button on the ReSpeaker 2-Mics Pi HAT
button = Factory.getButton("GPIO-LOW", 17)
```
This creates a button on port **D17**, the port that the button on the ReSpeaker 2-Mics Pi HAT is connected to. This button is set to send a low signal when pressed.
If you are not using the ReSpeaker 2-Mics Pi HAT, and are using a Grove button connected to the base hat, use this code.
```python
button = Factory.getButton("GPIO-HIGH", 5)
```
This creates a button on port **D5** that is set to send a high signal when pressed.
1. Below this, create an instance of the PyAudio class to handle audio:
```python
audio = pyaudio.PyAudio()
```
1. Declare the hardware card number for the microphone and speaker. This will be the number of the card you found by running `arecord -l` and `aplay -l` earlier in this lesson.
```python
microphone_card_number = <microphone card number>
speaker_card_number = <speaker card number>
```
Replace `<microphone card number>` with the number of your microphones card.
Replace `<speaker card number>` with the number of your speakers card, the same number you set in the `alsa.conf` file.
1. Below this, declare the sample rate to use for the audio capture and playback. You may need to change this depending on the hardware you are using.
```python
rate = 48000 #48KHz
```
If you get sample rate errors when running this code later, change this value to `44100` or `16000`. The higher the value, the better the quality of the sound.
1. Below this, create a new function called `capture_audio`. This will be called to capture audio from the microphone:
```python
def capture_audio():
```
1. Inside this function, add the following to capture the audio:
```python
stream = audio.open(format = pyaudio.paInt16,
rate = rate,
channels = 1,
input_device_index = microphone_card_number,
input = True,
frames_per_buffer = 4096)
frames = []
while button.is_pressed():
frames.append(stream.read(4096))
stream.stop_stream()
stream.close()
```
This code opens an audio input stream using the PyAudio object. This stream will capture audio from the microphone at 16KHz, capturing it in buffers of 4096 bytes in size.
The code then loops whilst the Grove button is pressed, reading these 4096 byte buffers into an array each time.
> 💁 You can read more on the options passed to the `open` method in the [PyAudio documentation](https://people.csail.mit.edu/hubert/pyaudio/docs/).
Once the button is released, the stream is stopped and closed.
1. Add the following to the end of this function:
```python
wav_buffer = io.BytesIO()
with wave.open(wav_buffer, 'wb') as wavefile:
wavefile.setnchannels(1)
wavefile.setsampwidth(audio.get_sample_size(pyaudio.paInt16))
wavefile.setframerate(rate)
wavefile.writeframes(b''.join(frames))
wav_buffer.seek(0)
return wav_buffer
```
This code creates a binary buffer, and writes all the captured audio to it as a [WAV file](https://wikipedia.org/wiki/WAV). This is a standard way to write uncompressed audio to a file. This buffer is then returned.
1. Add the following `play_audio` function to play back the audio buffer:
```python
def play_audio(buffer):
stream = audio.open(format = pyaudio.paInt16,
rate = rate,
channels = 1,
output_device_index = speaker_card_number,
output = True)
with wave.open(buffer, 'rb') as wf:
data = wf.readframes(4096)
while len(data) > 0:
stream.write(data)
data = wf.readframes(4096)
stream.close()
```
This function opens another audio stream, this time for output - to play the audio. It uses the same settings as the input stream. The buffer is then opened as a wave file and written to the output stream in 4096 byte chunks, playing the audio. The stream is then closed.
1. Add the following code below the `capture_audio` function to loop until the button is pressed. Once the button is pressed, the audio is captured, then played.
```python
while True:
while not button.is_pressed():
time.sleep(.1)
buffer = capture_audio()
play_audio(buffer)
```
1. Run the code. Press the button and speak into the microphone. Release the button when you are done, and you will hear the recording.
You may see some ALSA errors when the PyAudio instance is created. This is due to configuration on the Pi for audio devices you don't have. You can ignore these errors.
```output
pi@raspberrypi:~/smart-timer $ python3 app.py
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
```
If you see the following error:
```output
OSError: [Errno -9997] Invalid sample rate
```
then change the `rate` to either 44100 or 16000.
> 💁 You can find this code in the [code-record/pi](code-record/pi) folder.
😀 Your audio recording program was a success!

@ -0,0 +1,143 @@
# Configure your microphone and speakers - Raspberry Pi
In this part of the lesson, you will add a microphone and speakers to your Raspberry Pi.
## Hardware
The Raspberry Pi needs a microphone.
The Pi doesn't have a microphone built in, you will need to add an external microphone. There are multiple ways to do this:
* USB microphone
* USB headset
* USB all in one speakerphone
* USB audio adapter and microphone with a 3.5mm jack
* [ReSpeaker 2-Mics Pi HAT](https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT.html)
> 💁 Bluetooth microphones are not all supported on the Raspberry Pi, so if you have a bluetooth microphone or headset, you may have issues pairing or capturing audio.
Raspberry Pis come with a 3.5mm headphone jack. You can use this to connect headphones, a headset or a speaker. You can also add speakers using:
* HDMI audio through a monitor or TV
* USB speakers
* USB headset
* USB all in one speakerphone
* [ReSpeaker 2-Mics Pi HAT](https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT.html) with a speaker attached, either to the 3.5mm jack or to the JST port
## Connect and configure the microphone and speakers
The microphone and speakers need to be connected, and configured.
### Task - connect and configure the microphone
1. Connect the microphone using the appropriate method. For example, connect it via one of the USB ports.
1. If you are using the ReSpeaker 2-Mics Pi HAT, you can remove the Grove base hat, then fit the ReSpeaker hat in it's place.
![A raspberry pi with a ReSpeaker hat](../../../images/pi-respeaker-hat.png)
You will need a Grove button later in this lesson, but one is built into this hat, so the Grove base hat is not needed.
Once the hat is fitted, you will need to install some drivers. Refer to the [Seeed getting started instructions](https://wiki.seeedstudio.com/ReSpeaker_2_Mics_Pi_HAT_Raspberry/#getting-started) for driver installation instructions.
> ⚠️ The instructions use `git` to clone a repository. If you don't have `git` installed on your Pi, you can install it by running the following command:
>
> ```sh
> sudo apt install git --yes
> ```
1. Run the following command in your Terminal either on the Pi, or connected using VS Code and a remote SSH session to see information about the connected microphone:
```sh
arecord -l
```
You will see a list of connected microphones. It will be something like the following:
```output
pi@raspberrypi:~ $ arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: M0 [eMeet M0], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
```
Assuming you only have one microphone, you should only see one entry. Configuration of mics can be tricky on Linux, so it is easiest to only use one microphone and unplug any others.
Note down the card number, as you will need this later. In the output above the card number is 1.
### Task - connect and configure the speaker
1. Connect the speakers using the appropriate method.
1. Run the following command in your Terminal either on the Pi, or connected using VS Code and a remote SSH session to see information about the connected speakers:
```sh
aplay -l
```
You will see a list of connected speakers. It will be something like the following:
```output
pi@raspberrypi:~ $ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: Headphones [bcm2835 Headphones], device 0: bcm2835 Headphones [bcm2835 Headphones]
Subdevices: 8/8
Subdevice #0: subdevice #0
Subdevice #1: subdevice #1
Subdevice #2: subdevice #2
Subdevice #3: subdevice #3
Subdevice #4: subdevice #4
Subdevice #5: subdevice #5
Subdevice #6: subdevice #6
Subdevice #7: subdevice #7
card 1: M0 [eMeet M0], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
```
You will always see `card 0: Headphones` as this is the built-in headphone jack. If you have added additional speakers, such as a USB speaker, you will see this listed as well.
1. If you are using an additional speaker, and not a speaker or headphones connected to the built-in headphone jack, you need to configure it as the default. To do this run the following command:
```sh
sudo nano /usr/share/alsa/alsa.conf
```
This will open a configuration file in `nano`, a terminal-based text editor. Scroll down using the arrow keys on your keyboard until you find the following line:
```output
defaults.pcm.card 0
```
Change the value from `0` to the card number of the card you want to use from the list that came back from the call to `aplay -l`. For example, in the output above there is a second sound card called `card 1: M0 [eMeet M0], device 0: USB Audio [USB Audio]`, using card 1. To use this, I would update the line to be:
```output
defaults.pcm.card 1
```
Set this value to the appropriate card number. You can navigate to the number using the arrow keys on your keyboard, then delete and type the new number as normal when editing text files.
1. Save the changes and close the file by pressing `Ctrl+x`. Press `y` to save the file, then `return` to select the file name.
### Task - test the microphone and speaker
1. Run the following command to record 5 seconds of audio through the microphone:
```sh
arecord --format=S16_LE --duration=5 --rate=16000 --file-type=wav out.wav
```
Whilst this command is running, make noise into the microphone such as by speaking, singing, beat boxing, playing an instrument or whatever takes your fancy.
1. After 5 seconds, the recording will stop. Run the following command to play back the audio:
```sh
aplay --format=S16_LE --rate=16000 out.wav
```
You will hear the audio bing played back through the speakers. Adjust the output volume on your speaker as necessary.
1. If you need to adjust the volume of the built-in microphone port, or adjust the gain of the microphone, you can use the `alsamixer` utility. You can read more on this utility on thw [Linux alsamixer man page](https://linux.die.net/man/1/alsamixer)
1. If you get errors playing back the audio, check the card you set as the `defaults.pcm.card` in the `alsa.conf` file.

@ -0,0 +1,106 @@
# Speech to text - Raspberry Pi
In this part of the lesson, you will write code to convert speech in the captured audio to text using the speech service.
## Send the audio to the speech service
The audio can be sent to the speech service using the REST API. To use the speech service, first you need to request an access token, then use that token to access the REST API. These access tokens expire after 10 minutes, so your code should request them on a regular basis to ensure they are always up to date.
### Task - get an access token
1. Open the `smart-timer` project on your Pi.
1. Remove the `play_audio` function. This is no longer needed as you don't want a smart timer to repeat back to you what you said.
1. Add the following imports to the top of the `app.py` file:
```python
import requests
import json
```
1. Add the following code above the `while True` loop to declare some settings for the speech service:
```python
api_key = '<key>'
location = '<location>'
language = '<language>'
```
Replace `<key>` with the API key for your speech service. Replace `<location>` with the location you used when you created the speech service resource.
Replace `<language>` with the locale name for language you will be speaking in, for example `en-GB` for English, or `zn-HK` for Cantonese. You can find a list of the supported languages and their locale names in the [Language and voice support documentation on Microsoft docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support?WT.mc_id=academic-17441-jabenn#speech-to-text).
1. Below this, add the following function to get an access token:
```python
def get_access_token():
headers = {
'Ocp-Apim-Subscription-Key': api_key
}
token_endpoint = f'https://{location}.api.cognitive.microsoft.com/sts/v1.0/issuetoken'
response = requests.post(token_endpoint, headers=headers)
return str(response.text)
```
This calls a token issuing endpoint, passing the API key as a header. This call returns an access token that can be used to call the speech services.
1. Below this, declare a function to convert speech in the captured audio to text using the REST API:
```python
def convert_speech_to_text(buffer):
```
1. Inside this function, set up the REST API URL and headers:
```python
url = f'https://{location}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1'
headers = {
'Authorization': 'Bearer ' + get_access_token(),
'Content-Type': f'audio/wav; codecs=audio/pcm; samplerate={rate}',
'Accept': 'application/json;text/xml'
}
params = {
'language': language
}
```
This builds a URL using the location of the speech services resource. It then populates the headers with the access token from the `get_access_token` function, as well as the sample rate used to capture the audio. Finally it defines some parameters to be passed with the URL containing the language in the audio.
1. Below this, add the following code to call the REST API and get back the text:
```python
response = requests.post(url, headers=headers, params=params, data=buffer)
response_json = json.loads(response.text)
if response_json['RecognitionStatus'] == 'Success':
return response_json['DisplayText']
else:
return ''
```
This calls the URL and decodes the JSON value that comes in the response. The `RecognitionStatus` value in the response indicates if the call was able to extract speech into text successfully, and if this is `Success` then the text is returned from the function, otherwise an empty string is returned.
1. Finally replace the call to `play_audio` in the `while True` loop with a call to the `convert_speech_to_text` function, as well as printing the text to the console:
```python
text = convert_speech_to_text(buffer)
print(text)
```
1. Run the code. Press the button and speak into the microphone. Release the button when you are done, and you will see the audio converted to text in the output.
```output
pi@raspberrypi:~/smart-timer $ python3 app.py
Hello world.
Welcome to IoT for beginners.
```
Try different types of sentences, along with sentences where words sound the same but have different meanings. For example, if you are speaking in English, say 'I want to buy two bananas and an apple too', and notice how it will use the correct to, two and too based on the context of the word, not just it's sound.
> 💁 You can find this code in the [code-speech-to-text/pi](code-speech-to-text/pi) folder.
😀 Your speech to text program was a success!

@ -0,0 +1,3 @@
# Capture audio - Virtual IoT device
The Python libraries that you will be using later in this lesson to convert speech to text have built-in audio capture on Windows, macOS and Linux. You don't need to do anything here.

@ -0,0 +1,12 @@
# Configure your microphone and speakers - Virtual IoT Hardware
The virtual IoT hardware will use a microphone and speakers attached to your computer.
If your computer doesn't have a microphone and speakers built in, you will need to attach these using hardware of your choice, such as:
* USB microphone
* USB speakers
* Speakers built into your monitor and connected over HDMI
* Bluetooth headset
Refer to your hardware manufacturers instructions to install and configure this hardware.

@ -0,0 +1,95 @@
# Speech to text - Virtual IoT device
In this part of the lesson, you will write code to convert speech captured from your microphone to text using the speech service.
## Convert speech to text
On Windows, Linux, and macOS, the speech services Python SDK can be used to listen to your microphone and convert any speech that is detected to text. It will listen continuously, detecting the audio levels and sending the speech for conversion to text when the audio level drops, such as at the end of a block of speech.
### Task - convert speech to text
1. Create a new Python app on your computer in a folder called `smart-timer` with a single file called `app.py` and a Python virtual environment.
1. Install the Pip package for the speech services. Make sure you are installing this from a terminal with the virtual environment activated.
```sh
pip install azure-cognitiveservices-speech
```
> ⚠️ If you see the following error:
>
> ```output
> ERROR: Could not find a version that satisfies the requirement azure-cognitiveservices-speech (from versions: none)
> ERROR: No matching distribution found for azure-cognitiveservices-speech
> ```
>
> You will need to update Pip. Do this with the following command, then try to install the package again
>
> ```sh
> pip install --upgrade pip
> ```
1. Add the following imports to the `app,py` file:
```python
import time
from azure.cognitiveservices.speech import SpeechConfig, SpeechRecognizer
```
This imports some classes used to recognize speech.
1. Add the following code to declare some configuration:
```python
api_key = '<key>'
location = '<location>'
language = '<language>'
speech_config = SpeechConfig(subscription=api_key,
region=location,
speech_recognition_language=language)
```
Replace `<key>` with the API key for your speech service. Replace `<location>` with the location you used when you created the speech service resource.
Replace `<language>` with the locale name for language you will be speaking in, for example `en-GB` for English, or `zn-HK` for Cantonese. You can find a list of the supported languages and their locale names in the [Language and voice support documentation on Microsoft docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support?WT.mc_id=academic-17441-jabenn#speech-to-text).
This configuration is then used to create a `SpeechConfig` object that will be used to configure the speech services.
1. Add the following code to create a speech recognizer:
```python
recognizer = SpeechRecognizer(speech_config=speech_config)
```
1. The speech recognizer runs on a background thread, listening for audio and converting any speech in it to text. You can get the text using a callback function - a function you define and pass to the recognizer. Every time speech is detected, the callback is called. Add the following code to define a callback that prints the text to the console, and pass this callback to the recognizer:
```python
def recognized(args):
print(args.result.text)
recognizer.recognized.connect(recognized)
```
1. The recognizer only starts listening when you explicitly start it. Add the following code to start the recognition. This runs in the background, so your application will also need an infinite loop that sleeps to keep the application running.
```python
recognizer.start_continuous_recognition()
while True:
time.sleep(1)
```
1. Run this app. Speak into your microphone and you will see the audio converted to text in the console.
```output
(.venv) ➜ smart-timer python3 app.py
Hello world.
Welcome to IoT for beginners.
```
Try different types of sentences, along with sentences where words sound the same but have different meanings. For example, if you are speaking in English, say 'I want to buy two bananas and an apple too', and notice how it will use the correct to, two and too based on the context of the word, not just it's sound.
> 💁 You can find this code in the [code-speech-to-text/virtual-iot-device](code-speech-to-text/virtual-iot-device) folder.
😀 Your speech to text program was a success!

@ -0,0 +1,3 @@
# Capture audio - Wio Terminal
Coming soon!

@ -0,0 +1,3 @@
# Configure your microphone and speakers - Wio Terminal
Coming soon!

@ -0,0 +1,3 @@
# Speech to text - Wio Terminal
Coming soon!

@ -0,0 +1,33 @@
# Understand language
Add a sketchnote if possible/appropriate
![Embed a video here if available](video-url)
## Pre-lecture quiz
[Pre-lecture quiz](https://brave-island-0b7c7f50f.azurestaticapps.net/quiz/33)
## Introduction
In this lesson you will learn about
In this lesson we'll cover:
* [Thing 1](#thing-1)
## Thing 1
---
## 🚀 Challenge
## Post-lecture quiz
[Post-lecture quiz](https://brave-island-0b7c7f50f.azurestaticapps.net/quiz/34)
## Review & Self Study
## Assignment
[](assignment.md)

@ -0,0 +1,9 @@
#
## Instructions
## Rubric
| Criteria | Exemplary | Adequate | Needs Improvement |
| -------- | --------- | -------- | ----------------- |
| | | | |

@ -0,0 +1,33 @@
# Provide spoken feedback
Add a sketchnote if possible/appropriate
![Embed a video here if available](video-url)
## Pre-lecture quiz
[Pre-lecture quiz](https://brave-island-0b7c7f50f.azurestaticapps.net/quiz/33)
## Introduction
In this lesson you will learn about
In this lesson we'll cover:
* [Thing 1](#thing-1)
## Thing 1
---
## 🚀 Challenge
## Post-lecture quiz
[Post-lecture quiz](https://brave-island-0b7c7f50f.azurestaticapps.net/quiz/34)
## Review & Self Study
## Assignment
[](assignment.md)

@ -0,0 +1,9 @@
#
## Instructions
## Rubric
| Criteria | Exemplary | Adequate | Needs Improvement |
| -------- | --------- | -------- | ----------------- |
| | | | |

@ -0,0 +1,33 @@
# Support multiple languages
Add a sketchnote if possible/appropriate
![Embed a video here if available](video-url)
## Pre-lecture quiz
[Pre-lecture quiz](https://brave-island-0b7c7f50f.azurestaticapps.net/quiz/33)
## Introduction
In this lesson you will learn about
In this lesson we'll cover:
* [Thing 1](#thing-1)
## Thing 1
---
## 🚀 Challenge
## Post-lecture quiz
[Post-lecture quiz](https://brave-island-0b7c7f50f.azurestaticapps.net/quiz/34)
## Review & Self Study
## Assignment
[](assignment.md)

@ -0,0 +1,9 @@
#
## Instructions
## Rubric
| Criteria | Exemplary | Adequate | Needs Improvement |
| -------- | --------- | -------- | ----------------- |
| | | | |

@ -24,7 +24,11 @@ All the device code for Arduino is in C++. To complete all the assignments you w
These are specific to using the Wio terminal Arduino device, and are not relevant to using the Raspberry Pi.
* [ArduCam Mini 2MP Plus - OV2640](https://www.arducam.com/product/arducam-2mp-spi-camera-b0067-arduino/)
* [Grove speaker plus](https://www.seeedstudio.com/Grove-Speaker-Plus-p-4592.html)
* [ReSpeaker 2-Mics Pi HAT](https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT.html)
* [Breadboard Jumper Wires](https://www.seeedstudio.com/Breadboard-Jumper-Wire-Pack-241mm-200mm-160mm-117m-p-234.html)
* Headphones or other speaker with a 3.5mm jack, or a JST speaker such as:
* [Mono Enclosed Speaker - 2W 6 Ohm](https://www.seeedstudio.com/Mono-Enclosed-Speaker-2W-6-Ohm-p-2832.html)
* [Grove speaker plus](https://www.seeedstudio.com/Grove-Speaker-Plus-p-4592.html)
* *Optional* - microSD Card 16GB or less for testing image capture, along with a connector to use the SD card with your computer if you don't have one built-in. **NOTE** - the Wio Terminal only supports SD cards up to 16GB, it does not support higher capacities.
## Raspberry Pi
@ -45,11 +49,16 @@ These are specific to using the Raspberry Pi, and are not relevant to using the
* [Grove Pi base hat](https://wiki.seeedstudio.com/Grove_Base_Hat_for_Raspberry_Pi)
* [Raspberry Pi Camera module](https://www.raspberrypi.org/products/camera-module-v2/)
* Microphone and speaker:
* Any USB Microphone
* Any USB speaker, or speaker with a 3.5mm cable, or using HDMI audio if your Raspberry Pi is connected to a monitor with speakers
or
Use one of the following (or equivalent):
* Any USB Microphone with any USB speaker, or speaker with a 3.5mm jack cable, or using HDMI audio output if your Raspberry Pi is connected to a monitor or TV with speakers
* Any USB headset with a built in microphone
* [ReSpeaker 2-Mics Pi HAT](https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT.html) with
* Headphones or other speaker with a 3.5mm jack, or a JST speaker such as:
* [Mono Enclosed Speaker - 2W 6 Ohm](https://www.seeedstudio.com/Mono-Enclosed-Speaker-2W-6-Ohm-p-2832.html)
* [USB Speakerphone](https://www.amazon.com/USB-Speakerphone-Conference-Business-Microphones/dp/B07Q3D7F8S/ref=sr_1_1?dchild=1&keywords=m0&qid=1614647389&sr=8-1)
* [Grove Sunlight sensor](https://www.seeedstudio.com/Grove-Sunlight-Sensor.html)
* [Grove button](https://www.seeedstudio.com/Grove-Button.html)
## Sensors and actuators
@ -60,7 +69,7 @@ Most of the sensors and actuators needed are used by both the Arduino and Raspbe
* [Grove capacitive soil moisture sensor](https://www.seeedstudio.com/Grove-Capacitive-Moisture-Sensor-Corrosion-Resistant.html)
* [Grove relay](https://www.seeedstudio.com/Grove-Relay.html)
* [Grove GPS (Air530)](https://www.seeedstudio.com/Grove-GPS-Air530-p-4584.html)
* [Grove - Time of flight Distance Sensor](https://www.seeedstudio.com/Grove-Time-of-Flight-Distance-Sensor-VL53L0X.html)
* [Grove Time of flight Distance Sensor](https://www.seeedstudio.com/Grove-Time-of-Flight-Distance-Sensor-VL53L0X.html)
## Optional hardware
@ -74,6 +83,8 @@ The lessons on automated watering work using a relay. As an option, you can conn
## Virtual hardware
The virtual hardware route will provide simulators for the sensors and actuators, implemented in Python. Depending on your hardware availability, you can run this on your normal development device, such as a Mac, PC, or run it on a Raspberry Pi and simulate only the hardware you don't have. For example, if you have the camera but not the Grove sensors, you will be able to run the virtual device code on your Pi and simulate the Grove sensors, but use a physical camera.
The virtual hardware route will provide simulators for the sensors and actuators, implemented in Python. Depending on your hardware availability, you can run this on your normal development device, such as a Mac, PC, or run it on a Raspberry Pi and simulate only the hardware you don't have. For example, if you have the Raspberry Pi camera but not the Grove sensors, you will be able to run the virtual device code on your Pi and simulate the Grove sensors, but use a physical camera.
The virtual hardware will use the [CounterFit project](https://github.com/CounterFit-IoT/CounterFit).
To complete these lessons you will need to have a web cam, microphone and audio output such as speakers or headphones. These can be built in or external, and need to be configured to work with your operating system and available for use from all applications.

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 176 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 290 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 387 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 371 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 301 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.3 MiB

After

Width:  |  Height:  |  Size: 4.2 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.9 MiB

Loading…
Cancel
Save