You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
IoT-For-Beginners/6-consumer/lessons/1-speech-recognition/virtual-device-speech-to-te...

4.4 KiB

Speech to text - Virtual IoT device

In this part of the lesson, you will write code to convert speech captured from your microphone to text using the speech service.

Convert speech to text

On Windows, Linux, and macOS, the speech services Python SDK can be used to listen to your microphone and convert any speech that is detected to text. It will listen continuously, detecting the audio levels and sending the speech for conversion to text when the audio level drops, such as at the end of a block of speech.

Task - convert speech to text

  1. Create a new Python app on your computer in a folder called smart-timer with a single file called app.py and a Python virtual environment.

  2. Install the Pip package for the speech services. Make sure you are installing this from a terminal with the virtual environment activated.

    pip install azure-cognitiveservices-speech
    

    ⚠️ If you get the following error:

    ERROR: Could not find a version that satisfies the requirement azure-cognitiveservices-speech (from versions: none)
    ERROR: No matching distribution found for azure-cognitiveservices-speech
    

    You will need to update Pip. Do this with the following command, then try to install the package again

    pip install --upgrade pip
    
  3. Add the following imports to the app,py file:

    import requests
    import time
    from azure.cognitiveservices.speech import SpeechConfig, SpeechRecognizer
    

    This imports some classes used to recognize speech.

  4. Add the following code to declare some configuration:

    speech_api_key = '<key>'
    location = '<location>'
    language = '<language>'
    
    recognizer_config = SpeechConfig(subscription=speech_api_key,
                                     region=location,
                                     speech_recognition_language=language)
    

    Replace <key> with the API key for your speech service. Replace <location> with the location you used when you created the speech service resource.

    Replace <language> with the locale name for language you will be speaking in, for example en-GB for English, or zn-HK for Cantonese. You can find a list of the supported languages and their locale names in the Language and voice support documentation on Microsoft docs.

    This configuration is then used to create a SpeechConfig object that will be used to configure the speech services.

  5. Add the following code to create a speech recognizer:

    recognizer = SpeechRecognizer(speech_config=recognizer_config)
    
  6. The speech recognizer runs on a background thread, listening for audio and converting any speech in it to text. You can get the text using a callback function - a function you define and pass to the recognizer. Every time speech is detected, the callback is called. Add the following code to define a callback, and pass this callback to the recognizer, as well as defining a function to process the text, writing it to the consoled:

    def process_text(text):
        print(text)
    
    def recognized(args):
        process_text(args.result.text)
    
    recognizer.recognized.connect(recognized)
    
  7. The recognizer only starts listening when you explicitly start it. Add the following code to start the recognition. This runs in the background, so your application will also need an infinite loop that sleeps to keep the application running.

    recognizer.start_continuous_recognition()
    
    while True:
        time.sleep(1)
    
  8. Run this app. Speak into your microphone and the audio converted to text will be output to the console.

    (.venv) ➜  smart-timer python3 app.py
    Hello world.
    Welcome to IoT for beginners.
    

    Try different types of sentences, along with sentences where words sound the same but have different meanings. For example, if you are speaking in English, say 'I want to buy two bananas and an apple too', and notice how it will use the correct to, two and too based on the context of the word, not just it's sound.

💁 You can find this code in the code-speech-to-text/virtual-iot-device folder.

😀 Your speech to text program was a success!