9.8 KiB

Raw Permalink Blame History

Translate speech - Virtual IoT Device

In this part of the lesson, you will write code to translate speech into text using the speech service, then translate the text using the Translator service before generating a spoken response.

Use the speech service to translate speech

The speech service can process speech and not only convert it into text in the same language but also translate the output into other languages.

Task - use the speech service to translate speech

Open the smart-timer project in VS Code, and ensure the virtual environment is activated in the terminal.
Add the following import statements below the existing imports:
```
from azure.cognitiveservices import speech
from azure.cognitiveservices.speech.translation import SpeechTranslationConfig, TranslationRecognizer
import requests
```
These imports include classes used for speech translation and the requests library, which will be used later in this lesson to make a call to the Translator service.
Your smart timer will have two languages set: the language of the server used to train LUIS (this same language is also used to build the messages spoken to the user) and the language spoken by the user. Update the language variable to reflect the language spoken by the user, and add a new variable called server_language for the language used to train LUIS:
```
language = '<user language>'
server_language = '<server language>'
```
Replace <user language> with the locale name for the language you will be speaking, such as fr-FR for French or zn-HK for Cantonese.

Replace <server language> with the locale name for the language used to train LUIS.

You can find a list of supported languages and their locale names in the Language and voice support documentation on Microsoft docs.

💁 If you don't speak multiple languages, you can use a service like Bing Translate or Google Translate to translate from your preferred language into another language. These services can also play audio of the translated text. Note that the speech recognizer may ignore some audio output from your device, so you might need to use an additional device to play the translated text.

For example, if you train LUIS in English but want to use French as the user language, you can translate sentences like "set a 2 minute and 27 second timer" from English into French using Bing Translate, then use the Listen translation button to speak the translation into your microphone.

Replace the recognizer_config and recognizer declarations with the following:

translation_config = SpeechTranslationConfig(subscription=speech_api_key,
                                             region=location,
                                             speech_recognition_language=language,
                                             target_languages=(language, server_language))

recognizer = TranslationRecognizer(translation_config=translation_config)

This creates a translation configuration to recognize speech in the user language and generate translations in both the user and server languages. It then uses this configuration to create a translation recognizer—a speech recognizer that can translate the output of speech recognition into multiple languages.

💁 The original language must be included in the target_languages; otherwise, no translations will be generated.

Update the recognized function by replacing its entire contents with the following:
```
if args.result.reason == speech.ResultReason.TranslatedSpeech:
    language_match = next(l for l in args.result.translations if server_language.lower().startswith(l.lower()))
    text = args.result.translations[language_match]
    if (len(text) > 0):
        print(f'Translated text: {text}')

        message = Message(json.dumps({ 'speech': text }))
        device_client.send_message(message)
```
This code checks whether the recognized event was triggered because speech was translated (this event can also be triggered when speech is recognized but not translated). If the speech was translated, it retrieves the translation from the args.result.translations dictionary that matches the server language.

The args.result.translations dictionary uses the language part of the locale setting as its key, not the full setting. For example, if you request a translation into fr-FR for French, the dictionary will contain an entry for fr, not fr-FR.

The translated text is then sent to the IoT Hub.
Run this code to test the translations. Ensure your function app is running, and request a timer in the user language, either by speaking that language yourself or using a translation app.
```
(.venv) ➜  smart-timer python app.py
Connecting
Connected
Translated text: Set a timer of 2 minutes and 27 seconds.
```

Translate text using the translator service

The speech service does not support translating text back into speech. Instead, you can use the Translator service to translate the text. This service provides a REST API for text translation.

Task - use the translator resource to translate text

Add the Translator API key below the speech_api_key:
```
translator_api_key = '<key>'
```
Replace <key> with the API key for your Translator service resource.
Above the say function, define a translate_text function to translate text from the server language to the user language:
```
def translate_text(text):
```
Inside this function, define the URL and headers for the REST API call:
```
url = f'https://api.cognitive.microsofttranslator.com/translate?api-version=3.0'

headers = {
    'Ocp-Apim-Subscription-Key': translator_api_key,
    'Ocp-Apim-Subscription-Region': location,
    'Content-type': 'application/json'
}
```
The URL for this API is not location-specific; instead, the location is passed as a header. The API key is used directly, so unlike the speech service, there is no need to obtain an access token from the token issuer API.
Below this, define the parameters and body for the call:
```
params = {
    'from': server_language,
    'to': language
}

body = [{
    'text' : text
}]
```
The params specify the parameters for the API call, including the source (from) and target (to) languages. This call translates text from the from language into the to language.

The body contains the text to be translated. It is an array, as multiple blocks of text can be translated in a single call.

Make the REST API call and retrieve the response:

response = requests.post(url, headers=headers, params=params, json=body)

The response is a JSON array containing one item with the translations. This item includes an array of translations for all the text blocks passed in the body.

[
    {
        "translations": [
            {
                "text": "Chronométrant votre minuterie de 2 minutes 27 secondes.",
                "to": "fr"
            }
        ]
    }
]

Return the text property from the first translation in the first item of the array:
```
return response.json()[0]['translations'][0]['text']
```
Update the say function to translate the text before generating the SSML:
```
print('Original:', text)
text = translate_text(text)
print('Translated:', text)
```
This code also prints both the original and translated versions of the text to the console.
Run your code. Ensure your function app is running, and request a timer in the user language, either by speaking that language yourself or using a translation app.
```
(.venv) ➜  smart-timer python app.py
Connecting
Connected
Translated text: Set a timer of 2 minutes and 27 seconds.
Original: 2 minute 27 second timer started.
Translated: 2 minute 27 seconde minute a commencé.
Original: Times up on your 2 minute 27 second timer.
Translated: Chronométrant votre minuterie de 2 minutes 27 secondes.
```
💁 Different languages may express the same idea in slightly different ways, so the translations might not match the examples you provided to LUIS exactly. If this happens, add more examples to LUIS, retrain the model, and re-publish it.

💁 You can find this code in the code/virtual-iot-device folder.

😀 Your multilingual timer program was a success!

Disclaimer:
This document has been translated using the AI translation service Co-op Translator. While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.

9.8 KiB Raw Permalink Blame History