Lesson 24 (#112)
* Adding content * Update en.json * Update README.md * Update TRANSLATIONS.md * Adding lesson tempolates * Fixing code files with each others code in * Update README.md * Adding lesson 16 * Adding virtual camera * Adding Wio Terminal camera capture * Adding wio terminal code * Adding SBC classification to lesson 16 * Adding challenge, review and assignment * Adding images and using new Azure icons * Update README.md * Update iot-reference-architecture.png * Adding structure for JulyOT links * Removing icons * Sketchnotes! * Create lesson-1.png * Starting on lesson 18 * Updated sketch * Adding virtual distance sensor * Adding Wio Terminal image classification * Update README.md * Adding structure for project 6 and wio terminal distance sensor * Adding some of the smart timer stuff * Updating sketchnotes * Adding virtual device speech to text * Adding chapter 21 * Language tweaks * Lesson 22 stuff * Update en.json * Bumping seeed libraries * Adding functions lab to lesson 22 * Almost done with LUIS * Update README.md * Reverting sunlight sensor change Fixes #88 * Structure * Adding speech to text lab for Pi * Adding virtual device text to speech lab * Finishing lesson 23 * Clarifying privacy Fixes #99 * Update README.md * Update hardware.md * Update README.md * Fixing some code samples that were wrong * Adding more on translation * Adding more on translator * Update README.mdpull/116/head
parent
de29cc94c6
commit
b11d60e7a9
@ -1,9 +1,17 @@
|
|||||||
#
|
# Build a universal translator
|
||||||
|
|
||||||
## Instructions
|
## Instructions
|
||||||
|
|
||||||
|
A universal translator is a device that can translate between multiple languages, allowing folks who speak different languages to be able to communicate. Use what you have learned over the past few lessons to build a universal translator using 2 IoT devices.
|
||||||
|
|
||||||
|
> If you do not have 2 devices, follow the steps in the previous few lessons to set up a virtual IoT device as one of the IoT devices.
|
||||||
|
|
||||||
|
You should configure one device for one language, and one for another. Each device should accept speech, convert it to text, send it to the other device via IoT Hub and a Functions app, then translate it and play the translated speech.
|
||||||
|
|
||||||
|
> 💁 Tip: When sending the speech from one device to another, send the language it is in as well, making it easer to translate. You could even have each device register using IoT Hub and a Functions app first, passing the language they support to be stored in Azure Storage. You could then use a Functions app to do the translations, sending the translated text to the IoT device.
|
||||||
|
|
||||||
## Rubric
|
## Rubric
|
||||||
|
|
||||||
| Criteria | Exemplary | Adequate | Needs Improvement |
|
| Criteria | Exemplary | Adequate | Needs Improvement |
|
||||||
| -------- | --------- | -------- | ----------------- |
|
| -------- | --------- | -------- | ----------------- |
|
||||||
| | | | |
|
| Create a universal translator | Was able to build a universal translator, converting speech detected by one device into speech played by another device in a different language | Was able to get some components working, such as capturing speech, or translating, but was unable to build the end to end solution | Was unable to build any parts of a working universal translator |
|
||||||
|
@ -0,0 +1,212 @@
|
|||||||
|
import io
|
||||||
|
import json
|
||||||
|
import pyaudio
|
||||||
|
import requests
|
||||||
|
import time
|
||||||
|
import wave
|
||||||
|
import threading
|
||||||
|
|
||||||
|
from azure.iot.device import IoTHubDeviceClient, Message, MethodResponse
|
||||||
|
|
||||||
|
from grove.factory import Factory
|
||||||
|
button = Factory.getButton('GPIO-HIGH', 5)
|
||||||
|
|
||||||
|
audio = pyaudio.PyAudio()
|
||||||
|
microphone_card_number = 1
|
||||||
|
speaker_card_number = 1
|
||||||
|
rate = 16000
|
||||||
|
|
||||||
|
def capture_audio():
|
||||||
|
stream = audio.open(format = pyaudio.paInt16,
|
||||||
|
rate = rate,
|
||||||
|
channels = 1,
|
||||||
|
input_device_index = microphone_card_number,
|
||||||
|
input = True,
|
||||||
|
frames_per_buffer = 4096)
|
||||||
|
|
||||||
|
frames = []
|
||||||
|
|
||||||
|
while button.is_pressed():
|
||||||
|
frames.append(stream.read(4096))
|
||||||
|
|
||||||
|
stream.stop_stream()
|
||||||
|
stream.close()
|
||||||
|
|
||||||
|
wav_buffer = io.BytesIO()
|
||||||
|
with wave.open(wav_buffer, 'wb') as wavefile:
|
||||||
|
wavefile.setnchannels(1)
|
||||||
|
wavefile.setsampwidth(audio.get_sample_size(pyaudio.paInt16))
|
||||||
|
wavefile.setframerate(rate)
|
||||||
|
wavefile.writeframes(b''.join(frames))
|
||||||
|
wav_buffer.seek(0)
|
||||||
|
|
||||||
|
return wav_buffer
|
||||||
|
|
||||||
|
speech_api_key = '<key>'
|
||||||
|
translator_api_key = '<key>'
|
||||||
|
location = '<location>'
|
||||||
|
language = '<language>'
|
||||||
|
server_language = '<language>'
|
||||||
|
connection_string = '<connection_string>'
|
||||||
|
|
||||||
|
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
|
||||||
|
|
||||||
|
print('Connecting')
|
||||||
|
device_client.connect()
|
||||||
|
print('Connected')
|
||||||
|
|
||||||
|
def get_access_token():
|
||||||
|
headers = {
|
||||||
|
'Ocp-Apim-Subscription-Key': speech_api_key
|
||||||
|
}
|
||||||
|
|
||||||
|
token_endpoint = f'https://{location}.api.cognitive.microsoft.com/sts/v1.0/issuetoken'
|
||||||
|
response = requests.post(token_endpoint, headers=headers)
|
||||||
|
return str(response.text)
|
||||||
|
|
||||||
|
def convert_speech_to_text(buffer):
|
||||||
|
url = f'https://{location}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1'
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
'Authorization': 'Bearer ' + get_access_token(),
|
||||||
|
'Content-Type': f'audio/wav; codecs=audio/pcm; samplerate={rate}',
|
||||||
|
'Accept': 'application/json;text/xml'
|
||||||
|
}
|
||||||
|
|
||||||
|
params = {
|
||||||
|
'language': language
|
||||||
|
}
|
||||||
|
|
||||||
|
response = requests.post(url, headers=headers, params=params, data=buffer)
|
||||||
|
response_json = json.loads(response.text)
|
||||||
|
|
||||||
|
if response_json['RecognitionStatus'] == 'Success':
|
||||||
|
return response_json['DisplayText']
|
||||||
|
else:
|
||||||
|
return ''
|
||||||
|
|
||||||
|
def translate_text(text, from_language, to_language):
|
||||||
|
url = f'https://api.cognitive.microsofttranslator.com/translate?api-version=3.0'
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
'Ocp-Apim-Subscription-Key': translator_api_key,
|
||||||
|
'Ocp-Apim-Subscription-Region': location,
|
||||||
|
'Content-type': 'application/json'
|
||||||
|
}
|
||||||
|
|
||||||
|
params = {
|
||||||
|
'from': from_language,
|
||||||
|
'to': to_language
|
||||||
|
}
|
||||||
|
|
||||||
|
body = [{
|
||||||
|
'text' : text
|
||||||
|
}]
|
||||||
|
|
||||||
|
response = requests.post(url, headers=headers, params=params, json=body)
|
||||||
|
return response.json()[0]['translations'][0]['text']
|
||||||
|
|
||||||
|
def get_voice():
|
||||||
|
url = f'https://{location}.tts.speech.microsoft.com/cognitiveservices/voices/list'
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
'Authorization': 'Bearer ' + get_access_token()
|
||||||
|
}
|
||||||
|
|
||||||
|
response = requests.get(url, headers=headers)
|
||||||
|
voices_json = json.loads(response.text)
|
||||||
|
|
||||||
|
first_voice = next(x for x in voices_json if x['Locale'].lower() == language.lower())
|
||||||
|
return first_voice['ShortName']
|
||||||
|
|
||||||
|
voice = get_voice()
|
||||||
|
print(f'Using voice {voice}')
|
||||||
|
|
||||||
|
playback_format = 'riff-48khz-16bit-mono-pcm'
|
||||||
|
|
||||||
|
def get_speech(text):
|
||||||
|
url = f'https://{location}.tts.speech.microsoft.com/cognitiveservices/v1'
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
'Authorization': 'Bearer ' + get_access_token(),
|
||||||
|
'Content-Type': 'application/ssml+xml',
|
||||||
|
'X-Microsoft-OutputFormat': playback_format
|
||||||
|
}
|
||||||
|
|
||||||
|
ssml = f'<speak version=\'1.0\' xml:lang=\'{language}\'>'
|
||||||
|
ssml += f'<voice xml:lang=\'{language}\' name=\'{voice}\'>'
|
||||||
|
ssml += text
|
||||||
|
ssml += '</voice>'
|
||||||
|
ssml += '</speak>'
|
||||||
|
|
||||||
|
response = requests.post(url, headers=headers, data=ssml.encode('utf-8'))
|
||||||
|
return io.BytesIO(response.content)
|
||||||
|
|
||||||
|
def play_speech(speech):
|
||||||
|
with wave.open(speech, 'rb') as wave_file:
|
||||||
|
stream = audio.open(format=audio.get_format_from_width(wave_file.getsampwidth()),
|
||||||
|
channels=wave_file.getnchannels(),
|
||||||
|
rate=wave_file.getframerate(),
|
||||||
|
output_device_index=speaker_card_number,
|
||||||
|
output=True)
|
||||||
|
|
||||||
|
data = wave_file.readframes(4096)
|
||||||
|
|
||||||
|
while len(data) > 0:
|
||||||
|
stream.write(data)
|
||||||
|
data = wave_file.readframes(4096)
|
||||||
|
|
||||||
|
stream.stop_stream()
|
||||||
|
stream.close()
|
||||||
|
|
||||||
|
def say(text):
|
||||||
|
print('Original:', text)
|
||||||
|
text = translate_text(text, server_language, language)
|
||||||
|
print('Translated:', text)
|
||||||
|
speech = get_speech(text)
|
||||||
|
play_speech(speech)
|
||||||
|
|
||||||
|
def announce_timer(minutes, seconds):
|
||||||
|
announcement = 'Times up on your '
|
||||||
|
if minutes > 0:
|
||||||
|
announcement += f'{minutes} minute '
|
||||||
|
if seconds > 0:
|
||||||
|
announcement += f'{seconds} second '
|
||||||
|
announcement += 'timer.'
|
||||||
|
say(announcement)
|
||||||
|
|
||||||
|
def create_timer(total_seconds):
|
||||||
|
minutes, seconds = divmod(total_seconds, 60)
|
||||||
|
threading.Timer(total_seconds, announce_timer, args=[minutes, seconds]).start()
|
||||||
|
announcement = ''
|
||||||
|
if minutes > 0:
|
||||||
|
announcement += f'{minutes} minute '
|
||||||
|
if seconds > 0:
|
||||||
|
announcement += f'{seconds} second '
|
||||||
|
announcement += 'timer started.'
|
||||||
|
say(announcement)
|
||||||
|
|
||||||
|
def handle_method_request(request):
|
||||||
|
payload = json.loads(request.payload)
|
||||||
|
seconds = payload['seconds']
|
||||||
|
if seconds > 0:
|
||||||
|
create_timer(payload['seconds'])
|
||||||
|
|
||||||
|
method_response = MethodResponse.create_from_method_request(request, 200)
|
||||||
|
device_client.send_method_response(method_response)
|
||||||
|
|
||||||
|
device_client.on_method_request_received = handle_method_request
|
||||||
|
|
||||||
|
while True:
|
||||||
|
while not button.is_pressed():
|
||||||
|
time.sleep(.1)
|
||||||
|
|
||||||
|
buffer = capture_audio()
|
||||||
|
text = convert_speech_to_text(buffer)
|
||||||
|
if len(text) > 0:
|
||||||
|
print('Original:', text)
|
||||||
|
text = translate_text(text, language, server_language)
|
||||||
|
print('Translated:', text)
|
||||||
|
|
||||||
|
message = Message(json.dumps({ 'speech': text }))
|
||||||
|
device_client.send_message(message)
|
@ -0,0 +1,124 @@
|
|||||||
|
import json
|
||||||
|
import requests
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from azure.cognitiveservices import speech
|
||||||
|
from azure.cognitiveservices.speech import SpeechConfig, SpeechRecognizer, SpeechSynthesizer
|
||||||
|
from azure.cognitiveservices.speech.translation import SpeechTranslationConfig, TranslationRecognizer
|
||||||
|
from azure.iot.device import IoTHubDeviceClient, Message, MethodResponse
|
||||||
|
|
||||||
|
speech_api_key = '<key>'
|
||||||
|
translator_api_key = '<key>'
|
||||||
|
location = '<location>'
|
||||||
|
language = '<language>'
|
||||||
|
server_language = '<language>'
|
||||||
|
connection_string = '<connection_string>'
|
||||||
|
|
||||||
|
device_client = IoTHubDeviceClient.create_from_connection_string(connection_string)
|
||||||
|
|
||||||
|
print('Connecting')
|
||||||
|
device_client.connect()
|
||||||
|
print('Connected')
|
||||||
|
|
||||||
|
translation_config = SpeechTranslationConfig(subscription=speech_api_key,
|
||||||
|
region=location,
|
||||||
|
speech_recognition_language=language,
|
||||||
|
target_languages=(language, server_language))
|
||||||
|
|
||||||
|
recognizer = TranslationRecognizer(translation_config=translation_config)
|
||||||
|
|
||||||
|
def recognized(args):
|
||||||
|
if args.result.reason == speech.ResultReason.TranslatedSpeech:
|
||||||
|
language_match = next(l for l in args.result.translations if server_language.lower().startswith(l.lower()))
|
||||||
|
text = args.result.translations[language_match]
|
||||||
|
|
||||||
|
if (len(text) > 0):
|
||||||
|
print(f'Translated text: {text}')
|
||||||
|
|
||||||
|
message = Message(json.dumps({ 'speech': text }))
|
||||||
|
device_client.send_message(message)
|
||||||
|
|
||||||
|
recognizer.recognized.connect(recognized)
|
||||||
|
|
||||||
|
recognizer.start_continuous_recognition()
|
||||||
|
|
||||||
|
speech_config = SpeechTranslationConfig(subscription=speech_api_key,
|
||||||
|
region=location)
|
||||||
|
speech_config.speech_synthesis_language = language
|
||||||
|
speech_synthesizer = SpeechSynthesizer(speech_config=speech_config)
|
||||||
|
|
||||||
|
voices = speech_synthesizer.get_voices_async().get().voices
|
||||||
|
first_voice = next(x for x in voices if x.locale.lower() == language.lower())
|
||||||
|
speech_config.speech_synthesis_voice_name = first_voice.short_name
|
||||||
|
|
||||||
|
def translate_text(text):
|
||||||
|
url = f'https://api.cognitive.microsofttranslator.com/translate?api-version=3.0'
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
'Ocp-Apim-Subscription-Key': translator_api_key,
|
||||||
|
'Ocp-Apim-Subscription-Region': location,
|
||||||
|
'Content-type': 'application/json'
|
||||||
|
}
|
||||||
|
|
||||||
|
params = {
|
||||||
|
'from': server_language,
|
||||||
|
'to': language
|
||||||
|
}
|
||||||
|
|
||||||
|
body = [{
|
||||||
|
'text' : text
|
||||||
|
}]
|
||||||
|
|
||||||
|
response = requests.post(url, headers=headers, params=params, json=body)
|
||||||
|
|
||||||
|
return response.json()[0]['translations'][0]['text']
|
||||||
|
|
||||||
|
def say(text):
|
||||||
|
print('Original:', text)
|
||||||
|
text = translate_text(text)
|
||||||
|
print('Translated:', text)
|
||||||
|
|
||||||
|
ssml = f'<speak version=\'1.0\' xml:lang=\'{language}\'>'
|
||||||
|
ssml += f'<voice xml:lang=\'{language}\' name=\'{first_voice.short_name}\'>'
|
||||||
|
ssml += text
|
||||||
|
ssml += '</voice>'
|
||||||
|
ssml += '</speak>'
|
||||||
|
|
||||||
|
recognizer.stop_continuous_recognition()
|
||||||
|
speech_synthesizer.speak_ssml(ssml)
|
||||||
|
recognizer.start_continuous_recognition()
|
||||||
|
|
||||||
|
def announce_timer(minutes, seconds):
|
||||||
|
announcement = 'Times up on your '
|
||||||
|
if minutes > 0:
|
||||||
|
announcement += f'{minutes} minute '
|
||||||
|
if seconds > 0:
|
||||||
|
announcement += f'{seconds} second '
|
||||||
|
announcement += 'timer.'
|
||||||
|
say(announcement)
|
||||||
|
|
||||||
|
def create_timer(total_seconds):
|
||||||
|
minutes, seconds = divmod(total_seconds, 60)
|
||||||
|
threading.Timer(total_seconds, announce_timer, args=[minutes, seconds]).start()
|
||||||
|
announcement = ''
|
||||||
|
if minutes > 0:
|
||||||
|
announcement += f'{minutes} minute '
|
||||||
|
if seconds > 0:
|
||||||
|
announcement += f'{seconds} second '
|
||||||
|
announcement += 'timer started.'
|
||||||
|
say(announcement)
|
||||||
|
|
||||||
|
def handle_method_request(request):
|
||||||
|
if request.name == 'set-timer':
|
||||||
|
payload = json.loads(request.payload)
|
||||||
|
seconds = payload['seconds']
|
||||||
|
if seconds > 0:
|
||||||
|
create_timer(payload['seconds'])
|
||||||
|
|
||||||
|
method_response = MethodResponse.create_from_method_request(request, 200)
|
||||||
|
device_client.send_method_response(method_response)
|
||||||
|
|
||||||
|
device_client.on_method_request_received = handle_method_request
|
||||||
|
|
||||||
|
while True:
|
||||||
|
time.sleep(1)
|
@ -0,0 +1,150 @@
|
|||||||
|
# Translate speech - Raspberry Pi
|
||||||
|
|
||||||
|
In this part of the lesson, you will write code to translate text using the translator service.
|
||||||
|
|
||||||
|
## Convert text to speech using the translator service
|
||||||
|
|
||||||
|
The speech service REST API doesn't support direct translations, instead you can use the Translator service to translate the text generated by the speech to text service, and the text of the spoken response. This service has a REST API you can use to translate the text.
|
||||||
|
|
||||||
|
### Task - use the translator resource to translate text
|
||||||
|
|
||||||
|
1. Your smart timer will have 2 languages set - the language of the server that was used to train LUIS, and the language spoken by the user. Update the `language` variable to be the language that will be spoken by the used, and add a new variable called `server_language` for the language used to train LUIS:
|
||||||
|
|
||||||
|
```python
|
||||||
|
language = '<user language>'
|
||||||
|
server_language = '<server language>'
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `<user language>` with the locale name for language you will be speaking in, for example `fr-FR` for French, or `zn-HK` for Cantonese.
|
||||||
|
|
||||||
|
Replace `<server language>` with the locale name for language used to train LUIS.
|
||||||
|
|
||||||
|
You can find a list of the supported languages and their locale names in the [Language and voice support documentation on Microsoft docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support?WT.mc_id=academic-17441-jabenn#speech-to-text).
|
||||||
|
|
||||||
|
> 💁 If you don't speak multiple languages you can use a service like [Bing Translate](https://www.bing.com/translator) or [Google Translate](https://translate.google.com) to translate from your preferred language to a language of your choice. These services can then play audio of the translated text.
|
||||||
|
>
|
||||||
|
> For example, if you train LUIS in English, but want to use French as the user language, you can translate sentences like "set a 2 minute and 27 second timer" from English into French using Bing Translate, then use the **Listen translation** button to speak the translation into your microphone.
|
||||||
|
>
|
||||||
|
> ![The listen translation button on Bing translate](../../../images/bing-translate.png)
|
||||||
|
|
||||||
|
1. Add the translator API key below the `speech_api_key`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
translator_api_key = '<key>'
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `<key>` with the API key for your translator service resource.
|
||||||
|
|
||||||
|
1. Above the `say` function, define a `translate_text` function that will translate text from the server language to the user language:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def translate_text(text, from_language, to_language):
|
||||||
|
```
|
||||||
|
|
||||||
|
The from and to languages are passed to this function - your app needs to convert from user language to server language when recognizing speech, and from server language to user language when provided spoken feedback.
|
||||||
|
|
||||||
|
1. Inside this function, define the URL and headers for the REST API call:
|
||||||
|
|
||||||
|
```python
|
||||||
|
url = f'https://api.cognitive.microsofttranslator.com/translate?api-version=3.0'
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
'Ocp-Apim-Subscription-Key': translator_api_key,
|
||||||
|
'Ocp-Apim-Subscription-Region': location,
|
||||||
|
'Content-type': 'application/json'
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The URL for this API is not location specific, instead the location is passed in as a header. The API key is used directly, so unlike the speech service there is no need to get an access token from the token issuer API.
|
||||||
|
|
||||||
|
1. Below this define the parameters and body for the call:
|
||||||
|
|
||||||
|
```python
|
||||||
|
params = {
|
||||||
|
'from': from_language,
|
||||||
|
'to': to_language
|
||||||
|
}
|
||||||
|
|
||||||
|
body = [{
|
||||||
|
'text' : text
|
||||||
|
}]
|
||||||
|
```
|
||||||
|
|
||||||
|
The `params` defines the parameters to pass to the API call, passing the from and to languages. This call will translate text in the `from` language into the `to` language.
|
||||||
|
|
||||||
|
The `body` contains the text to translate. This is an array, as multiple blocks of text can be translated in the same call.
|
||||||
|
|
||||||
|
1. Make the call the REST API, and get the response:
|
||||||
|
|
||||||
|
```python
|
||||||
|
response = requests.post(url, headers=headers, params=params, json=body)
|
||||||
|
```
|
||||||
|
|
||||||
|
The response that comes back is a JSON array, with one item that contains the translations. This item has an array for translations of all the items passed in the body.
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"translations": [
|
||||||
|
{
|
||||||
|
"text": "Set a 2 minute 27 second timer.",
|
||||||
|
"to": "en"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Return the `test` property from the first translation from the first item in the array:
|
||||||
|
|
||||||
|
```python
|
||||||
|
return response.json()[0]['translations'][0]['text']
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Update the `while True` loop to translate the text from the call to `convert_speech_to_text` from the user language to the server language:
|
||||||
|
|
||||||
|
```python
|
||||||
|
if len(text) > 0:
|
||||||
|
print('Original:', text)
|
||||||
|
text = translate_text(text, language, server_language)
|
||||||
|
print('Translated:', text)
|
||||||
|
|
||||||
|
message = Message(json.dumps({ 'speech': text }))
|
||||||
|
device_client.send_message(message)
|
||||||
|
```
|
||||||
|
|
||||||
|
This code also prints the original and translated versions of the text to the console.
|
||||||
|
|
||||||
|
1. Update the `say` function to translate the text to say from the server language to the user language:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def say(text):
|
||||||
|
print('Original:', text)
|
||||||
|
text = translate_text(text, server_language, language)
|
||||||
|
print('Translated:', text)
|
||||||
|
speech = get_speech(text)
|
||||||
|
play_speech(speech)
|
||||||
|
```
|
||||||
|
|
||||||
|
This code also prints the original and translated versions of the text to the console.
|
||||||
|
|
||||||
|
1. Run your code. Ensure your function app is running, and request a timer in the user language, either by speaking that language yourself, or using a translation app.
|
||||||
|
|
||||||
|
```output
|
||||||
|
pi@raspberrypi:~/smart-timer $ python3 app.py
|
||||||
|
Connecting
|
||||||
|
Connected
|
||||||
|
Using voice fr-FR-DeniseNeural
|
||||||
|
Original: Définir une minuterie de 2 minutes et 27 secondes.
|
||||||
|
Translated: Set a timer of 2 minutes and 27 seconds.
|
||||||
|
Original: 2 minute 27 second timer started.
|
||||||
|
Translated: 2 minute 27 seconde minute a commencé.
|
||||||
|
Original: Times up on your 2 minute 27 second timer.
|
||||||
|
Translated: Chronométrant votre minuterie de 2 minutes 27 secondes.
|
||||||
|
```
|
||||||
|
|
||||||
|
> 💁 Due to the different ways of saying something in different languages, you may get translations that are slightly different to the examples you gave LUIS. If this is the case, add more examples to LUIS, retrain then re-publish the model.
|
||||||
|
|
||||||
|
> 💁 You can find this code in the [code/pi](code/pi) folder.
|
||||||
|
|
||||||
|
😀 Your multi-lingual timer program was a success!
|
@ -0,0 +1,190 @@
|
|||||||
|
# Translate speech - Virtual IoT Device
|
||||||
|
|
||||||
|
In this part of the lesson, you will write code to translate speech when converting to text using the speech service, then translate text using the Translator service before generating a spoken response.
|
||||||
|
|
||||||
|
## Use the speech service to translate speech
|
||||||
|
|
||||||
|
The speech service can take speech and not only convert to text in the same language, but also translate the output to other languages.
|
||||||
|
|
||||||
|
### Task - use the speech service to translate speech
|
||||||
|
|
||||||
|
1. Open the `smart-timer` project in VS Code, and ensure the virtual environment is loaded in the terminal.
|
||||||
|
|
||||||
|
1. Add the following import statements below the existing imports:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from azure.cognitiveservices import speech
|
||||||
|
from azure.cognitiveservices.speech.translation import SpeechTranslationConfig, TranslationRecognizer
|
||||||
|
import requests
|
||||||
|
```
|
||||||
|
|
||||||
|
This imports classes used to translate speech, and a `requests` library that will be used to make a call to the Translator service later in this lesson.
|
||||||
|
|
||||||
|
1. Your smart timer will have 2 languages set - the language of the server that was used to train LUIS, and the language spoken by the user. Update the `language` variable to be the language that will be spoken by the used, and add a new variable called `server_language` for the language used to train LUIS:
|
||||||
|
|
||||||
|
```python
|
||||||
|
language = '<user language>'
|
||||||
|
server_language = '<server language>'
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `<user language>` with the locale name for language you will be speaking in, for example `fr-FR` for French, or `zn-HK` for Cantonese.
|
||||||
|
|
||||||
|
Replace `<server language>` with the locale name for language used to train LUIS.
|
||||||
|
|
||||||
|
You can find a list of the supported languages and their locale names in the [Language and voice support documentation on Microsoft docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support?WT.mc_id=academic-17441-jabenn#speech-to-text).
|
||||||
|
|
||||||
|
> 💁 If you don't speak multiple languages you can use a service like [Bing Translate](https://www.bing.com/translator) or [Google Translate](https://translate.google.com) to translate from your preferred language to a language of your choice. These services can then play audio of the translated text. Be aware that the speech recognizer will ignore some audio output from your device, so you may need to use an additional device to play the translated text.
|
||||||
|
>
|
||||||
|
> For example, if you train LUIS in English, but want to use French as the user language, you can translate sentences like "set a 2 minute and 27 second timer" from English into French using Bing Translate, then use the **Listen translation** button to speak the translation into your microphone.
|
||||||
|
>
|
||||||
|
> ![The listen translation button on Bing translate](../../../images/bing-translate.png)
|
||||||
|
|
||||||
|
1. Replace the `recognizer_config` and `recognizer` declarations with the following:
|
||||||
|
|
||||||
|
```python
|
||||||
|
translation_config = SpeechTranslationConfig(subscription=speech_api_key,
|
||||||
|
region=location,
|
||||||
|
speech_recognition_language=language,
|
||||||
|
target_languages=(language, server_language))
|
||||||
|
|
||||||
|
recognizer = TranslationRecognizer(translation_config=translation_config)
|
||||||
|
```
|
||||||
|
|
||||||
|
This creates a translation config to recognize speech in the user language, and create translations in the user and server language. It then uses this config to create a translation recognizer - a speech recognizer that can translate the output of the speech recognition into multiple languages.
|
||||||
|
|
||||||
|
> 💁 The original language needs to be specified in the `target_languages`, otherwise you won't get any translations.
|
||||||
|
|
||||||
|
1. Update the `recognized` function, replacing the entire contents of the function with the following:
|
||||||
|
|
||||||
|
```python
|
||||||
|
if args.result.reason == speech.ResultReason.TranslatedSpeech:
|
||||||
|
language_match = next(l for l in args.result.translations if server_language.lower().startswith(l.lower()))
|
||||||
|
text = args.result.translations[language_match]
|
||||||
|
if (len(text) > 0):
|
||||||
|
print(f'Translated text: {text}')
|
||||||
|
|
||||||
|
message = Message(json.dumps({ 'speech': text }))
|
||||||
|
device_client.send_message(message)
|
||||||
|
```
|
||||||
|
|
||||||
|
This code checks to see if the recognized event was fired because speech was translated (this event can fire at other times, such as when the speech is recognized but not translated). If the speech was translated, it finds the translation in the `args.result.translations` dictionary that matches the server language.
|
||||||
|
|
||||||
|
The `args.result.translations` dictionary is keyed off the language part of the locale setting, not the whole setting. For example, if you request a translation into `fr-FR` for French, the dictionary will contain an entry for `fr`, not `fr-FR`.
|
||||||
|
|
||||||
|
The translated text is then sent to the IoT Hub.
|
||||||
|
|
||||||
|
1. Run this code to test the translations. Ensure your function app is running, and request a timer in the user language, either by speaking that language yourself, or using a translation app.
|
||||||
|
|
||||||
|
```output
|
||||||
|
(.venv) ➜ smart-timer python app.py
|
||||||
|
Connecting
|
||||||
|
Connected
|
||||||
|
Translated text: Set a timer of 2 minutes and 27 seconds.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Translate text using the translator service
|
||||||
|
|
||||||
|
The speech service doesn't support translation pf text back to speech, instead you can use the Translator service to translate the text. This service has a REST API you can use to translate the text.
|
||||||
|
|
||||||
|
### Task - use the translator resource to translate text
|
||||||
|
|
||||||
|
1. Add the translator API key below the `speech_api_key`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
translator_api_key = '<key>'
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `<key>` with the API key for your translator service resource.
|
||||||
|
|
||||||
|
1. Above the `say` function, define a `translate_text` function that will translate text from the server language to the user language:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def translate_text(text):
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Inside this function, define the URL and headers for the REST API call:
|
||||||
|
|
||||||
|
```python
|
||||||
|
url = f'https://api.cognitive.microsofttranslator.com/translate?api-version=3.0'
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
'Ocp-Apim-Subscription-Key': translator_api_key,
|
||||||
|
'Ocp-Apim-Subscription-Region': location,
|
||||||
|
'Content-type': 'application/json'
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The URL for this API is not location specific, instead the location is passed in as a header. The API key is used directly, so unlike the speech service there is no need to get an access token from the token issuer API.
|
||||||
|
|
||||||
|
1. Below this define the parameters and body for the call:
|
||||||
|
|
||||||
|
```python
|
||||||
|
params = {
|
||||||
|
'from': server_language,
|
||||||
|
'to': language
|
||||||
|
}
|
||||||
|
|
||||||
|
body = [{
|
||||||
|
'text' : text
|
||||||
|
}]
|
||||||
|
```
|
||||||
|
|
||||||
|
The `params` defines the parameters to pass to the API call, passing the from and to languages. This call will translate text in the `from` language into the `to` language.
|
||||||
|
|
||||||
|
The `body` contains the text to translate. This is an array, as multiple blocks of text can be translated in the same call.
|
||||||
|
|
||||||
|
1. Make the call the REST API, and get the response:
|
||||||
|
|
||||||
|
```python
|
||||||
|
response = requests.post(url, headers=headers, params=params, json=body)
|
||||||
|
```
|
||||||
|
|
||||||
|
The response that comes back is a JSON array, with one item that contains the translations. This item has an array for translations of all the items passed in the body.
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"translations": [
|
||||||
|
{
|
||||||
|
"text": "Chronométrant votre minuterie de 2 minutes 27 secondes.",
|
||||||
|
"to": "fr"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Return the `test` property from the first translation from the first item in the array:
|
||||||
|
|
||||||
|
```python
|
||||||
|
return response.json()[0]['translations'][0]['text']
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Update the `say` function to translate the text to say before the SSML is generated:
|
||||||
|
|
||||||
|
```python
|
||||||
|
print('Original:', text)
|
||||||
|
text = translate_text(text)
|
||||||
|
print('Translated:', text)
|
||||||
|
```
|
||||||
|
|
||||||
|
This code also prints the original and translated versions of the text to the console.
|
||||||
|
|
||||||
|
1. Run your code. Ensure your function app is running, and request a timer in the user language, either by speaking that language yourself, or using a translation app.
|
||||||
|
|
||||||
|
```output
|
||||||
|
(.venv) ➜ smart-timer python app.py
|
||||||
|
Connecting
|
||||||
|
Connected
|
||||||
|
Translated text: Set a timer of 2 minutes and 27 seconds.
|
||||||
|
Original: 2 minute 27 second timer started.
|
||||||
|
Translated: 2 minute 27 seconde minute a commencé.
|
||||||
|
Original: Times up on your 2 minute 27 second timer.
|
||||||
|
Translated: Chronométrant votre minuterie de 2 minutes 27 secondes.
|
||||||
|
```
|
||||||
|
|
||||||
|
> 💁 Due to the different ways of saying something in different languages, you may get translations that are slightly different to the examples you gave LUIS. If this is the case, add more examples to LUIS, retrain then re-publish the model.
|
||||||
|
|
||||||
|
> 💁 You can find this code in the [code/virtual-iot-device](code/virtual-iot-device) folder.
|
||||||
|
|
||||||
|
😀 Your multi-lingual timer program was a success!
|
@ -0,0 +1,3 @@
|
|||||||
|
# Translate speech - Wio Terminal
|
||||||
|
|
||||||
|
Coming soon!
|
Binary file not shown.
After Width: | Height: | Size: 8.3 KiB |
After Width: | Height: | Size: 15 KiB |
After Width: | Height: | Size: 68 KiB |
After Width: | Height: | Size: 118 KiB |
Loading…
Reference in new issue