You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
IoT-For-Beginners/translations/en/6-consumer/lessons/1-speech-recognition/wio-terminal-speech-to-text.md

542 lines
22 KiB

<!--
CO_OP_TRANSLATOR_METADATA:
{
"original_hash": "3f92edf2975175577174910caca4a389",
"translation_date": "2025-08-28T19:29:31+00:00",
"source_file": "6-consumer/lessons/1-speech-recognition/wio-terminal-speech-to-text.md",
"language_code": "en"
}
-->
# Speech to Text - Wio Terminal
In this part of the lesson, you will write code to convert speech from the captured audio into text using the speech service.
## Sending Audio to the Speech Service
Audio can be sent to the speech service using the REST API. To use the speech service, you first need to request an access token, which is then used to access the REST API. These access tokens expire after 10 minutes, so your code should regularly request new tokens to ensure they remain valid.
### Task - Obtain an Access Token
1. Open the `smart-timer` project if it is not already open.
1. Add the following library dependencies to the `platformio.ini` file to enable WiFi access and handle JSON:
```ini
seeed-studio/Seeed Arduino rpcWiFi @ 1.0.5
seeed-studio/Seeed Arduino rpcUnified @ 2.1.3
seeed-studio/Seeed_Arduino_mbedtls @ 3.0.1
seeed-studio/Seeed Arduino RTC @ 2.0.0
bblanchon/ArduinoJson @ 6.17.3
```
1. Add the following code to the `config.h` header file:
```cpp
const char *SSID = "<SSID>";
const char *PASSWORD = "<PASSWORD>";
const char *SPEECH_API_KEY = "<API_KEY>";
const char *SPEECH_LOCATION = "<LOCATION>";
const char *LANGUAGE = "<LANGUAGE>";
const char *TOKEN_URL = "https://%s.api.cognitive.microsoft.com/sts/v1.0/issuetoken";
```
Replace `<SSID>` and `<PASSWORD>` with your WiFi credentials.
Replace `<API_KEY>` with the API key for your speech service resource. Replace `<LOCATION>` with the location you used when creating the speech service resource.
Replace `<LANGUAGE>` with the locale name of the language you will be speaking, for example, `en-GB` for English or `zh-HK` for Cantonese. You can find a list of supported languages and their locale names in the [Language and Voice Support documentation on Microsoft Docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support?WT.mc_id=academic-17441-jabenn#speech-to-text).
The `TOKEN_URL` constant is the URL of the token issuer without the location. This will later be combined with the location to form the full URL.
1. Similar to connecting to Custom Vision, you will need to use an HTTPS connection to connect to the token issuing service. Add the following code to the end of `config.h`:
```cpp
const char *TOKEN_CERTIFICATE =
"-----BEGIN CERTIFICATE-----\r\n"
"MIIF8zCCBNugAwIBAgIQAueRcfuAIek/4tmDg0xQwDANBgkqhkiG9w0BAQwFADBh\r\n"
"MQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMRkwFwYDVQQLExB3\r\n"
"d3cuZGlnaWNlcnQuY29tMSAwHgYDVQQDExdEaWdpQ2VydCBHbG9iYWwgUm9vdCBH\r\n"
"MjAeFw0yMDA3MjkxMjMwMDBaFw0yNDA2MjcyMzU5NTlaMFkxCzAJBgNVBAYTAlVT\r\n"
"MR4wHAYDVQQKExVNaWNyb3NvZnQgQ29ycG9yYXRpb24xKjAoBgNVBAMTIU1pY3Jv\r\n"
"c29mdCBBenVyZSBUTFMgSXNzdWluZyBDQSAwNjCCAiIwDQYJKoZIhvcNAQEBBQAD\r\n"
"ggIPADCCAgoCggIBALVGARl56bx3KBUSGuPc4H5uoNFkFH4e7pvTCxRi4j/+z+Xb\r\n"
"wjEz+5CipDOqjx9/jWjskL5dk7PaQkzItidsAAnDCW1leZBOIi68Lff1bjTeZgMY\r\n"
"iwdRd3Y39b/lcGpiuP2d23W95YHkMMT8IlWosYIX0f4kYb62rphyfnAjYb/4Od99\r\n"
"ThnhlAxGtfvSbXcBVIKCYfZgqRvV+5lReUnd1aNjRYVzPOoifgSx2fRyy1+pO1Uz\r\n"
"aMMNnIOE71bVYW0A1hr19w7kOb0KkJXoALTDDj1ukUEDqQuBfBxReL5mXiu1O7WG\r\n"
"0vltg0VZ/SZzctBsdBlx1BkmWYBW261KZgBivrql5ELTKKd8qgtHcLQA5fl6JB0Q\r\n"
"gs5XDaWehN86Gps5JW8ArjGtjcWAIP+X8CQaWfaCnuRm6Bk/03PQWhgdi84qwA0s\r\n"
"sRfFJwHUPTNSnE8EiGVk2frt0u8PG1pwSQsFuNJfcYIHEv1vOzP7uEOuDydsmCjh\r\n"
"lxuoK2n5/2aVR3BMTu+p4+gl8alXoBycyLmj3J/PUgqD8SL5fTCUegGsdia/Sa60\r\n"
"N2oV7vQ17wjMN+LXa2rjj/b4ZlZgXVojDmAjDwIRdDUujQu0RVsJqFLMzSIHpp2C\r\n"
"Zp7mIoLrySay2YYBu7SiNwL95X6He2kS8eefBBHjzwW/9FxGqry57i71c2cDAgMB\r\n"
"AAGjggGtMIIBqTAdBgNVHQ4EFgQU1cFnOsKjnfR3UltZEjgp5lVou6UwHwYDVR0j\r\n"
"BBgwFoAUTiJUIBiV5uNu5g/6+rkS7QYXjzkwDgYDVR0PAQH/BAQDAgGGMB0GA1Ud\r\n"
"JQQWMBQGCCsGAQUFBwMBBggrBgEFBQcDAjASBgNVHRMBAf8ECDAGAQH/AgEAMHYG\r\n"
"CCsGAQUFBwEBBGowaDAkBggrBgEFBQcwAYYYaHR0cDovL29jc3AuZGlnaWNlcnQu\r\n"
"Y29tMEAGCCsGAQUFBzAChjRodHRwOi8vY2FjZXJ0cy5kaWdpY2VydC5jb20vRGln\r\n"
"aUNlcnRHbG9iYWxSb290RzIuY3J0MHsGA1UdHwR0MHIwN6A1oDOGMWh0dHA6Ly9j\r\n"
"cmwzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydEdsb2JhbFJvb3RHMi5jcmwwN6A1oDOG\r\n"
"MWh0dHA6Ly9jcmw0LmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydEdsb2JhbFJvb3RHMi5j\r\n"
"cmwwHQYDVR0gBBYwFDAIBgZngQwBAgEwCAYGZ4EMAQICMBAGCSsGAQQBgjcVAQQD\r\n"
"AgEAMA0GCSqGSIb3DQEBDAUAA4IBAQB2oWc93fB8esci/8esixj++N22meiGDjgF\r\n"
"+rA2LUK5IOQOgcUSTGKSqF9lYfAxPjrqPjDCUPHCURv+26ad5P/BYtXtbmtxJWu+\r\n"
"cS5BhMDPPeG3oPZwXRHBJFAkY4O4AF7RIAAUW6EzDflUoDHKv83zOiPfYGcpHc9s\r\n"
"kxAInCedk7QSgXvMARjjOqdakor21DTmNIUotxo8kHv5hwRlGhBJwps6fEVi1Bt0\r\n"
"trpM/3wYxlr473WSPUFZPgP1j519kLpWOJ8z09wxay+Br29irPcBYv0GMXlHqThy\r\n"
"8y4m/HyTQeI2IMvMrQnwqPpY+rLIXyviI2vLoI+4xKE4Rn38ZZ8m\r\n"
"-----END CERTIFICATE-----\r\n";
```
This is the same certificate you used when connecting to Custom Vision.
1. Add an include directive for the WiFi header file and the config header file at the top of the `main.cpp` file:
```cpp
#include <rpcWiFi.h>
#include "config.h"
```
1. Add code to connect to WiFi in `main.cpp` above the `setup` function:
```cpp
void connectWiFi()
{
while (WiFi.status() != WL_CONNECTED)
{
Serial.println("Connecting to WiFi..");
WiFi.begin(SSID, PASSWORD);
delay(500);
}
Serial.println("Connected!");
}
```
1. Call this function from the `setup` function after the serial connection has been established:
```cpp
connectWiFi();
```
1. Create a new header file in the `src` folder called `speech_to_text.h`. Add the following code to this header file:
```cpp
#pragma once
#include <Arduino.h>
#include <ArduinoJson.h>
#include <HTTPClient.h>
#include <WiFiClientSecure.h>
#include "config.h"
#include "mic.h"
class SpeechToText
{
public:
private:
};
SpeechToText speechToText;
```
This includes necessary header files for an HTTP connection, configuration, and the `mic.h` header file. It also defines a class called `SpeechToText` and declares an instance of this class for later use.
1. Add the following two fields to the `private` section of this class:
```cpp
WiFiClientSecure _token_client;
String _access_token;
```
The `_token_client` is a WiFi Client that uses HTTPS and will be used to get the access token. The token will then be stored in `_access_token`.
1. Add the following method to the `private` section:
```cpp
String getAccessToken()
{
char url[128];
sprintf(url, TOKEN_URL, SPEECH_LOCATION);
HTTPClient httpClient;
httpClient.begin(_token_client, url);
httpClient.addHeader("Ocp-Apim-Subscription-Key", SPEECH_API_KEY);
int httpResultCode = httpClient.POST("{}");
if (httpResultCode != 200)
{
Serial.println("Error getting access token, trying again...");
delay(10000);
return getAccessToken();
}
Serial.println("Got access token.");
String result = httpClient.getString();
httpClient.end();
return result;
}
```
This code constructs the URL for the token issuer API using the location of the speech resource. It then creates an `HTTPClient` to make the web request, setting it up to use the WiFi client configured with the token endpoint's certificate. The API key is set as a header for the call. A POST request is made to retrieve the token, retrying in case of errors. Finally, the access token is returned.
1. Add a method to the `public` section to retrieve the access token. This will be needed in later lessons to convert text to speech:
```cpp
String AccessToken()
{
return _access_token;
}
```
1. Add an `init` method to the `public` section to set up the token client:
```cpp
void init()
{
_token_client.setCACert(TOKEN_CERTIFICATE);
_access_token = getAccessToken();
}
```
This sets the certificate on the WiFi client and retrieves the access token.
1. In `main.cpp`, include this new header file in the include directives:
```cpp
#include "speech_to_text.h"
```
1. Initialize the `SpeechToText` class at the end of the `setup` function, after the `mic.init` call but before writing `Ready` to the serial monitor:
```cpp
speechToText.init();
```
### Task - Read Audio from Flash Memory
1. In an earlier part of this lesson, audio was recorded to flash memory. This audio needs to be sent to the Speech Services REST API, so it must be read from flash memory. It cannot be loaded into an in-memory buffer as it would be too large. The `HTTPClient` class, which makes REST calls, can stream data using an Arduino Stream—a class that loads data in small chunks, sending them one at a time as part of the request. Each time you call `read` on a stream, it returns the next block of data. An Arduino stream can be created to read from flash memory. Create a new file called `flash_stream.h` in the `src` folder, and add the following code:
```cpp
#pragma once
#include <Arduino.h>
#include <HTTPClient.h>
#include <sfud.h>
#include "config.h"
class FlashStream : public Stream
{
public:
virtual size_t write(uint8_t val)
{
}
virtual int available()
{
}
virtual int read()
{
}
virtual int peek()
{
}
private:
};
```
This declares the `FlashStream` class, which derives from the Arduino `Stream` class. This is an abstract class, meaning derived classes must implement certain methods before the class can be instantiated. These methods are defined in this class.
✅ Learn more about Arduino Streams in the [Arduino Stream documentation](https://www.arduino.cc/reference/en/language/functions/communication/stream/)
1. Add the following fields to the `private` section:
```cpp
size_t _pos;
size_t _flash_address;
const sfud_flash *_flash;
byte _buffer[HTTP_TCP_BUFFER_SIZE];
```
This defines a temporary buffer to store data read from flash memory, along with fields to store the current position when reading from the buffer, the current address to read from flash memory, and the flash memory device.
1. Add the following method to the `private` section:
```cpp
void populateBuffer()
{
sfud_read(_flash, _flash_address, HTTP_TCP_BUFFER_SIZE, _buffer);
_flash_address += HTTP_TCP_BUFFER_SIZE;
_pos = 0;
}
```
This code reads from flash memory at the current address and stores the data in a buffer. It then increments the address so the next call reads the next block of memory. The buffer is sized based on the largest chunk the `HTTPClient` will send to the REST API at one time.
> 💁 Flash memory must be erased using the grain size, but reading can be done without this restriction.
1. Add a constructor to the `public` section of this class:
```cpp
FlashStream()
{
_pos = 0;
_flash_address = 0;
_flash = sfud_get_device_table() + 0;
populateBuffer();
}
```
This constructor initializes all fields to start reading from the beginning of the flash memory block and loads the first chunk of data into the buffer.
1. Implement the `write` method. Since this stream will only read data, this method does nothing and returns 0:
```cpp
virtual size_t write(uint8_t val)
{
return 0;
}
```
1. Implement the `peek` method. This returns the data at the current position without advancing the stream. Calling `peek` multiple times will always return the same data as long as no data is read from the stream:
```cpp
virtual int peek()
{
return _buffer[_pos];
}
```
1. Implement the `available` function. This returns how many bytes can be read from the stream, or -1 if the stream is complete. For this class, the maximum available will not exceed the HTTPClient's chunk size. When this stream is used in the HTTP client, it calls this function to determine how much data is available, then requests that amount to send to the REST API. If more than the chunk size is available, the chunk size is returned. If less, the available amount is returned. Once all data has been streamed, -1 is returned:
```cpp
virtual int available()
{
int remaining = BUFFER_SIZE - ((_flash_address - HTTP_TCP_BUFFER_SIZE) + _pos);
int bytes_available = min(HTTP_TCP_BUFFER_SIZE, remaining);
if (bytes_available == 0)
{
bytes_available = -1;
}
return bytes_available;
}
```
1. Implement the `read` method to return the next byte from the buffer, incrementing the position. If the position exceeds the buffer size, the buffer is populated with the next block from flash memory, and the position is reset:
```cpp
virtual int read()
{
int retVal = _buffer[_pos++];
if (_pos == HTTP_TCP_BUFFER_SIZE)
{
populateBuffer();
}
return retVal;
}
```
1. In the `speech_to_text.h` header file, add an include directive for this new header file:
```cpp
#include "flash_stream.h"
```
### Task - Convert Speech to Text
1. Speech can be converted to text by sending the audio to the Speech Service via a REST API. This REST API uses a different certificate than the token issuer, so add the following code to the `config.h` header file to define this certificate:
```cpp
const char *SPEECH_CERTIFICATE =
"-----BEGIN CERTIFICATE-----\r\n"
"MIIF8zCCBNugAwIBAgIQCq+mxcpjxFFB6jvh98dTFzANBgkqhkiG9w0BAQwFADBh\r\n"
"MQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMRkwFwYDVQQLExB3\r\n"
"d3cuZGlnaWNlcnQuY29tMSAwHgYDVQQDExdEaWdpQ2VydCBHbG9iYWwgUm9vdCBH\r\n"
"MjAeFw0yMDA3MjkxMjMwMDBaFw0yNDA2MjcyMzU5NTlaMFkxCzAJBgNVBAYTAlVT\r\n"
"MR4wHAYDVQQKExVNaWNyb3NvZnQgQ29ycG9yYXRpb24xKjAoBgNVBAMTIU1pY3Jv\r\n"
"c29mdCBBenVyZSBUTFMgSXNzdWluZyBDQSAwMTCCAiIwDQYJKoZIhvcNAQEBBQAD\r\n"
"ggIPADCCAgoCggIBAMedcDrkXufP7pxVm1FHLDNA9IjwHaMoaY8arqqZ4Gff4xyr\r\n"
"RygnavXL7g12MPAx8Q6Dd9hfBzrfWxkF0Br2wIvlvkzW01naNVSkHp+OS3hL3W6n\r\n"
"l/jYvZnVeJXjtsKYcXIf/6WtspcF5awlQ9LZJcjwaH7KoZuK+THpXCMtzD8XNVdm\r\n"
"GW/JI0C/7U/E7evXn9XDio8SYkGSM63aLO5BtLCv092+1d4GGBSQYolRq+7Pd1kR\r\n"
"EkWBPm0ywZ2Vb8GIS5DLrjelEkBnKCyy3B0yQud9dpVsiUeE7F5sY8Me96WVxQcb\r\n"
"OyYdEY/j/9UpDlOG+vA+YgOvBhkKEjiqygVpP8EZoMMijephzg43b5Qi9r5UrvYo\r\n"
"o19oR/8pf4HJNDPF0/FJwFVMW8PmCBLGstin3NE1+NeWTkGt0TzpHjgKyfaDP2tO\r\n"
"4bCk1G7pP2kDFT7SYfc8xbgCkFQ2UCEXsaH/f5YmpLn4YPiNFCeeIida7xnfTvc4\r\n"
"7IxyVccHHq1FzGygOqemrxEETKh8hvDR6eBdrBwmCHVgZrnAqnn93JtGyPLi6+cj\r\n"
"WGVGtMZHwzVvX1HvSFG771sskcEjJxiQNQDQRWHEh3NxvNb7kFlAXnVdRkkvhjpR\r\n"
"GchFhTAzqmwltdWhWDEyCMKC2x/mSZvZtlZGY+g37Y72qHzidwtyW7rBetZJAgMB\r\n"
"AAGjggGtMIIBqTAdBgNVHQ4EFgQUDyBd16FXlduSzyvQx8J3BM5ygHYwHwYDVR0j\r\n"
"BBgwFoAUTiJUIBiV5uNu5g/6+rkS7QYXjzkwDgYDVR0PAQH/BAQDAgGGMB0GA1Ud\r\n"
"JQQWMBQGCCsGAQUFBwMBBggrBgEFBQcDAjASBgNVHRMBAf8ECDAGAQH/AgEAMHYG\r\n"
"CCsGAQUFBwEBBGowaDAkBggrBgEFBQcwAYYYaHR0cDovL29jc3AuZGlnaWNlcnQu\r\n"
"Y29tMEAGCCsGAQUFBzAChjRodHRwOi8vY2FjZXJ0cy5kaWdpY2VydC5jb20vRGln\r\n"
"aUNlcnRHbG9iYWxSb290RzIuY3J0MHsGA1UdHwR0MHIwN6A1oDOGMWh0dHA6Ly9j\r\n"
"cmwzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydEdsb2JhbFJvb3RHMi5jcmwwN6A1oDOG\r\n"
"MWh0dHA6Ly9jcmw0LmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydEdsb2JhbFJvb3RHMi5j\r\n"
"cmwwHQYDVR0gBBYwFDAIBgZngQwBAgEwCAYGZ4EMAQICMBAGCSsGAQQBgjcVAQQD\r\n"
"AgEAMA0GCSqGSIb3DQEBDAUAA4IBAQAlFvNh7QgXVLAZSsNR2XRmIn9iS8OHFCBA\r\n"
"WxKJoi8YYQafpMTkMqeuzoL3HWb1pYEipsDkhiMnrpfeYZEA7Lz7yqEEtfgHcEBs\r\n"
"K9KcStQGGZRfmWU07hPXHnFz+5gTXqzCE2PBMlRgVUYJiA25mJPXfB00gDvGhtYa\r\n"
"+mENwM9Bq1B9YYLyLjRtUz8cyGsdyTIG/bBM/Q9jcV8JGqMU/UjAdh1pFyTnnHEl\r\n"
"Y59Npi7F87ZqYYJEHJM2LGD+le8VsHjgeWX2CJQko7klXvcizuZvUEDTjHaQcs2J\r\n"
"+kPgfyMIOY1DMJ21NxOJ2xPRC/wAh/hzSBRVtoAnyuxtkZ4VjIOh\r\n"
"-----END CERTIFICATE-----\r\n";
```
1. Add a constant to this file for the speech URL without the location. This will later be combined with the location and language to form the full URL:
```cpp
const char *SPEECH_URL = "https://%s.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=%s";
```
1. In the `speech_to_text.h` header file, add a field to the `private` section of the `SpeechToText` class for a WiFi Client using the speech certificate:
```cpp
WiFiClientSecure _speech_client;
```
1. In the `init` method, set the certificate on this WiFi Client:
```cpp
_speech_client.setCACert(SPEECH_CERTIFICATE);
```
1. Add the following code to the `public` section of the `SpeechToText` class to define a method for converting speech to text:
```cpp
String convertSpeechToText()
{
}
```
1. Add the following code to this method to create an HTTP client using the WiFi client configured with the speech certificate, and using the speech URL set with the location and language:
```cpp
char url[128];
sprintf(url, SPEECH_URL, SPEECH_LOCATION, LANGUAGE);
HTTPClient httpClient;
httpClient.begin(_speech_client, url);
```
1. Set the necessary headers for the connection:
```cpp
httpClient.addHeader("Authorization", String("Bearer ") + _access_token);
httpClient.addHeader("Content-Type", String("audio/wav; codecs=audio/pcm; samplerate=") + String(RATE));
httpClient.addHeader("Accept", "application/json;text/xml");
```
This sets headers for authorization using the access token, the audio format using the sample rate, and specifies that the client expects the result in JSON format.
1. Add the following code to make the REST API call:
```cpp
Serial.println("Sending speech...");
FlashStream stream;
int httpResponseCode = httpClient.sendRequest("POST", &stream, BUFFER_SIZE);
Serial.println("Speech sent!");
```
This creates a `FlashStream` and uses it to stream data to the REST API.
1. Add the following code to check the response code:
```cpp
String text = "";
if (httpResponseCode == 200)
{
String result = httpClient.getString();
Serial.println(result);
DynamicJsonDocument doc(1024);
deserializeJson(doc, result.c_str());
JsonObject obj = doc.as<JsonObject>();
text = obj["DisplayText"].as<String>();
}
else if (httpResponseCode == 401)
{
Serial.println("Access token expired, trying again with a new token");
_access_token = getAccessToken();
return convertSpeechToText();
}
else
{
Serial.print("Failed to convert text to speech - error ");
Serial.println(httpResponseCode);
}
```
If the response code is 200 (success), the result is retrieved, decoded from JSON, and the `DisplayText` property is assigned to the `text` variable. This property contains the text version of the speech.
If the response code is 401, the access token has expired (tokens last only 10 minutes). A new access token is requested, and the call is retried.
Otherwise, an error is logged to the serial monitor, and the `text` variable is left blank.
1. Add the following code to the end of this method to close the HTTP client and return the text:
```cpp
httpClient.end();
return text;
```
1. In `main.cpp`, call the new `convertSpeechToText` method in the `processAudio` function, then log the speech to the serial monitor:
```cpp
String text = speechToText.convertSpeechToText();
Serial.println(text);
```
1. Build the code, upload it to your Wio Terminal, and test it through the serial monitor. Once you see `Ready` in the serial monitor, press the C button (the one on the left-hand side, closest to the power switch), and speak. Four seconds of audio will be captured and converted to text.
```output
--- Available filters and text transformations: colorize, debug, default, direct, hexlify, log2file, nocontrol, printable, send_on_enter, time
--- More details at http://bit.ly/pio-monitor-filters
--- Miniterm on /dev/cu.usbmodem1101 9600,8,N,1 ---
--- Quit: Ctrl+C | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
Connecting to WiFi..
Connected!
Got access token.
Ready.
Starting recording...
Finished recording
Sending speech...
Speech sent!
{"RecognitionStatus":"Success","DisplayText":"Set a 2 minute and 27 second timer.","Offset":4700000,"Duration":35300000}
Set a 2 minute and 27 second timer.
```
> 💁 You can find this code in the [code-speech-to-text/wio-terminal](../../../../../6-consumer/lessons/1-speech-recognition/code-speech-to-text/wio-terminal) folder.
😀 Congratulations! Your speech-to-text program is working successfully!
---
**Disclaimer**:
This document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.