diff --git a/6-consumer/lessons/1-speech-recognition/translation/README.ko.md b/6-consumer/lessons/1-speech-recognition/translation/README.ko.md deleted file mode 100644 index 728c87db..00000000 --- a/6-consumer/lessons/1-speech-recognition/translation/README.ko.md +++ /dev/null @@ -1,223 +0,0 @@ -# Recognize speech with an IoT device - -![A sketchnote overview of this lesson](../../../sketchnotes/lesson-21.jpg) - -> Sketchnote by [Nitya Narasimhan](https://github.com/nitya). Click the image for a larger version. - -This video gives an overview of the Azure speech service, a topic that will be covered in this lesson: - -[![How to get started using your Cognitive Services Speech resource from the Microsoft Azure YouTube channel](https://img.youtube.com/vi/iW0Fw0l3mrA/0.jpg)](https://www.youtube.com/watch?v=iW0Fw0l3mrA) - -> πŸŽ₯ Click the image above to watch a video - -## Pre-lecture quiz - -[Pre-lecture quiz](https://black-meadow-040d15503.1.azurestaticapps.net/quiz/41) - -## Introduction - -'Alexa, set a 12 minute timer' - -'Alexa, timer status' - -'Alexa, set a 8 minute timer called steam broccoli' - -Smart devices are becoming more and more pervasive. Not just as smart speakers like HomePods, Echos and Google Homes, but embedded in our phones, watches, and even light fittings and thermostats. - -> πŸ’ I have at least 19 devices in my home that have voice assistants, and that's just the ones I know about! - -Voice control increases accessibility by allowing folks with limited movement to interact with devices. Whether it is a permanent disability such as being born without arms, to temporary disabilities such as broken arms, or having your hands full of shopping or young children, being able to control our houses from our voice instead of our hands opens up a world of access. Shouting 'Hey Siri, close my garage door' whilst dealing with a baby change and an unruly toddler can be a small but effective improvement on life. - -One of the more popular uses for voice assistants is setting timers, especially kitchen timers. Being able to set multiple timers with just your voice is a great help in the kitchen - no need to stop kneading dough, stirring soup, or clean dumpling filling off your hands to use a physical timer. - -In this lesson you will learn about building voice recognition into IoT devices. You'll learn about microphones as sensors, how to capture audio from a microphone attached to an IoT device, and how to use AI to convert what is heard into text. Throughout the rest of this project you will build a smart kitchen timer, able to set timers using your voice with multiple languages. - -In this lesson we'll cover: - -* [Microphones](#마이크) -* [Capture audio from your IoT device](#capture-audio-from-your-iot-device) -* [Speech to text](#speech-to-text) -* [Convert speech to text](#convert-speech-to-text) - -## 마이크 - -λ§ˆμ΄ν¬λŠ” 음파λ₯Ό μ „κΈ° μ‹ ν˜Έλ‘œ λ³€ν™˜ν•˜λŠ” μ•„λ‚ λ‘œκ·Έ μ„Όμ„œμž…λ‹ˆλ‹€. 곡기 μ€‘μ˜ 진동은 마이크의 ꡬ성 μš”μ†Œλ“€μ„ μ•„μ£Ό μž‘μ€ μ–‘μœΌλ‘œ μ›€μ§μ΄κ²Œ ν•˜κ³ , 이것듀은 μ „κΈ° μ‹ ν˜Έμ— μž‘μ€ λ³€ν™”λ₯Ό μΌμœΌν‚΅λ‹ˆλ‹€. 이후 μ΄λŸ¬ν•œ λ³€ν™”λŠ” μ¦ν­λ˜μ–΄ 전기적 좜λ ₯을 μƒμ„±ν•©λ‹ˆλ‹€. - -### 마이크 μœ ν˜• - -λ§ˆμ΄ν¬λŠ” λ‹€μ–‘ν•œ μ’…λ₯˜κ°€ μžˆμŠ΅λ‹ˆλ‹€. - -* Dynamic - Dynamic λ§ˆμ΄ν¬μ—λŠ” μžμ„μ΄ λΆ€μ°©λ˜μ–΄μžˆμ–΄, 와이어 코일을 톡해 움직이며 μ „λ₯˜λ₯Ό μƒμ„±ν•˜λŠ” μ›€μ§μ΄λŠ” λ‹€μ΄μ–΄ν”„λž¨μ΄ μžˆμŠ΅λ‹ˆλ‹€. 이것은 λŒ€κ°œ μ „λ₯˜λ₯Ό μ‚¬μš©ν•˜μ—¬ 와이어 코일에 μžˆλŠ” μžμ„μ„ μ›€μ§μ΄λŠ” ν™•μ„±κΈ°μ™€λŠ” λ°˜λŒ€λ‘œ, μ§„λ™νŒμ„ 움직여 μ†Œλ¦¬λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€. 즉, 이것은 μŠ€ν”Όμ»€κ°€ Dynamic 마이크둜 μ‚¬μš©ν•  수 있고, dynamic 마이크λ₯Ό μŠ€ν”Όμ»€λ‘œ μ‚¬μš©ν•  수 μžˆμŒμ„ μ˜λ―Έν•©λ‹ˆλ‹€. μ‚¬μš©μžκ°€ λ“£κ±°λ‚˜ λ§ν•˜λŠ” intercom 같은 μž₯μΉ˜μ—μ„œ μŠ€ν”Όμ»€μ™€ 마이크의 역할을 λ™μ‹œμ— μˆ˜ν–‰ν•  수 μžˆλŠ” μž₯μΉ˜λŠ” μ—†μŠ΅λ‹ˆλ‹€. - - Dynamic λ§ˆμ΄ν¬λŠ” μž‘λ™ν•˜λŠ”λ° μ „λ ₯이 ν•„μš”ν•˜μ§€ μ•ŠμœΌλ©°, μ „κΈ° μ‹ ν˜ΈλŠ” μ „μ μœΌλ‘œ λ§ˆμ΄ν¬μ—μ„œ μƒμ„±λ©λ‹ˆλ‹€., - - ![Patti Smith singing into a Shure SM58 (dynamic cardioid type) microphone](../../../../images/dynamic-mic.jpg) - -* Ribbon -Ribbon λ§ˆμ΄ν¬λŠ” λ‹€μ΄μ–΄ν”„λž¨ λŒ€μ‹  κΈˆμ† 리본이 μžˆλ‹€λŠ” 점을 μ œμ™Έν•˜λ©΄ Dynamic λ§ˆμ΄ν¬μ™€ μœ μ‚¬ν•©λ‹ˆλ‹€. 이 리본은 자기μž₯μ—μ„œ μ΄λ™ν•˜λ©° μ „λ₯˜λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€. Dynamic λ§ˆμ΄ν¬μ™€ λ§ˆμ°¬κ°€μ§€λ‘œ 리본 λ§ˆμ΄ν¬λŠ” 전원이 ν•„μš”ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. - - ![Edmund Lowe, American actor, standing at radio microphone (labeled for (NBC) Blue Network), holding script, 1942](../../../../images/ribbon-mic.jpg) - -* Condenser - Condenser λ§ˆμ΄ν¬λŠ” 얇은 κΈˆμ† λ‹€μ΄μ–΄ν”„λž¨κ³Ό κ³ μ • κΈˆμ† λ°±ν”Œλ ˆμ΄νŠΈλ₯Ό κ°€μ§€κ³  μžˆμŠ΅λ‹ˆλ‹€. 이 두 κ°€μ§€ λͺ¨λ‘μ— μ „κΈ°κ°€ 적용되며 λ‹€μ΄μ–΄ν”„λž¨μ΄ 진동함에 따라 ν”Œλ ˆμ΄νŠΈ μ‚¬μ΄μ˜ μ •μ „κΈ°κ°€ λ³€ν™”ν•˜μ—¬ μ‹ ν˜Έκ°€ μƒμ„±λ©λ‹ˆλ‹€. μ½˜λ΄μ„œ λ§ˆμ΄ν¬κ°€ μž‘λ™ν•˜λ €λ©΄ *νŒ¬ν…€ 전원*이 ν•„μš”ν•©λ‹ˆλ‹€. - - ![C451B small-diaphragm condenser microphone by AKG Acoustics](../../../../images/condenser-mic.jpg) - -* MEMS - 마이크둜 μ „κΈ° 기계 μ‹œμŠ€ν…œ 마이크, λ˜λŠ” MEMSλŠ” μž‘μ€ 칩에 μžˆλŠ” λ§ˆμ΄ν¬μž…λ‹ˆλ‹€. 이듀은 μ••λ ₯ 감지 λ‹€μ΄μ–΄ν”„λž¨μ„ μ‹€λ¦¬μ½˜ 칩에 μƒˆκΈ°κ³ , μ½˜λ΄μ„œ λ§ˆμ΄ν¬μ™€ μœ μ‚¬ν•˜κ²Œ μž‘λ™ν•©λ‹ˆλ‹€. 이 λ§ˆμ΄ν¬λ“€μ€ μ•„μ£Ό μž‘κ³  νšŒλ‘œμ— μ‚¬μš© 될 수 μžˆμŠ΅λ‹ˆλ‹€. - - ![A MEMS microphone on a circuit board](../../../../images/mems-microphone.png) - - μœ„ μ΄λ―Έμ§€μ—μ„œ **LEFT**라고 ν‘œμ‹œλœ 칩은 MEMS 마이크이며 폭이 1mm 미만인 μž‘μ€ λ‹€λ””μ–΄ν”„λž¨μ΄ μžˆμŠ΅λ‹ˆλ‹€. - -βœ… 생각 ν•΄ λ΄…μ‹œλ‹€ : 컴퓨터, μ „ν™”κΈ°, ν—€λ“œμ…‹ λ˜λŠ” λ‹€λ₯Έ μ „μžκΈ°κΈ°μ—μ„œλŠ” μ–΄λ– ν•œ 마이크λ₯Ό κ°€μ§€κ³  μžˆλŠ”μ§€ 쑰사 ν•΄ λ΄…μ‹œλ‹€. - -### Digital audio - -μ˜€λ””μ˜€λŠ” 맀우 λ―Έμ„Έν•œ 정보λ₯Ό μ „λ‹¬ν•˜λŠ” μ•„λ‚ λ‘œκ·Έ μ‹ ν˜Έμž…λ‹ˆλ‹€. 이 μ‹ ν˜Έλ₯Ό λ””μ§€ν„Έλ‘œ λ³€ν™˜ν•˜λ €λ©΄ μ˜€λ””μ˜€λ₯Ό μ΄ˆλ‹Ή 수천번 μƒ˜ν”Œλ§ ν•΄μ•Όν•©λ‹ˆλ‹€. - -> πŸŽ“ μƒ˜ν”Œλ§μ΄λž€ μ˜€λ””μ˜€ μ‹ ν˜Έλ₯Ό ν•΄λ‹Ή μ§€μ μ˜ μ‹ ν˜Έλ₯Ό λ‚˜νƒ€λ‚΄λŠ” 디지컬 κ°’μœΌλ‘œ λ³€ν™˜ν•˜λŠ” 것 μž…λ‹ˆλ‹€. - -![A line chart showing a signal, with discrete points at fixed intervals](../../../../images/sampling.png) - -λ””μ§€ν„Έ μ˜€λ””μ˜€λŠ” νŽ„μŠ€ μ½”λ“œ λ³€μ‘°(Pulse Code Modulation, PCM)λ₯Ό μ‚¬μš©ν•˜μ—¬ μƒ˜ν”Œλ§ λ©λ‹ˆλ‹€. PCM은 μ‹ ν˜Έμ˜ 전압을 읽고 μ •μ˜λœ 크기λ₯Ό μ‚¬μš©ν•˜μ—¬ ν•΄λ‹Ή 전압에 κ°€μž₯ κ°€κΉŒμš΄ 이산 값을 μ„ νƒν•˜λŠ” μž‘μ—…μ„ ν¬ν•¨ν•©λ‹ˆλ‹€. - -> πŸ’ PCM은 νŽ„μŠ€ 폭 λ³€μ‘°μ˜ μ„Όμ„œ 버전 ν˜Ήμ€ PWM(PWM은 [lesson 3 of the getting started project](../../../../1-getting-started/lessons/3-sensors-and-actuators/README.md#pulse-width-modulation)μ—μ„œ 닀룬 적 μžˆμŠ΅λ‹ˆλ‹€.). PCM은 μ•„λ‚ λ‘œκ·Έ μ‹ ν˜Έλ₯Ό λ””μ§€ν„Έ μ‹ ν˜Έλ‘œ λ³€ν™˜ν•˜κ³  PWM은 λ””μ§€ν„Έ μ‹ ν˜Έλ₯Ό μ•„λ‚ λ‘œκ·Έλ‘œ λ³€ν™˜ν•©λ‹ˆλ‹€. - -예λ₯Ό λ“€μ–΄ λŒ€λΆ€λΆ„μ˜ 슀트리밍 μŒμ•… μ„œλΉ„μŠ€λŠ” 16λΉ„νŠΈ ν˜Ήμ€ 24λΉ„νŠΈ μ˜€λ””μ˜€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€. 즉, 전압을 16λΉ„νŠΈ μ •μˆ˜ λ˜λŠ” 24λΉ„νŠΈ μ •μˆ˜λ‘œ λ³€ν™˜ν•©λ‹ˆλ‹€. 16λΉ„νŠΈ μ˜€λ””μ˜€λŠ” -32,768μ—μ„œ 32,767 μ‚¬μ΄μ˜ 숫자둜 λ³€ν™˜λ˜κ³ , 24λΉ„νŠΈλŠ” -8,388,608μ—μ„œ 8,388,607 μ‚¬μ΄μ˜ λ²”μœ„μ— μžˆμŠ΅λ‹ˆλ‹€. λΉ„νŠΈ μˆ˜κ°€ λ§Žμ„μˆ˜λ‘ μƒ˜ν”Œλ§ 된 κ²°κ³ΌλŠ” μš°λ¦¬κ°€ μ‹€μ œλ‘œ κ·€λ‘œ λ“£λŠ” 것과 μœ μ‚¬ν•΄μ§‘λ‹ˆλ‹€. - -> πŸ’ μ’…μ’… LoFi라고 ν•˜λŠ” ν•˜λ“œ 8λΉ„νŠΈ μ˜€λ””μ˜€λ₯Ό μ‚¬μš©ν•  λ•Œκ°€ μžˆμŠ΅λ‹ˆλ‹€. 이것은 8λΉ„νŠΈλ§Œ μ‚¬μš©ν•˜λŠ” μ˜€λ””μ˜€ μƒ˜ν”Œλ§μœΌλ‘œ λ²”μœ„λŠ” -128μ—μ„œ 127κΉŒμ§€μž…λ‹ˆλ‹€. 졜초의 컴퓨터 μ˜€λ””μ˜€λŠ” ν•˜λ“œμ›¨μ–΄μ˜ ν•œκ³„λ‘œ 인해 8λΉ„νŠΈλ‘œ μ œν•œλ˜μ—ˆκΈ° λ•Œλ¬Έμ— 이것은 레트둜 κ²Œμž„μ—μ„œ 자주 λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€. - -μ΄λŸ¬ν•œ μƒ˜ν”Œμ€ KHz(μ΄ˆλ‹Ή 수천 개의 νŒλ…μΉ˜) λ‹¨μœ„λ‘œ 잘 μ •μ˜λœ μƒ˜ν”Œ 속도λ₯Ό μ‚¬μš©ν•˜μ—¬ μ΄ˆλ‹Ή 수천 번 μˆ˜μ§‘λ©λ‹ˆλ‹€. 슀트리밍 μŒμ•… μ„œλΉ„μŠ€λŠ” λŒ€λΆ€λΆ„μ˜ μ˜€λ””μ˜€μ— 48KHzλ₯Ό μ‚¬μš©ν•˜μ§€λ§Œ, 일뢀 `무손싀` μ˜€λ””μ˜€λŠ” μ΅œλŒ€ 96KHz λ˜λŠ” 심지어 192KHzλ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€. μƒ˜ν”Œλ§ 속도가 λ†’μ„μˆ˜λ‘ μ˜€λ””κ°€ 원본에 κ°€κΉμŠ΅λ‹ˆλ‹€. 인간이 48KHz μ΄μƒμ˜ 차이λ₯Ό ꡬ별할 수 μžˆλŠ”μ§€μ— λŒ€ν•œ λ…Όλž€μ΄ μžˆμŠ΅λ‹ˆλ‹€. - -βœ… 생각 ν•΄ λ΄…μ‹œλ‹€ : 슀트리밍 μŒμ•… μ„œλΉ„μŠ€λ₯Ό μ‚¬μš©ν•œλ‹€λ©΄, μ–΄λ–€ μƒ˜ν”Œλ§ 정도와 크기λ₯Ό μ‚¬μš©ν•˜λ‚˜μš”? CDλ₯Ό μ‚¬μš©ν•  경우 CD μ˜€λ””μ˜€μ˜ μƒ˜ν”Œλ§ λΉ„μœ¨κ³Ό ν¬κΈ°λŠ” μ–΄λ–»κ²Œ λ κΉŒμš”? - -μ˜€λ””μ˜€ λ°μ΄ν„°μ—λŠ” μ—¬λŸ¬κ°€μ§€ λ‹€λ₯Έ ν˜•μ‹μ΄ μžˆμŠ΅λ‹ˆλ‹€. μŒμ§ˆμ„ μžƒμ§€ μ•Šκ³  μž‘κ²Œ λ§Œλ“€κΈ° μœ„ν•΄μ„œ λ§Œλ“€μ–΄μ§„ mp3 μ˜€λ””μ˜€ 데이터에 λŒ€ν•˜μ—¬ λ“€μ–΄λ³Έ 적이 μžˆμ„ 것 μž…λ‹ˆλ‹€. μ••μΆ•λ˜μ§€ μ•Šμ€ μ˜€λ””μ˜€λŠ” μ’…μ’… WAV 파일둜 μ €μž₯λ©λ‹ˆλ‹€. 이 νŒŒμΌμ€ 44 λ°”μ΄νŠΈλ¦ 헀더 정보와 μ›μ‹œ μ˜€λ””μ˜€ 데이터λ₯Ό ν¬ν•©ν•©λ‹ˆλ‹€. ν—€λ”μ—λŠ” μƒ˜ν”Œλ§ 속도(예: 16KHz의 경우 16000), μƒ˜ν”Œλ§ 크기(16λΉ„νŠΈμ˜ 경우 16) 및 채널 μˆ˜μ™€ 같은 정보가 ν¬ν•¨λ©λ‹ˆλ‹€. WAV 파일의 헀더 뒀에 μ›μ‹œ μ˜€λ””μ˜€ 데이터가 ν¬ν•¨λ©λ‹ˆλ‹€. - -> πŸŽ“ 채널은 μ˜€λ””μ˜€λ₯Ό κ΅¬μ„±ν•˜λŠ” λ‹€μ–‘ν•œ μ˜€λ””μ˜€ 슀트림 수λ₯Ό λ‚˜νƒ€λƒ…λ‹ˆλ‹€. 예λ₯Ό λ“€μ–΄, 쒌우 ꡬ뢄이 λ˜λŠ” μŠ€ν…Œλ ˆμ˜€ μ˜€λ””μ˜€μ˜ 경우 2개의 채널이 μžˆμŠ΅λ‹ˆλ‹€. ν™ˆ μ‹œμ–΄ν„° μ‹œμŠ€ν…œμ˜ 7.1 μ„œλΌμš΄λ“œ μ‚¬μš΄λ“œμ˜ 경우 8μž…λ‹ˆλ‹€. - -### μ˜€λ””μ˜€ - -μ˜€λ””μ˜€ λ°μ΄ν„°λŠ” μƒλŒ€μ μœΌλ‘œ 큰 값을 κ°€μ§‘λ‹ˆλ‹€. μ••μΆ•λ˜μ§€ μ•Šμ€ 16λΉ„νŠΈ μ˜€λ””μ˜€λ₯Ό 16KHz(μŠ€ν”ΌμΉ˜ λŒ€ ν…μŠ€νŠΈ λͺ¨λΈμ—μ„œ μ‚¬μš©ν•˜κΈ°μ— μΆ©λΆ„ν•œ 속도)둜 μΊ‘μ²˜ν•˜λ €λ©΄ μ˜€λ””μ˜€μ˜ μ΄ˆλ‹Ή 32KB의 데이터가 ν•„μš”ν•©λ‹ˆλ‹€. - -* 16λΉ„νŠΈλŠ” μƒ˜ν”Œλ‹Ή 2λ°”μ΄νŠΈ(1λ°”μ΄νŠΈλŠ” 8λΉ„νŠΈ)λ₯Ό μ˜λ―Έν•©λ‹ˆλ‹€. -* 16KHzλŠ” μ΄ˆλ‹Ή 16,000개의 μƒ˜ν”Œμž…λ‹ˆλ‹€. -* 16,000 x 2λ°”μ΄νŠΈ = 32,000 bytes/sec. - -적은 μ–‘μ˜ λ°μ΄ν„°μ²˜λŸΌ 느껴질 수 μžˆμ§€λ§Œ λ©”λͺ¨λ¦¬κ°€ μ œν•œλœ 마이크둜 컨트둀러λ₯Ό μ‚¬μš©ν•˜λŠ” 경우 데이터가 훨씬 더 많게 느껴질 수 μžˆμŠ΅λ‹ˆλ‹€. 예λ₯Ό λ“€μ–΄, Wio Terminal은 192KB의 λ©”λͺ¨λ¦¬λ₯Ό κ°€μ§€κ³  있으며 ν”„λ‘œκ·Έλž¨ μ½”λ“œμ™€ λ³€μˆ˜λ₯Ό μ €μž₯ν•΄μ•Ό ν•©λ‹ˆλ‹€. ν”„λ‘œκ·Έλž¨ μ½”λ“œμ˜ 길이가 짧더라도 5초 μ΄μƒμ˜ μ˜€λ””μ˜€λ₯Ό 캑쳐할 수 μ—†μŠ΅λ‹ˆλ‹€. - -λ§ˆμ΄ν¬λ‘œμ»¨νŠΈλ‘€λŸ¬λŠ” SD μΉ΄λ“œλ‚˜ ν”Œλž˜μ‹œ λ©”λͺ¨λ¦¬μ™€ 같은 μΆ”κ°€ μ €μž₯μ†Œμ— μ•‘μ„ΈμŠ€ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μ˜€λ””μ˜€λ₯Ό μΊ‘μ²˜ν•˜λŠ” IoT μž₯치λ₯Ό ꡬ좕할 λ•ŒλŠ” μΆ”κ°€ μ €μž₯μ†Œκ°€ μžˆμ–΄μ•Ό ν•  뿐만 μ•„λ‹ˆλΌ μ½”λ“œκ°€ λ§ˆμ΄ν¬μ—μ„œ μΊ‘μ²˜ν•œ μ˜€λ””μ˜€λ₯Ό ν•΄λ‹Ή μ €μž₯μ†Œμ— 직접 κΈ°λ‘ν•˜κ³  ν΄λΌμš°λ“œλ‘œ 전솑할 λ•Œ μ €μž₯μ†Œμ—μ„œ μ›Ή μš”μ²­μœΌλ‘œ μŠ€νŠΈλ¦¬λ°ν•΄μ•Ό ν•©λ‹ˆλ‹€. μ΄λ ‡κ²Œ ν•˜λ©΄ ν•œ λ²ˆμ— 전체적인 μ˜€λ””μ˜€ 데이터 블둝을 λ©”λͺ¨λ¦¬μ— μ €μž₯ν•˜μ—¬ λ©”λͺ¨λ¦¬ lack을 λ°©μ§€ν•  수 μžˆμŠ΅λ‹ˆλ‹€. - -## Capture audio from your IoT device - -Your IoT device can be connected to a microphone to capture audio, ready for conversion to text. It can also be connected to speakers to output audio. In later lessons this will be used to give audio feedback, but it is useful to set up speakers now to test the microphone. - -### Task - configure your microphone and speakers - -Work through the relevant guide to configure the microphone and speakers for your IoT device: - -* [Arduino - Wio Terminal](wio-terminal-microphone.md) -* [Single-board computer - Raspberry Pi](pi-microphone.md) -* [Single-board computer - Virtual device](virtual-device-microphone.md) - -### Task - capture audio - -Work through the relevant guide to capture audio on your IoT device: - -* [Arduino - Wio Terminal](wio-terminal-audio.md) -* [Single-board computer - Raspberry Pi](pi-audio.md) -* [Single-board computer - Virtual device](virtual-device-audio.md) - -## Speech to text - -Speech to text, or speech recognition, involves using AI to convert words in an audio signal to text. - -### Speech recognition models - -To convert speech to text, samples from the audio signal are grouped together and fed into a machine learning model based around a Recurrent Neural network (RNN). This is a type of machine learning model that can use previous data to make a decision about incoming data. For example, the RNN could detect one block of audio samples as the sound 'Hel', and when it receives another that it thinks is the sound 'lo', it can combine this with the previous sound, find that 'Hello' is a valid word and select that as the outcome. - -ML models always accept data of the same size every time. The image classifier you built in an earlier lesson resizes images to a fixed size and processes them. The same with speech models, they have to process fixed sized audio chunks. The speech models need to be able to combine the outputs of multiple predictions to get the answer, to allow it to distinguish between 'Hi' and 'Highway', or 'flock' and 'floccinaucinihilipilification'. - -Speech models are also advanced enough to understand context, and can correct the words they detect as more sounds are processed. For example, if you say "I went to the shops to get two bananas and an apple too", you would use three words that sound the same, but are spelled differently - to, two and too. Speech models are able to understand the context and use the appropriate spelling of the word. - -> πŸ’ Some speech services allow customization to make them work better in noisy environments such as factories, or with industry-specific words such as chemical names. These customizations are trained by providing sample audio and a transcription, and work using transfer learning, the same as how you trained an image classifier using only a few images in an earlier lesson. - -### Privacy - -When using speech to text in a consumer IoT device, privacy is incredibly important. These devices listen to audio continuously, so as a consumer you don't want everything you say being sent to the cloud and converted to text. Not only will this use a lot of Internet bandwidth, it also has massive privacy implications, especially when some smart device makers randomly select audio for [humans to validate against the text generated to help improve their model](https://www.theverge.com/2019/4/10/18305378/amazon-alexa-ai-voice-assistant-annotation-listen-private-recordings). - -You only want your smart device to send audio to the cloud for processing when you are using it, not when it hears audio in your home, audio that could include private meetings or intimate interactions. The way most smart devices work is with a *wake word*, a key phrase such as "Alexa", "Hey Siri", or "OK Google" that causes the device to 'wake up' and listen to what you are saying up until it detects a break in your speech, indicating you have finished talking to the device. - -> πŸŽ“ Wake word detection is also referred to as *Keyword spotting* or *Keyword recognition*. - -These wake words are detected on the device, not in the cloud. These smart devices have small AI models that run on the device that listen for the wake work, and when it is detected, start streaming the audio to the cloud for recognition. These models are very specialized, and just listen for the wake word. - -> πŸ’ Some tech companies are adding more privacy to their devices and doing some of the speech to text conversion on the device. Apple have announced that as part of their 2021 iOS and macOS updates they will support the speech to text conversion on device, and be able to handle many requests without needing to use the cloud. This is thanks to having powerful processors in their devices that can run ML models. - -βœ… What do you think are the privacy and ethical implications of storing the audio sent to the cloud? Should this audio be stored, and if so, how? Do you thing the use of recordings for law enforcement is a good trade off for the loss of privacy? - -Wake word detection usually uses a technique know an TinyML, that is converting ML models to be able to run on microcontrollers. These models are small in size, and consume very little power to run. - -To avoid the complexity of training and using a wake word model, the smart timer you are building in this lesson will use a button to turn on the speech recognition. - -> πŸ’ If you want to try creating a wake word detection model to run on the Wio Terminal or Raspberry Pi, check out this [responding to your voice tutorial by Edge Impulse](https://docs.edgeimpulse.com/docs/responding-to-your-voice). If you want to use your computer to do this, you can try the [get started with Custom Keyword quickstart on the Microsoft docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/keyword-recognition-overview?WT.mc_id=academic-17441-jabenn). - -## Convert speech to text - -![Speech services logo](../../../images/azure-speech-logo.png) - -Just like with image classification in an earlier project, there are pre-built AI services that can take speech as an audio file and convert it to text. Once such service is the Speech Service, part of the Cognitive Services, pre-built AI services you can use in your apps. - -### Task - configure a speech AI resource - -1. Create a Resource Group for this project called `smart-timer` - -1. Use the following command to create a free speech resource: - - ```sh - az cognitiveservices account create --name smart-timer \ - --resource-group smart-timer \ - --kind SpeechServices \ - --sku F0 \ - --yes \ - --location - ``` - - Replace `` with the location you used when creating the Resource Group. - -1. You will need an API key to access the speech resource from your code. Run the following command to get the key: - - ```sh - az cognitiveservices account keys list --name smart-timer \ - --resource-group smart-timer \ - --output table - ``` - - Take a copy of one of the keys. - -### Task - convert speech to text - -Work through the relevant guide to convert speech to text on your IoT device: - -* [Arduino - Wio Terminal](wio-terminal-speech-to-text.md) -* [Single-board computer - Raspberry Pi](pi-speech-to-text.md) -* [Single-board computer - Virtual device](virtual-device-speech-to-text.md) - ---- - -## πŸš€ Challenge - -Speech recognition has been around for a long time, and is continuously improving. Research the current capabilities and compare how these have evolved over time, including how accurate machine transcriptions are compared to human. - -What do you think the future holds for speech recognition? - -## Post-lecture quiz - -[Post-lecture quiz](https://black-meadow-040d15503.1.azurestaticapps.net/quiz/42) - -## Review & Self Study - -* Read about the different microphone types and how they work on the [what's the difference between dynamic and condenser microphones article on Musician's HQ](https://musicianshq.com/whats-the-difference-between-dynamic-and-condenser-microphones/). -* Read more on the Cognitive Services speech service on the [speech service documentation on Microsoft Docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/?WT.mc_id=academic-17441-jabenn) -* Read about keyword spotting on the [keyword recognition documentation on Microsoft Docs](https://docs.microsoft.com/azure/cognitive-services/speech-service/keyword-recognition-overview?WT.mc_id=academic-17441-jabenn) - -## Assignment - -[](assignment.md) diff --git a/6-consumer/lessons/1-speech-recognition/translation/pi-microphone.ko.md b/6-consumer/lessons/1-speech-recognition/translation/pi-microphone.ko.md deleted file mode 100644 index f400417f..00000000 --- a/6-consumer/lessons/1-speech-recognition/translation/pi-microphone.ko.md +++ /dev/null @@ -1,140 +0,0 @@ -# 마이크 및 μŠ€ν”Όμ»€ ꡬ성 - Raspberry Pi - -이 λ‹¨μ›μ—μ„œλŠ” Raspberry Pi에 λ§ˆμ΄ν¬μ™€ μŠ€ν”Όμ»€λ₯Ό μΆ”κ°€ν•©λ‹ˆλ‹€. - -## ν•˜λ“œμ›¨μ–΄ - -Raspberry Pi에 μ—°κ²°ν•  λ§ˆμ΄ν¬κ°€ ν•„μš”ν•©λ‹ˆλ‹€. - -Piμ—λŠ” λ‚΄μž₯ λ§ˆμ΄ν¬κ°€ μ—†κΈ° λ•Œλ¬Έμ— μ™ΈλΆ€ 마이크λ₯Ό μΆ”κ°€ν•΄μ•Ό ν•©λ‹ˆλ‹€. μ™ΈλΆ€ 마이크λ₯Ό μΆ”κ°€ν•˜λŠ” λ°©λ²•μ—λŠ” μ—¬λŸ¬κ°€μ§€κ°€ μžˆμŠ΅λ‹ˆλ‹€. - -* USB 마이크 -* USB ν—€λ“œμ…‹ -* USB μ—°κ²° μŠ€ν”Όμ»€ν° -* USB μ—°κ²° 3.5mm 잭이 μžˆλŠ” μ˜€λ””μ˜€ μ–΄λŒ‘ν„° 및 마이크 -* [ReSpeaker 2-Mics Pi HAT](https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT.html) - -> πŸ’ Raspberry Piμ—μ„œλŠ” λΈ”λ£¨νˆ¬μŠ€ λ§ˆμ΄ν¬κ°€ 일뢀 μ§€μ›λ˜μ§€ μ•ŠμœΌλ―€λ‘œ λΈ”λ£¨νˆ¬μŠ€ 마이크 λ˜λŠ” ν—€λ“œμ…‹μ΄ μžˆλŠ” 경우 μ˜€λ””μ˜€ νŽ˜μ–΄λ§ λ˜λŠ” μΊ‘μ²˜μ— λ¬Έμ œκ°€ μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€. - -Raspberry Pi μž₯μΉ˜μ—λŠ” 3.5mm ν—€λ“œν° 잭이 μžˆμŠ΅λ‹ˆλ‹€. ν—€λ“œμ…‹ λ˜λŠ” μŠ€ν”Όμ»€λ₯Ό μ—°κ²°ν•˜κΈ° μœ„ν•΄ 이λ₯Ό μ‚¬μš©ν•  수 있으며 μ•„λž˜ 방법을 ν†΅ν•΄μ„œλ„ μŠ€ν”Όμ»€λ₯Ό μΆ”κ°€ν•  수 μžˆμŠ΅λ‹ˆλ‹€. - -* λͺ¨λ‹ˆν„° λ˜λŠ” TVλ₯Ό ν†΅ν•œ HDMI μ˜€λ””μ˜€ -* USB μŠ€ν”Όμ»€ -* USB ν—€λ“œμ…‹ -* USB μ—°κ²° κ°€λŠ₯ μŠ€ν”Όμ»€ν° -* 3.5mm 잭 λ˜λŠ” JST ν¬νŠΈμ— μŠ€ν”Όμ»€κ°€ λΆ€μ°©λœ [ReSpeaker 2-Mics Pi HAT](https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT.html) - -## λ§ˆμ΄ν¬μ™€ μŠ€ν”Όμ»€λ₯Ό μ—°κ²°ν•˜κ³  κ΅¬μ„±ν•©λ‹ˆλ‹€. - -λ§ˆμ΄ν¬μ™€ μŠ€ν”Όμ»€λ₯Ό μ—°κ²°ν•˜κ³  ꡬ성해야 ν•©λ‹ˆλ‹€. - -### μž‘μ—… - 마이크λ₯Ό μ—°κ²°ν•˜κ³  κ΅¬μ„±ν•©μ‹œλ‹€. - -1. μ μ ˆν•œ λ°©λ²•μœΌλ‘œ 마이크λ₯Ό μ—°κ²°ν•©λ‹ˆλ‹€. 예λ₯Ό λ“€μ–΄ USB 포트 쀑 ν•˜λ‚˜λ₯Ό 톡해 μ—°κ²°ν•©λ‹ˆλ‹€. - -1. ReSpeaker 2-Mics Pi HATλ₯Ό μ‚¬μš©ν•˜λŠ” 경우 Grove base hat을 μ œκ±°ν•œ λ‹€μŒ ReSpeaker hat을 κ·Έ μžλ¦¬μ— μž₯μ°©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. - ![A raspberry pi with a ReSpeaker hat](../../../../images/pi-respeaker-hat.png) - - 이 κ³Όμ •μ˜ ν›„λ°˜λΆ€μ— Grove λ²„νŠΌμ΄ ν•„μš”ν•˜μ§€λ§Œ, 이 λͺ¨μžμ—λŠ” Grove base hat이 λ‚΄μž₯λ˜μ–΄ μžˆμœΌλ―€λ‘œ Grove base hat이 ν•„μš”ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. - - hat이 μž₯착되면 λ“œλΌμ΄λ²„λ₯Ό μ„€μΉ˜ν•΄μ•Ό ν•©λ‹ˆλ‹€. λ“œλΌμ΄λ²„ μ„€μΉ˜ 지침은 [Seeed getting started instructions](https://wiki.seeedstudio.com/ReSpeaker_2_Mics_Pi_HAT_Raspberry/#getting-started) 을 μ°Έκ³ ν•˜μ„Έμš”. - - > ⚠️ λͺ…λ Ήμ–΄λŠ” `git`λ₯Ό μ‚¬μš©ν•˜μ—¬ μ €μž₯μ†Œλ₯Ό λ³΅μ œν•©λ‹ˆλ‹€. Pi에 `git`이 μ„€μΉ˜λ˜μ–΄ μžˆμ§€ μ•Šμ€ 경우 λ‹€μŒ λͺ…령을 μ‹€ν–‰ν•˜μ—¬ μ„€μΉ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€. - > - > ```sh - > sudo apt install git --yes - > ``` - -1. μ—°κ²°λœ λ§ˆμ΄ν¬μ— λŒ€ν•œ 정보λ₯Ό 보렀면 Piμ—μ„œ λ˜λŠ” VS Code 및 원격 SSH μ„Έμ…˜μ„ μ‚¬μš©ν•˜μ—¬ μ—°κ²°λœ ν„°λ―Έλ„μ—μ„œ λ‹€μŒ λͺ…령을 μ‹€ν–‰ν•©λ‹ˆλ‹€. - - ```sh - arecord -l - ``` - - μ•„λž˜μ™€ 같이 μ—°κ²°λœ 마이크 λͺ©λ‘μ΄ ν‘œμ‹œλ©λ‹ˆλ‹€: - - ```output - pi@raspberrypi:~ $ arecord -l - **** List of CAPTURE Hardware Devices **** - card 1: M0 [eMeet M0], device 0: USB Audio [USB Audio] - Subdevices: 1/1 - Subdevice #0: subdevice #0 - ``` - - μ—°κ²°λœ λ§ˆμ΄ν¬κ°€ ν•˜λ‚˜μΌ λ•Œ ν•˜λ‚˜μ˜ ν•­λͺ©λ§Œ ν‘œμ‹œλ©λ‹ˆλ‹€. λ¦¬λˆ…μŠ€μ—μ„œ 마이크 ꡬ성이 κΉŒλ‹€λ‘œμšΈ 수 μžˆμœΌλ―€λ‘œ ν•œ 개의 마이크만 μ‚¬μš©ν•˜κ³  λ‹€λ₯Έ λ§ˆμ΄ν¬λŠ” λΆ„λ¦¬ν•˜λŠ” 것을 μΆ”μ²œν•©λ‹ˆλ‹€. - - μΉ΄λ“œ λ²ˆν˜ΈλŠ” λ‚˜μ€‘μ— ν•„μš”ν•˜λ―€λ‘œ 적어 λ‘μ„Έμš”. μœ„μ˜ 좜λ ₯μ—μ„œ μΉ΄λ“œ λ²ˆν˜ΈλŠ” 1μž…λ‹ˆλ‹€. - -### μž‘μ—… - μŠ€ν”Όμ»€λ₯Ό μ—°κ²°ν•˜κ³  κ΅¬μ„±ν•©λ‹ˆλ‹€. - -1. μ μ ˆν•œ λ°©λ²•μœΌλ‘œ μŠ€ν”Όμ»€λ₯Ό μ—°κ²°ν•©λ‹ˆλ‹€. - -1. μ—°κ²°λœ μŠ€ν”Όμ»€μ— λŒ€ν•œ 정보λ₯Ό 보렀면 Piμ—μ„œ λ˜λŠ” VS Code와 원격 SSH μ„Έμ…˜μ„ μ‚¬μš©ν•˜μ—¬ μ—°κ²°λœ ν„°λ―Έλ„μ—μ„œ λ‹€μŒ λͺ…령을 μ‹€ν–‰ν•©λ‹ˆλ‹€. - ```sh - aplay -l - ``` - - μ•„λž˜μ™€ 같이 μ—°κ²°λœ μŠ€ν”Όμ»€ λͺ©λ‘μ΄ ν‘œμ‹œλ©λ‹ˆλ‹€: - - ```output - pi@raspberrypi:~ $ aplay -l - **** List of PLAYBACK Hardware Devices **** - card 0: Headphones [bcm2835 Headphones], device 0: bcm2835 Headphones [bcm2835 Headphones] - Subdevices: 8/8 - Subdevice #0: subdevice #0 - Subdevice #1: subdevice #1 - Subdevice #2: subdevice #2 - Subdevice #3: subdevice #3 - Subdevice #4: subdevice #4 - Subdevice #5: subdevice #5 - Subdevice #6: subdevice #6 - Subdevice #7: subdevice #7 - card 1: M0 [eMeet M0], device 0: USB Audio [USB Audio] - Subdevices: 1/1 - Subdevice #0: subdevice #0 - ``` - - ν—€λ“œν° 잭이 λ‚΄μž₯돼 μžˆμ–΄ `card 0: Headphones`이 항상 ν™•μΈλ˜λŠ” 것을 λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€. USB μŠ€ν”Όμ»€μ™€ 같은 μŠ€ν”Όμ»€λ₯Ό μΆ”κ°€ν•œ κ²½μš°μ—λ„ 이 λͺ©λ‘μ€ ν‘œμ‹œλ©λ‹ˆλ‹€. - -1. λ‚΄μž₯ ν—€λ“œν° μž­μ— μ—°κ²°λœ μŠ€ν”Όμ»€λ‚˜ ν—€λ“œν°μ΄ μ•„λ‹Œ μΆ”κ°€ μŠ€ν”Όμ»€λ₯Ό μ‚¬μš©ν•˜λŠ” 경우 λ‹€μŒ λͺ…λ Ήμ–΄λ₯Ό 톡해 κΈ°λ³Έκ°’μœΌλ‘œ ꡬ성해야 ν•©λ‹ˆλ‹€. - ```sh - sudo nano /usr/share/alsa/alsa.conf - ``` - - μ΄λ ‡κ²Œ ν•˜λ©΄ 단말기 기반 ν…μŠ€νŠΈ νŽΈμ§‘κΈ°μΈ `nano`μ—μ„œ ꡬ성 파일이 μ—΄λ¦½λ‹ˆλ‹€. λ‹€μŒ 쀄을 찾을 λ•ŒκΉŒμ§€ ν‚€λ³΄λ“œμ˜ ν™”μ‚΄ν‘œ ν‚€λ₯Ό μ‚¬μš©ν•˜μ—¬ μ•„λž˜λ‘œ μŠ€ν¬λ‘€ν•©λ‹ˆλ‹€. - - ```output - defaults.pcm.card 0 - ``` - - 호좜 ν›„ λŒμ•„μ˜¨ λͺ©λ‘μ—μ„œ μ‚¬μš©ν•  μΉ΄λ“œμ˜ μΉ΄λ“œ 번호λ₯Ό `0`μ—μ„œ `aplay -l`둜 λ³€κ²½ν•©λ‹ˆλ‹€. 예λ₯Ό λ“€μ–΄, μœ„μ˜ 좜λ ₯μ—λŠ” `card 1: M0 [eMeet M0], μž₯치 0: USB Audio [USB Audio]`λΌλŠ” 두 번째 μ‚¬μš΄λ“œ μΉ΄λ“œκ°€ μžˆμŠ΅λ‹ˆλ‹€. 이λ₯Ό μ‚¬μš©ν•˜κΈ° μœ„ν•΄ λ‹€μŒκ³Ό 같이 νŒŒμΌμ„ μ—…λ°μ΄νŠΈν•©λ‹ˆλ‹€. - - ```output - defaults.pcm.card 1 - ``` - - 이 값을 μ μ ˆν•œ μΉ΄λ“œ 번호둜 μ„€μ •ν•©λ‹ˆλ‹€. ν‚€λ³΄λ“œμ˜ ν™”μ‚΄ν‘œ ν‚€λ₯Ό μ‚¬μš©ν•˜μ—¬ 숫자둜 μ΄λ™ν•œ λ‹€μŒ ν…μŠ€νŠΈ νŒŒμΌμ„ νŽΈμ§‘ν•  λ•Œ 일반적으둜 μƒˆ 숫자λ₯Ό μ‚­μ œν•˜κ³  μž…λ ₯ν•  수 μžˆμŠ΅λ‹ˆλ‹€. - -1. `Ctrl+x`λ₯Ό 눌러 λ³€κ²½ λ‚΄μš©μ„ μ €μž₯ν•˜κ³  νŒŒμΌμ„ λ‹«μŠ΅λ‹ˆλ‹€. `y`λ₯Ό 눌러 νŒŒμΌμ„ μ €μž₯ν•œ λ‹€μŒ `return`을 눌러 파일 이름을 μ„ νƒν•©λ‹ˆλ‹€. - -### μž‘μ—… - λ§ˆμ΄ν¬μ™€ μŠ€ν”Όμ»€λ₯Ό ν…ŒμŠ€νŠΈν•©λ‹ˆλ‹€ - -1. λ‹€μŒ λͺ…령을 μ‹€ν–‰ν•˜μ—¬ 마이크λ₯Ό 톡해 5μ΄ˆκ°„μ˜ μ˜€λ””μ˜€λ₯Ό λ…ΉμŒν•©λ‹ˆλ‹€.: - - ```sh - arecord --format=S16_LE --duration=5 --rate=16000 --file-type=wav out.wav - ``` - - 이 λͺ…령이 μ‹€ν–‰λ˜λŠ” λ™μ•ˆ λ§ν•˜κΈ°, λ…Έλž˜ν•˜κΈ°, λΉ„νŠΈλ°•μŠ€, μ•…κΈ° μ—°μ£Ό λ˜λŠ” ν•˜κ³ μ‹Άμ€ 것을 ν•˜λ©° λ§ˆμ΄ν¬μ— μ†Œλ¦¬λ₯Ό λ‚΄μ‹­μ‹œμ˜€. - -1. 5초 후에 λ…Ήν™”κ°€ μ€‘μ§€λ©λ‹ˆλ‹€. λ‹€μŒ λͺ…령을 μ‹€ν–‰ν•˜μ—¬ μ˜€λ””μ˜€λ₯Ό μž¬μƒν•©λ‹ˆλ‹€. - - ```sh - aplay --format=S16_LE --rate=16000 out.wav - ``` - - μŠ€ν”Όμ»€λ₯Ό 톡해 audio bing이 μž¬μƒλ˜λŠ” μ†Œλ¦¬κ°€ λ“€λ¦½λ‹ˆλ‹€. ν•„μš”μ— 따라 μŠ€ν”Όμ»€μ˜ 좜λ ₯ λ³Όλ₯¨μ„ μ‘°μ •ν•©λ‹ˆλ‹€. - -1. λ‚΄μž₯된 마이크 포트의 λ³Όλ₯¨μ„ μ‘°μ ˆν•˜κ±°λ‚˜ 마이크의 κ²ŒμΈμ„ μ‘°μ ˆν•΄μ•Ό ν•  경우 `alsamixer` μœ ν‹Έλ¦¬ν‹°λ₯Ό μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. 이 μœ ν‹Έλ¦¬ν‹°μ— λŒ€ν•œ μžμ„Έν•œ λ‚΄μš©μ€ [Linux alsamixer man page](https://linux.die.net/man/1/alsamixer) μ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€. - -1. μ˜€λ””μ˜€λ₯Ό μž¬μƒν•  λ•Œ 였λ₯˜κ°€ λ°œμƒν•˜λ©΄ `alsa.conf` νŒŒμΌμ—μ„œ `defaults.pcm.card`둜 μ„€μ •ν•œ μΉ΄λ“œλ₯Ό ν™•μΈν•©λ‹ˆλ‹€.