2.8 KiB

Raw Blame History

ASR (Automatic Speech Recognition)

Introduction

ASR, or Automatic Speech Recognition, refers to the problem of getting a program to automatically transcribe spoken language (speech-to-text).

This demo is an implementation to recognize text from a specific audio file. It can be done by a single command or a few lines in python using PaddleSpeech.

Usage

1. Installation

see https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md.

You can choose one way from easy, meduim and hard to install paddlespeech.

2. Prepare Input File

The input of this demo should be a WAV file(.wav), and the sample rate must be the same as the model.

Here are sample files for this demo that can be downloaded:

wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav

3. Usage

Command Line(Recommended)
```
paddlespeech asr --input ./zh.wav
```
(It doesn't matter if package paddlespeech-ctcdecoders is not found, this package is optional.)

Usage:
```
paddlespeech asr --help
```
Arguments:
- input(required): Audio file to recognize.
- model: Model type of asr task. Default: conformer_wenetspeech.
- lang: Model language. Default: zh.
- sample_rate: Sample rate of the model. Default: 16000.
- config: Config of asr task. Use pretrained model when it is None. Default: None.
- ckpt_path: Model checkpoint. Use pretrained model when it is None. Default: None.
- yes: No additional parameters required. Once set this parameter, it means accepting the request of the program by default, which includes transforming the audio sample rate. Default: False.
- device: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
Output:
```
[2021-12-08 13:12:34,063] [    INFO] [utils.py] [L225] - ASR Result: 我认为跑步最重要的就是给我带来了身体健康
```

Python API

import paddle
from paddlespeech.cli import ASRExecutor

asr_executor = ASRExecutor()
text = asr_executor(
    model='conformer_wenetspeech',
    lang='zh',
    sample_rate=16000,
    config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
    ckpt_path=None,
    audio_file='./zh.wav',
    force_yes=False,
    device=paddle.get_device())
print('ASR Result: \n{}'.format(text))

Output:

ASR Result:
我认为跑步最重要的就是给我带来了身体健康

4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

Model	Language	Sample Rate
conformer_wenetspeech	zh	16000

2.8 KiB Raw Blame History