You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/demos/speech_translation
Jackwaterveg 1f554d8699
Update README_cn.md
3 years ago
..
README.md add readme_cn for audio_tagging automatic_video_subtitiles, punctuation_restoration and speech_recognition, test=doc_fix (#1162) 3 years ago
README_cn.md Update README_cn.md 3 years ago
run.sh Add run.sh. 3 years ago

README.md

(简体中文|English)

Speech Translation

Introduction

Speech translation is the process by which conversational spoken phrases are instantly translated and spoken aloud in a second language.

This demo is an implementation to recognize text from a specific audio file and translate it to the target language. It can be done by a single command or a few lines in python using PaddleSpeech.

Usage

1. Installation

pip install paddlespeech

2. Prepare Input File

The input of this demo should be a WAV file(.wav).

Here are sample files for this demo that can be downloaded:

wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav

3. Usage (not support for Windows now)

  • Command Line(Recommended)

    paddlespeech st --input ./en.wav
    

    Usage:

    paddlespeech st --help
    

    Arguments:

    • input(required): Audio file to recognize and translate.
    • model: Model type of st task. Default: fat_st_ted.
    • src_lang: Source language. Default: en.
    • tgt_lang: Target language. Default: zh.
    • sample_rate: Sample rate of the model. Default: 16000.
    • config: Config of st task. Use pretrained model when it is None. Default: None.
    • ckpt_path: Model checkpoint. Use pretrained model when it is None. Default: None.
    • device: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

    Output:

    [2021-12-09 11:13:03,178] [    INFO] [utils.py] [L225] - ST Result: ['我 在 这栋 建筑 的 古老 门上 敲门 。']
    
  • Python API

    import paddle
    from paddlespeech.cli import STExecutor
    
    st_executor = STExecutor()
    text = st_executor(
        model='fat_st_ted',
        src_lang='en',
        tgt_lang='zh',
        sample_rate=16000,
        config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
        ckpt_path=None,
        audio_file='./en.wav',
        device=paddle.get_device())
    print('ST Result: \n{}'.format(text))
    

    Output:

    ST Result:
    ['我 在 这栋 建筑 的 古老 门上 敲门 。'] 
    

4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

Model Source Language Target Language
fat_st_ted en zh