You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/demos/speech_server
TianYuan 10ab7aabfe
Update README_cn.md
3 years ago
..
conf add server test, test=doc 3 years ago
README.md add server test, test=doc 3 years ago
README_cn.md Update README_cn.md 3 years ago
asr_client.sh remove wavfile, test=doc 3 years ago
server.sh add server demo, test=doc 3 years ago
tts_client.sh add server demo, test=doc 3 years ago

README.md

(简体中文|English)

Speech Server

Introduction

This demo is an implementation of starting the voice service and accessing the service. It can be achieved with a single command using paddlespeech_server and paddlespeech_client or a few lines of code in python.

Usage

1. Installation

see installation.

It is recommended to use paddlepaddle 2.2.1 or above. You can choose one way from easy, meduim and hard to install paddlespeech.

2. Prepare config File

The configuration file contains the service-related configuration files and the model configuration related to the voice tasks contained in the service. They are all under the conf folder.

Note: The configuration of engine_backend in application.yaml represents all speech tasks included in the started service. If the service you want to start contains only a certain speech task, then you need to comment out the speech tasks that do not need to be included. For example, if you only want to use the speech recognition (ASR) service, then you can comment out the speech synthesis (TTS) service, as in the following example:

engine_backend:
    asr: 'conf/asr/asr.yaml'
    #tts: 'conf/tts/tts.yaml'

Note: The configuration file of engine_backend in application.yaml needs to match the configuration type of engine_type. When the configuration file of engine_backend is XXX.yaml, the configuration type of engine_type needs to be set to python; when the configuration file of engine_backend is XXX_pd.yaml, the configuration of engine_type needs to be set type is inference;

The input of ASR client demo should be a WAV file(.wav), and the sample rate must be the same as the model.

Here are sample files for thisASR client demo that can be downloaded:

wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav

3. Server Usage

  • Command Line (Recommended)

    # start the service
    paddlespeech_server start --config_file ./conf/application.yaml
    

    Usage:

    paddlespeech_server start --help
    

    Arguments:

    • config_file: yaml file of the app, defalut: ./conf/application.yaml
    • log_file: log file. Default: ./log/paddlespeech.log

    Output:

    [2022-02-23 11:17:32] [INFO] [server.py:64] Started server process [6384]
    INFO:     Waiting for application startup.
    [2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
    INFO:     Application startup complete.
    [2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
    INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
    [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
    
    
  • Python API

    from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
    
    server_executor = ServerExecutor()
    server_executor(
        config_file="./conf/application.yaml", 
        log_file="./log/paddlespeech.log")
    

    Output:

    INFO:     Started server process [529]
    [2022-02-23 14:57:56] [INFO] [server.py:64] Started server process [529]
    INFO:     Waiting for application startup.
    [2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
    INFO:     Application startup complete.
    [2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
    INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
    [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
    
    

4. ASR Client Usage

Note: The response time will be slightly longer when using the client for the first time

  • Command Line (Recommended)

    paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
    

    Usage:

    paddlespeech_client asr --help
    

    Arguments:

    • server_ip: server ip. Default: 127.0.0.1
    • port: server port. Default: 8090
    • input(required): Audio file to be recognized.
    • sample_rate: Audio ampling rate, default: 16000.
    • lang: Language. Default: "zh_cn".
    • audio_format: Audio format. Default: "wav".

    Output:

    [2022-02-23 18:11:22,819] [    INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
    [2022-02-23 18:11:22,820] [    INFO] - time cost 0.689145 s.
    
    
  • Python API

    from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
    
    asrclient_executor = ASRClientExecutor()
    asrclient_executor(
        input="./zh.wav",
        server_ip="127.0.0.1",
        port=8090,
        sample_rate=16000,
        lang="zh_cn",
        audio_format="wav")
    

    Output:

    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
    time cost 0.604353 s.
    

5. TTS Client Usage

Note: The response time will be slightly longer when using the client for the first time

  • Command Line (Recommended)

    paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
    

    Usage:

    paddlespeech_client tts --help
    

    Arguments:

    • server_ip: server ip. Default: 127.0.0.1
    • port: server port. Default: 8090
    • input(required): Input text to generate.
    • spk_id: Speaker id for multi-speaker text to speech. Default: 0
    • speed: Audio speed, the value should be set between 0 and 3. Default: 1.0
    • volume: Audio volume, the value should be set between 0 and 3. Default: 1.0
    • sample_rate: Sampling rate, choice: [0, 8000, 16000], the default is the same as the model. Default: 0
    • output: Output wave filepath. Default: output.wav.

    Output:

    [2022-02-23 15:20:37,875] [    INFO] - {'description': 'success.'}
    [2022-02-23 15:20:37,875] [    INFO] - Save synthesized audio successfully on output.wav.
    [2022-02-23 15:20:37,875] [    INFO] - Audio duration: 3.612500 s.
    [2022-02-23 15:20:37,875] [    INFO] - Response time: 0.348050 s.
    
    
  • Python API

    from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor
    
    ttsclient_executor = TTSClientExecutor()
    ttsclient_executor(
        input="您好,欢迎使用百度飞桨语音合成服务。",
        server_ip="127.0.0.1",
        port=8090,
        spk_id=0,
        speed=1.0,
        volume=1.0,
        sample_rate=0,
        output="./output.wav")
    

    Output:

    {'description': 'success.'}
    Save synthesized audio successfully on ./output.wav.
    Audio duration: 3.612500 s.
    Response time: 0.388317 s.
    
    

Models supported by the service

ASR model

Get all models supported by the ASR service via paddlespeech_server stats --task asr, where static models can be used for paddle inference inference.

TTS model

Get all models supported by the TTS service via paddlespeech_server stats --task tts, where static models can be used for paddle inference inference.