History

lym0302 59864953c7 remove wavfile, test=doc		2 years ago
..
conf	add server demo, test=doc	2 years ago
README.md	remove wavfile, test=doc	2 years ago
README_cn.md	remove wavfile, test=doc	2 years ago
asr_client.sh	remove wavfile, test=doc	2 years ago
server.sh	add server demo, test=doc	2 years ago
tts_client.sh	add server demo, test=doc	2 years ago

README.md

(简体中文|English)

Speech Server

Introduction

This demo is an implementation of starting the voice service and accessing the service. It can be achieved with a single command using paddlespeech_server and paddlespeech_client or a few lines of code in python.

Usage

1. Installation

see installation.

You can choose one way from easy, meduim and hard to install paddlespeech.

2. Prepare config File

The configuration file contains the service-related configuration files and the model configuration related to the voice tasks contained in the service. They are all under the conf folder.

The input of ASR client demo should be a WAV file(.wav), and the sample rate must be the same as the model.

Here are sample files for thisASR client demo that can be downloaded:

wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav

3. Server Usage

Command Line (Recommended)

# start the service
paddlespeech_server start --config_file ./conf/application.yaml

Usage:

paddlespeech_server start --help

Arguments:

config_file: yaml file of the app, defalut: ./conf/application.yaml
log_file: log file. Default: ./log/paddlespeech.log

Output:

[2022-02-23 11:17:32] [INFO] [server.py:64] Started server process [6384]
INFO:     Waiting for application startup.
[2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
INFO:     Application startup complete.
[2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)

Python API

from paddlespeech.server.bin.paddlespeech_server import ServerExecutor

server_executor = ServerExecutor()
server_executor(
    config_file="./conf/application.yaml", 
    log_file="./log/paddlespeech.log")

Output:

INFO:     Started server process [529]
[2022-02-23 14:57:56] [INFO] [server.py:64] Started server process [529]
INFO:     Waiting for application startup.
[2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
INFO:     Application startup complete.
[2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)

4. ASR Client Usage

Command Line (Recommended)

paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav

Usage:

paddlespeech_client asr --help

Arguments:

server_ip: server ip. Default: 127.0.0.1
port: server port. Default: 8090
input(required): Audio file to be recognized.
sample_rate: Audio ampling rate, default: 16000.
lang: Language. Default: "zh_cn".
audio_format: Audio format. Default: "wav".

Output:

[2022-02-23 18:11:22,819] [    INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
[2022-02-23 18:11:22,820] [    INFO] - time cost 0.689145 s.

Python API

from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor

asrclient_executor = ASRClientExecutor()
asrclient_executor(
    input="./zh.wav",
    server_ip="127.0.0.1",
    port=8090,
    sample_rate=16000,
    lang="zh_cn",
    audio_format="wav")

Output:

{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
time cost 0.604353 s.

5. TTS Client Usage

Command Line (Recommended)

paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好，欢迎使用百度飞桨语音合成服务。" --output output.wav

Usage:

paddlespeech_client tts --help

Arguments:

server_ip: server ip. Default: 127.0.0.1
port: server port. Default: 8090
input(required): Input text to generate.
spk_id: Speaker id for multi-speaker text to speech. Default: 0
speed: Audio speed, the value should be set between 0 and 3. Default: 1.0
volume: Audio volume, the value should be set between 0 and 3. Default: 1.0
sample_rate: Sampling rate, choice: [0, 8000, 16000], the default is the same as the model. Default: 0
output: Output wave filepath. Default: output.wav.

Output:

[2022-02-23 15:20:37,875] [    INFO] - {'description': 'success.'}
[2022-02-23 15:20:37,875] [    INFO] - Save synthesized audio successfully on output.wav.
[2022-02-23 15:20:37,875] [    INFO] - Audio duration: 3.612500 s.
[2022-02-23 15:20:37,875] [    INFO] - Response time: 0.348050 s.
[2022-02-23 15:20:37,875] [    INFO] - RTF: 0.096346

Python API

from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor

ttsclient_executor = TTSClientExecutor()
ttsclient_executor(
    input="您好，欢迎使用百度飞桨语音合成服务。",
    server_ip="127.0.0.1",
    port=8090,
    spk_id=0,
    speed=1.0,
    volume=1.0,
    sample_rate=0,
    output="./output.wav")

Output:

{'description': 'success.'}
Save synthesized audio successfully on ./output.wav.
Audio duration: 3.612500 s.
Response time: 0.388317 s.
RTF: 0.107493

Pretrained Models

ASR model

Here is a list of ASR pretrained models released by PaddleSpeech, both command line and python interfaces are available:

Model	Language	Sample Rate
conformer_wenetspeech	zh	16000
transformer_librispeech	en	16000

TTS model

Here is a list of TTS pretrained models released by PaddleSpeech, both command line and python interfaces are available:

Acoustic model

Model Language

speedyspeech_csmsc zh

fastspeech2_csmsc zh

fastspeech2_aishell3 zh

fastspeech2_ljspeech en

fastspeech2_vctk en
Vocoder

Model Language

pwgan_csmsc zh

pwgan_aishell3 zh

pwgan_ljspeech en

pwgan_vctk en

mb_melgan_csmsc zh

Model	Language
speedyspeech_csmsc	zh
fastspeech2_csmsc	zh
fastspeech2_aishell3	zh
fastspeech2_ljspeech	en
fastspeech2_vctk	en

Model	Language
pwgan_csmsc	zh
pwgan_aishell3	zh
pwgan_ljspeech	en
pwgan_vctk	en
mb_melgan_csmsc	zh

Here is a list of TTS pretrained static models released by PaddleSpeech, both command line and python interfaces are available:

Acoustic model

Model Language

speedyspeech_csmsc zh

fastspeech2_csmsc zh
Vocoder

Model Language

pwgan_csmsc zh

mb_melgan_csmsc zh

hifigan_csmsc zh