You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/demos/speech_server/README_cn.md

22 KiB

(简体中文|English)

语音服务

介绍

这个 demo 是一个启动离线语音服务和访问服务的实现。它可以通过使用 paddlespeech_serverpaddlespeech_client 的单个命令或 python 的几行代码来实现。

服务接口定义请参考:

使用方法

1. 安装

请看 安装文档.

推荐使用 paddlepaddle 2.3.1 或以上版本。

你可以从简单,中等,困难 几种方式中选择一种方式安装 PaddleSpeech。

如果使用简单模式安装,需要自行准备 yaml 文件,可参考 conf 目录下的 yaml 文件。

2. 准备配置文件

配置文件可参见 conf/application.yaml 。 其中,engine_list 表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。

目前服务集成的语音任务有: asr (语音识别)、tts (语音合成)、cls (音频分类)、vector (声纹识别)以及 text (文本处理)。

目前引擎类型支持两种形式python 及 inference (Paddle Inference) 注意: 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 host 地址换成本地 ip 地址。

ASR client 的输入是一个 WAV 文件(.wav),并且采样率必须与模型的采样率相同。

可以下载此 ASR client 的示例音频:

wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav

3. 服务端使用方法

  • 命令行 (推荐使用)

    # 启动服务
    paddlespeech_server start --config_file ./conf/application.yaml
    

    使用方法:

    paddlespeech_server start --help
    

    参数:

    • config_file: 服务的配置文件,默认: ./conf/application.yaml
    • log_file: log 文件. 默认:./log/paddlespeech.log

    输出:

    [2022-02-23 11:17:32] [INFO] [server.py:64] Started server process [6384]
    INFO:     Waiting for application startup.
    [2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
    INFO:     Application startup complete.
    [2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
    INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
    [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
    
  • Python API

    from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
    
    server_executor = ServerExecutor()
    server_executor(
        config_file="./conf/application.yaml", 
        log_file="./log/paddlespeech.log")
    

    输出:

    INFO:     Started server process [529]
    [2022-02-23 14:57:56] [INFO] [server.py:64] Started server process [529]
    INFO:     Waiting for application startup.
    [2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
    INFO:     Application startup complete.
    [2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
    INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
    [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
    

4. ASR 客户端使用方法

注意: 初次使用客户端时响应时间会略长

  • 命令行 (推荐使用)

    127.0.0.1 不能访问,则需要使用实际服务 IP 地址

    paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
    
    

    使用帮助:

    paddlespeech_client asr --help
    

    参数:

    • server_ip: 服务端 ip 地址,默认: 127.0.0.1。
    • port: 服务端口,默认: 8090。
    • input(必须输入): 用于识别的音频文件。
    • sample_rate: 音频采样率默认值16000。
    • lang: 模型语言默认值zh_cn。
    • audio_format: 音频格式默认值wav。

    输出:

    [2022-02-23 18:11:22,819] [    INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
    [2022-02-23 18:11:22,820] [    INFO] - time cost 0.689145 s.
    
  • Python API

    from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
    import json
    
    asrclient_executor = ASRClientExecutor()
    res = asrclient_executor(
        input="./zh.wav",
        server_ip="127.0.0.1",
        port=8090,
        sample_rate=16000,
        lang="zh_cn",
        audio_format="wav")
    print(res.json())
    

    输出:

    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
    

5. TTS 客户端使用方法

注意: 初次使用客户端时响应时间会略长

  • 命令行 (推荐使用)

    127.0.0.1 不能访问,则需要使用实际服务 IP 地址

    paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
    

    使用帮助:

    paddlespeech_client tts --help
    

    参数:

    • server_ip: 服务端ip地址默认: 127.0.0.1。
    • port: 服务端口,默认: 8090。
    • input(必须输入): 待合成的文本。
    • spk_id: 说话人 id用于多说话人语音合成默认值 0。
    • speed: 音频速度,该值应设置在 0 到 3 之间。 默认值1.0
    • volume: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0
    • sample_rate: 采样率,可选 [0, 8000, 16000],默认与模型相同。 默认值0
    • output: 输出音频的路径, 默认值None表示不保存音频到本地。

    输出:

    [2022-02-23 15:20:37,875] [    INFO] - {'description': 'success.'}
    [2022-02-23 15:20:37,875] [    INFO] - Save synthesized audio successfully on output.wav.
    [2022-02-23 15:20:37,875] [    INFO] - Audio duration: 3.612500 s.
    [2022-02-23 15:20:37,875] [    INFO] - Response time: 0.348050 s.
    
  • Python API

    from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor
    import json
    
    ttsclient_executor = TTSClientExecutor()
    res = ttsclient_executor(
        input="您好,欢迎使用百度飞桨语音合成服务。",
        server_ip="127.0.0.1",
        port=8090,
        spk_id=0,
        speed=1.0,
        volume=1.0,
        sample_rate=0,
        output="./output.wav")
    
    response_dict = res.json()
    print(response_dict["message"])
    print("Save synthesized audio successfully on %s." % (response_dict['result']['save_path']))
    print("Audio duration: %f s." %(response_dict['result']['duration']))
    

    输出:

    {'description': 'success.'}
    Save synthesized audio successfully on ./output.wav.
    Audio duration: 3.612500 s.
    

6. CLS 客户端使用方法

注意: 初次使用客户端时响应时间会略长

  • 命令行 (推荐使用)

    127.0.0.1 不能访问,则需要使用实际服务 IP 地址

    paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
    

    使用帮助:

    paddlespeech_client cls --help
    

    参数:

    • server_ip: 服务端 ip 地址,默认: 127.0.0.1。
    • port: 服务端口,默认: 8090。
    • input(必须输入): 用于分类的音频文件。
    • topk: 分类结果的topk。

    输出:

    [2022-03-09 20:44:39,974] [    INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
    [2022-03-09 20:44:39,975] [    INFO] - Response time 0.104360 s.
    
  • Python API

    from paddlespeech.server.bin.paddlespeech_client import CLSClientExecutor
    import json
    
    clsclient_executor = CLSClientExecutor()
    res = clsclient_executor(
        input="./zh.wav",
        server_ip="127.0.0.1",
        port=8090,
        topk=1)
    print(res.json())
    

    输出:

    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
    

7. 声纹客户端使用方法

7.1 提取声纹特征

注意: 初次使用客户端时响应时间会略长

  • 命令行 (推荐使用)

    127.0.0.1 不能访问,则需要使用实际服务 IP 地址

    paddlespeech_client vector --task spk  --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
    

    使用帮助:

    paddlespeech_client vector --help
    

    参数:

    • server_ip: 服务端ip地址默认: 127.0.0.1。
    • port: 服务端口,默认: 8090。
    • input(必须输入): 用于识别的音频文件。
    • task: vector 的任务可选spk或者score。默认是 spk。
    • enroll: 注册音频;。
    • test: 测试音频。

    输出:

    [2022-05-25 12:25:36,165] [    INFO] - vector http client start
    [2022-05-25 12:25:36,165] [    INFO] - the input audio: 85236145389.wav
    [2022-05-25 12:25:36,165] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector
    [2022-05-25 12:25:36,166] [    INFO] - http://127.0.0.1:8790/paddlespeech/vector
    [2022-05-25 12:25:36,324] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
    [2022-05-25 12:25:36,324] [    INFO] - Response time 0.159053 s.
    
  • Python API

    from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
    
    vectorclient_executor = VectorClientExecutor()
    res = vectorclient_executor(
        input="85236145389.wav",
        server_ip="127.0.0.1",
        port=8090,
        task="spk")
    print(res)
    

    输出:

    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
    

7.2 音频声纹打分

注意: 初次使用客户端时响应时间会略长

  • 命令行 (推荐使用)

    127.0.0.1 不能访问,则需要使用实际服务 IP 地址

    paddlespeech_client vector --task score  --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
    

    使用帮助:

    paddlespeech_client vector --help
    

    参数:

    • server_ip: 服务端ip地址默认: 127.0.0.1。
    • port: 服务端口,默认: 8090。
    • input(必须输入): 用于识别的音频文件。
    • task: vector 的任务可选spk或者score。默认是 spk。
    • enroll: 注册音频;。
    • test: 测试音频。

    输出:

    [2022-05-25 12:33:24,527] [    INFO] - vector score http client start
    [2022-05-25 12:33:24,527] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
    [2022-05-25 12:33:24,528] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
    [2022-05-25 12:33:24,695] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
    [2022-05-25 12:33:24,696] [    INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
    [2022-05-25 12:33:24,696] [    INFO] - Response time 0.168271 s.
    
  • Python API

    from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor
    
    vectorclient_executor = VectorClientExecutor()
    res = vectorclient_executor(
        input=None,
        enroll_audio="85236145389.wav",
        test_audio="123456789.wav",
        server_ip="127.0.0.1",
        port=8090,
        task="score")
    print(res)
    

    输出:

    [2022-05-25 12:30:14,143] [    INFO] - vector score http client start
    [2022-05-25 12:30:14,143] [    INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav
    [2022-05-25 12:30:14,143] [    INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score
    [2022-05-25 12:30:14,363] [    INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
    {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
    

8. 标点预测

注意: 初次使用客户端时响应时间会略长

  • 命令行 (推荐使用)

    127.0.0.1 不能访问,则需要使用实际服务 IP 地址

    paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康"
    

    使用帮助:

    paddlespeech_client text --help
    

    参数:

    • server_ip: 服务端ip地址默认: 127.0.0.1。
    • port: 服务端口,默认: 8090。
    • input(必须输入): 用于标点预测的文本内容。

    输出:

    [2022-05-09 18:19:04,397] [    INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。
    [2022-05-09 18:19:04,397] [    INFO] - Response time 0.092407 s.
    
  • Python API

    from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor
    
    textclient_executor = TextClientExecutor()
    res = textclient_executor(
        input="我认为跑步最重要的就是给我带来了身体健康",
        server_ip="127.0.0.1",
        port=8090,)
    print(res)
    

    输出:

    我认为跑步最重要的就是给我带来了身体健康。
    

服务支持的模型

ASR 支持的模型

通过 paddlespeech_server stats --task asr 获取 ASR 服务支持的所有模型,其中静态模型可用于 paddle inference 推理。

TTS 支持的模型

通过 paddlespeech_server stats --task tts 获取 TTS 服务支持的所有模型,其中静态模型可用于 paddle inference 推理。

CLS 支持的模型

通过 paddlespeech_server stats --task cls 获取 CLS 服务支持的所有模型,其中静态模型可用于 paddle inference 推理。

Vector 支持的模型

通过 paddlespeech_server stats --task vector 获取 Vector 服务支持的所有模型。

Text支持的模型

通过 paddlespeech_server stats --task text 获取 Text 服务支持的所有模型。