21 KiB
(简体中文|English)
语音服务
介绍
这个 demo 是一个启动离线语音服务和访问服务的实现。它可以通过使用paddlespeech_server
和 paddlespeech_client
的单个命令或 python 的几行代码来实现。
使用方法
1. 安装
请看 安装文档.
推荐使用 paddlepaddle 2.2.2 或以上版本。 你可以从 medium,hard 两种方式中选择一种方式安装 PaddleSpeech。
2. 准备配置文件
配置文件可参见 conf/application.yaml
。
其中,engine_list
表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。
目前服务集成的语音任务有: asr(语音识别)、tts(语音合成)、cls(音频分类)、vector(声纹识别)以及text(文本处理)。
目前引擎类型支持两种形式:python 及 inference (Paddle Inference)
注意: 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 host
地址换成本地 ip 地址。
ASR client 的输入是一个 WAV 文件(.wav
),并且采样率必须与模型的采样率相同。
可以下载此 ASR client 的示例音频:
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
3. 服务端使用方法
-
命令行 (推荐使用)
# 启动服务 paddlespeech_server start --config_file ./conf/application.yaml
使用方法:
paddlespeech_server start --help
参数:
config_file
: 服务的配置文件,默认: ./conf/application.yamllog_file
: log 文件. 默认:./log/paddlespeech.log
输出:
[2022-02-23 11:17:32] [INFO] [server.py:64] Started server process [6384] INFO: Waiting for application startup. [2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup. INFO: Application startup complete. [2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) [2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
-
Python API
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor server_executor = ServerExecutor() server_executor( config_file="./conf/application.yaml", log_file="./log/paddlespeech.log")
输出:
INFO: Started server process [529] [2022-02-23 14:57:56] [INFO] [server.py:64] Started server process [529] INFO: Waiting for application startup. [2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup. INFO: Application startup complete. [2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) [2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
4. ASR 客户端使用方法
注意: 初次使用客户端时响应时间会略长
-
命令行 (推荐使用)
若
127.0.0.1
不能访问,则需要使用实际服务 IP 地址paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
使用帮助:
paddlespeech_client asr --help
参数:
server_ip
: 服务端 ip 地址,默认: 127.0.0.1。port
: 服务端口,默认: 8090。input
(必须输入): 用于识别的音频文件。sample_rate
: 音频采样率,默认值:16000。lang
: 模型语言,默认值:zh_cn。audio_format
: 音频格式,默认值:wav。
输出:
[2022-02-23 18:11:22,819] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}} [2022-02-23 18:11:22,820] [ INFO] - time cost 0.689145 s.
-
Python API
from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor import json asrclient_executor = ASRClientExecutor() res = asrclient_executor( input="./zh.wav", server_ip="127.0.0.1", port=8090, sample_rate=16000, lang="zh_cn", audio_format="wav") print(res.json())
输出:
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
5. TTS 客户端使用方法
注意: 初次使用客户端时响应时间会略长
-
命令行 (推荐使用)
若
127.0.0.1
不能访问,则需要使用实际服务 IP 地址paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
使用帮助:
paddlespeech_client tts --help
参数:
server_ip
: 服务端ip地址,默认: 127.0.0.1。port
: 服务端口,默认: 8090。input
(必须输入): 待合成的文本。spk_id
: 说话人 id,用于多说话人语音合成,默认值: 0。speed
: 音频速度,该值应设置在 0 到 3 之间。 默认值:1.0volume
: 音频音量,该值应设置在 0 到 3 之间。 默认值: 1.0sample_rate
: 采样率,可选 [0, 8000, 16000],默认与模型相同。 默认值:0output
: 输出音频的路径, 默认值:None,表示不保存音频到本地。
输出:
[2022-02-23 15:20:37,875] [ INFO] - {'description': 'success.'} [2022-02-23 15:20:37,875] [ INFO] - Save synthesized audio successfully on output.wav. [2022-02-23 15:20:37,875] [ INFO] - Audio duration: 3.612500 s. [2022-02-23 15:20:37,875] [ INFO] - Response time: 0.348050 s.
-
Python API
from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor import json ttsclient_executor = TTSClientExecutor() res = ttsclient_executor( input="您好,欢迎使用百度飞桨语音合成服务。", server_ip="127.0.0.1", port=8090, spk_id=0, speed=1.0, volume=1.0, sample_rate=0, output="./output.wav") response_dict = res.json() print(response_dict["message"]) print("Save synthesized audio successfully on %s." % (response_dict['result']['save_path'])) print("Audio duration: %f s." %(response_dict['result']['duration']))
输出:
{'description': 'success.'} Save synthesized audio successfully on ./output.wav. Audio duration: 3.612500 s.
6. CLS 客户端使用方法
注意: 初次使用客户端时响应时间会略长
-
命令行 (推荐使用)
若
127.0.0.1
不能访问,则需要使用实际服务 IP 地址paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
使用帮助:
paddlespeech_client cls --help
参数:
server_ip
: 服务端 ip 地址,默认: 127.0.0.1。port
: 服务端口,默认: 8090。input
(必须输入): 用于分类的音频文件。topk
: 分类结果的topk。
输出:
[2022-03-09 20:44:39,974] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}} [2022-03-09 20:44:39,975] [ INFO] - Response time 0.104360 s.
-
Python API
from paddlespeech.server.bin.paddlespeech_client import CLSClientExecutor import json clsclient_executor = CLSClientExecutor() res = clsclient_executor( input="./zh.wav", server_ip="127.0.0.1", port=8090, topk=1) print(res.json())
输出:
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
7. 声纹客户端使用方法
7.1 提取声纹特征
注意: 初次使用客户端时响应时间会略长
-
命令行 (推荐使用)
若
127.0.0.1
不能访问,则需要使用实际服务 IP 地址paddlespeech_client vector --task spk --server_ip 127.0.0.1 --port 8090 --input 85236145389.wav
使用帮助:
paddlespeech_client vector --help
参数:
- server_ip: 服务端ip地址,默认: 127.0.0.1。
- port: 服务端口,默认: 8090。
- input(必须输入): 用于识别的音频文件。
- task: vector 的任务,可选spk或者score。默认是 spk。
- enroll: 注册音频;。
- test: 测试音频。 输出:
[2022-05-25 12:25:36,165] [ INFO] - vector http client start [2022-05-25 12:25:36,165] [ INFO] - the input audio: 85236145389.wav [2022-05-25 12:25:36,165] [ INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector [2022-05-25 12:25:36,166] [ INFO] - http://127.0.0.1:8790/paddlespeech/vector [2022-05-25 12:25:36,324] [ INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}} [2022-05-25 12:25:36,324] [ INFO] - Response time 0.159053 s.
-
Python API
from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor vectorclient_executor = VectorClientExecutor() res = vectorclient_executor( input="85236145389.wav", server_ip="127.0.0.1", port=8090, task="spk") print(res)
输出:
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'vec': [-1.3251205682754517, 7.860682487487793, -4.620625972747803, 0.3000721037387848, 2.2648534774780273, -1.1931440830230713, 3.064713716506958, 7.673594951629639, -6.004472732543945, -12.024259567260742, -1.9496068954467773, 3.126953601837158, 1.6188379526138306, -7.638310432434082, -1.2299772500991821, -12.33833122253418, 2.1373026371002197, -5.395712375640869, 9.717328071594238, 5.675230503082275, 3.7805123329162598, 3.0597171783447266, 3.429692029953003, 8.9760103225708, 13.174124717712402, -0.5313228368759155, 8.942471504211426, 4.465109825134277, -4.426247596740723, -9.726503372192383, 8.399328231811523, 7.223917484283447, -7.435853958129883, 2.9441683292388916, -4.343039512634277, -13.886964797973633, -1.6346734762191772, -10.902740478515625, -5.311244964599609, 3.800722122192383, 3.897603750228882, -2.123077392578125, -2.3521194458007812, 4.151031017303467, -7.404866695404053, 0.13911646604537964, 2.4626107215881348, 4.96645450592041, 0.9897574186325073, 5.483975410461426, -3.3574001789093018, 10.13400650024414, -0.6120170950889587, -10.403095245361328, 4.600754261016846, 16.009349822998047, -7.78369140625, -4.194530487060547, -6.93686056137085, 1.1789555549621582, 11.490800857543945, 4.23802375793457, 9.550930976867676, 8.375045776367188, 7.508914470672607, -0.6570729613304138, -0.3005157709121704, 2.8406054973602295, 3.0828027725219727, 0.7308170199394226, 6.1483540534973145, 0.1376611888408661, -13.424735069274902, -7.746140480041504, -2.322798252105713, -8.305252075195312, 2.98791241645813, -10.99522876739502, 0.15211068093776703, -2.3820347785949707, -1.7984174489974976, 8.49562931060791, -5.852236747741699, -3.755497932434082, 0.6989710927009583, -5.270299434661865, -2.6188621520996094, -1.8828465938568115, -4.6466498374938965, 14.078543663024902, -0.5495333075523376, 10.579157829284668, -3.216050148010254, 9.349003791809082, -4.381077766418457, -11.675816535949707, -2.863020658493042, 4.5721755027771, 2.246612071990967, -4.574341773986816, 1.8610187768936157, 2.3767874240875244, 5.625787734985352, -9.784077644348145, 0.6496725678443909, -1.457950472831726, 0.4263263940811157, -4.921126365661621, -2.4547839164733887, 3.4869801998138428, -0.4265422224998474, 8.341268539428711, 1.356552004814148, 7.096688270568848, -13.102828979492188, 8.01673412322998, -7.115934371948242, 1.8699780702590942, 0.20872099697589874, 14.699383735656738, -1.0252779722213745, -2.6107232570648193, -2.5082311630249023, 8.427192687988281, 6.913852691650391, -6.29124641418457, 0.6157366037368774, 2.489687919616699, -3.4668266773223877, 9.92176342010498, 11.200815200805664, -0.19664029777050018, 7.491600513458252, -0.6231271624565125, -0.2584814429283142, -9.947997093200684, -0.9611040949821472, 1.1649218797683716, -2.1907122135162354, -1.502848744392395, -0.5192610621452332, 15.165953636169434, 2.4649462699890137, -0.998044490814209, 7.44166374206543, -2.0768048763275146, 3.5896823406219482, -7.305543422698975, -7.562084674835205, 4.32333517074585, 0.08044180274009705, -6.564010143280029, -2.314805269241333, -1.7642345428466797, -2.470881700515747, -7.6756181716918945, -9.548877716064453, -1.017755389213562, 0.1698644608259201, 2.5877134799957275, -1.8752295970916748, -0.36614322662353516, -6.049378395080566, -2.3965611457824707, -5.945338726043701, 0.9424033164978027, -13.155974388122559, -7.45780086517334, 0.14658108353614807, -3.7427968978881836, 5.841492652893066, -1.2872905731201172, 5.569431304931641, 12.570590019226074, 1.0939218997955322, 2.2142086029052734, 1.9181575775146484, 6.991420745849609, -5.888138771057129, 3.1409823894500732, -2.0036280155181885, 2.4434285163879395, 9.973138809204102, 5.036680221557617, 2.005120277404785, 2.861560344696045, 5.860223770141602, 2.917618751525879, -1.63111412525177, 2.0292205810546875, -4.070415019989014, -6.831437110900879]}}
7.2 音频声纹打分
注意: 初次使用客户端时响应时间会略长
-
命令行 (推荐使用)
若
127.0.0.1
不能访问,则需要使用实际服务 IP 地址paddlespeech_client vector --task score --server_ip 127.0.0.1 --port 8090 --enroll 85236145389.wav --test 123456789.wav
使用帮助:
paddlespeech_client vector --help
参数:
- server_ip: 服务端ip地址,默认: 127.0.0.1。
- port: 服务端口,默认: 8090。
- input(必须输入): 用于识别的音频文件。
- task: vector 的任务,可选spk或者score。默认是 spk。
- enroll: 注册音频;。
- test: 测试音频。
输出:
[2022-05-25 12:33:24,527] [ INFO] - vector score http client start [2022-05-25 12:33:24,527] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav [2022-05-25 12:33:24,528] [ INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score [2022-05-25 12:33:24,695] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}} [2022-05-25 12:33:24,696] [ INFO] - The vector: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}} [2022-05-25 12:33:24,696] [ INFO] - Response time 0.168271 s.
-
Python API
from paddlespeech.server.bin.paddlespeech_client import VectorClientExecutor vectorclient_executor = VectorClientExecutor() res = vectorclient_executor( input=None, enroll_audio="85236145389.wav", test_audio="123456789.wav", server_ip="127.0.0.1", port=8090, task="score") print(res)
输出:
[2022-05-25 12:30:14,143] [ INFO] - vector score http client start [2022-05-25 12:30:14,143] [ INFO] - enroll audio: 85236145389.wav, test audio: 123456789.wav [2022-05-25 12:30:14,143] [ INFO] - endpoint: http://127.0.0.1:8790/paddlespeech/vector/score [2022-05-25 12:30:14,363] [ INFO] - The vector score is: {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}} {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'score': 0.45332613587379456}}
8. 标点预测
注意: 初次使用客户端时响应时间会略长
-
命令行 (推荐使用)
若
127.0.0.1
不能访问,则需要使用实际服务 IP 地址paddlespeech_client text --server_ip 127.0.0.1 --port 8090 --input "我认为跑步最重要的就是给我带来了身体健康"
使用帮助:
paddlespeech_client text --help
参数:
server_ip
: 服务端ip地址,默认: 127.0.0.1。port
: 服务端口,默认: 8090。input
(必须输入): 用于标点预测的文本内容。
输出:
[2022-05-09 18:19:04,397] [ INFO] - The punc text: 我认为跑步最重要的就是给我带来了身体健康。 [2022-05-09 18:19:04,397] [ INFO] - Response time 0.092407 s.
-
Python API
from paddlespeech.server.bin.paddlespeech_client import TextClientExecutor textclient_executor = TextClientExecutor() res = textclient_executor( input="我认为跑步最重要的就是给我带来了身体健康", server_ip="127.0.0.1", port=8090,) print(res)
输出:
我认为跑步最重要的就是给我带来了身体健康。
服务支持的模型
ASR 支持的模型
通过 paddlespeech_server stats --task asr
获取 ASR 服务支持的所有模型,其中静态模型可用于 paddle inference 推理。
TTS 支持的模型
通过 paddlespeech_server stats --task tts
获取 TTS 服务支持的所有模型,其中静态模型可用于 paddle inference 推理。
CLS 支持的模型
通过 paddlespeech_server stats --task cls
获取 CLS 服务支持的所有模型,其中静态模型可用于 paddle inference 推理。
Vector 支持的模型
通过 paddlespeech_server stats --task vector
获取 Vector 服务支持的所有模型。
Text支持的模型
通过 paddlespeech_server stats --task text
获取 Text 服务支持的所有模型。