diff --git a/demos/asr_deployment/README.md b/demos/asr_deployment/README.md new file mode 100644 index 00000000..9d36f19f --- /dev/null +++ b/demos/asr_deployment/README.md @@ -0,0 +1,100 @@ +([简体中文](./README_cn.md)|English) +# ASR Deployment by SpeechX + +## Introduction + +ASR deployment support U2/U2++/Deepspeech2 asr model using c++, which is good practice in industry deployment. + +More info about SpeechX, please see [here](../../speechx/README.md). + +## Usage +### 1. Environment + +* python - 3.7 +* docker - `registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7` +* os - Ubuntu 16.04.7 LTS +* gcc/g++/gfortran - 8.2.0 +* cmake - 3.16.0 + +More info please see [here](../../speechx/README.md). + +### 2. Compile SpeechX + +Please see [here](../../speechx/README.md). + +### 3. Usage + +For u2++ asr deployment example, please to see [here](../../speechx/examples/u2pp_ol/wenetspeech/). + +First go to `speechx/speechx/examples/u2pp_ol/wenetspeech` dir. + +- Source path.sh + ```bash + source path.sh + ``` + +- Download Model, Prepare test data and cmvn + ```bash + run.sh --stage 0 --stop_stage 1 + ``` + +- Decode with WAV + + ```bash + # FP32 + ./local/recognizer.sh + + # INT8 + ./local/recognizer_quant.sh + ``` + + Output: + ```bash + I1026 16:13:24.683531 48038 u2_recognizer_main.cc:55] utt: BAC009S0916W0495 + I1026 16:13:24.683578 48038 u2_recognizer_main.cc:56] wav dur: 4.17119 sec. + I1026 16:13:24.683595 48038 u2_recognizer_main.cc:64] wav len (sample): 66739 + I1026 16:13:25.037652 48038 u2_recognizer_main.cc:87] Pratial result: 3 这令 + I1026 16:13:25.043697 48038 u2_recognizer_main.cc:87] Pratial result: 4 这令 + I1026 16:13:25.222124 48038 u2_recognizer_main.cc:87] Pratial result: 5 这令被贷款 + I1026 16:13:25.228385 48038 u2_recognizer_main.cc:87] Pratial result: 6 这令被贷款 + I1026 16:13:25.414669 48038 u2_recognizer_main.cc:87] Pratial result: 7 这令被贷款的员工 + I1026 16:13:25.420714 48038 u2_recognizer_main.cc:87] Pratial result: 8 这令被贷款的员工 + I1026 16:13:25.608129 48038 u2_recognizer_main.cc:87] Pratial result: 9 这令被贷款的员工们请 + I1026 16:13:25.801620 48038 u2_recognizer_main.cc:87] Pratial result: 10 这令被贷款的员工们请食难安 + I1026 16:13:25.804101 48038 feature_cache.h:44] set finished + I1026 16:13:25.804128 48038 feature_cache.h:51] compute last feats done. + I1026 16:13:25.948771 48038 u2_recognizer_main.cc:87] Pratial result: 11 这令被贷款的员工们请食难安 + I1026 16:13:26.246963 48038 u2_recognizer_main.cc:113] BAC009S0916W0495 这令被贷款的员工们请食难安 + ``` + +## Result + +> CER compute under aishell-test. +> RTF compute with feature and decoder, which is more end to end. +> Machine Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz avx512_vnni + +### FP32 + +``` +Overall -> 5.75 % N=104765 C=99035 S=5587 D=143 I=294 +Mandarin -> 5.75 % N=104762 C=99035 S=5584 D=143 I=294 +English -> 0.00 % N=0 C=0 S=0 D=0 I=0 +Other -> 100.00 % N=3 C=0 S=3 D=0 I=0 +``` + +``` +RTF is: 0.315337 +``` + +### INT8 + +``` +Overall -> 5.83 % N=104765 C=98943 S=5675 D=147 I=286 +Mandarin -> 5.83 % N=104762 C=98943 S=5672 D=147 I=286 +English -> 0.00 % N=0 C=0 S=0 D=0 I=0 +Other -> 100.00 % N=3 C=0 S=3 D=0 I=0 +``` + +``` +RTF is: 0.269674 +``` diff --git a/demos/asr_deployment/README_cn.md b/demos/asr_deployment/README_cn.md new file mode 100644 index 00000000..ee4aa848 --- /dev/null +++ b/demos/asr_deployment/README_cn.md @@ -0,0 +1,96 @@ +([简体中文](./README_cn.md)|English) +# 基于SpeechX 的 ASR 部署 + +## 简介 + +支持 U2/U2++/Deepspeech2 模型的 C++ 部署,其在工业实践中经常被用到。 + +更多 Speechx 信息可以参看[文档](../../speechx/README.md)。 + +## 使用 +### 1. 环境 + +* python - 3.7 +* docker - `registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7` +* os - Ubuntu 16.04.7 LTS +* gcc/g++/gfortran - 8.2.0 +* cmake - 3.16.0 + +更多信息可以参看[文档](../../speechx/README.md)。 + +### 2. 编译 SpeechX + +更多信息可以参看[文档](../../speechx/README.md)。 + +### 3. 例子 + +u2++ 识别部署参看[这里](../../speechx/examples/u2pp_ol/wenetspeech/)。 + +以下是在 `speechx/speechx/examples/u2pp_ol/wenetspeech`. + +- Source path.sh + ```bash + source path.sh + ``` + +- 下载模型,准备测试数据和cmvn文件 + ```bash + run.sh --stage 0 --stop_stage 1 + ``` + +- 解码 + + ```bash + # FP32 + ./local/recognizer.sh + + # INT8 + ./local/recognizer_quant.sh + ``` + + 输出: + ```bash + I1026 16:13:24.683531 48038 u2_recognizer_main.cc:55] utt: BAC009S0916W0495 + I1026 16:13:24.683578 48038 u2_recognizer_main.cc:56] wav dur: 4.17119 sec. + I1026 16:13:24.683595 48038 u2_recognizer_main.cc:64] wav len (sample): 66739 + I1026 16:13:25.037652 48038 u2_recognizer_main.cc:87] Pratial result: 3 这令 + I1026 16:13:25.043697 48038 u2_recognizer_main.cc:87] Pratial result: 4 这令 + I1026 16:13:25.222124 48038 u2_recognizer_main.cc:87] Pratial result: 5 这令被贷款 + I1026 16:13:25.228385 48038 u2_recognizer_main.cc:87] Pratial result: 6 这令被贷款 + I1026 16:13:25.414669 48038 u2_recognizer_main.cc:87] Pratial result: 7 这令被贷款的员工 + I1026 16:13:25.420714 48038 u2_recognizer_main.cc:87] Pratial result: 8 这令被贷款的员工 + I1026 16:13:25.608129 48038 u2_recognizer_main.cc:87] Pratial result: 9 这令被贷款的员工们请 + I1026 16:13:25.801620 48038 u2_recognizer_main.cc:87] Pratial result: 10 这令被贷款的员工们请食难安 + I1026 16:13:25.804101 48038 feature_cache.h:44] set finished + I1026 16:13:25.804128 48038 feature_cache.h:51] compute last feats done. + I1026 16:13:25.948771 48038 u2_recognizer_main.cc:87] Pratial result: 11 这令被贷款的员工们请食难安 + I1026 16:13:26.246963 48038 u2_recognizer_main.cc:113] BAC009S0916W0495 这令被贷款的员工们请食难安 + ``` + +## 结果 + +> CER 测试集为 aishell-test +> RTF 计算包含提特征和解码 +> 测试机器: Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz avx512_vnni + +### FP32 + +``` +Overall -> 5.75 % N=104765 C=99035 S=5587 D=143 I=294 +Mandarin -> 5.75 % N=104762 C=99035 S=5584 D=143 I=294 +English -> 0.00 % N=0 C=0 S=0 D=0 I=0 +Other -> 100.00 % N=3 C=0 S=3 D=0 I=0 +``` + +``` +RTF is: 0.315337 +``` + +### INT8 + +``` +Overall -> 5.87 % N=104765 C=98909 S=5711 D=145 I=289 +Mandarin -> 5.86 % N=104762 C=98909 S=5708 D=145 I=289 +English -> 0.00 % N=0 C=0 S=0 D=0 I=0 +Other -> 100.00 % N=3 C=0 S=3 D=0 I=0 +``` diff --git a/speechx/examples/codelab/u2/utils b/speechx/examples/codelab/u2/utils new file mode 120000 index 00000000..23cef961 --- /dev/null +++ b/speechx/examples/codelab/u2/utils @@ -0,0 +1 @@ +../../../../utils \ No newline at end of file diff --git a/speechx/examples/u2pp_ol/wenetspeech/RESULTS.md b/speechx/examples/u2pp_ol/wenetspeech/RESULTS.md index 6a8e8c46..5b33f364 100644 --- a/speechx/examples/u2pp_ol/wenetspeech/RESULTS.md +++ b/speechx/examples/u2pp_ol/wenetspeech/RESULTS.md @@ -2,9 +2,11 @@ 7176 utts, duration 36108.9 sec. -## Attention Rescore +## U2++ Attention Rescore -### u2++ FP32 +> Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz, support `avx512_vnni` +> RTF with feature and decoder which is more end to end. +### FP32 #### CER @@ -17,20 +19,29 @@ Other -> 100.00 % N=3 C=0 S=3 D=0 I=0 #### RTF -> RTF with feature and decoder which is more end to end. - -* Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz, support `avx512_vnni` - ``` I1027 10:52:38.662868 51665 u2_recognizer_main.cc:122] total wav duration is: 36108.9 sec I1027 10:52:38.662858 51665 u2_recognizer_main.cc:121] total cost:11169.1 sec I1027 10:52:38.662876 51665 u2_recognizer_main.cc:123] RTF is: 0.309318 ``` -* Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, not support `avx512_vnni` +### INT8 + +> RTF relative improve 12.8%, which count feature and decoder time. + +#### CER + +``` +Overall -> 5.83 % N=104765 C=98943 S=5675 D=147 I=286 +Mandarin -> 5.83 % N=104762 C=98943 S=5672 D=147 I=286 +English -> 0.00 % N=0 C=0 S=0 D=0 I=0 +Other -> 100.00 % N=3 C=0 S=3 D=0 I=0 +``` + +#### RTF ``` -I1026 16:13:26.247121 48038 u2_recognizer_main.cc:123] total wav duration is: 36108.9 sec -I1026 16:13:26.247130 48038 u2_recognizer_main.cc:124] total decode cost:13656.7 sec -I1026 16:13:26.247138 48038 u2_recognizer_main.cc:125] RTF is: 0.378208 +I1110 09:59:52.551712 37249 u2_recognizer_main.cc:122] total wav duration is: 36108.9 sec +I1110 09:59:52.551717 37249 u2_recognizer_main.cc:123] total decode cost:9737.63 sec +I1110 09:59:52.551723 37249 u2_recognizer_main.cc:124] RTF is: 0.269674 ```