diff --git a/README.md b/README.md
index 95429df55..0a8566940 100644
--- a/README.md
+++ b/README.md
@@ -157,14 +157,15 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
   - 🧩  *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
 
 ### Recent Update
+- 🔥 2022.11.07: [U2/U2++ C++ High Performance Streaming Asr Deployment](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/speechx/examples/u2pp_ol/wenetspeech).
 - 👑 2022.11.01: Add [Adversarial Loss](https://arxiv.org/pdf/1907.04448.pdf) for [Chinese English mixed TTS](./examples/zh_en_tts/tts3).
 - 🔥 2022.10.26: Add [Prosody Prediction](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/rhy) for TTS.
 - 🎉 2022.10.21: Add [SSML](https://github.com/PaddlePaddle/PaddleSpeech/discussions/2538) for TTS Chinese Text Frontend.
 - 👑 2022.10.11: Add [Wav2vec2ASR](./examples/librispeech/asr3), wav2vec2.0 fine-tuning for ASR on LibriSpeech.
-- 🔥 2022.09.26: Add Voice Cloning, TTS finetune, and ERNIE-SAT in [PaddleSpeech Web Demo](./demos/speech_web).
+- 🔥 2022.09.26: Add Voice Cloning, TTS finetune, and [ERNIE-SAT](https://arxiv.org/abs/2211.03545) in [PaddleSpeech Web Demo](./demos/speech_web).
 - ⚡ 2022.09.09: Add AISHELL-3 Voice Cloning [example](./examples/aishell3/vc2) with ECAPA-TDNN speaker encoder.
 - ⚡ 2022.08.25: Release TTS [finetune](./examples/other/tts_finetune/tts3) example.
-- 🔥 2022.08.22: Add ERNIE-SAT models: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat).
+- 🔥 2022.08.22: Add [ERNIE-SAT](https://arxiv.org/abs/2211.03545) models: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat).
 - 🔥 2022.08.15: Add [g2pW](https://github.com/GitYCC/g2pW) into TTS Chinese Text Frontend.
 - 🔥 2022.08.09: Release [Chinese English mixed TTS](./examples/zh_en_tts/tts3).
 - ⚡ 2022.08.03: Add ONNXRuntime infer for  TTS CLI.
@@ -579,7 +580,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
       </td>
     </tr>
     <tr>
-      <td>ERNIE-SAT</td>
+      <td><a href = "https://arxiv.org/abs/2211.03545">ERNIE-SAT</a></td>
       <td>VCTK / AISHELL-3 / ZH_EN</td>
       <td>
       <a href = "./examples/vctk/ernie_sat">ERNIE-SAT-vctk</a> / <a href = "./examples/aishell3/ernie_sat">ERNIE-SAT-aishell3</a> / <a href = "./examples/aishell3_vctk/ernie_sat">ERNIE-SAT-zh_en</a>
@@ -717,9 +718,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
   <tr>
       <td>Keyword Spotting</td>
       <td>hey-snips</td>
-      <td>PANN</td>
+      <td>MDTC</td>
       <td>
-      <a href = "./examples/hey_snips/kws0">pann-hey-snips</a>
+      <a href = "./examples/hey_snips/kws0">mdtc-hey-snips</a>
       </td>
     </tr>
   </tbody>
diff --git a/README_cn.md b/README_cn.md
index 7f2dd8203..9f33a4cb8 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -168,10 +168,10 @@
 - 🔥 2022.10.26: TTS 新增[韵律预测](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/rhy)功能。
 - 🎉 2022.10.21: TTS 中文文本前端新增 [SSML](https://github.com/PaddlePaddle/PaddleSpeech/discussions/2538) 功能。
 - 👑 2022.10.11: 新增 [Wav2vec2ASR](./examples/librispeech/asr3), 在 LibriSpeech 上针对 ASR 任务对 wav2vec2.0 的 finetuning。
-- 🔥 2022.09.26: 新增 Voice Cloning, TTS finetune 和 ERNIE-SAT 到 [PaddleSpeech 网页应用](./demos/speech_web)。
+- 🔥 2022.09.26: 新增 Voice Cloning, TTS finetune 和 [ERNIE-SAT](https://arxiv.org/abs/2211.03545) 到 [PaddleSpeech 网页应用](./demos/speech_web)。
 - ⚡ 2022.09.09: 新增基于 ECAPA-TDNN 声纹模型的 AISHELL-3 Voice Cloning [示例](./examples/aishell3/vc2)。
 - ⚡ 2022.08.25: 发布 TTS [finetune](./examples/other/tts_finetune/tts3) 示例。
-- 🔥 2022.08.22: 新增 ERNIE-SAT 模型: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat)。
+- 🔥 2022.08.22: 新增 [ERNIE-SAT](https://arxiv.org/abs/2211.03545) 模型: [ERNIE-SAT-vctk](./examples/vctk/ernie_sat)、[ERNIE-SAT-aishell3](./examples/aishell3/ernie_sat)、[ERNIE-SAT-zh_en](./examples/aishell3_vctk/ernie_sat)。
 - 🔥 2022.08.15: 将 [g2pW](https://github.com/GitYCC/g2pW) 引入 TTS 中文文本前端。
 - 🔥 2022.08.09: 发布[中英文混合 TTS](./examples/zh_en_tts/tts3)。
 - ⚡ 2022.08.03: TTS CLI 新增 ONNXRuntime 推理方式。
@@ -576,7 +576,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
       </td>
     </tr>
     <tr>
-      <td>ERNIE-SAT</td>
+      <td><a href = "https://arxiv.org/abs/2211.03545">ERNIE-SAT</a></td>
       <td>VCTK / AISHELL-3 / ZH_EN</td>
       <td>
       <a href = "./examples/vctk/ernie_sat">ERNIE-SAT-vctk</a> / <a href = "./examples/aishell3/ernie_sat">ERNIE-SAT-aishell3</a> / <a href = "./examples/aishell3_vctk/ernie_sat">ERNIE-SAT-zh_en</a>
@@ -697,9 +697,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
 </table>
 
 
-<a name="唤醒模型"></a>
+<a name="语音唤醒模型"></a>
 
-**唤醒**
+**语音唤醒**
 
 <table style="width:100%">
   <thead>
@@ -712,11 +712,11 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
   </thead>
   <tbody>
   <tr>
-      <td>唤醒</td>
+      <td>语音唤醒</td>
       <td>hey-snips</td>
-      <td>PANN</td>
+      <td>MDTC</td>
       <td>
-      <a href = "./examples/hey_snips/kws0">pann-hey-snips</a>
+      <a href = "./examples/hey_snips/kws0">mdtc-hey-snips</a>
       </td>
     </tr>
   </tbody>
diff --git a/demos/asr_deployment/README.md b/demos/asr_deployment/README.md
new file mode 100644
index 000000000..9d36f19f2
--- /dev/null
+++ b/demos/asr_deployment/README.md
@@ -0,0 +1,100 @@
+([简体中文](./README_cn.md)|English)
+# ASR Deployment by SpeechX
+
+## Introduction
+
+ASR deployment support U2/U2++/Deepspeech2 asr model using c++, which is good practice in industry deployment.
+
+More info about SpeechX, please see [here](../../speechx/README.md).
+
+## Usage
+### 1. Environment
+
+* python - 3.7
+* docker - `registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7`
+* os - Ubuntu 16.04.7 LTS
+* gcc/g++/gfortran - 8.2.0
+* cmake - 3.16.0
+
+More info please see [here](../../speechx/README.md).
+
+### 2. Compile SpeechX
+
+Please see [here](../../speechx/README.md).
+
+### 3. Usage
+
+For u2++ asr deployment example, please to see [here](../../speechx/examples/u2pp_ol/wenetspeech/).
+
+First go to `speechx/speechx/examples/u2pp_ol/wenetspeech` dir.
+
+- Source path.sh
+  ```bash
+  source path.sh
+  ```
+
+- Download Model, Prepare test data and cmvn
+  ```bash
+  run.sh --stage 0 --stop_stage 1
+  ```
+
+- Decode with WAV
+  
+  ```bash
+  # FP32
+  ./local/recognizer.sh
+
+  # INT8
+  ./local/recognizer_quant.sh
+  ```
+
+  Output:
+  ```bash
+  I1026 16:13:24.683531 48038 u2_recognizer_main.cc:55] utt: BAC009S0916W0495
+  I1026 16:13:24.683578 48038 u2_recognizer_main.cc:56] wav dur: 4.17119 sec.
+  I1026 16:13:24.683595 48038 u2_recognizer_main.cc:64] wav len (sample): 66739
+  I1026 16:13:25.037652 48038 u2_recognizer_main.cc:87] Pratial result: 3 这令
+  I1026 16:13:25.043697 48038 u2_recognizer_main.cc:87] Pratial result: 4 这令
+  I1026 16:13:25.222124 48038 u2_recognizer_main.cc:87] Pratial result: 5 这令被贷款
+  I1026 16:13:25.228385 48038 u2_recognizer_main.cc:87] Pratial result: 6 这令被贷款
+  I1026 16:13:25.414669 48038 u2_recognizer_main.cc:87] Pratial result: 7 这令被贷款的员工
+  I1026 16:13:25.420714 48038 u2_recognizer_main.cc:87] Pratial result: 8 这令被贷款的员工
+  I1026 16:13:25.608129 48038 u2_recognizer_main.cc:87] Pratial result: 9 这令被贷款的员工们请
+  I1026 16:13:25.801620 48038 u2_recognizer_main.cc:87] Pratial result: 10 这令被贷款的员工们请食难安
+  I1026 16:13:25.804101 48038 feature_cache.h:44] set finished
+  I1026 16:13:25.804128 48038 feature_cache.h:51] compute last feats done.
+  I1026 16:13:25.948771 48038 u2_recognizer_main.cc:87] Pratial result: 11 这令被贷款的员工们请食难安
+  I1026 16:13:26.246963 48038 u2_recognizer_main.cc:113] BAC009S0916W0495 这令被贷款的员工们请食难安
+  ```
+
+## Result
+
+> CER compute under aishell-test.
+> RTF compute with feature and decoder, which is more end to end.
+> Machine Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz avx512_vnni
+
+### FP32
+
+```
+Overall -> 5.75 % N=104765 C=99035 S=5587 D=143 I=294
+Mandarin -> 5.75 % N=104762 C=99035 S=5584 D=143 I=294
+English -> 0.00 % N=0 C=0 S=0 D=0 I=0
+Other -> 100.00 % N=3 C=0 S=3 D=0 I=0
+```
+
+```
+RTF is: 0.315337
+```
+
+### INT8
+
+```
+Overall -> 5.83 % N=104765 C=98943 S=5675 D=147 I=286
+Mandarin -> 5.83 % N=104762 C=98943 S=5672 D=147 I=286
+English -> 0.00 % N=0 C=0 S=0 D=0 I=0
+Other -> 100.00 % N=3 C=0 S=3 D=0 I=0
+```
+
+```
+RTF is: 0.269674
+```
diff --git a/demos/asr_deployment/README_cn.md b/demos/asr_deployment/README_cn.md
new file mode 100644
index 000000000..ee4aa8489
--- /dev/null
+++ b/demos/asr_deployment/README_cn.md
@@ -0,0 +1,96 @@
+([简体中文](./README_cn.md)|English)
+# 基于SpeechX 的 ASR 部署 
+
+## 简介
+
+支持 U2/U2++/Deepspeech2 模型的 C++ 部署，其在工业实践中经常被用到。
+
+更多 Speechx 信息可以参看[文档](../../speechx/README.md)。
+
+## 使用
+### 1. 环境
+
+* python - 3.7
+* docker - `registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7`
+* os - Ubuntu 16.04.7 LTS
+* gcc/g++/gfortran - 8.2.0
+* cmake - 3.16.0
+
+更多信息可以参看[文档](../../speechx/README.md)。
+
+### 2. 编译 SpeechX
+
+更多信息可以参看[文档](../../speechx/README.md)。
+
+### 3. 例子
+
+u2++ 识别部署参看[这里](../../speechx/examples/u2pp_ol/wenetspeech/)。
+
+以下是在 `speechx/speechx/examples/u2pp_ol/wenetspeech`.
+
+- Source path.sh
+  ```bash
+  source path.sh
+  ```
+
+- 下载模型，准备测试数据和cmvn文件
+  ```bash
+  run.sh --stage 0 --stop_stage 1
+  ```
+
+- 解码
+  
+  ```bash
+  # FP32
+  ./local/recognizer.sh
+
+  # INT8
+  ./local/recognizer_quant.sh
+  ```
+
+  输出:
+  ```bash
+  I1026 16:13:24.683531 48038 u2_recognizer_main.cc:55] utt: BAC009S0916W0495
+  I1026 16:13:24.683578 48038 u2_recognizer_main.cc:56] wav dur: 4.17119 sec.
+  I1026 16:13:24.683595 48038 u2_recognizer_main.cc:64] wav len (sample): 66739
+  I1026 16:13:25.037652 48038 u2_recognizer_main.cc:87] Pratial result: 3 这令
+  I1026 16:13:25.043697 48038 u2_recognizer_main.cc:87] Pratial result: 4 这令
+  I1026 16:13:25.222124 48038 u2_recognizer_main.cc:87] Pratial result: 5 这令被贷款
+  I1026 16:13:25.228385 48038 u2_recognizer_main.cc:87] Pratial result: 6 这令被贷款
+  I1026 16:13:25.414669 48038 u2_recognizer_main.cc:87] Pratial result: 7 这令被贷款的员工
+  I1026 16:13:25.420714 48038 u2_recognizer_main.cc:87] Pratial result: 8 这令被贷款的员工
+  I1026 16:13:25.608129 48038 u2_recognizer_main.cc:87] Pratial result: 9 这令被贷款的员工们请
+  I1026 16:13:25.801620 48038 u2_recognizer_main.cc:87] Pratial result: 10 这令被贷款的员工们请食难安
+  I1026 16:13:25.804101 48038 feature_cache.h:44] set finished
+  I1026 16:13:25.804128 48038 feature_cache.h:51] compute last feats done.
+  I1026 16:13:25.948771 48038 u2_recognizer_main.cc:87] Pratial result: 11 这令被贷款的员工们请食难安
+  I1026 16:13:26.246963 48038 u2_recognizer_main.cc:113] BAC009S0916W0495 这令被贷款的员工们请食难安
+  ```
+
+## 结果
+
+> CER 测试集为 aishell-test
+> RTF 计算包含提特征和解码
+> 测试机器： Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz avx512_vnni
+
+### FP32
+
+```
+Overall -> 5.75 % N=104765 C=99035 S=5587 D=143 I=294
+Mandarin -> 5.75 % N=104762 C=99035 S=5584 D=143 I=294
+English -> 0.00 % N=0 C=0 S=0 D=0 I=0
+Other -> 100.00 % N=3 C=0 S=3 D=0 I=0
+```
+
+```
+RTF is: 0.315337
+```
+
+### INT8
+
+```
+Overall -> 5.87 % N=104765 C=98909 S=5711 D=145 I=289
+Mandarin -> 5.86 % N=104762 C=98909 S=5708 D=145 I=289
+English -> 0.00 % N=0 C=0 S=0 D=0 I=0
+Other -> 100.00 % N=3 C=0 S=3 D=0 I=0
+```
diff --git a/docs/source/cls/custom_dataset.md b/docs/source/cls/custom_dataset.md
index e39dcf12d..b7c06cd7a 100644
--- a/docs/source/cls/custom_dataset.md
+++ b/docs/source/cls/custom_dataset.md
@@ -108,7 +108,7 @@ for epoch in range(1, epochs + 1):
         optimizer.clear_grad()
 
         # Calculate loss
-        avg_loss = loss.numpy()[0]
+        avg_loss = float(loss)
 
         # Calculate metrics
         preds = paddle.argmax(logits, axis=1)
diff --git a/docs/source/released_model.md b/docs/source/released_model.md
index 2f3c9d098..79e8f4f46 100644
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@@ -22,7 +22,7 @@ Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER |
 Model | Pre-Train Method | Pre-Train Data | Finetune Data | Size | Descriptions | CER | WER |  Example Link |
 :-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----:  | :-----:  | :-----: | 
 [Wav2vec2-large-960h-lv60-self Model](https://paddlespeech.bj.bcebos.com/wav2vec/wav2vec2-large-960h-lv60-self.pdparams) | wav2vec2 | Librispeech and LV-60k Dataset (5.3w h) | - | 1.18 GB |Pre-trained Wav2vec2.0 Model | - | - | - | 
-[Wav2vec2ASR-large-960h-librispeech Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr3/wav2vec2ASR-large-960h-librispeech_ckpt_1.3.0.model.tar.gz) | wav2vec2 | Librispeech and LV-60k Dataset (5.3w h) | Librispeech (960 h) | 1.18 GB |Encoder: Wav2vec2.0, Decoder: CTC, Decoding method: Greedy search | - | 0.0189 | [Wav2vecASR Librispeech ASR3](../../examples/librispeech/asr3) |
+[Wav2vec2ASR-large-960h-librispeech Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr3/wav2vec2ASR-large-960h-librispeech_ckpt_1.3.1.model.tar.gz) | wav2vec2 | Librispeech and LV-60k Dataset (5.3w h) | Librispeech (960 h) | 718 MB |Encoder: Wav2vec2.0, Decoder: CTC, Decoding method: Greedy search | - | 0.0189 | [Wav2vecASR Librispeech ASR3](../../examples/librispeech/asr3) |
 
 ### Language Model based on NGram
 Language Model | Training Data | Token-based | Size | Descriptions
diff --git a/docs/tutorial/cls/cls_tutorial.ipynb b/docs/tutorial/cls/cls_tutorial.ipynb
index 56b488adc..3cee64991 100644
--- a/docs/tutorial/cls/cls_tutorial.ipynb
+++ b/docs/tutorial/cls/cls_tutorial.ipynb
@@ -509,7 +509,7 @@
     "        optimizer.clear_grad()\n",
     "\n",
     "        # Calculate loss\n",
-    "        avg_loss += loss.numpy()[0]\n",
+    "        avg_loss += float(loss)\n",
     "\n",
     "        # Calculate metrics\n",
     "        preds = paddle.argmax(logits, axis=1)\n",
diff --git a/examples/aishell3/ernie_sat/README.md b/examples/aishell3/ernie_sat/README.md
index 9b7768985..bd5964c3a 100644
--- a/examples/aishell3/ernie_sat/README.md
+++ b/examples/aishell3/ernie_sat/README.md
@@ -1,5 +1,5 @@
 # ERNIE-SAT with AISHELL-3 dataset
-ERNIE-SAT speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning.
+[ERNIE-SAT](https://arxiv.org/abs/2211.03545) speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning.
 
 ## Model Framework
 In ERNIE-SAT, we propose two innovations:
diff --git a/examples/aishell3/tts3/local/export2lite.sh b/examples/aishell3/tts3/local/export2lite.sh
new file mode 120000
index 000000000..f7719914a
--- /dev/null
+++ b/examples/aishell3/tts3/local/export2lite.sh
@@ -0,0 +1 @@
+../../../csmsc/tts3/local/export2lite.sh
\ No newline at end of file
diff --git a/examples/aishell3/tts3/run.sh b/examples/aishell3/tts3/run.sh
index f730f3761..90b342125 100755
--- a/examples/aishell3/tts3/run.sh
+++ b/examples/aishell3/tts3/run.sh
@@ -58,3 +58,13 @@ fi
 if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
     ./local/ort_predict.sh ${train_output_path}
 fi
+
+if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then
+    # This model is not supported, because 3 ops are not supported on 'arm'. These unsupported ops are: 'round, set_value, share_data'.
+    # This model is not supported, because 4 ops are not supported on 'x86'. These unsupported ops are: 'matmul_v2, round, set_value, share_data'.
+    # ./local/export2lite.sh ${train_output_path} inference pdlite fastspeech2_aishell3 x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference pdlite pwgan_aishell3 x86  
+    # x86 ok, arm ok
+    ./local/export2lite.sh ${train_output_path} inference pdlite hifigan_aishell3 x86
+fi
diff --git a/examples/aishell3_vctk/ernie_sat/README.md b/examples/aishell3_vctk/ernie_sat/README.md
index 321957835..fbf9244d1 100644
--- a/examples/aishell3_vctk/ernie_sat/README.md
+++ b/examples/aishell3_vctk/ernie_sat/README.md
@@ -1,5 +1,5 @@
 # ERNIE-SAT with AISHELL-3 and VCTK dataset
-ERNIE-SAT speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning.
+[ERNIE-SAT](https://arxiv.org/abs/2211.03545) speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning.
 
 ## Model Framework
 In ERNIE-SAT, we propose two innovations:
diff --git a/examples/csmsc/tts2/local/export2lite.sh b/examples/csmsc/tts2/local/export2lite.sh
new file mode 120000
index 000000000..402fd8334
--- /dev/null
+++ b/examples/csmsc/tts2/local/export2lite.sh
@@ -0,0 +1 @@
+../../tts3/local/export2lite.sh
\ No newline at end of file
diff --git a/examples/csmsc/tts2/run.sh b/examples/csmsc/tts2/run.sh
index 557dd4ff3..75fdb2109 100755
--- a/examples/csmsc/tts2/run.sh
+++ b/examples/csmsc/tts2/run.sh
@@ -60,3 +60,16 @@ fi
 if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
     ./local/ort_predict.sh ${train_output_path}
 fi
+
+# must run after stage 3 (which stage generated static models)
+if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then
+    # This model is not supported, because 3 ops are not supported on 'arm'. These unsupported ops are: 'round, set_value, share_data'.
+    # This model is not supported, because 4 ops are not supported on 'x86'. These unsupported ops are: 'matmul_v2, round, set_value, share_data'.
+    ./local/export2lite.sh ${train_output_path} inference pdlite speedyspeech_csmsc x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference pdlite pwgan_csmsc x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference pdlite mb_melgan_csmsc x86
+    # x86 ok, arm ok
+    # ./local/export2lite.sh ${train_output_path} inference pdlite hifigan_csmsc x86
+fi
diff --git a/examples/csmsc/tts3/local/export2lite.sh b/examples/csmsc/tts3/local/export2lite.sh
new file mode 100755
index 000000000..f99905cfe
--- /dev/null
+++ b/examples/csmsc/tts3/local/export2lite.sh
@@ -0,0 +1,18 @@
+train_output_path=$1
+model_dir=$2
+output_dir=$3
+model=$4
+valid_targets=$5
+
+model_name=${model%_*}
+echo model_name: ${model_name}
+
+
+
+mkdir -p ${train_output_path}/${output_dir}
+
+paddle_lite_opt \
+    --model_file ${train_output_path}/${model_dir}/${model}.pdmodel \
+    --param_file  ${train_output_path}/${model_dir}/${model}.pdiparams \
+    --optimize_out ${train_output_path}/${output_dir}/${model}_${valid_targets} \
+    --valid_targets ${valid_targets}
diff --git a/examples/csmsc/tts3/run.sh b/examples/csmsc/tts3/run.sh
index 80acf8200..8d646ecc3 100755
--- a/examples/csmsc/tts3/run.sh
+++ b/examples/csmsc/tts3/run.sh
@@ -61,3 +61,16 @@ fi
 if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
     ./local/ort_predict.sh ${train_output_path}
 fi
+
+# must run after stage 3 (which stage generated static models)
+if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then
+    # This model is not supported, because 3 ops are not supported on 'arm'. These unsupported ops are: 'round, set_value, share_data'.
+    # This model is not supported, because 4 ops are not supported on 'x86'. These unsupported ops are: 'matmul_v2, round, set_value, share_data'.
+    ./local/export2lite.sh ${train_output_path} inference pdlite fastspeech2_csmsc x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference pdlite pwgan_csmsc x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference pdlite mb_melgan_csmsc x86
+    # x86 ok, arm ok
+    # ./local/export2lite.sh ${train_output_path} inference pdlite hifigan_csmsc x86
+fi
diff --git a/examples/csmsc/tts3/run_cnndecoder.sh b/examples/csmsc/tts3/run_cnndecoder.sh
index bae833157..645d1af09 100755
--- a/examples/csmsc/tts3/run_cnndecoder.sh
+++ b/examples/csmsc/tts3/run_cnndecoder.sh
@@ -75,7 +75,6 @@ if [ ${stage} -le 8 ] && [ ${stop_stage} -ge 8 ]; then
 fi
 
 # paddle2onnx streaming
-
 if [ ${stage} -le 9 ] && [ ${stop_stage} -ge 9 ]; then
     # install paddle2onnx
     version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}')
@@ -97,3 +96,34 @@ if [ ${stage} -le 10 ] && [ ${stop_stage} -ge 10 ]; then
     ./local/ort_predict_streaming.sh ${train_output_path}
 fi
 
+# must run after stage 3 (which stage generated static models)
+if [ ${stage} -le 11 ] && [ ${stop_stage} -ge 11 ]; then
+    # This model is not supported, because 3 ops are not supported on 'arm'. These unsupported ops are: 'round, set_value, share_data'.
+    # This model is not supported, because 4 ops are not supported on 'x86'. These unsupported ops are: 'matmul_v2, round, set_value, share_data'.
+    ./local/export2lite.sh ${train_output_path} inference pdlite fastspeech2_csmsc x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference pdlite pwgan_csmsc x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference pdlite mb_melgan_csmsc x86
+    # x86 ok, arm ok
+    # ./local/export2lite.sh ${train_output_path} inference pdlite hifigan_csmsc x86
+fi
+
+# must run after stage 5 (which stage generated static models)
+if [ ${stage} -le 12 ] && [ ${stop_stage} -ge 12 ]; then
+    # streaming acoustic model
+    # This model is not supported, because 3 ops are not supported on 'arm'. These unsupported ops are: 'round, set_value, share_data'.
+    # This model is not supported, because 4 ops are not supported on 'x86'. These unsupported ops are: 'matmul_v2, round, set_value, share_data'.
+    # ./local/export2lite.sh ${train_output_path} inference pdlite fastspeech2_csmsc x86
+    ./local/export2lite.sh ${train_output_path} inference_streaming pdlite_streaming fastspeech2_csmsc_am_encoder_infer x86
+    # x86 ok, arm Segmentation fault
+    ./local/export2lite.sh ${train_output_path} inference_streaming pdlite_streaming fastspeech2_csmsc_am_decoder x86
+    # x86 ok, arm Segmentation fault
+    ./local/export2lite.sh ${train_output_path} inference_streaming pdlite_streaming fastspeech2_csmsc_am_postnet x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference_streaming pdlite_streaming pwgan_csmsc x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference_streaming pdlite_streaming mb_melgan_csmsc x86
+    # x86 ok, arm ok
+    # ./local/export2lite.sh ${train_output_path} inference_streaming pdlite_streaming hifigan_csmsc x86
+fi
diff --git a/examples/librispeech/asr3/conf/wav2vec2ASR.yaml b/examples/librispeech/asr3/conf/wav2vec2ASR.yaml
index b19881b70..c45bd692a 100644
--- a/examples/librispeech/asr3/conf/wav2vec2ASR.yaml
+++ b/examples/librispeech/asr3/conf/wav2vec2ASR.yaml
@@ -70,7 +70,6 @@ train_manifest: data/manifest.train
 dev_manifest: data/manifest.dev
 test_manifest: data/manifest.test-clean
 
-
 ###########################################
 #              Dataloader                 #
 ###########################################
@@ -95,6 +94,12 @@ dist_sampler: True
 shortest_first: True
 return_lens_rate: True
   
+############################################
+#             Data Augmentation            #
+############################################
+audio_augment:  # for raw audio 
+  sample_rate: 16000
+  speeds: [95, 100, 105]
 
 ###########################################
 #                 Training                #
@@ -115,6 +120,3 @@ log_interval: 1
 checkpoint:
   kbest_n: 50
   latest_n: 5
-augment: True
-
-
diff --git a/examples/ljspeech/tts3/local/export2lite.sh b/examples/ljspeech/tts3/local/export2lite.sh
new file mode 120000
index 000000000..f7719914a
--- /dev/null
+++ b/examples/ljspeech/tts3/local/export2lite.sh
@@ -0,0 +1 @@
+../../../csmsc/tts3/local/export2lite.sh
\ No newline at end of file
diff --git a/examples/ljspeech/tts3/run.sh b/examples/ljspeech/tts3/run.sh
index 956185935..7ab591862 100755
--- a/examples/ljspeech/tts3/run.sh
+++ b/examples/ljspeech/tts3/run.sh
@@ -59,3 +59,14 @@ fi
 if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
     ./local/ort_predict.sh ${train_output_path}
 fi
+
+# must run after stage 3 (which stage generated static models)
+if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then
+    # This model is not supported, because 3 ops are not supported on 'arm'. These unsupported ops are: 'round, set_value, share_data'.
+    # This model is not supported, because 4 ops are not supported on 'x86'. These unsupported ops are: 'matmul_v2, round, set_value, share_data'.
+    ./local/export2lite.sh ${train_output_path} inference pdlite fastspeech2_ljspeech x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference pdlite pwgan_ljspeech x86
+    # x86 ok, arm ok
+    # ./local/export2lite.sh ${train_output_path} inference pdlite hifigan_ljspeech x86
+fi
\ No newline at end of file
diff --git a/examples/other/mfa/README.md b/examples/other/mfa/README.md
index c24524ab4..216d1275b 100644
--- a/examples/other/mfa/README.md
+++ b/examples/other/mfa/README.md
@@ -4,3 +4,6 @@ Run the following script to get started, for more detail, please see `run.sh`.
 ```bash
 ./run.sh
 ```
+# Rhythm tags for MFA
+If you want to get rhythm tags with duration through MFA tool, you may add flag `--rhy-with-duration` in the first two commands in `run.sh`
+Note that only CSMSC dataset is supported so far, and we replace `#` with `sp` in rhythm tags for MFA.
diff --git a/examples/other/mfa/local/generate_lexicon.py b/examples/other/mfa/local/generate_lexicon.py
index e9445665b..3deb24701 100644
--- a/examples/other/mfa/local/generate_lexicon.py
+++ b/examples/other/mfa/local/generate_lexicon.py
@@ -182,12 +182,17 @@ if __name__ == "__main__":
         "--with-tone", action="store_true", help="whether to consider tone.")
     parser.add_argument(
         "--with-r", action="store_true", help="whether to consider erhua.")
+    parser.add_argument(
+        "--rhy-with-duration",
+        action="store_true", )
     args = parser.parse_args()
 
     lexicon = generate_lexicon(args.with_tone, args.with_r)
     symbols = generate_symbols(lexicon)
 
     with open(args.output + ".lexicon", 'wt') as f:
+        if args.rhy_with_duration:
+            f.write("sp1 sp1\nsp2 sp2\nsp3 sp3\nsp4 sp4\n")
         for k, v in lexicon.items():
             f.write(f"{k} {v}\n")
 
diff --git a/examples/other/mfa/local/reorganize_baker.py b/examples/other/mfa/local/reorganize_baker.py
index 153e01d13..0e0035bda 100644
--- a/examples/other/mfa/local/reorganize_baker.py
+++ b/examples/other/mfa/local/reorganize_baker.py
@@ -23,6 +23,7 @@ for more details.
 """
 import argparse
 import os
+import re
 import shutil
 from concurrent.futures import ThreadPoolExecutor
 from pathlib import Path
@@ -32,6 +33,22 @@ import librosa
 import soundfile as sf
 from tqdm import tqdm
 
+repalce_dict = {
+    "；": "",
+    "。": "",
+    "：": "",
+    "—": "",
+    "）": "",
+    "，": "",
+    "“": "",
+    "（": "",
+    "、": "",
+    "…": "",
+    "！": "",
+    "？": "",
+    "”": ""
+}
+
 
 def get_transcripts(path: Union[str, Path]):
     transcripts = {}
@@ -55,9 +72,13 @@ def resample_and_save(source, target, sr=16000):
 
 def reorganize_baker(root_dir: Union[str, Path],
                      output_dir: Union[str, Path]=None,
-                     resample_audio=False):
+                     resample_audio=False,
+                     rhy_dur=False):
     root_dir = Path(root_dir).expanduser()
-    transcript_path = root_dir / "ProsodyLabeling" / "000001-010000.txt"
+    if rhy_dur:
+        transcript_path = root_dir / "ProsodyLabeling" / "000001-010000_rhy.txt"
+    else:
+        transcript_path = root_dir / "ProsodyLabeling" / "000001-010000.txt"
     transcriptions = get_transcripts(transcript_path)
 
     wave_dir = root_dir / "Wave"
@@ -92,6 +113,46 @@ def reorganize_baker(root_dir: Union[str, Path],
     print("Done!")
 
 
+def insert_rhy(sentence_first, sentence_second):
+    sub = '#'
+    return_words = []
+    sentence_first = sentence_first.translate(str.maketrans(repalce_dict))
+    rhy_idx = [substr.start() for substr in re.finditer(sub, sentence_first)]
+    re_rhy_idx = []
+    sentence_first_ = sentence_first.replace("#1", "").replace(
+        "#2", "").replace("#3", "").replace("#4", "")
+    sentence_seconds = sentence_second.split(" ")
+    for i, w in enumerate(rhy_idx):
+        re_rhy_idx.append(w - i * 2)
+    i = 0
+    # print("re_rhy_idx: ", re_rhy_idx)
+    for sentence_s in (sentence_seconds):
+        return_words.append(sentence_s)
+        if i < len(re_rhy_idx) and len(return_words) - i == re_rhy_idx[i]:
+            return_words.append("sp" + sentence_first[rhy_idx[i] + 1:rhy_idx[i]
+                                                      + 2])
+            i = i + 1
+    return return_words
+
+
+def normalize_rhy(root_dir: Union[str, Path]):
+    root_dir = Path(root_dir).expanduser()
+    transcript_path = root_dir / "ProsodyLabeling" / "000001-010000.txt"
+    target_transcript_path = root_dir / "ProsodyLabeling" / "000001-010000_rhy.txt"
+
+    with open(transcript_path) as f:
+        lines = f.readlines()
+
+    with open(target_transcript_path, 'wt') as f:
+        for i in range(0, len(lines), 2):
+            sentence_first = lines[i]  #第一行直接保存
+            f.write(sentence_first)
+            transcription = lines[i + 1].strip()
+            f.write("\t" + " ".join(
+                insert_rhy(sentence_first.split('\t')[1], transcription)) +
+                    "\n")
+
+
 if __name__ == "__main__":
     parser = argparse.ArgumentParser(
         description="Reorganize Baker dataset for MFA")
@@ -104,6 +165,12 @@ if __name__ == "__main__":
         "--resample-audio",
         action="store_true",
         help="To resample audio files or just copy them")
+    parser.add_argument(
+        "--rhy-with-duration",
+        action="store_true", )
     args = parser.parse_args()
 
-    reorganize_baker(args.root_dir, args.output_dir, args.resample_audio)
+    if args.rhy_with_duration:
+        normalize_rhy(args.root_dir)
+    reorganize_baker(args.root_dir, args.output_dir, args.resample_audio,
+                     args.rhy_with_duration)
diff --git a/examples/other/tn/data/textnorm_test_cases.txt b/examples/other/tn/data/textnorm_test_cases.txt
index e9a479b47..17e90d0b6 100644
--- a/examples/other/tn/data/textnorm_test_cases.txt
+++ b/examples/other/tn/data/textnorm_test_cases.txt
@@ -122,4 +122,6 @@ iPad Pro的秒控键盘这次也推出白色版本。|iPad Pro的秒控键盘这
 近期也一反常态地发表看空言论|近期也一反常态地发表看空言论
 985|九八五
 12~23|十二到二十三
-12-23|十二到二十三
\ No newline at end of file
+12-23|十二到二十三
+25cm²|二十五平方厘米
+25m|米
\ No newline at end of file
diff --git a/examples/other/tts_finetune/tts3/README.md b/examples/other/tts_finetune/tts3/README.md
index fa691764c..8564af5f6 100644
--- a/examples/other/tts_finetune/tts3/README.md
+++ b/examples/other/tts_finetune/tts3/README.md
@@ -55,7 +55,7 @@ If you want to finetune Chinese pretrained model, you need to prepare Chinese da
 000001|ka2 er2 pu3 pei2 wai4 sun1 wan2 hua2 ti1
 ```
 
-Here is an example of the first 200 data of csmsc.
+Here is a Chinese data example of the first 200 data of csmsc.
 
 ```bash
 mkdir -p input && cd input
@@ -69,7 +69,7 @@ If you want to finetune English pretrained model, you need to prepare English da
 LJ001-0001|Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition
 ```
 
-Here is an example of the first 200 data of ljspeech.
+Here is an English data example of the first 200 data of ljspeech.
 
 ```bash
 mkdir -p input && cd input
@@ -78,7 +78,7 @@ unzip ljspeech_mini.zip
 cd ../
 ```
 
-If you want to finetune Chinese-English Mixed pretrained model, you need to prepare Chinese data or English data. Here is an example of the first 12 data of SSB0005 (the speaker of aishell3).
+If you want to finetune Chinese-English Mixed pretrained model, you need to prepare Chinese data or English data. Here is a Chinese data example of the first 12 data of SSB0005 (the speaker of aishell3).
 
 ```bash
 mkdir -p input && cd input
diff --git a/examples/other/tts_finetune/tts3/run_mix.sh b/examples/other/tts_finetune/tts3/run_mix.sh
old mode 100644
new mode 100755
index 71008ef5b..960278a53
--- a/examples/other/tts_finetune/tts3/run_mix.sh
+++ b/examples/other/tts_finetune/tts3/run_mix.sh
@@ -108,3 +108,4 @@ if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
         --spk_id=$replace_spkid
 fi
 
+
diff --git a/examples/vctk/ernie_sat/README.md b/examples/vctk/ernie_sat/README.md
index 94c7ae25d..1808e2074 100644
--- a/examples/vctk/ernie_sat/README.md
+++ b/examples/vctk/ernie_sat/README.md
@@ -1,5 +1,5 @@
 # ERNIE-SAT with VCTK dataset
-ERNIE-SAT speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning.
+[ERNIE-SAT](https://arxiv.org/abs/2211.03545) speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning.
 
 ## Model Framework
 In ERNIE-SAT, we propose two innovations:
diff --git a/examples/vctk/tts3/local/export2lite.sh b/examples/vctk/tts3/local/export2lite.sh
new file mode 120000
index 000000000..f7719914a
--- /dev/null
+++ b/examples/vctk/tts3/local/export2lite.sh
@@ -0,0 +1 @@
+../../../csmsc/tts3/local/export2lite.sh
\ No newline at end of file
diff --git a/examples/vctk/tts3/run.sh b/examples/vctk/tts3/run.sh
index b5184aed8..16f1eae18 100755
--- a/examples/vctk/tts3/run.sh
+++ b/examples/vctk/tts3/run.sh
@@ -58,3 +58,14 @@ fi
 if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
     ./local/ort_predict.sh ${train_output_path}
 fi
+
+# must run after stage 3 (which stage generated static models)
+if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then
+    # This model is not supported, because 3 ops are not supported on 'arm'. These unsupported ops are: 'round, set_value, share_data'.
+    # This model is not supported, because 4 ops are not supported on 'x86'. These unsupported ops are: 'matmul_v2, round, set_value, share_data'.
+    ./local/export2lite.sh ${train_output_path} inference pdlite fastspeech2_vctk x86
+    # x86 ok, arm Segmentation fault
+    # ./local/export2lite.sh ${train_output_path} inference pdlite pwgan_vctk x86
+    # x86 ok, arm ok
+    # ./local/export2lite.sh ${train_output_path} inference pdlite hifigan_vctk x86
+fi
diff --git a/paddlespeech/cls/exps/panns/train.py b/paddlespeech/cls/exps/panns/train.py
index fba38a01c..133893081 100644
--- a/paddlespeech/cls/exps/panns/train.py
+++ b/paddlespeech/cls/exps/panns/train.py
@@ -101,7 +101,7 @@ if __name__ == "__main__":
             optimizer.clear_grad()
 
             # Calculate loss
-            avg_loss += loss.numpy()[0]
+            avg_loss += float(loss)
 
             # Calculate metrics
             preds = paddle.argmax(logits, axis=1)
diff --git a/paddlespeech/kws/exps/mdtc/train.py b/paddlespeech/kws/exps/mdtc/train.py
index 94e45d590..d5bb5e020 100644
--- a/paddlespeech/kws/exps/mdtc/train.py
+++ b/paddlespeech/kws/exps/mdtc/train.py
@@ -110,7 +110,7 @@ if __name__ == '__main__':
             optimizer.clear_grad()
 
             # Calculate loss
-            avg_loss += loss.numpy()[0]
+            avg_loss += float(loss)
 
             # Calculate metrics
             num_corrects += corrects
diff --git a/paddlespeech/s2t/exps/wav2vec2/model.py b/paddlespeech/s2t/exps/wav2vec2/model.py
index 933e268ed..4f6bc0c5b 100644
--- a/paddlespeech/s2t/exps/wav2vec2/model.py
+++ b/paddlespeech/s2t/exps/wav2vec2/model.py
@@ -71,7 +71,8 @@ class Wav2Vec2ASRTrainer(Trainer):
         wavs_lens_rate = wavs_lens / wav.shape[1]
         target_lens_rate = target_lens / target.shape[1]
         wav = wav[:, :, 0]
-        wav = self.speech_augmentation(wav, wavs_lens_rate)
+        if hasattr(train_conf, 'speech_augment'):
+            wav = self.speech_augmentation(wav, wavs_lens_rate)
         loss = self.model(wav, wavs_lens_rate, target, target_lens_rate)
         # loss div by `batch_size * accum_grad`
         loss /= train_conf.accum_grad
@@ -277,7 +278,9 @@ class Wav2Vec2ASRTrainer(Trainer):
         logger.info("Setup model!")
 
         # setup speech augmentation for wav2vec2
-        self.speech_augmentation = TimeDomainSpecAugment()
+        if hasattr(config, 'audio_augment') and self.train:
+            self.speech_augmentation = TimeDomainSpecAugment(
+                **config.audio_augment)
 
         if not self.train:
             return
diff --git a/paddlespeech/s2t/models/wav2vec2/processing/speech_augmentation.py b/paddlespeech/s2t/models/wav2vec2/processing/speech_augmentation.py
index 78a0782e7..ac9bf45db 100644
--- a/paddlespeech/s2t/models/wav2vec2/processing/speech_augmentation.py
+++ b/paddlespeech/s2t/models/wav2vec2/processing/speech_augmentation.py
@@ -641,14 +641,11 @@ class DropChunk(nn.Layer):
 
 class TimeDomainSpecAugment(nn.Layer):
     """A time-domain approximation of the SpecAugment algorithm.
-
     This augmentation module implements three augmentations in
     the time-domain.
-
      1. Drop chunks of the audio (zero amplitude or white noise)
      2. Drop frequency bands (with band-drop filters)
      3. Speed peturbation (via resampling to slightly different rate)
-
     Arguments
     ---------
     perturb_prob : float from 0 to 1
@@ -677,7 +674,6 @@ class TimeDomainSpecAugment(nn.Layer):
     drop_chunk_noise_factor : float
         The noise factor used to scale the white noise inserted, relative to
         the average amplitude of the utterance. Default 0 (no noise inserted).
-
     Example
     -------
     >>> inputs = paddle.randn([10, 16000])
@@ -718,7 +714,6 @@ class TimeDomainSpecAugment(nn.Layer):
 
     def forward(self, waveforms, lengths):
         """Returns the distorted waveforms.
-
         Arguments
         ---------
         waveforms : tensor
diff --git a/paddlespeech/t2s/frontend/zh_normalization/quantifier.py b/paddlespeech/t2s/frontend/zh_normalization/quantifier.py
index 268d7229b..598030e43 100644
--- a/paddlespeech/t2s/frontend/zh_normalization/quantifier.py
+++ b/paddlespeech/t2s/frontend/zh_normalization/quantifier.py
@@ -18,6 +18,25 @@ from .num import num2str
 # 温度表达式，温度会影响负号的读法
 # -3°C 零下三度
 RE_TEMPERATURE = re.compile(r'(-?)(\d+(\.\d+)?)(°C|℃|度|摄氏度)')
+measure_dict = {
+    "cm2": "平方厘米",
+    "cm²": "平方厘米",
+    "cm3": "立方厘米",
+    "cm³": "立方厘米",
+    "cm": "厘米",
+    "db": "分贝",
+    "ds": "毫秒",
+    "kg": "千克",
+    "km": "千米",
+    "m2": "平方米",
+    "m²": "平方米",
+    "m³": "立方米",
+    "m3": "立方米",
+    "ml": "毫升",
+    "m": "米",
+    "mm": "毫米",
+    "s": "秒"
+}
 
 
 def replace_temperature(match) -> str:
@@ -35,3 +54,10 @@ def replace_temperature(match) -> str:
     unit: str = "摄氏度" if unit == "摄氏度" else "度"
     result = f"{sign}{temperature}{unit}"
     return result
+
+
+def replace_measure(sentence) -> str:
+    for q_notation in measure_dict:
+        if q_notation in sentence:
+            sentence = sentence.replace(q_notation, measure_dict[q_notation])
+    return sentence
diff --git a/paddlespeech/t2s/frontend/zh_normalization/text_normlization.py b/paddlespeech/t2s/frontend/zh_normalization/text_normlization.py
index bc663c70d..8f8e3b07d 100644
--- a/paddlespeech/t2s/frontend/zh_normalization/text_normlization.py
+++ b/paddlespeech/t2s/frontend/zh_normalization/text_normlization.py
@@ -46,6 +46,7 @@ from .phonecode import RE_TELEPHONE
 from .phonecode import replace_mobile
 from .phonecode import replace_phone
 from .quantifier import RE_TEMPERATURE
+from .quantifier import replace_measure
 from .quantifier import replace_temperature
 
 
@@ -91,6 +92,7 @@ class TextNormalizer():
         sentence = RE_TIME.sub(replace_time, sentence)
 
         sentence = RE_TEMPERATURE.sub(replace_temperature, sentence)
+        sentence = replace_measure(sentence)
         sentence = RE_FRAC.sub(replace_frac, sentence)
         sentence = RE_PERCENTAGE.sub(replace_percentage, sentence)
         sentence = RE_MOBILE_PHONE.sub(replace_mobile, sentence)
diff --git a/speechx/examples/codelab/u2/utils b/speechx/examples/codelab/u2/utils
new file mode 120000
index 000000000..23cef9612
--- /dev/null
+++ b/speechx/examples/codelab/u2/utils
@@ -0,0 +1 @@
+../../../../utils
\ No newline at end of file
diff --git a/speechx/examples/u2pp_ol/wenetspeech/README.md b/speechx/examples/u2pp_ol/wenetspeech/README.md
index 9a8f8af51..b90b8e201 100644
--- a/speechx/examples/u2pp_ol/wenetspeech/README.md
+++ b/speechx/examples/u2pp_ol/wenetspeech/README.md
@@ -2,10 +2,10 @@
 
 ## Testing with Aishell Test Data
 
-## Download wav and model
+### Download wav and model
 
 ```
-run.sh --stop_stage 0
+./run.sh --stop_stage 0
 ```
 
 ### compute feature
@@ -22,7 +22,6 @@ run.sh --stop_stage 0
 
 ### decoding using wav
 
-
 ```
 ./run.sh --stage 3 --stop_stage 3
 ```
diff --git a/speechx/examples/u2pp_ol/wenetspeech/RESULTS.md b/speechx/examples/u2pp_ol/wenetspeech/RESULTS.md
index 6a8e8c46d..5b33f3641 100644
--- a/speechx/examples/u2pp_ol/wenetspeech/RESULTS.md
+++ b/speechx/examples/u2pp_ol/wenetspeech/RESULTS.md
@@ -2,9 +2,11 @@
 
 7176 utts, duration 36108.9 sec.
 
-## Attention Rescore
+## U2++ Attention Rescore
 
-### u2++ FP32
+> Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz, support `avx512_vnni`
+> RTF with feature and decoder which is more end to end.
+### FP32
 
 #### CER
 
@@ -17,20 +19,29 @@ Other -> 100.00 % N=3 C=0 S=3 D=0 I=0
 
 #### RTF 
 
-> RTF with feature and decoder which is more end to end.
-
-* Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz, support `avx512_vnni`
-
 ```
 I1027 10:52:38.662868 51665 u2_recognizer_main.cc:122] total wav duration is: 36108.9 sec
 I1027 10:52:38.662858 51665 u2_recognizer_main.cc:121] total cost:11169.1 sec
 I1027 10:52:38.662876 51665 u2_recognizer_main.cc:123] RTF is: 0.309318
 ```
 
-* Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, not support `avx512_vnni`
+### INT8
+
+> RTF relative improve 12.8%, which count feature and decoder time.
+
+#### CER
+
+```
+Overall -> 5.83 % N=104765 C=98943 S=5675 D=147 I=286
+Mandarin -> 5.83 % N=104762 C=98943 S=5672 D=147 I=286
+English -> 0.00 % N=0 C=0 S=0 D=0 I=0
+Other -> 100.00 % N=3 C=0 S=3 D=0 I=0
+```
+
+#### RTF 
 
 ```
-I1026 16:13:26.247121 48038 u2_recognizer_main.cc:123] total wav duration is: 36108.9 sec
-I1026 16:13:26.247130 48038 u2_recognizer_main.cc:124] total decode cost:13656.7 sec
-I1026 16:13:26.247138 48038 u2_recognizer_main.cc:125] RTF is: 0.378208
+I1110 09:59:52.551712 37249 u2_recognizer_main.cc:122] total wav duration is: 36108.9 sec
+I1110 09:59:52.551717 37249 u2_recognizer_main.cc:123] total decode cost:9737.63 sec
+I1110 09:59:52.551723 37249 u2_recognizer_main.cc:124] RTF is: 0.269674
 ```
diff --git a/speechx/examples/u2pp_ol/wenetspeech/local/decode.sh b/speechx/examples/u2pp_ol/wenetspeech/local/decode.sh
index e9c81009c..059ed1b36 100755
--- a/speechx/examples/u2pp_ol/wenetspeech/local/decode.sh
+++ b/speechx/examples/u2pp_ol/wenetspeech/local/decode.sh
@@ -9,8 +9,9 @@ nj=20
 mkdir -p $exp
 ckpt_dir=./data/model
 model_dir=$ckpt_dir/asr1_chunk_conformer_u2pp_wenetspeech_static_1.3.0.model/
+text=$data/test/text
 
-utils/run.pl JOB=1:$nj $data/split${nj}/JOB/decoder.fbank.wolm.log \
+utils/run.pl JOB=1:$nj $data/split${nj}/JOB/decoder.log \
 ctc_prefix_beam_search_decoder_main \
     --model_path=$model_dir/export.jit \
     --vocab_path=$model_dir/unit.txt \
@@ -20,6 +21,6 @@ ctc_prefix_beam_search_decoder_main \
     --feature_rspecifier=scp:$data/split${nj}/JOB/fbank.scp \
     --result_wspecifier=ark,t:$data/split${nj}/JOB/result_decode.ark
 
-cat $data/split${nj}/*/result_decode.ark > $exp/${label_file}
-utils/compute-wer.py --char=1 --v=1 $text $exp/${label_file} > $exp/${wer}
-tail -n 7 $exp/${wer}
\ No newline at end of file
+cat $data/split${nj}/*/result_decode.ark > $exp/aishell.decode.rsl
+utils/compute-wer.py --char=1 --v=1 $text $exp/aishell.decode.rsl > $exp/aishell.decode.err
+tail -n 7 $exp/aishell.decode.err
\ No newline at end of file
diff --git a/speechx/examples/u2pp_ol/wenetspeech/local/nnet.sh b/speechx/examples/u2pp_ol/wenetspeech/local/nnet.sh
index 5455b5c9b..f947e6b17 100755
--- a/speechx/examples/u2pp_ol/wenetspeech/local/nnet.sh
+++ b/speechx/examples/u2pp_ol/wenetspeech/local/nnet.sh
@@ -1,18 +1,21 @@
 #!/bin/bash
-set -x
 set -e
 
 . path.sh
 
+nj=20
 data=data
 exp=exp
+
 mkdir -p $exp
 ckpt_dir=./data/model
 model_dir=$ckpt_dir/asr1_chunk_conformer_u2pp_wenetspeech_static_1.3.0.model/
 
+utils/run.pl JOB=1:$nj $data/split${nj}/JOB/nnet.log \
 u2_nnet_main \
     --model_path=$model_dir/export.jit \
-    --feature_rspecifier=ark,t:$exp/fbank.ark \
+    --vocab_path=$model_dir/unit.txt \
+    --feature_rspecifier=ark,t:${data}/split${nj}/JOB/fbank.ark \
     --nnet_decoder_chunk=16 \
     --receptive_field_length=7 \
     --subsampling_rate=4 \
@@ -20,4 +23,3 @@ u2_nnet_main \
     --nnet_encoder_outs_wspecifier=ark,t:$exp/encoder_outs.ark \
     --nnet_prob_wspecifier=ark,t:$exp/logprobs.ark
 echo "u2 nnet decode."
-
diff --git a/speechx/examples/u2pp_ol/wenetspeech/run.sh b/speechx/examples/u2pp_ol/wenetspeech/run.sh
index 2bc855dec..870c5deeb 100755
--- a/speechx/examples/u2pp_ol/wenetspeech/run.sh
+++ b/speechx/examples/u2pp_ol/wenetspeech/run.sh
@@ -24,8 +24,6 @@ fi
 
 
 ckpt_dir=$data/model
-model_dir=$ckpt_dir/asr1_chunk_conformer_u2pp_wenetspeech_static_1.3.0.model/
-
 
 if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ];then
     #  download u2pp model
diff --git a/speechx/speechx/frontend/audio/data_cache.h b/speechx/speechx/frontend/audio/data_cache.h
index f538df1dd..5fe5e4fe0 100644
--- a/speechx/speechx/frontend/audio/data_cache.h
+++ b/speechx/speechx/frontend/audio/data_cache.h
@@ -32,7 +32,6 @@ class DataCache : public FrontendInterface {
     // accept waves/feats
     void Accept(const kaldi::VectorBase<kaldi::BaseFloat>& inputs) override {
         data_ = inputs;
-        SetDim(data_.Dim());
     }
 
     bool Read(kaldi::Vector<kaldi::BaseFloat>* feats) override {
@@ -41,7 +40,6 @@ class DataCache : public FrontendInterface {
         }
         (*feats) = data_;
         data_.Resize(0);
-        SetDim(data_.Dim());
         return true;
     }
 
diff --git a/speechx/speechx/nnet/decodable.cc b/speechx/speechx/nnet/decodable.cc
index 7f6859082..5fe2b9842 100644
--- a/speechx/speechx/nnet/decodable.cc
+++ b/speechx/speechx/nnet/decodable.cc
@@ -71,6 +71,7 @@ bool Decodable::AdvanceChunk() {
         VLOG(3) << "decodable exit;";
         return false;
     }
+    CHECK_GE(frontend_->Dim(), 0);
     VLOG(1) << "AdvanceChunk feat cost: " << timer.Elapsed() << " sec.";
     VLOG(2) << "Forward in " << features.Dim() / frontend_->Dim() << " feats.";
 
diff --git a/tests/test_tipc/configs/mdtc/train_infer_python.txt b/tests/test_tipc/configs/mdtc/train_infer_python.txt
index 7a5f658ee..6fb8c3484 100644
--- a/tests/test_tipc/configs/mdtc/train_infer_python.txt
+++ b/tests/test_tipc/configs/mdtc/train_infer_python.txt
@@ -49,9 +49,3 @@ null:null
 null:null
 null:null
 null:null
-===========================train_benchmark_params==========================
-batch_size:16|30
-fp_items:fp32
-iteration:50
---profiler-options:"batch_range=[10,35];state=GPU;tracer_option=Default;profile_path=model.profile"
-flags:null
diff --git a/tests/unit/tts/test_pwg.py b/tests/unit/tts/test_pwg.py
index 78cb34f25..10c82c9fd 100644
--- a/tests/unit/tts/test_pwg.py
+++ b/tests/unit/tts/test_pwg.py
@@ -13,6 +13,7 @@
 # limitations under the License.
 import paddle
 import torch
+from paddle.device.cuda import synchronize
 from parallel_wavegan.layers import residual_block
 from parallel_wavegan.layers import upsample
 from parallel_wavegan.models import parallel_wavegan as pwgan
@@ -24,7 +25,6 @@ from paddlespeech.t2s.models.parallel_wavegan import PWGGenerator
 from paddlespeech.t2s.models.parallel_wavegan import ResidualBlock
 from paddlespeech.t2s.models.parallel_wavegan import ResidualPWGDiscriminator
 from paddlespeech.t2s.utils.layer_tools import summary
-from paddlespeech.t2s.utils.profile import synchronize
 
 paddle.set_device("gpu:0")
 device = torch.device("cuda:0")