Merge branch 'PaddlePaddle:develop' into develop

3 years ago · 5c8b75ee18
parent 152ebcbce8 f5380af32e
commit 5c8b75ee18
65 changed files with 2739 additions and 187 deletions
--- a/README.md
+++ b/README.md
@ -157,6 +157,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
  - 🧩  *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

 ### Recent Update
+- 🎉 2022.11.30: Add [TTS Android Demo](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid).
 - 👑 2022.11.18: Add [Whisper CLI and Demos](https://github.com/PaddlePaddle/PaddleSpeech/pull/2640), support multi language recognition and translation.
 - 🔥 2022.11.18: Add [Wav2vec2 CLI and Demos](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_ssl), Support ASR and Feature Extraction.
 - 🎉 2022.11.17: Add [male voice for TTS](https://github.com/PaddlePaddle/PaddleSpeech/pull/2660).
--- a/README_cn.md
+++ b/README_cn.md
@ -164,7 +164,8 @@

  
 ### 近期更新
- 👑 2022.11.18: 新增 [Whisper CLI 和 Demos](https://github.com/PaddlePaddle/PaddleSpeech/pull/2640)，支持多种语言的识别与翻译。
+- 🎉 2022.11.30: 新增 [TTS Android 部署示例](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid)。
+- 👑 2022.11.18: 新增 [Whisper CLI 和 Demos](https://github.com/PaddlePaddle/PaddleSpeech/pull/2640), 支持多种语言的识别与翻译。
 - 🔥 2022.11.18: 新增 [Wav2vec2 CLI 和 Demos](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_ssl), 支持 ASR 和 特征提取.
 - 🎉 2022.11.17: TTS 新增[高质量男性音色](https://github.com/PaddlePaddle/PaddleSpeech/pull/2660)。
 - 🔥 2022.11.07: 新增 [U2/U2++ 高性能流式 ASR C++ 部署](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/speechx/examples/u2pp_ol/wenetspeech)。
--- a/demos/TTSAndroid/.gitignore
+++ b/demos/TTSAndroid/.gitignore
@ -0,0 +1,13 @@
+*.iml
+.gradle
+/local.properties
+/.idea/caches
+/.idea/libraries
+/.idea/modules.xml
+/.idea/workspace.xml
+/.idea/navEditor.xml
+/.idea/assetWizardSettings.xml
+.DS_Store
+/build
+/captures
+.externalNativeBuild
--- a/demos/TTSAndroid/README.md
+++ b/demos/TTSAndroid/README.md
@ -0,0 +1,189 @@
+# 语音合成 Java API Demo 使用指南
+
+在 Android 上实现语音合成功能，此 Demo 有很好的的易用性和开放性，如在 Demo 中跑自己训练好的模型等。
+
+本文主要介绍语音合成 Demo 运行方法。
+
+## 如何运行语音合成 Demo
+
+### 环境准备
+
+1. 在本地环境安装好 Android Studio 工具，详细安装方法请见 [Android Stuido 官网](https://developer.android.com/studio)。
+2. 准备一部 Android 手机，并开启 USB 调试模式。开启方法: `手机设置 -> 查找开发者选项 -> 打开开发者选项和 USB 调试模式`。
+
+**注意**：
+> 如果您的 Android Studio 尚未配置 NDK ，请根据 Android Studio 用户指南中的[安装及配置 NDK 和 CMake ](https://developer.android.com/studio/projects/install-ndk)内容，预先配置好 NDK 。您可以选择最新的 NDK 版本，或者使用 Paddle Lite 预测库版本一样的 NDK。
+
+### 部署步骤
+
+1. 用 Android Studio 打开 TTSAndroid 工程。
+2. 手机连接电脑，打开 USB 调试和文件传输模式，并在 Android Studio 上连接自己的手机设备（手机需要开启允许从 USB 安装软件权限）。
+
+**注意：**
+>1. 如果您在导入项目、编译或者运行过程中遇到 NDK 配置错误的提示，请打开 `File > Project Structure > SDK Location`，修改 `Andriod NDK location` 为您本机配置的 NDK 所在路径。
+>2. 如果您是通过 Andriod Studio 的 SDK Tools 下载的 NDK (见本章节"环境准备")，可以直接点击下拉框选择默认路径。
+>3. 还有一种 NDK 配置方法，你可以在 `TTSAndroid/local.properties` 文件中手动添加 NDK 路径配置 `nkd.dir=/root/android-ndk-r20b`
+>4. 如果以上步骤仍旧无法解决 NDK 配置错误，请尝试根据 Andriod Studio 官方文档中的[更新 Android Gradle 插件](https://developer.android.com/studio/releases/gradle-plugin?hl=zh-cn#updating-plugin)章节，尝试更新 Android Gradle plugin 版本。
+
+3. 点击 Run 按钮，自动编译 APP 并安装到手机。(该过程会自动下载 Paddle Lite 预测库和模型，需要联网)
+   成功后效果如下：
+    - pic 1：APP 安装到手机。
+    - pic 2：APP 打开后的效果，在下拉框中选择待合成的文本。
+    - pic 3：合成后点击按钮播放音频。
+
+<p align="center"><img width="350" height="500"  src="https://user-images.githubusercontent.com/24568452/204450217-d166588a-5341-4565-8662-0f8129284bba.png"/><img width="350" height="500" src="https://user-images.githubusercontent.com/24568452/204450231-d6f3105c-276a-4af5-a3ba-864d9f5ee24e.png"/><img width="350" height="500" src="https://user-images.githubusercontent.com/24568452/204450269-0ddf46ec-eedd-4c90-8a0d-e915622fdf3e.png"/></p>
+
+## 更新预测库
+
+* Paddle Lite
+  项目：[https://github.com/PaddlePaddle/Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)。
+
+
+参考 [Paddle Lite 源码编译文档](https://www.paddlepaddle.org.cn/lite/v2.11/source_compile/compile_env.html)，编译
+Android 预测库。
+
+* 编译最终产物位于 `build.lite.xxx.xxx.xxx` 下的 `inference_lite_lib.xxx.xxx`
+* 替换 java 库
+    * jar 包
+      将生成的 `build.lite.android.xxx.gcc/inference_lite_lib.android.xxx/java/jar/PaddlePredictor.jar`
+      替换 Demo 中的 `TTSAndroid/app/libs/PaddlePredictor.jar`。
+    * Java so
+        * arm64-v8a
+          将生成的 `build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/java/so/libpaddle_lite_jni.so`
+          库替换 Demo 中的 `TTSAndroid/app/src/main/jniLibs/arm64-v8a/libpaddle_lite_jni.so`。
+
+## Demo 内容介绍
+
+先整体介绍下目标检测 Demo 的代码结构，然后介绍 Java 各功能模块的功能。
+
+<p align="center">
+<img width="442" alt="image" src="https://user-images.githubusercontent.com/24568452/204455080-4f96fe55-6058-4235-bb92-cc98cfcc8bb6.png">
+</p>
+
+### 重点关注内容
+
+1. `Predictor.java`： 预测代码。
+
+```bash
+# 位置：
+TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/Predictor.java
+```
+
+2. `fastspeech2_csmsc_arm.nb`  和 `mb_melgan_csmsc_arm.nb`: 模型文件 (opt 工具转化后 Paddle Lite 模型)
+   ，分别来自 [fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip)
+   和 [mb_melgan_csmsc_pdlite_1.3.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_pdlite_1.3.0.zip)。
+
+```bash
+# 位置：
+TTSAndroid/app/src/main/assets/models/cpu/fastspeech2_csmsc_arm.nb
+TTSAndroid/app/src/main/assets/models/cpu/mb_melgan_csmsc_arm.nb
+```
+
+3. `libpaddle_lite_jni.so`、`PaddlePredictor.jar`：Paddle Lite Java 预测库与 jar 包。
+
+```bash
+# 位置
+TTSAndroid/app/src/main/jniLibs/arm64-v8a/libpaddle_lite_jni.so
+TTSAndroid/app/libs/PaddlePredictor.jar
+```
+
+> 如果要替换动态库 so 和 jar 文件，则将新的动态库 so 更新到 `TTSAndroid/app/src/main/jniLibs/arm64-v8a/` 目录下 新的 jar 文件更新到 `TTSAndroid/app/libs/` 目录下
+
+4. `build.gradle` : 定义编译过程的 gradle 脚本。（不用改动，定义了自动下载 Paddle Lite 预测和模型的过程）
+
+```bash
+# 位置
+TTSAndroid/app/build.gradle
+```
+
+如果需要手动更新模型和预测库，则可将 gradle 脚本中的 `download*` 接口注释即可, 将新的预测库替换至相应目录下
+
+### Java 端
+
+* 模型存放，将下载好的模型解压存放在 `app/src/assets/models` 目录下。
+* TTSAndroid Java 包在 `app/src/main/java/com/baidu/paddle/lite/demo/tts` 目录下，实现 APP 界面消息事件。
+* MainActivity 实现 APP 的创建、运行、释放功能，重点关注 `onLoadModel` 和 `onRunModel` 函数，实现 APP 界面值传递和推理处理。
+
+     ```java
+    public boolean onLoadModel() {
+        return predictor.init(MainActivity.this, modelPath, AMmodelName, VOCmodelName, cpuThreadNum,
+                cpuPowerMode);
+    }
+     
+    public boolean onRunModel() {
+        return predictor.isLoaded() && predictor.runModel(phones);
+    }
+     ```
+
+* SettingActivity 实现设置界面各个元素的更新与显示如模型地址、线程数、输入 shape 大小等，如果新增/删除界面的某个元素，均在这个类里面实现：
+    - 参数的默认值可在 `app/src/main/res/values/strings.xml` 查看
+    - 每个元素的 ID 和 value 是对应 `app/src/main/res/xml/settings.xml`
+      和 `app/src/main/res/values/string.xml` 文件中的值
+    - 这部分内容不建议修改，如果有新增属性，可以按照此格式进行添加
+
+* Predictor 使用 Java API 实现语音合成模型的预测功能，重点关注 `init`、和 `runModel` 函数，实现 Paddle Lite 端侧推理功能：
+     ```java
+     // 初始化函数，完成预测器初始化
+     public boolean init(Context appCtx, String modelPath, String AMmodelName, String VOCmodelName, int cpuThreadNum, String cpuPowerMode);
+     // 模型推理函数
+     public boolean runModel(float[] phones);
+     ```
+
+## 代码讲解 （使用 Paddle Lite `Java API` 执行预测）
+
+Android 示例基于 Java API 开发，调用 Paddle Lite `Java API` 包括以下五步。更详细的 `API`
+描述参考：[Paddle Lite Java API ](https://www.paddlepaddle.org.cn/lite/v2.11/api_reference/java_api_doc.html)。
+
+## 如何更新模型和输入
+
+### 更新模型
+
+1. 将优化后的模型存放到目录 `TTSAndroid/app/src/main/assets/models/cpu/`
+   下，可任意换成 [released_model.md](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md)
+   中的 `*_pdlite_*.zip/*_arm.nb`
+   格式的声学模型和声码器，注意更换声学模型需要对应修改 `TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/MainActivity.java`
+   中的 `sentencesToChoose` 数组。
+2. 如果模型名字跟工程中模型名字一模一样，即均是使用`fastspeech2_csmsc_arm.nb` （假设声学模型的 `phone_id_map.txt`
+   也一样）和 `mb_melgan_csmsc_arm.nb`
+   ，则代码不需更新；否则，需要修改  `TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/MainActivity.java`
+   中的 `AMmodelName` 和 `VOCmodelName`：
+
+<p align="center">
+<img src="https://user-images.githubusercontent.com/24568452/204458299-25e305a6-7cbb-4308-86ee-03f146bb938e.png">
+</p>
+
+3. 如果更新模型的输入/输出 Tensor 个数、shape 和 Dtype
+   发生更新，需要更新文件 `TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/Predictor.java`。
+
+### 更新输入
+
+**本 Demo 不包含文本前端模块**，通过下拉框选择预先设置好的文本，在代码中映射成对应的 phone_id，**如需文本前端模块请自行处理**，`phone_id_map.txt`
+请参考 [fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip)。
+
+## 通过 setting 界面更新语音合成的相关参数
+
+### setting 界面参数介绍
+
+可通过 APP 上的 Settings 按钮，实现语音合成 Demo 中参数的更新，目前支持以下参数的更新：
+参数的默认值可在 `app/src/main/res/values/strings.xml` 查看
+
+- CPU setting：
+    - power_mode 默认是 `LITE_POWER_HIGH`
+    - thread_num 默认是 1
+
+### setting 界面参数更新
+
+1. 打开 APP，点击右上角的 `:` 符合，选择 `Settings..` 选项，打开 setting 界面；
+2. 再将 setting 界面的 Enable custom settings 选中☑️，然后更新部分参数；
+3. 假设更新线程数据，将 CPU Thread Num 设置为 4，更新后，返回原界面，APP 将自动重新加载模型，在下拉框中选择文本会进行合成，合成结束后悔打印 4 线程的耗时和结果
+
+## 性能优化方法
+
+如果你觉得当前性能不符合需求，想进一步提升模型性能，可参考[性能优化文档](https://github.com/PaddlePaddle/Paddle-Lite-Demo#%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C%96)完成性能优化。
+
+## Release
+
+[2022-11-29-app-release.apk](https://paddlespeech.bj.bcebos.com/demos/TTSAndroid/2022-11-29-app-release.apk)
+
+## More
+本 Demo 合并自 [yt605155624/TTSAndroid](https://github.com/yt605155624/TTSAndroid)。
--- a/demos/TTSAndroid/app/.gitignore
+++ b/demos/TTSAndroid/app/.gitignore
@ -0,0 +1 @@
+/build
--- a/demos/TTSAndroid/app/build.gradle
+++ b/demos/TTSAndroid/app/build.gradle
@ -0,0 +1,108 @@
+import java.security.MessageDigest
+
+apply plugin: 'com.android.application'
+
+android {
+    compileSdkVersion 28
+    defaultConfig {
+        applicationId "com.baidu.paddle.lite.demo.tts"
+        minSdkVersion 15
+        targetSdkVersion 28
+        versionCode 1
+        versionName "1.0"
+        testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
+    }
+    buildTypes {
+        release {
+            minifyEnabled false
+            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
+        }
+    }
+}
+
+dependencies {
+    implementation fileTree(include: ['*.jar'], dir: 'libs')
+    implementation 'com.android.support:appcompat-v7:28.0.0'
+    implementation 'com.android.support.constraint:constraint-layout:1.1.3'
+    implementation 'com.android.support:design:28.0.0'
+    testImplementation 'junit:junit:4.12'
+    androidTestImplementation 'com.android.support.test:runner:1.0.2'
+    androidTestImplementation 'com.android.support.test.espresso:espresso-core:3.0.2'
+    implementation files('libs/PaddlePredictor.jar')
+}
+
+def paddleLiteLibs = 'https://paddlespeech.bj.bcebos.com/demos/TTSAndroid/paddle_lite_libs_68b66fd3.tar.gz'
+task downloadAndExtractPaddleLiteLibs(type: DefaultTask) {
+    doFirst {
+        println "Downloading and extracting Paddle Lite libs"
+    }
+    doLast {
+        // Prepare cache folder for libs
+        if (!file("cache").exists()) {
+            mkdir "cache"
+        }
+        // Generate cache name for libs
+        MessageDigest messageDigest = MessageDigest.getInstance('MD5')
+        messageDigest.update(paddleLiteLibs.bytes)
+        String cacheName = new BigInteger(1, messageDigest.digest()).toString(32)
+        // Download libs
+        if (!file("cache/${cacheName}.tar.gz").exists()) {
+            ant.get(src: paddleLiteLibs, dest: file("cache/${cacheName}.tar.gz"))
+        }
+        // Unpack libs
+        if (!file("cache/${cacheName}").exists()) {
+            copy {
+                from tarTree("cache/${cacheName}.tar.gz")
+                into "cache/${cacheName}"
+            }
+        }
+        // Copy PaddlePredictor.jar
+        if (!file("libs/PaddlePredictor.jar").exists()) {
+            copy {
+                from "cache/${cacheName}/java/PaddlePredictor.jar"
+                into "libs"
+            }
+        }
+        if (!file("src/main/jniLibs/arm64-v8a/libpaddle_lite_jni.so").exists()) {
+            copy {
+                from "cache/${cacheName}/java/libs/arm64-v8a/"
+                into "src/main/jniLibs/arm64-v8a"
+            }
+        }
+    }
+}
+preBuild.dependsOn downloadAndExtractPaddleLiteLibs
+
+def paddleLiteModels = [['src' : 'https://paddlespeech.bj.bcebos.com/demos/TTSAndroid/fs2cnn_mbmelgan_cpu_v1.3.0.tar.gz',
+                         'dest': 'src/main/assets/models'],]
+task downloadAndExtractPaddleLiteModels(type: DefaultTask) {
+    doFirst {
+        println "Downloading and extracting Paddle Lite models"
+    }
+    doLast {
+        // Prepare cache folder for models
+        String cachePath = "cache"
+        if (!file("${cachePath}").exists()) {
+            mkdir "${cachePath}"
+        }
+        paddleLiteModels.eachWithIndex { model, index ->
+            MessageDigest messageDigest = MessageDigest.getInstance('MD5')
+            messageDigest.update(model.src.bytes)
+            String cacheName = new BigInteger(1, messageDigest.digest()).toString(32)
+            // Download the target model if not exists
+            boolean copyFiles = !file("${model.dest}").exists()
+            if (!file("${cachePath}/${cacheName}.tar.gz").exists()) {
+                ant.get(src: model.src, dest: file("${cachePath}/${cacheName}.tar.gz"))
+                copyFiles = true // force to copy files from the latest archive files
+            }
+            // Copy model file
+            if (copyFiles) {
+                copy {
+                    from tarTree("${cachePath}/${cacheName}.tar.gz")
+                    into "${model.dest}"
+                }
+            }
+        }
+    }
+}
+preBuild.dependsOn downloadAndExtractPaddleLiteModels
--- a/demos/TTSAndroid/app/proguard-rules.pro
+++ b/demos/TTSAndroid/app/proguard-rules.pro
@ -0,0 +1,21 @@
+# Add project specific ProGuard rules here.
+# You can control the set of applied configuration files using the
+# proguardFiles setting in build.gradle.
+#
+# For more details, see
+#   http://developer.android.com/guide/developing/tools/proguard.html
+
+# If your project uses WebView with JS, uncomment the following
+# and specify the fully qualified class name to the JavaScript interface
+# class:
+#-keepclassmembers class fqcn.of.javascript.interface.for.webview {
+#   public *;
+#}
+
+# Uncomment this to preserve the line number information for
+# debugging stack traces.
+#-keepattributes SourceFile,LineNumberTable
+
+# If you keep the line number information, uncomment this to
+# hide the original source file name.
+#-renamesourcefileattribute SourceFile
--- a/demos/TTSAndroid/app/src/androidTest/java/com/baidu/paddle/lite/demo/tts/ExampleInstrumentedTest.java
+++ b/demos/TTSAndroid/app/src/androidTest/java/com/baidu/paddle/lite/demo/tts/ExampleInstrumentedTest.java
@ -0,0 +1,26 @@
+package com.baidu.paddle.lite.demo.tts;
+
+import android.content.Context;
+import android.support.test.InstrumentationRegistry;
+import android.support.test.runner.AndroidJUnit4;
+
+import org.junit.Test;
+import org.junit.runner.RunWith;
+
+import static org.junit.Assert.*;
+
+/**
+ * Instrumented test, which will execute on an Android device.
+ *
+ * @see <a href="http://d.android.com/tools/testing">Testing documentation</a>
+ */
+@RunWith(AndroidJUnit4.class)
+public class ExampleInstrumentedTest {
+    @Test
+    public void useAppContext() {
+        // Context of the app under test.
+        Context appContext = InstrumentationRegistry.getTargetContext();
+
+        assertEquals("com.baidu.paddle.lite.demo", appContext.getPackageName());
+    }
+}
--- a/demos/TTSAndroid/app/src/main/AndroidManifest.xml
+++ b/demos/TTSAndroid/app/src/main/AndroidManifest.xml
@ -0,0 +1,27 @@
+<?xml version="1.0" encoding="utf-8"?>
+<manifest xmlns:android="http://schemas.android.com/apk/res/android"
+    package="com.baidu.paddle.lite.demo.tts">
+
+    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
+    <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
+
+    <application
+        android:allowBackup="true"
+        android:icon="@drawable/logo"
+        android:label="@string/app_name"
+        android:roundIcon="@drawable/logo"
+        android:supportsRtl="true"
+        android:theme="@style/AppTheme">
+        <activity android:name="com.baidu.paddle.lite.demo.tts.MainActivity">
+            <intent-filter>
+                <action android:name="android.intent.action.MAIN" />
+
+                <category android:name="android.intent.category.LAUNCHER" />
+            </intent-filter>
+        </activity>
+        <activity
+            android:name="com.baidu.paddle.lite.demo.tts.SettingsActivity"
+            android:label="Settings"></activity>
+    </application>
+
+</manifest>
--- a/demos/TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/AppCompatPreferenceActivity.java
+++ b/demos/TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/AppCompatPreferenceActivity.java
@ -0,0 +1,122 @@
+/*
+ * Copyright (C) 2014 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.baidu.paddle.lite.demo.tts;
+
+import android.content.res.Configuration;
+import android.os.Bundle;
+import android.preference.PreferenceActivity;
+import android.support.annotation.LayoutRes;
+import android.support.v7.app.ActionBar;
+import android.support.v7.app.AppCompatDelegate;
+import android.view.MenuInflater;
+import android.view.View;
+import android.view.ViewGroup;
+
+/**
+ * A {@link android.preference.PreferenceActivity} which implements and proxies the necessary calls
+ * to be used with AppCompat.
+ * <p>
+ * This technique can be used with an {@link android.app.Activity} class, not just
+ * {@link android.preference.PreferenceActivity}.
+ */
+public abstract class AppCompatPreferenceActivity extends PreferenceActivity {
+    private AppCompatDelegate mDelegate;
+
+    @Override
+    protected void onCreate(Bundle savedInstanceState) {
+        getDelegate().installViewFactory();
+        getDelegate().onCreate(savedInstanceState);
+        super.onCreate(savedInstanceState);
+    }
+
+    @Override
+    protected void onPostCreate(Bundle savedInstanceState) {
+        super.onPostCreate(savedInstanceState);
+        getDelegate().onPostCreate(savedInstanceState);
+    }
+
+    public ActionBar getSupportActionBar() {
+        return getDelegate().getSupportActionBar();
+    }
+
+
+    @Override
+    public MenuInflater getMenuInflater() {
+        return getDelegate().getMenuInflater();
+    }
+
+    @Override
+    public void setContentView(@LayoutRes int layoutResID) {
+        getDelegate().setContentView(layoutResID);
+    }
+
+    @Override
+    public void setContentView(View view) {
+        getDelegate().setContentView(view);
+    }
+
+    @Override
+    public void setContentView(View view, ViewGroup.LayoutParams params) {
+        getDelegate().setContentView(view, params);
+    }
+
+    @Override
+    public void addContentView(View view, ViewGroup.LayoutParams params) {
+        getDelegate().addContentView(view, params);
+    }
+
+    @Override
+    protected void onPostResume() {
+        super.onPostResume();
+        getDelegate().onPostResume();
+    }
+
+    @Override
+    protected void onTitleChanged(CharSequence title, int color) {
+        super.onTitleChanged(title, color);
+        getDelegate().setTitle(title);
+    }
+
+    @Override
+    public void onConfigurationChanged(Configuration newConfig) {
+        super.onConfigurationChanged(newConfig);
+        getDelegate().onConfigurationChanged(newConfig);
+    }
+
+    @Override
+    protected void onStop() {
+        super.onStop();
+        getDelegate().onStop();
+    }
+
+    @Override
+    protected void onDestroy() {
+        super.onDestroy();
+        getDelegate().onDestroy();
+    }
+
+    public void invalidateOptionsMenu() {
+        getDelegate().invalidateOptionsMenu();
+    }
+
+    private AppCompatDelegate getDelegate() {
+        if (mDelegate == null) {
+            mDelegate = AppCompatDelegate.create(this, null);
+        }
+        return mDelegate;
+    }
+}
--- a/demos/TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/MainActivity.java
+++ b/demos/TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/MainActivity.java
@ -0,0 +1,400 @@
+package com.baidu.paddle.lite.demo.tts;
+
+import android.Manifest;
+import android.app.ProgressDialog;
+import android.content.Intent;
+import android.content.SharedPreferences;
+import android.content.pm.PackageManager;
+import android.media.MediaPlayer;
+import android.os.Bundle;
+import android.os.Environment;
+import android.os.Handler;
+import android.os.HandlerThread;
+import android.os.Message;
+import android.preference.PreferenceManager;
+import android.support.annotation.NonNull;
+import android.support.v4.app.ActivityCompat;
+import android.support.v4.content.ContextCompat;
+import android.support.v7.app.AppCompatActivity;
+import android.text.method.ScrollingMovementMethod;
+import android.util.Log;
+import android.view.Menu;
+import android.view.MenuInflater;
+import android.view.MenuItem;
+import android.view.View;
+import android.widget.AdapterView;
+import android.widget.ArrayAdapter;
+import android.widget.Button;
+import android.widget.Spinner;
+import android.widget.TextView;
+import android.widget.Toast;
+
+import java.io.File;
+import java.io.IOException;
+
+public class MainActivity extends AppCompatActivity implements View.OnClickListener, MediaPlayer.OnPreparedListener, MediaPlayer.OnErrorListener, AdapterView.OnItemSelectedListener {
+    public static final int REQUEST_LOAD_MODEL = 0;
+    public static final int REQUEST_RUN_MODEL = 1;
+    public static final int RESPONSE_LOAD_MODEL_SUCCESSED = 0;
+    public static final int RESPONSE_LOAD_MODEL_FAILED = 1;
+    public static final int RESPONSE_RUN_MODEL_SUCCESSED = 2;
+    public static final int RESPONSE_RUN_MODEL_FAILED = 3;
+    public MediaPlayer mediaPlayer = new MediaPlayer();
+    private static final String TAG = Predictor.class.getSimpleName();
+    protected ProgressDialog pbLoadModel = null;
+    protected ProgressDialog pbRunModel = null;
+    // Receive messages from worker thread
+    protected Handler receiver = null;
+    // Send command to worker thread
+    protected Handler sender = null;
+    // Worker thread to load&run model
+    protected HandlerThread worker = null;
+    // UI components of image classification
+    protected TextView tvInputSetting;
+    protected TextView tvInferenceTime;
+    protected Button btn_play;
+    protected Button btn_pause;
+    protected Button btn_stop;
+    // Model settings of image classification
+    protected String modelPath = "";
+    protected int cpuThreadNum = 1;
+    protected String cpuPowerMode = "";
+    protected Predictor predictor = new Predictor();
+    int sampleRate = 24000;
+    private final String wavName = "tts_output.wav";
+    private final String wavFile = Environment.getExternalStorageDirectory() + File.separator + wavName;
+    private final String AMmodelName = "fastspeech2_csmsc_arm.nb";
+    private final String VOCmodelName = "mb_melgan_csmsc_arm.nb";
+    private float[] phones = {};
+    private final float[][] sentencesToChoose = {
+            // 009901 昨日，这名“伤者”与医生全部被警方依法刑事拘留。
+            {261, 231, 175, 116, 179, 262, 44, 154, 126, 177, 19, 262, 42, 241, 72, 177, 56, 174, 245, 37, 186, 37, 49, 151, 127, 69, 19, 179, 72, 69, 4, 260, 126, 177, 116, 151, 239, 153, 141},
+            // 009902 钱伟长想到上海来办学校是经过深思熟虑的。
+            {174, 83, 213, 39, 20, 260, 89, 40, 30, 177, 22, 71, 9, 153, 8, 37, 17, 260, 251, 260, 99, 179, 177, 116, 151, 125, 70, 233, 177, 51, 176, 108, 177, 184, 153, 242, 40, 45},
+            // 009903 她见我一进门就骂，吃饭时也骂，骂得我抬不起头。
+            {182, 2, 151, 85, 232, 73, 151, 123, 154, 52, 151, 143, 154, 5, 179, 39, 113, 69, 17, 177, 114, 105, 154, 5, 179, 154, 5, 40, 45, 232, 182, 8, 37, 186, 174, 74, 182, 168},
+            // 009904 李述德在离开之前，只说了一句“柱驼杀父亲了”。
+            {153, 74, 177, 186, 40, 42, 261, 10, 153, 73, 152, 7, 262, 113, 174, 83, 179, 262, 115, 177, 230, 153, 45, 73, 151, 242, 180, 262, 186, 182, 231, 177, 2, 69, 186, 174, 124, 153, 45},
+            // 009905 这种车票和保险单捆绑出售属于重复性购买。
+            {262, 44, 262, 163, 39, 41, 173, 99, 71, 42, 37, 28, 260, 84, 40, 14, 179, 152, 220, 37, 21, 39, 183, 177, 170, 179, 177, 185, 240, 39, 162, 69, 186, 260, 128, 70, 170, 154, 9},
+            // 009906 戴佩妮的男友西米露接唱情歌，让她非常开心。
+            {40, 10, 173, 49, 155, 72, 40, 45, 155, 15, 142, 260, 72, 154, 74, 153, 186, 179, 151, 103, 39, 22, 174, 126, 70, 41, 179, 175, 22, 182, 2, 69, 46, 39, 20, 152, 7, 260, 120},
+            // 009907 观大势、谋大局、出大策始终是该院的办院方针。
+            {70, 199, 40, 5, 177, 116, 154, 168, 40, 5, 151, 240, 179, 39, 183, 40, 5, 38, 44, 179, 177, 115, 262, 161, 177, 116, 70, 7, 247, 40, 45, 37, 17, 247, 69, 19, 262, 51},
+            // 009908 他们骑着摩托回家，正好为农忙时的父母帮忙。
+            {182, 2, 154, 55, 174, 73, 262, 45, 154, 157, 182, 230, 71, 212, 151, 77, 180, 262, 59, 71, 29, 214, 155, 162, 154, 20, 177, 114, 40, 45, 69, 186, 154, 185, 37, 19, 154, 20},
+            // 009909 但是因为还没到退休年龄，只能掰着指头捱日子。
+            {40, 17, 177, 116, 120, 214, 71, 8, 154, 47, 40, 30, 182, 214, 260, 140, 155, 83, 153, 126, 180, 262, 115, 155, 57, 37, 7, 262, 45, 262, 115, 182, 171, 8, 175, 116, 261, 112},
+            // 009910 这几天雨水不断，人们恨不得待在家里不出门。
+            {262, 44, 151, 74, 182, 82, 240, 177, 213, 37, 184, 40, 202, 180, 175, 52, 154, 55, 71, 54, 37, 186, 40, 42, 40, 7, 261, 10, 151, 77, 153, 74, 37, 186, 39, 183, 154, 52}
+
+    };
+
+    @Override
+    public void onClick(View v) {
+        switch (v.getId()) {
+            case R.id.btn_play:
+                if (!mediaPlayer.isPlaying()) {
+                    mediaPlayer.start();
+                }
+                break;
+            case R.id.btn_pause:
+                if (mediaPlayer.isPlaying()) {
+                    mediaPlayer.pause();
+                }
+                break;
+            case R.id.btn_stop:
+                if (mediaPlayer.isPlaying()) {
+                    mediaPlayer.reset();
+                    initMediaPlayer();
+                }
+                break;
+            default:
+                break;
+        }
+    }
+
+    private void initMediaPlayer() {
+        try {
+            File file = new File(wavFile);
+            // 指定音频文件的路径
+            mediaPlayer.setDataSource(file.getPath());
+            // 让 MediaPlayer 进入到准备状态
+            mediaPlayer.prepare();
+            // 该方法使得进入应用时就播放音频
+            // mediaPlayer.setOnPreparedListener(this);
+            // prepare async to not block main thread
+            mediaPlayer.prepareAsync();
+        } catch (Exception e) {
+            e.printStackTrace();
+        }
+    }
+
+    @Override
+    public void onPrepared(MediaPlayer player) {
+        player.start();
+    }
+
+    @Override
+    public boolean onError(MediaPlayer mp, int what, int extra) {
+        // The MediaPlayer has moved to the Error state, must be reset!
+        mediaPlayer.reset();
+        initMediaPlayer();
+        return true;
+    }
+
+    @Override
+    protected void onCreate(Bundle savedInstanceState) {
+        requestAllPermissions();
+        super.onCreate(savedInstanceState);
+        setContentView(R.layout.activity_main);
+
+        // 初始化控件
+        Spinner spinner = findViewById(R.id.spinner1);
+        // 建立数据源
+        String[] sentences = getResources().getStringArray(R.array.text);
+        // 建立 Adapter 并且绑定数据源
+        ArrayAdapter<String> adapter = new ArrayAdapter<String>(this, android.R.layout.simple_spinner_dropdown_item, sentences);
+        // 第一个参数表示在哪个 Activity 上显示，第二个参数是系统下拉框的样式，第三个参数是数组。
+        spinner.setAdapter(adapter);//绑定Adapter到控件
+        spinner.setOnItemSelectedListener(this);
+
+        btn_play = findViewById(R.id.btn_play);
+        btn_pause = findViewById(R.id.btn_pause);
+        btn_stop = findViewById(R.id.btn_stop);
+
+        btn_play.setOnClickListener(this);
+        btn_pause.setOnClickListener(this);
+        btn_stop.setOnClickListener(this);
+
+        btn_play.setVisibility(View.INVISIBLE);
+        btn_pause.setVisibility(View.INVISIBLE);
+        btn_stop.setVisibility(View.INVISIBLE);
+
+
+        // Clear all setting items to avoid app crashing due to the incorrect settings
+        SharedPreferences sharedPreferences = PreferenceManager.getDefaultSharedPreferences(this);
+        SharedPreferences.Editor editor = sharedPreferences.edit();
+        editor.clear();
+        editor.commit();
+
+        // Prepare the worker thread for mode loading and inference
+        receiver = new Handler() {
+            @Override
+            public void handleMessage(Message msg) {
+                switch (msg.what) {
+                    case RESPONSE_LOAD_MODEL_SUCCESSED:
+                        pbLoadModel.dismiss();
+                        onLoadModelSuccessed();
+                        break;
+                    case RESPONSE_LOAD_MODEL_FAILED:
+                        pbLoadModel.dismiss();
+                        Toast.makeText(MainActivity.this, "Load model failed!", Toast.LENGTH_SHORT).show();
+                        onLoadModelFailed();
+                        break;
+                    case RESPONSE_RUN_MODEL_SUCCESSED:
+                        pbRunModel.dismiss();
+                        onRunModelSuccessed();
+                        break;
+                    case RESPONSE_RUN_MODEL_FAILED:
+                        pbRunModel.dismiss();
+                        Toast.makeText(MainActivity.this, "Run model failed!", Toast.LENGTH_SHORT).show();
+                        onRunModelFailed();
+                        break;
+                    default:
+                        break;
+                }
+            }
+        };
+
+        worker = new HandlerThread("Predictor Worker");
+        worker.start();
+        sender = new Handler(worker.getLooper()) {
+            public void handleMessage(Message msg) {
+                switch (msg.what) {
+                    case REQUEST_LOAD_MODEL:
+                        // Load model and reload test image
+                        if (onLoadModel()) {
+                            receiver.sendEmptyMessage(RESPONSE_LOAD_MODEL_SUCCESSED);
+                        } else {
+                            receiver.sendEmptyMessage(RESPONSE_LOAD_MODEL_FAILED);
+                        }
+                        break;
+                    case REQUEST_RUN_MODEL:
+                        // Run model if model is loaded
+                        if (onRunModel()) {
+                            receiver.sendEmptyMessage(RESPONSE_RUN_MODEL_SUCCESSED);
+                        } else {
+                            receiver.sendEmptyMessage(RESPONSE_RUN_MODEL_FAILED);
+                        }
+                        break;
+                    default:
+                        break;
+                }
+            }
+        };
+
+        // Setup the UI components
+        tvInputSetting = findViewById(R.id.tv_input_setting);
+        tvInferenceTime = findViewById(R.id.tv_inference_time);
+        tvInputSetting.setMovementMethod(ScrollingMovementMethod.getInstance());
+    }
+
+    @Override
+    protected void onResume() {
+        super.onResume();
+        boolean settingsChanged = false;
+        SharedPreferences sharedPreferences = PreferenceManager.getDefaultSharedPreferences(this);
+        String model_path = sharedPreferences.getString(getString(R.string.MODEL_PATH_KEY),
+                getString(R.string.MODEL_PATH_DEFAULT));
+
+        settingsChanged |= !model_path.equalsIgnoreCase(modelPath);
+
+        int cpu_thread_num = Integer.parseInt(sharedPreferences.getString(getString(R.string.CPU_THREAD_NUM_KEY),
+                getString(R.string.CPU_THREAD_NUM_DEFAULT)));
+        settingsChanged |= cpu_thread_num != cpuThreadNum;
+        String cpu_power_mode =
+                sharedPreferences.getString(getString(R.string.CPU_POWER_MODE_KEY),
+                        getString(R.string.CPU_POWER_MODE_DEFAULT));
+        settingsChanged |= !cpu_power_mode.equalsIgnoreCase(cpuPowerMode);
+
+        if (settingsChanged) {
+            modelPath = model_path;
+            cpuThreadNum = cpu_thread_num;
+            cpuPowerMode = cpu_power_mode;
+            // Update UI
+            tvInputSetting.setText("Model: " + modelPath.substring(modelPath.lastIndexOf("/") + 1) + "\n" + "CPU" +
+                    " Thread Num: " + cpuThreadNum + "\n" + "CPU Power Mode: " + cpuPowerMode + "\n");
+            tvInputSetting.scrollTo(0, 0);
+            // Reload model if configure has been changed
+            loadModel();
+        }
+    }
+
+    public void loadModel() {
+        pbLoadModel = ProgressDialog.show(this, "", "Loading model...", false, false);
+        sender.sendEmptyMessage(REQUEST_LOAD_MODEL);
+    }
+
+    public void runModel() {
+        pbRunModel = ProgressDialog.show(this, "", "Running model...", false, false);
+        sender.sendEmptyMessage(REQUEST_RUN_MODEL);
+    }
+
+    public boolean onLoadModel() {
+        return predictor.init(MainActivity.this, modelPath, AMmodelName, VOCmodelName, cpuThreadNum,
+                cpuPowerMode);
+    }
+
+    public boolean onRunModel() {
+        return predictor.isLoaded() && predictor.runModel(phones);
+    }
+
+    public boolean onLoadModelSuccessed() {
+        // Load test image from path and run model
+//        runModel();
+        return true;
+    }
+
+    public void onLoadModelFailed() {
+    }
+
+    public void onRunModelSuccessed() {
+        // Obtain results and update UI
+        btn_play.setVisibility(View.VISIBLE);
+        btn_pause.setVisibility(View.VISIBLE);
+        btn_stop.setVisibility(View.VISIBLE);
+        tvInferenceTime.setText("Inference done！\nInference time: " + predictor.inferenceTime() + " ms"
+                + "\nRTF: " + predictor.inferenceTime() * sampleRate / (predictor.wav.length * 1000) + "\nAudio saved in " + wavFile);
+        try {
+            Utils.rawToWave(wavFile, predictor.wav, sampleRate);
+        } catch (IOException e) {
+            e.printStackTrace();
+        }
+        if (ContextCompat.checkSelfPermission(MainActivity.this,
+                Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {
+            ActivityCompat.requestPermissions(MainActivity.this, new String[]{Manifest.permission.WRITE_EXTERNAL_STORAGE}, 1);
+        } else {
+            // 初始化 MediaPlayer
+            initMediaPlayer();
+        }
+    }
+
+    public void onRunModelFailed() {
+    }
+
+
+    public void onSettingsClicked() {
+        startActivity(new Intent(MainActivity.this, SettingsActivity.class));
+    }
+
+    @Override
+    public boolean onCreateOptionsMenu(Menu menu) {
+        MenuInflater inflater = getMenuInflater();
+        inflater.inflate(R.menu.menu_action_options, menu);
+        return true;
+    }
+
+    @Override
+    public boolean onOptionsItemSelected(MenuItem item) {
+        switch (item.getItemId()) {
+            case android.R.id.home:
+                finish();
+                break;
+            case R.id.settings:
+                onSettingsClicked();
+        }
+        return super.onOptionsItemSelected(item);
+    }
+
+    @Override
+    public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions,
+                                           @NonNull int[] grantResults) {
+
+        super.onRequestPermissionsResult(requestCode, permissions, grantResults);
+        if (grantResults[0] != PackageManager.PERMISSION_GRANTED) {
+            Toast.makeText(this, "Permission Denied", Toast.LENGTH_SHORT).show();
+        }
+    }
+
+
+    @Override
+    protected void onDestroy() {
+        if (predictor != null) {
+            predictor.releaseModel();
+        }
+        worker.quit();
+        super.onDestroy();
+        if (mediaPlayer != null) {
+            mediaPlayer.stop();
+            mediaPlayer.release();
+        }
+    }
+
+    private boolean requestAllPermissions() {
+        if (ContextCompat.checkSelfPermission(this, Manifest.permission.WRITE_EXTERNAL_STORAGE)
+                != PackageManager.PERMISSION_GRANTED || ContextCompat.checkSelfPermission(this,
+                Manifest.permission.CAMERA)
+                != PackageManager.PERMISSION_GRANTED) {
+            ActivityCompat.requestPermissions(this, new String[]{Manifest.permission.WRITE_EXTERNAL_STORAGE},
+                    0);
+            return false;
+        }
+        return true;
+    }
+
+
+    @Override
+    public void onItemSelected(AdapterView<?> parent, View view, int position, long id) {
+        if (position > 0) {
+            phones = sentencesToChoose[position - 1];
+            runModel();
+        }
+
+    }
+
+    @Override
+    public void onNothingSelected(AdapterView<?> parent) {
+
+    }
+}
--- a/demos/TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/Predictor.java
+++ b/demos/TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/Predictor.java
@ -0,0 +1,149 @@
+package com.baidu.paddle.lite.demo.tts;
+
+import android.content.Context;
+import android.util.Log;
+
+import com.baidu.paddle.lite.MobileConfig;
+import com.baidu.paddle.lite.PaddlePredictor;
+import com.baidu.paddle.lite.PowerMode;
+import com.baidu.paddle.lite.Tensor;
+
+import java.io.File;
+import java.util.Date;
+
+
+public class Predictor {
+    private static final String TAG = Predictor.class.getSimpleName();
+    public boolean isLoaded = false;
+    public int cpuThreadNum = 1;
+    public String cpuPowerMode = "LITE_POWER_HIGH";
+    public String modelPath = "";
+    protected PaddlePredictor AMPredictor = null;
+    protected PaddlePredictor VOCPredictor = null;
+    protected float inferenceTime = 0;
+    protected float[] wav;
+
+    public boolean init(Context appCtx, String modelPath, String AMmodelName, String VOCmodelName, int cpuThreadNum, String cpuPowerMode) {
+        // Release model if exists
+        releaseModel();
+
+        AMPredictor = loadModel(appCtx, modelPath, AMmodelName, cpuThreadNum, cpuPowerMode);
+        if (AMPredictor == null) {
+            return false;
+        }
+        VOCPredictor = loadModel(appCtx, modelPath, VOCmodelName, cpuThreadNum, cpuPowerMode);
+        if (VOCPredictor == null) {
+            return false;
+        }
+        isLoaded = true;
+        return true;
+    }
+
+    protected PaddlePredictor loadModel(Context appCtx, String modelPath, String modelName, int cpuThreadNum, String cpuPowerMode) {
+        // Load model
+        if (modelPath.isEmpty()) {
+            return null;
+        }
+        String realPath = modelPath;
+        if (modelPath.charAt(0) != '/') {
+            // Read model files from custom path if the first character of mode path is '/'
+            // otherwise copy model to cache from assets
+            realPath = appCtx.getCacheDir() + "/" + modelPath;
+            // push model to mobile
+            Utils.copyDirectoryFromAssets(appCtx, modelPath, realPath);
+        }
+        if (realPath.isEmpty()) {
+            return null;
+        }
+        MobileConfig config = new MobileConfig();
+        config.setModelFromFile(realPath + File.separator + modelName);
+        Log.e(TAG, "File:" + realPath + File.separator + modelName);
+        config.setThreads(cpuThreadNum);
+        if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_HIGH")) {
+            config.setPowerMode(PowerMode.LITE_POWER_HIGH);
+        } else if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_LOW")) {
+            config.setPowerMode(PowerMode.LITE_POWER_LOW);
+        } else if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_FULL")) {
+            config.setPowerMode(PowerMode.LITE_POWER_FULL);
+        } else if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_NO_BIND")) {
+            config.setPowerMode(PowerMode.LITE_POWER_NO_BIND);
+        } else if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_RAND_HIGH")) {
+            config.setPowerMode(PowerMode.LITE_POWER_RAND_HIGH);
+        } else if (cpuPowerMode.equalsIgnoreCase("LITE_POWER_RAND_LOW")) {
+            config.setPowerMode(PowerMode.LITE_POWER_RAND_LOW);
+        } else {
+            Log.e(TAG, "Unknown cpu power mode!");
+            return null;
+        }
+        return PaddlePredictor.createPaddlePredictor(config);
+    }
+
+    public void releaseModel() {
+        AMPredictor = null;
+        VOCPredictor = null;
+        isLoaded = false;
+        cpuThreadNum = 1;
+        cpuPowerMode = "LITE_POWER_HIGH";
+        modelPath = "";
+    }
+
+    public boolean runModel(float[] phones) {
+        if (!isLoaded()) {
+            return false;
+        }
+        Date start = new Date();
+        Tensor am_output_handle = getAMOutput(phones, AMPredictor);
+        wav = getVOCOutput(am_output_handle, VOCPredictor);
+        Date end = new Date();
+        inferenceTime = (end.getTime() - start.getTime());
+        return true;
+    }
+
+    public Tensor getAMOutput(float[] phones, PaddlePredictor am_predictor) {
+        Tensor phones_handle = am_predictor.getInput(0);
+        long[] dims = {phones.length};
+        phones_handle.resize(dims);
+        phones_handle.setData(phones);
+        am_predictor.run();
+        Tensor am_output_handle = am_predictor.getOutput(0);
+        // [?, 80]
+        // long outputShape[] = am_output_handle.shape();
+        float[] am_output_data = am_output_handle.getFloatData();
+        // [? x 80]
+        // long[] am_output_data_shape = {am_output_data.length};
+        // Log.e(TAG, Arrays.toString(am_output_data));
+        // 打印 mel 数组
+        // for (int i=0;i<outputShape[0];i++) {
+        //      Log.e(TAG, Arrays.toString(Arrays.copyOfRange(am_output_data,i*80,(i+1)*80)));
+        // }
+        // voc_predictor 需要知道输入的 shape，所以不能输出转成 float 之后的一维数组
+        return am_output_handle;
+    }
+
+    public float[] getVOCOutput(Tensor input, PaddlePredictor voc_predictor) {
+        Tensor mel_handle = voc_predictor.getInput(0);
+        // [?, 80]
+        long[] dims = input.shape();
+        mel_handle.resize(dims);
+        float[] am_output_data = input.getFloatData();
+        mel_handle.setData(am_output_data);
+        voc_predictor.run();
+        Tensor voc_output_handle = voc_predictor.getOutput(0);
+        // [? x 300, 1]
+        // long[] outputShape = voc_output_handle.shape();
+        float[] voc_output_data = voc_output_handle.getFloatData();
+        // long[] voc_output_data_shape = {voc_output_data.length};
+        return voc_output_data;
+    }
+
+
+    public boolean isLoaded() {
+        return AMPredictor != null && VOCPredictor != null && isLoaded;
+    }
+
+
+    public float inferenceTime() {
+        return inferenceTime;
+    }
+
+}
--- a/demos/TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/SettingsActivity.java
+++ b/demos/TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/SettingsActivity.java
@ -0,0 +1,111 @@
+package com.baidu.paddle.lite.demo.tts;
+
+import android.content.SharedPreferences;
+import android.os.Bundle;
+import android.preference.CheckBoxPreference;
+import android.preference.EditTextPreference;
+import android.preference.ListPreference;
+import android.support.v7.app.ActionBar;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public class SettingsActivity extends AppCompatPreferenceActivity implements SharedPreferences.OnSharedPreferenceChangeListener {
+    ListPreference lpChoosePreInstalledModel = null;
+    CheckBoxPreference cbEnableCustomSettings = null;
+    EditTextPreference etModelPath = null;
+    ListPreference lpCPUThreadNum = null;
+    ListPreference lpCPUPowerMode = null;
+
+    List<String> preInstalledModelPaths = null;
+    List<String> preInstalledCPUThreadNums = null;
+    List<String> preInstalledCPUPowerModes = null;
+
+
+    @Override
+    public void onCreate(Bundle savedInstanceState) {
+        super.onCreate(savedInstanceState);
+        addPreferencesFromResource(R.xml.settings);
+        ActionBar supportActionBar = getSupportActionBar();
+        if (supportActionBar != null) {
+            supportActionBar.setDisplayHomeAsUpEnabled(true);
+        }
+
+        // Initialized pre-installed models
+        preInstalledModelPaths = new ArrayList<String>();
+        preInstalledCPUThreadNums = new ArrayList<String>();
+        preInstalledCPUPowerModes = new ArrayList<String>();
+        preInstalledModelPaths.add(getString(R.string.MODEL_PATH_DEFAULT));
+        preInstalledCPUThreadNums.add(getString(R.string.CPU_THREAD_NUM_DEFAULT));
+        preInstalledCPUPowerModes.add(getString(R.string.CPU_POWER_MODE_DEFAULT));
+
+
+        // Setup UI components
+        lpChoosePreInstalledModel = (ListPreference) findPreference(getString(R.string.CHOOSE_PRE_INSTALLED_MODEL_KEY));
+        String[] preInstalledModelNames = new String[preInstalledModelPaths.size()];
+        for (int i = 0; i < preInstalledModelPaths.size(); i++) {
+            preInstalledModelNames[i] = preInstalledModelPaths.get(i).substring(preInstalledModelPaths.get(i).lastIndexOf("/") + 1);
+        }
+        lpChoosePreInstalledModel.setEntries(preInstalledModelNames);
+        lpChoosePreInstalledModel.setEntryValues(preInstalledModelPaths.toArray(new String[preInstalledModelPaths.size()]));
+        lpCPUThreadNum = (ListPreference) findPreference(getString(R.string.CPU_THREAD_NUM_KEY));
+        lpCPUPowerMode = (ListPreference) findPreference(getString(R.string.CPU_POWER_MODE_KEY));
+        cbEnableCustomSettings = (CheckBoxPreference) findPreference(getString(R.string.ENABLE_CUSTOM_SETTINGS_KEY));
+        etModelPath = (EditTextPreference) findPreference(getString(R.string.MODEL_PATH_KEY));
+        etModelPath.setTitle("Model Path (SDCard: " + Utils.getSDCardDirectory() + ")");
+    }
+
+    private void reloadPreferenceAndUpdateUI() {
+        SharedPreferences sharedPreferences = getPreferenceScreen().getSharedPreferences();
+        boolean enableCustomSettings = sharedPreferences.getBoolean(getString(R.string.ENABLE_CUSTOM_SETTINGS_KEY), false);
+        String modelPath = sharedPreferences.getString(getString(R.string.CHOOSE_PRE_INSTALLED_MODEL_KEY), getString(R.string.MODEL_PATH_DEFAULT));
+        int modelIdx = lpChoosePreInstalledModel.findIndexOfValue(modelPath);
+        if (modelIdx >= 0 && modelIdx < preInstalledModelPaths.size()) {
+            if (!enableCustomSettings) {
+                SharedPreferences.Editor editor = sharedPreferences.edit();
+                editor.putString(getString(R.string.MODEL_PATH_KEY), preInstalledModelPaths.get(modelIdx));
+                editor.putString(getString(R.string.CPU_THREAD_NUM_KEY), preInstalledCPUThreadNums.get(modelIdx));
+                editor.putString(getString(R.string.CPU_POWER_MODE_KEY), preInstalledCPUPowerModes.get(modelIdx));
+                editor.commit();
+            }
+            lpChoosePreInstalledModel.setSummary(modelPath);
+        }
+        cbEnableCustomSettings.setChecked(enableCustomSettings);
+        etModelPath.setEnabled(enableCustomSettings);
+        lpCPUThreadNum.setEnabled(enableCustomSettings);
+        lpCPUPowerMode.setEnabled(enableCustomSettings);
+        modelPath = sharedPreferences.getString(getString(R.string.MODEL_PATH_KEY), getString(R.string.MODEL_PATH_DEFAULT));
+        String cpuThreadNum = sharedPreferences.getString(getString(R.string.CPU_THREAD_NUM_KEY), getString(R.string.CPU_THREAD_NUM_DEFAULT));
+        String cpuPowerMode = sharedPreferences.getString(getString(R.string.CPU_POWER_MODE_KEY), getString(R.string.CPU_POWER_MODE_DEFAULT));
+
+        etModelPath.setSummary(modelPath);
+        etModelPath.setText(modelPath);
+        lpCPUThreadNum.setValue(cpuThreadNum);
+        lpCPUThreadNum.setSummary(cpuThreadNum);
+        lpCPUPowerMode.setValue(cpuPowerMode);
+        lpCPUPowerMode.setSummary(cpuPowerMode);
+    }
+
+    @Override
+    protected void onResume() {
+        super.onResume();
+        getPreferenceScreen().getSharedPreferences().registerOnSharedPreferenceChangeListener(this);
+        reloadPreferenceAndUpdateUI();
+    }
+
+    @Override
+    protected void onPause() {
+        super.onPause();
+        getPreferenceScreen().getSharedPreferences().unregisterOnSharedPreferenceChangeListener(this);
+    }
+
+    @Override
+    public void onSharedPreferenceChanged(SharedPreferences sharedPreferences, String key) {
+        if (key.equals(getString(R.string.CHOOSE_PRE_INSTALLED_MODEL_KEY))) {
+            SharedPreferences.Editor editor = sharedPreferences.edit();
+            editor.putBoolean(getString(R.string.ENABLE_CUSTOM_SETTINGS_KEY), false);
+            editor.commit();
+        }
+        reloadPreferenceAndUpdateUI();
+    }
+}
--- a/demos/TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/Utils.java
+++ b/demos/TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/Utils.java
@ -0,0 +1,155 @@
+package com.baidu.paddle.lite.demo.tts;
+
+import static java.lang.Math.abs;
+
+import android.content.Context;
+import android.os.Environment;
+
+import java.io.BufferedInputStream;
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+public class Utils {
+    public static void copyFileFromAssets(Context appCtx, String srcPath, String dstPath) {
+        if (srcPath.isEmpty() || dstPath.isEmpty()) {
+            return;
+        }
+        InputStream is = null;
+        OutputStream os = null;
+        try {
+            is = new BufferedInputStream(appCtx.getAssets().open(srcPath));
+            os = new BufferedOutputStream(new FileOutputStream(new File(dstPath)));
+            byte[] buffer = new byte[1024];
+            int length = 0;
+            while ((length = is.read(buffer)) != -1) {
+                os.write(buffer, 0, length);
+            }
+        } catch (FileNotFoundException e) {
+            e.printStackTrace();
+        } catch (IOException e) {
+            e.printStackTrace();
+        } finally {
+            try {
+                os.close();
+                is.close();
+            } catch (IOException e) {
+                e.printStackTrace();
+            }
+        }
+    }
+
+    public static void copyDirectoryFromAssets(Context appCtx, String srcDir, String dstDir) {
+        if (srcDir.isEmpty() || dstDir.isEmpty()) {
+            return;
+        }
+        try {
+            if (!new File(dstDir).exists()) {
+                new File(dstDir).mkdirs();
+            }
+            for (String fileName : appCtx.getAssets().list(srcDir)) {
+                String srcSubPath = srcDir + File.separator + fileName;
+                String dstSubPath = dstDir + File.separator + fileName;
+                if (new File(srcSubPath).isDirectory()) {
+                    copyDirectoryFromAssets(appCtx, srcSubPath, dstSubPath);
+                } else {
+                    copyFileFromAssets(appCtx, srcSubPath, dstSubPath);
+                }
+            }
+        } catch (Exception e) {
+            e.printStackTrace();
+        }
+    }
+
+
+    public static String getSDCardDirectory() {
+        return Environment.getExternalStorageDirectory().getAbsolutePath();
+    }
+
+    public static void rawToWave(String file, float[] data, int samplerate) throws IOException {
+        // creating the empty wav file.
+        File waveFile = new File(file);
+        waveFile.createNewFile();
+        //following block is converting raw to wav.
+        DataOutputStream output = null;
+        try {
+            output = new DataOutputStream(new FileOutputStream(waveFile));
+            // WAVE header
+            // chunk id
+            writeString(output, "RIFF");
+            // chunk size
+            writeInt(output, 36 + data.length * 2);
+            // format
+            writeString(output, "WAVE");
+            // subchunk 1 id
+            writeString(output, "fmt ");
+            // subchunk 1 size
+            writeInt(output, 16);
+            // audio format (1 = PCM)
+            writeShort(output, (short) 1);
+            // number of channels
+            writeShort(output, (short) 1);
+            // sample rate
+            writeInt(output, samplerate);
+            // byte rate
+            writeInt(output, samplerate * 2);
+            // block align
+            writeShort(output, (short) 2);
+            // bits per sample
+            writeShort(output, (short) 16);
+            // subchunk 2 id
+            writeString(output, "data");
+            // subchunk 2 size
+            writeInt(output, data.length * 2);
+            short[] short_data = FloatArray2ShortArray(data);
+            for (int i = 0; i < short_data.length; i++) {
+                writeShort(output, short_data[i]);
+            }
+        } finally {
+            if (output != null) {
+                output.close();
+            }
+        }
+    }
+
+    private static void writeInt(final DataOutputStream output, final int value) throws IOException {
+        output.write(value);
+        output.write(value >> 8);
+        output.write(value >> 16);
+        output.write(value >> 24);
+    }
+
+    private static void writeShort(final DataOutputStream output, final short value) throws IOException {
+        output.write(value);
+        output.write(value >> 8);
+    }
+
+    private static void writeString(final DataOutputStream output, final String value) throws IOException {
+        for (int i = 0; i < value.length(); i++) {
+            output.write(value.charAt(i));
+        }
+    }
+
+    public static short[] FloatArray2ShortArray(float[] values) {
+        float mmax = (float) 0.01;
+        short[] ret = new short[values.length];
+
+        for (int i = 0; i < values.length; i++) {
+            if (abs(values[i]) > mmax) {
+                mmax = abs(values[i]);
+            }
+        }
+
+        for (int i = 0; i < values.length; i++) {
+            values[i] = values[i] * (32767 / mmax);
+            ret[i] = (short) (values[i]);
+        }
+        return ret;
+    }
+
+}
--- a/demos/TTSAndroid/app/src/main/res/drawable/button_drawable.xml
+++ b/demos/TTSAndroid/app/src/main/res/drawable/button_drawable.xml
@ -0,0 +1,20 @@
+<?xml version="1.0" encoding="utf-8"?>
+<selector xmlns:android="http://schemas.android.com/apk/res/android">
+    <item android:state_pressed="false"><!--没点击按钮的时候-->
+        <shape android:shape="rectangle"><!--按钮形状-->
+            <solid android:color="#008577" /><!--按钮背景填充色-->
+            <corners android:radius="10dp" />
+            <stroke android:width="1dp" android:color="#009688" /><!--按钮边框-->
+        </shape>
+    </item>
+
+    <item android:state_pressed="true">
+        <shape android:shape="rectangle"><!--按钮形状-->
+            <solid android:color="#C3009688" /><!--按钮背景填充色-->
+            <corners android:radius="10dp" />
+            <stroke android:width="1dp" android:color="#009688" /><!--按钮边框-->
+        </shape>
+    </item>
+
+</selector>
+
--- a/demos/TTSAndroid/app/src/main/res/drawable/logo.jpg
+++ b/demos/TTSAndroid/app/src/main/res/drawable/logo.jpg
--- a/demos/TTSAndroid/app/src/main/res/drawable/paddlespeech_logo.png
+++ b/demos/TTSAndroid/app/src/main/res/drawable/paddlespeech_logo.png
--- a/demos/TTSAndroid/app/src/main/res/layout/activity_main.xml
+++ b/demos/TTSAndroid/app/src/main/res/layout/activity_main.xml
@ -0,0 +1,112 @@
+<?xml version="1.0" encoding="utf-8"?>
+<android.support.constraint.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
+    xmlns:tools="http://schemas.android.com/tools"
+    android:layout_width="match_parent"
+    android:layout_height="match_parent"
+    tools:context=".MainActivity">
+
+    <RelativeLayout
+        android:layout_width="match_parent"
+        android:layout_height="match_parent">
+
+        <ImageView
+            android:id="@+id/logo"
+            android:layout_width="wrap_content"
+            android:layout_height="wrap_content"
+            android:layout_marginTop="20dp"
+            android:src="@drawable/paddlespeech_logo" />
+
+        <LinearLayout
+            android:id="@+id/v_input_info"
+            android:layout_width="fill_parent"
+            android:layout_height="wrap_content"
+            android:layout_below="@+id/logo"
+            android:layout_alignParentTop="true"
+            android:layout_marginTop="120dp"
+            android:orientation="vertical">
+
+            <TextView
+                android:id="@+id/tv_input_setting"
+                android:layout_width="wrap_content"
+                android:layout_height="wrap_content"
+                android:layout_marginLeft="12dp"
+                android:layout_marginTop="10dp"
+                android:layout_marginRight="12dp"
+                android:layout_marginBottom="5dp"
+                android:lineSpacingExtra="4dp"
+                android:maxLines="6"
+                android:scrollbars="vertical"
+                android:singleLine="false"
+                android:text=""
+                android:textColor="#3C3C3C" />
+
+            <Spinner
+                android:id="@+id/spinner1"
+                android:layout_width="wrap_content"
+                android:layout_height="wrap_content"
+                android:dropDownSelector="#63D81B60"
+                android:spinnerMode="dropdown" />
+
+            <TextView
+                android:id="@+id/tv_inference_time"
+                android:layout_width="wrap_content"
+                android:layout_height="wrap_content"
+                android:layout_below="@+id/spinner1"
+                android:layout_centerHorizontal="true"
+                android:layout_centerVertical="true"
+                android:layout_marginLeft="12dp"
+                android:layout_marginTop="50dp"
+                android:layout_marginRight="12dp"
+                android:layout_marginBottom="5dp"
+                android:gravity="start"
+                android:lineSpacingExtra="4dp"
+                android:maxLines="6"
+                android:textColor="#3C3C3C" />
+
+            <LinearLayout
+                android:id="@+id/btns"
+                android:layout_width="match_parent"
+                android:layout_height="match_parent"
+                android:layout_below="@+id/tv_inference_time"
+                android:layout_marginLeft="10dp"
+                android:layout_marginTop="30dp">
+
+                <Button
+                    android:id="@+id/btn_play"
+                    android:layout_width="60dp"
+                    android:layout_height="40dp"
+                    android:background="@drawable/button_drawable"
+                    android:text="Play"
+                    android:textAllCaps="false"
+                    android:textColor="#ffffff" />
+
+                <Button
+                    android:id="@+id/btn_pause"
+                    android:layout_width="60dp"
+                    android:layout_height="40dp"
+                    android:layout_marginLeft="3dp"
+                    android:background="@drawable/button_drawable"
+                    android:text="Pause"
+                    android:textAllCaps="false"
+                    android:textColor="#ffffff" />
+
+                <Button
+                    android:id="@+id/btn_stop"
+                    android:layout_width="60dp"
+                    android:layout_height="40dp"
+                    android:layout_marginLeft="3dp"
+                    android:background="@drawable/button_drawable"
+                    android:text="Stop"
+                    android:textAllCaps="false"
+                    android:textColor="#ffffff" />
+
+            </LinearLayout>
+
+
+        </LinearLayout>
+
+
+    </RelativeLayout>
+
+
+</android.support.constraint.ConstraintLayout>
--- a/demos/TTSAndroid/app/src/main/res/menu/menu_action_options.xml
+++ b/demos/TTSAndroid/app/src/main/res/menu/menu_action_options.xml
@ -0,0 +1,9 @@
+<menu xmlns:android="http://schemas.android.com/apk/res/android"
+    xmlns:app="http://schemas.android.com/apk/res-auto">
+    <group>
+        <item
+            android:id="@+id/settings"
+            android:title="Settings..."
+            app:showAsAction="withText" />
+    </group>
+</menu>
--- a/demos/TTSAndroid/app/src/main/res/values/arrays.xml
+++ b/demos/TTSAndroid/app/src/main/res/values/arrays.xml
@ -0,0 +1,44 @@
+<?xml version="1.0" encoding="utf-8"?>
+<resources>
+    <string-array name="cpu_thread_num_entries">
+        <item>1 threads</item>
+        <item>2 threads</item>
+        <item>4 threads</item>
+        <item>8 threads</item>
+    </string-array>
+    <string-array name="cpu_thread_num_values">
+        <item>1</item>
+        <item>2</item>
+        <item>4</item>
+        <item>8</item>
+    </string-array>
+    <string-array name="cpu_power_mode_entries">
+        <item>HIGH(only big cores)</item>
+        <item>LOW(only LITTLE cores)</item>
+        <item>FULL(all cores)</item>
+        <item>NO_BIND(depends on system)</item>
+        <item>RAND_HIGH</item>
+        <item>RAND_LOW</item>
+    </string-array>
+    <string-array name="cpu_power_mode_values">
+        <item>LITE_POWER_HIGH</item>
+        <item>LITE_POWER_LOW</item>
+        <item>LITE_POWER_FULL</item>
+        <item>LITE_POWER_NO_BIND</item>
+        <item>LITE_POWER_RAND_HIGH</item>
+        <item>LITE_POWER_RAND_LOW</item>
+    </string-array>
+    <string-array name="text">
+        <item>Please select a sentence to be synthesized</item>
+        <item>昨日，这名“伤者”与医生全部被警方依法刑事拘留。</item>
+        <item>钱伟长想到上海来办学校是经过深思熟虑的。</item>
+        <item>她见我一进门就骂，吃饭时也骂，骂得我抬不起头。</item>
+        <item>李述德在离开之前，只说了一句“柱驼杀父亲了”。</item>
+        <item>这种车票和保险单捆绑出售属于重复性购买。</item>
+        <item>戴佩妮的男友西米露接唱情歌，让她非常开心。</item>
+        <item>观大势、谋大局、出大策始终是该院的办院方针。</item>
+        <item>他们骑着摩托回家，正好为农忙时的父母帮忙。</item>
+        <item>但是因为还没到退休年龄，只能掰着指头捱日子。</item>
+        <item>这几天雨水不断，人们恨不得待在家里不出门。</item>
+    </string-array>
+</resources>
--- a/demos/TTSAndroid/app/src/main/res/values/colors.xml
+++ b/demos/TTSAndroid/app/src/main/res/values/colors.xml
@ -0,0 +1,6 @@
+<?xml version="1.0" encoding="utf-8"?>
+<resources>
+    <color name="colorPrimary">#008577</color>
+    <color name="colorPrimaryDark">#00574B</color>
+    <color name="colorAccent">#D81B60</color>
+</resources>
--- a/demos/TTSAndroid/app/src/main/res/values/strings.xml
+++ b/demos/TTSAndroid/app/src/main/res/values/strings.xml
@ -0,0 +1,12 @@
+<resources>
+    <string name="app_name">TTS</string>
+    <string name="CHOOSE_PRE_INSTALLED_MODEL_KEY">CHOOSE_PRE_INSTALLED_MODEL_KEY</string>
+    <string name="ENABLE_CUSTOM_SETTINGS_KEY">ENABLE_CUSTOM_SETTINGS_KEY</string>
+    <string name="MODEL_PATH_KEY">MODEL_PATH_KEY</string>
+    <string name="CPU_THREAD_NUM_KEY">CPU_THREAD_NUM_KEY</string>
+    <string name="CPU_POWER_MODE_KEY">CPU_POWER_MODE_KEY</string>
+    <string name="MODEL_PATH_DEFAULT">models/cpu</string>
+    <string name="CPU_THREAD_NUM_DEFAULT">1</string>
+    <string name="CPU_POWER_MODE_DEFAULT">LITE_POWER_HIGH</string>
+</resources>
+
--- a/demos/TTSAndroid/app/src/main/res/values/styles.xml
+++ b/demos/TTSAndroid/app/src/main/res/values/styles.xml
@ -0,0 +1,16 @@
+<resources>
+
+    <!-- Base application theme. -->
+    <style name="AppTheme" parent="Theme.AppCompat.Light.DarkActionBar">
+        <!-- Customize your theme here. -->
+        <item name="colorPrimary">@color/colorPrimary</item>
+        <item name="colorPrimaryDark">@color/colorPrimaryDark</item>
+        <item name="colorAccent">@color/colorAccent</item>
+        <item name="actionOverflowMenuStyle">@style/OverflowMenuStyle</item>
+    </style>
+
+    <style name="OverflowMenuStyle" parent="Widget.AppCompat.Light.PopupMenu.Overflow">
+        <item name="overlapAnchor">false</item>
+    </style>
+
+</resources>
--- a/demos/TTSAndroid/app/src/main/res/xml/settings.xml
+++ b/demos/TTSAndroid/app/src/main/res/xml/settings.xml
@ -0,0 +1,39 @@
+<?xml version="1.0" encoding="utf-8"?>
+<PreferenceScreen xmlns:android="http://schemas.android.com/apk/res/android">
+    <PreferenceCategory android:title="Model Settings">
+        <ListPreference
+            android:defaultValue="@string/MODEL_PATH_DEFAULT"
+            android:key="@string/CHOOSE_PRE_INSTALLED_MODEL_KEY"
+            android:negativeButtonText="@null"
+            android:positiveButtonText="@null"
+            android:title="Choose pre-installed models" />
+        <CheckBoxPreference
+            android:defaultValue="false"
+            android:key="@string/ENABLE_CUSTOM_SETTINGS_KEY"
+            android:summaryOff="Disable"
+            android:summaryOn="Enable"
+            android:title="Enable custom settings" />
+        <EditTextPreference
+            android:defaultValue="@string/MODEL_PATH_DEFAULT"
+            android:key="@string/MODEL_PATH_KEY"
+            android:title="Model Path" />
+    </PreferenceCategory>
+    <PreferenceCategory android:title="CPU Settings">
+        <ListPreference
+            android:defaultValue="@string/CPU_THREAD_NUM_DEFAULT"
+            android:entries="@array/cpu_thread_num_entries"
+            android:entryValues="@array/cpu_thread_num_values"
+            android:key="@string/CPU_THREAD_NUM_KEY"
+            android:negativeButtonText="@null"
+            android:positiveButtonText="@null"
+            android:title="CPU Thread Num" />
+        <ListPreference
+            android:defaultValue="@string/CPU_POWER_MODE_DEFAULT"
+            android:entries="@array/cpu_power_mode_entries"
+            android:entryValues="@array/cpu_power_mode_values"
+            android:key="@string/CPU_POWER_MODE_KEY"
+            android:negativeButtonText="@null"
+            android:positiveButtonText="@null"
+            android:title="CPU Power Mode" />
+    </PreferenceCategory>
+</PreferenceScreen>
--- a/demos/TTSAndroid/app/src/test/java/com/baidu/paddle/lite/demo/tts/ExampleUnitTest.java
+++ b/demos/TTSAndroid/app/src/test/java/com/baidu/paddle/lite/demo/tts/ExampleUnitTest.java
@ -0,0 +1,17 @@
+package com.baidu.paddle.lite.demo.tts;
+
+import static org.junit.Assert.assertEquals;
+
+import org.junit.Test;
+
+/**
+ * Example local unit test, which will execute on the development machine (host).
+ *
+ * @see <a href="http://d.android.com/tools/testing">Testing documentation</a>
+ */
+public class ExampleUnitTest {
+    @Test
+    public void addition_isCorrect() {
+        assertEquals(4, 2 + 2);
+    }
+}
--- a/demos/TTSAndroid/build.gradle
+++ b/demos/TTSAndroid/build.gradle
@ -0,0 +1,27 @@
+// Top-level build file where you can add configuration options common to all sub-projects/modules.
+
+buildscript {
+    repositories {
+        google()
+        jcenter()
+
+    }
+    dependencies {
+        classpath 'com.android.tools.build:gradle:4.1.0'
+
+        // NOTE: Do not place your application dependencies here; they belong
+        // in the individual module build.gradle files
+    }
+}
+
+allprojects {
+    repositories {
+        google()
+        jcenter()
+
+    }
+}
+
+task clean(type: Delete) {
+    delete rootProject.buildDir
+}
--- a/demos/TTSAndroid/gradle.properties
+++ b/demos/TTSAndroid/gradle.properties
@ -0,0 +1,15 @@
+# Project-wide Gradle settings.
+# IDE (e.g. Android Studio) users:
+# Gradle settings configured through the IDE *will override*
+# any settings specified in this file.
+# For more details on how to configure your build environment visit
+# http://www.gradle.org/docs/current/userguide/build_environment.html
+# Specifies the JVM arguments used for the daemon process.
+# The setting is particularly useful for tweaking memory settings.
+org.gradle.jvmargs=-Xmx1536m
+# When configured, Gradle will run in incubating parallel mode.
+# This option should only be used with decoupled projects. More details, visit
+# http://www.gradle.org/docs/current/userguide/multi_project_builds.html#sec:decoupled_projects
+# org.gradle.parallel=true
+
+
--- a/demos/TTSAndroid/gradle/wrapper/gradle-wrapper.jar
+++ b/demos/TTSAndroid/gradle/wrapper/gradle-wrapper.jar
--- a/demos/TTSAndroid/gradle/wrapper/gradle-wrapper.properties
+++ b/demos/TTSAndroid/gradle/wrapper/gradle-wrapper.properties
@ -0,0 +1,6 @@
+#Wed Jun 16 14:31:28 CST 2021
+distributionBase=GRADLE_USER_HOME
+distributionPath=wrapper/dists
+zipStoreBase=GRADLE_USER_HOME
+zipStorePath=wrapper/dists
+distributionUrl=https\://services.gradle.org/distributions/gradle-7.0-all.zip
--- a/demos/TTSAndroid/gradlew
+++ b/demos/TTSAndroid/gradlew
@ -0,0 +1,172 @@
+#!/usr/bin/env sh
+
+##############################################################################
+##
+##  Gradle start up script for UN*X
+##
+##############################################################################
+
+# Attempt to set APP_HOME
+# Resolve links: $0 may be a link
+PRG="$0"
+# Need this for relative symlinks.
+while [ -h "$PRG" ] ; do
+    ls=`ls -ld "$PRG"`
+    link=`expr "$ls" : '.*-> \(.*\)$'`
+    if expr "$link" : '/.*' > /dev/null; then
+        PRG="$link"
+    else
+        PRG=`dirname "$PRG"`"/$link"
+    fi
+done
+SAVED="`pwd`"
+cd "`dirname \"$PRG\"`/" >/dev/null
+APP_HOME="`pwd -P`"
+cd "$SAVED" >/dev/null
+
+APP_NAME="Gradle"
+APP_BASE_NAME=`basename "$0"`
+
+# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
+DEFAULT_JVM_OPTS=""
+
+# Use the maximum available, or set MAX_FD != -1 to use that value.
+MAX_FD="maximum"
+
+warn () {
+    echo "$*"
+}
+
+die () {
+    echo
+    echo "$*"
+    echo
+    exit 1
+}
+
+# OS specific support (must be 'true' or 'false').
+cygwin=false
+msys=false
+darwin=false
+nonstop=false
+case "`uname`" in
+  CYGWIN* )
+    cygwin=true
+    ;;
+  Darwin* )
+    darwin=true
+    ;;
+  MINGW* )
+    msys=true
+    ;;
+  NONSTOP* )
+    nonstop=true
+    ;;
+esac
+
+CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar
+
+# Determine the Java command to use to start the JVM.
+if [ -n "$JAVA_HOME" ] ; then
+    if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
+        # IBM's JDK on AIX uses strange locations for the executables
+        JAVACMD="$JAVA_HOME/jre/sh/java"
+    else
+        JAVACMD="$JAVA_HOME/bin/java"
+    fi
+    if [ ! -x "$JAVACMD" ] ; then
+        die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
+
+Please set the JAVA_HOME variable in your environment to match the
+location of your Java installation."
+    fi
+else
+    JAVACMD="java"
+    which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
+
+Please set the JAVA_HOME variable in your environment to match the
+location of your Java installation."
+fi
+
+# Increase the maximum file descriptors if we can.
+if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then
+    MAX_FD_LIMIT=`ulimit -H -n`
+    if [ $? -eq 0 ] ; then
+        if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
+            MAX_FD="$MAX_FD_LIMIT"
+        fi
+        ulimit -n $MAX_FD
+        if [ $? -ne 0 ] ; then
+            warn "Could not set maximum file descriptor limit: $MAX_FD"
+        fi
+    else
+        warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
+    fi
+fi
+
+# For Darwin, add options to specify how the application appears in the dock
+if $darwin; then
+    GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
+fi
+
+# For Cygwin, switch paths to Windows format before running java
+if $cygwin ; then
+    APP_HOME=`cygpath --path --mixed "$APP_HOME"`
+    CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
+    JAVACMD=`cygpath --unix "$JAVACMD"`
+
+    # We build the pattern for arguments to be converted via cygpath
+    ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
+    SEP=""
+    for dir in $ROOTDIRSRAW ; do
+        ROOTDIRS="$ROOTDIRS$SEP$dir"
+        SEP="|"
+    done
+    OURCYGPATTERN="(^($ROOTDIRS))"
+    # Add a user-defined pattern to the cygpath arguments
+    if [ "$GRADLE_CYGPATTERN" != "" ] ; then
+        OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
+    fi
+    # Now convert the arguments - kludge to limit ourselves to /bin/sh
+    i=0
+    for arg in "$@" ; do
+        CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
+        CHECK2=`echo "$arg"|egrep -c "^-"`                                 ### Determine if an option
+
+        if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then                    ### Added a condition
+            eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
+        else
+            eval `echo args$i`="\"$arg\""
+        fi
+        i=$((i+1))
+    done
+    case $i in
+        (0) set -- ;;
+        (1) set -- "$args0" ;;
+        (2) set -- "$args0" "$args1" ;;
+        (3) set -- "$args0" "$args1" "$args2" ;;
+        (4) set -- "$args0" "$args1" "$args2" "$args3" ;;
+        (5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
+        (6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
+        (7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
+        (8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
+        (9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
+    esac
+fi
+
+# Escape application args
+save () {
+    for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
+    echo " "
+}
+APP_ARGS=$(save "$@")
+
+# Collect all arguments for the java command, following the shell quoting and substitution rules
+eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS"
+
+# by default we should be in the correct project dir, but when run from Finder on Mac, the cwd is wrong
+if [ "$(uname)" = "Darwin" ] && [ "$HOME" = "$PWD" ]; then
+  cd "$(dirname "$0")"
+fi
+
+exec "$JAVACMD" "$@"
--- a/demos/TTSAndroid/gradlew.bat
+++ b/demos/TTSAndroid/gradlew.bat
@ -0,0 +1,84 @@
+@if "%DEBUG%" == "" @echo off
+@rem ##########################################################################
+@rem
+@rem  Gradle startup script for Windows
+@rem
+@rem ##########################################################################
+
+@rem Set local scope for the variables with windows NT shell
+if "%OS%"=="Windows_NT" setlocal
+
+set DIRNAME=%~dp0
+if "%DIRNAME%" == "" set DIRNAME=.
+set APP_BASE_NAME=%~n0
+set APP_HOME=%DIRNAME%
+
+@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
+set DEFAULT_JVM_OPTS=
+
+@rem Find java.exe
+if defined JAVA_HOME goto findJavaFromJavaHome
+
+set JAVA_EXE=java.exe
+%JAVA_EXE% -version >NUL 2>&1
+if "%ERRORLEVEL%" == "0" goto init
+
+echo.
+echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
+echo.
+echo Please set the JAVA_HOME variable in your environment to match the
+echo location of your Java installation.
+
+goto fail
+
+:findJavaFromJavaHome
+set JAVA_HOME=%JAVA_HOME:"=%
+set JAVA_EXE=%JAVA_HOME%/bin/java.exe
+
+if exist "%JAVA_EXE%" goto init
+
+echo.
+echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
+echo.
+echo Please set the JAVA_HOME variable in your environment to match the
+echo location of your Java installation.
+
+goto fail
+
+:init
+@rem Get command-line arguments, handling Windows variants
+
+if not "%OS%" == "Windows_NT" goto win9xME_args
+
+:win9xME_args
+@rem Slurp the command line arguments.
+set CMD_LINE_ARGS=
+set _SKIP=2
+
+:win9xME_args_slurp
+if "x%~1" == "x" goto execute
+
+set CMD_LINE_ARGS=%*
+
+:execute
+@rem Setup the command line
+
+set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
+
+@rem Execute Gradle
+"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %CMD_LINE_ARGS%
+
+:end
+@rem End local scope for the variables with windows NT shell
+if "%ERRORLEVEL%"=="0" goto mainEnd
+
+:fail
+rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
+rem the _cmd.exe /c_ return code!
+if  not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
+exit /b 1
+
+:mainEnd
+if "%OS%"=="Windows_NT" endlocal
+
+:omega
--- a/demos/TTSAndroid/settings.gradle
+++ b/demos/TTSAndroid/settings.gradle
@ -0,0 +1 @@
+include ':app'
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@ -19,11 +19,12 @@ numpydoc
 onnxruntime==1.10.0
 opencc
 paddlenlp
-paddlepaddle>=2.2.2
+# use paddlepaddle == 2.3.* according to: https://github.com/PaddlePaddle/Paddle/issues/48243
+paddlepaddle>=2.2.2,<2.4.0
 paddlespeech_ctcdecoders
 paddlespeech_feat
 pandas
-pathos == 0.2.8
+pathos==0.2.8
 pattern_singleton
 Pillow>=9.0.0
 praatio==5.0.0
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@ -23,10 +23,17 @@ Model | Pre-Train Method | Pre-Train Data | Finetune Data | Size | Descriptions
 :-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----:  | :-----:  | :-----: | 
 [Wav2vec2-large-960h-lv60-self Model](https://paddlespeech.bj.bcebos.com/wav2vec/wav2vec2-large-960h-lv60-self.pdparams) | wav2vec2 | Librispeech and LV-60k Dataset (5.3w h) | - | 1.18 GB |Pre-trained Wav2vec2.0 Model | - | - | - | 
 [Wav2vec2ASR-large-960h-librispeech Model](https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr3/wav2vec2ASR-large-960h-librispeech_ckpt_1.3.1.model.tar.gz) | wav2vec2 | Librispeech and LV-60k Dataset (5.3w h) | Librispeech (960 h) | 718 MB |Encoder: Wav2vec2.0, Decoder: CTC, Decoding method: Greedy search | - | 0.0189 | [Wav2vecASR Librispeech ASR3](../../examples/librispeech/asr3) |
+[Wav2vec2-large-wenetspeech-self Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr3/wav2vec2-large-wenetspeech-self_ckpt_1.3.0.model.tar.gz) | wav2vec2 | Wenetspeech Dataset (1w h) | - | 714 MB |Pre-trained Wav2vec2.0 Model | - | - | - | 
+[Wav2vec2ASR-large-aishell1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr3/wav2vec2ASR-large-aishell1_ckpt_1.3.0.model.tar.gz) | wav2vec2 | Wenetspeech Dataset (1w h) | aishell1 (train set) | 1.17 GB |Encoder: Wav2vec2.0, Decoder: CTC, Decoding method: Greedy search | 0.0453 | - | - |
+
+### Whisper Model
+Demo Link | Training Data | Size | Descriptions | CER | Model 
+:-----------: | :-----:| :-------: | :-----: | :-----: |:---------:|
+[Whisper](../../demos/whisper) | 680kh from internet | large: 5.8G,</br>medium: 2.9G,</br>small: 923M,</br>base: 277M,</br>tiny: 145M | Encoder:Transformer,</br> Decoder:Transformer, </br>Decoding method: </br>Greedy search | 2.7 </br>(large, Librispeech) | [whisper-large](https://paddlespeech.bj.bcebos.com/whisper/whisper_model_20221122/whisper-large-model.tar.gz) </br>[whisper-medium](https://paddlespeech.bj.bcebos.com/whisper/whisper_model_20221122/whisper-medium-model.tar.gz) </br>[whisper-medium-English-only](https://paddlespeech.bj.bcebos.com/whisper/whisper_model_20221122/whisper-medium-en-model.tar.gz) </br>[whisper-small](https://paddlespeech.bj.bcebos.com/whisper/whisper_model_20221122/whisper-small-model.tar.gz) </br>[whisper-small-English-only](https://paddlespeech.bj.bcebos.com/whisper/whisper_model_20221122/whisper-small-en-model.tar.gz) </br>[whisper-base](https://paddlespeech.bj.bcebos.com/whisper/whisper_model_20221122/whisper-base-model.tar.gz) </br>[whisper-base-English-only](https://paddlespeech.bj.bcebos.com/whisper/whisper_model_20221122/whisper-base-en-model.tar.gz) </br>[whisper-tiny](https://paddlespeech.bj.bcebos.com/whisper/whisper_model_20221122/whisper-tiny-model.tar.gz) </br>[whisper-tiny-English-only](https://paddlespeech.bj.bcebos.com/whisper/whisper_model_20221122/whisper-tiny-en-model.tar.gz)

 ### Language Model based on NGram
-Language Model | Training Data | Token-based | Size | Descriptions
-:------------:| :------------:|:------------: | :------------: | :------------:
+|Language Model | Training Data | Token-based | Size | Descriptions|
+| :------------: | :------------: | :------------: | :------------: | :------------: |
 [English LM](https://deepspeech.bj.bcebos.com/en_lm/common_crawl_00.prune01111.trie.klm) |  [CommonCrawl(en.00)](http://web-language-models.s3-website-us-east-1.amazonaws.com/ngrams/en/deduped/en.00.deduped.xz) | Word-based | 8.3 GB | Pruned with 0 1 1 1 1; <br/> About 1.85 billion n-grams; <br/> 'trie'  binary with '-a 22 -q 8 -b 8'
 [Mandarin LM Small](https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm) | Baidu Internal Corpus | Char-based | 2.8 GB | Pruned with 0 1 2 4 4; <br/> About 0.13 billion n-grams; <br/> 'probing' binary with default settings
 [Mandarin LM Large](https://deepspeech.bj.bcebos.com/zh_lm/zhidao_giga.klm) | Baidu Internal Corpus | Char-based | 70.4 GB | No Pruning; <br/> About 3.7 billion n-grams; <br/> 'probing' binary with default settings
--- a/examples/csmsc/tts3_rhy/README.md
+++ b/examples/csmsc/tts3_rhy/README.md
@ -0,0 +1,74 @@
+# This example mainly follows the FastSpeech2 with CSMSC
+This example contains code used to train a rhythm version of [Fastspeech2](https://arxiv.org/abs/2006.04558) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html).
+
+## Dataset
+### Download and Extract
+Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
+
+### Get MFA Result and Extract
+We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2.
+You can directly download the rhythm version of MFA result from here [baker_alignment_tone.zip](https://paddlespeech.bj.bcebos.com/Rhy_e2e/baker_alignment_tone.zip), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) of our repo.
+Remember in our repo, you should add `--rhy-with-duration` flag to obtain the rhythm information.
+
+## Get Started
+Assume the path to the dataset is `~/datasets/BZNSYP`.
+Assume the path to the MFA result of CSMSC is `./baker_alignment_tone`.
+Run the command below to
+1. **source path**.
+2. preprocess the dataset.
+3. train the model.
+4. synthesize wavs.
+    - synthesize waveform from `metadata.jsonl`.
+    - synthesize waveform from a text file.
+5. inference using the static model.
+```bash
+./run.sh
+```
+You can choose a range of stages you want to run, or set `stage` equal to `stop-stage` to use only one stage, for example, running the following command will only preprocess the dataset.
+```bash
+./run.sh --stage 0 --stop-stage 0
+```
+### Data Preprocessing
+```bash
+./local/preprocess.sh ${conf_path}
+```
+When it is done. A `dump` folder is created in the current directory. The structure of the dump folder is listed below.
+
+```text
+dump
+├── dev
+│   ├── norm
+│   └── raw
+├── phone_id_map.txt
+├── speaker_id_map.txt
+├── test
+│   ├── norm
+│   └── raw
+└── train
+    ├── energy_stats.npy
+    ├── norm
+    ├── pitch_stats.npy
+    ├── raw
+    └── speech_stats.npy
+```
+The dataset is split into 3 parts, namely `train`, `dev`, and` test`, each of which contains a `norm` and `raw` subfolder. The raw folder contains speech、pitch and energy features of each utterance, while the norm folder contains normalized ones. The statistics used to normalize features are computed from the training set, which is located in `dump/train/*_stats.npy`.
+
+Also, there is a `metadata.jsonl` in each subfolder. It is a table-like file that contains phones, text_lengths, speech_lengths, durations, the path of speech features, the path of pitch features, the path of energy features, speaker, and the id of each utterance.
+
+# For more details, You can refer to [FastSpeech2 with CSMSC](../tts3)
+
+## Pretrained Model
+Pretrained FastSpeech2 model for end-to-end rhythm version:
+- [fastspeech2_rhy_csmsc_ckpt_1.3.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_rhy_csmsc_ckpt_1.3.0.zip)
+
+This FastSpeech2 checkpoint contains files listed below.
+```text
+fastspeech2_rhy_csmsc_ckpt_1.3.0
+├── default.yaml             # default config used to train fastspeech2
+├── phone_id_map.txt         # phone vocabulary file when training fastspeech2
+├── snapshot_iter_153000.pdz # model parameters and optimizer states
+├── durations.txt            # the intermediate output of preprocess.sh
+├── energy_stats.npy
+├── pitch_stats.npy
+└── speech_stats.npy         # statistics used to normalize spectrogram when training fastspeech2
+```
--- a/examples/csmsc/tts3_rhy/conf/default.yaml
+++ b/examples/csmsc/tts3_rhy/conf/default.yaml
@ -0,0 +1 @@
+../../tts3/conf/default.yaml
--- a/examples/csmsc/tts3_rhy/local/preprocess.sh
+++ b/examples/csmsc/tts3_rhy/local/preprocess.sh
@ -0,0 +1 @@
+../../tts3/local/preprocess.sh
--- a/examples/csmsc/tts3_rhy/local/synthesize.sh
+++ b/examples/csmsc/tts3_rhy/local/synthesize.sh
@ -0,0 +1 @@
+../../tts3/local/synthesize.sh
--- a/examples/csmsc/tts3_rhy/local/synthesize_e2e.sh
+++ b/examples/csmsc/tts3_rhy/local/synthesize_e2e.sh
@ -0,0 +1,119 @@
+#!/bin/bash
+
+config_path=$1
+train_output_path=$2
+ckpt_name=$3
+
+stage=0
+stop_stage=0
+
+# pwgan
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+    FLAGS_allocator_strategy=naive_best_fit \
+    FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+    python3 ${BIN_DIR}/../synthesize_e2e.py \
+        --am=fastspeech2_csmsc \
+        --am_config=${config_path} \
+        --am_ckpt=${train_output_path}/checkpoints/${ckpt_name} \
+        --am_stat=dump/train/speech_stats.npy \
+        --voc=pwgan_csmsc \
+        --voc_config=pwg_baker_ckpt_0.4/pwg_default.yaml \
+        --voc_ckpt=pwg_baker_ckpt_0.4/pwg_snapshot_iter_400000.pdz \
+        --voc_stat=pwg_baker_ckpt_0.4/pwg_stats.npy \
+        --lang=zh \
+        --text=${BIN_DIR}/../sentences.txt \
+        --output_dir=${train_output_path}/test_e2e \
+        --phones_dict=dump/phone_id_map.txt \
+        --inference_dir=${train_output_path}/inference \
+        --use_rhy=True
+fi
+
+# for more GAN Vocoders
+# multi band melgan
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+    FLAGS_allocator_strategy=naive_best_fit \
+    FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+    python3 ${BIN_DIR}/../synthesize_e2e.py \
+        --am=fastspeech2_csmsc \
+        --am_config=${config_path} \
+        --am_ckpt=${train_output_path}/checkpoints/${ckpt_name} \
+        --am_stat=dump/train/speech_stats.npy \
+        --voc=mb_melgan_csmsc \
+        --voc_config=mb_melgan_csmsc_ckpt_0.1.1/default.yaml \
+        --voc_ckpt=mb_melgan_csmsc_ckpt_0.1.1/snapshot_iter_1000000.pdz\
+        --voc_stat=mb_melgan_csmsc_ckpt_0.1.1/feats_stats.npy \
+        --lang=zh \
+        --text=${BIN_DIR}/../sentences.txt \
+        --output_dir=${train_output_path}/test_e2e \
+        --phones_dict=dump/phone_id_map.txt \
+        --inference_dir=${train_output_path}/inference \
+        --use_rhy=True
+fi
+
+# the pretrained models haven't release now
+# style melgan
+# style melgan's Dygraph to Static Graph is not ready now
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+    FLAGS_allocator_strategy=naive_best_fit \
+    FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+    python3 ${BIN_DIR}/../synthesize_e2e.py \
+        --am=fastspeech2_csmsc \
+        --am_config=${config_path} \
+        --am_ckpt=${train_output_path}/checkpoints/${ckpt_name} \
+        --am_stat=dump/train/speech_stats.npy \
+        --voc=style_melgan_csmsc \
+        --voc_config=style_melgan_csmsc_ckpt_0.1.1/default.yaml \
+        --voc_ckpt=style_melgan_csmsc_ckpt_0.1.1/snapshot_iter_1500000.pdz \
+        --voc_stat=style_melgan_csmsc_ckpt_0.1.1/feats_stats.npy \
+        --lang=zh \
+        --text=${BIN_DIR}/../sentences.txt \
+        --output_dir=${train_output_path}/test_e2e \
+        --phones_dict=dump/phone_id_map.txt \
+        --use_rhy=True
+        # --inference_dir=${train_output_path}/inference
+fi
+
+# hifigan
+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
+    echo "in hifigan syn_e2e"
+    FLAGS_allocator_strategy=naive_best_fit \
+    FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+    python3 ${BIN_DIR}/../synthesize_e2e.py \
+        --am=fastspeech2_csmsc \
+        --am_config=${config_path} \
+        --am_ckpt=${train_output_path}/checkpoints/${ckpt_name} \
+        --am_stat=dump/train/speech_stats.npy \
+        --voc=hifigan_csmsc \
+        --voc_config=hifigan_csmsc_ckpt_0.1.1/default.yaml \
+        --voc_ckpt=hifigan_csmsc_ckpt_0.1.1/snapshot_iter_2500000.pdz \
+        --voc_stat=hifigan_csmsc_ckpt_0.1.1/feats_stats.npy \
+        --lang=zh \
+        --text=${BIN_DIR}/../sentences.txt \
+        --output_dir=${train_output_path}/test_e2e \
+        --phones_dict=dump/phone_id_map.txt \
+        --inference_dir=${train_output_path}/inference \
+        --use_rhy=True
+fi
+
+
+# wavernn
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+    echo "in wavernn syn_e2e"
+    FLAGS_allocator_strategy=naive_best_fit \
+    FLAGS_fraction_of_gpu_memory_to_use=0.01 \
+    python3 ${BIN_DIR}/../synthesize_e2e.py \
+        --am=fastspeech2_csmsc \
+        --am_config=${config_path} \
+        --am_ckpt=${train_output_path}/checkpoints/${ckpt_name} \
+        --am_stat=dump/train/speech_stats.npy \
+        --voc=wavernn_csmsc \
+        --voc_config=wavernn_csmsc_ckpt_0.2.0/default.yaml \
+        --voc_ckpt=wavernn_csmsc_ckpt_0.2.0/snapshot_iter_400000.pdz \
+        --voc_stat=wavernn_csmsc_ckpt_0.2.0/feats_stats.npy \
+        --lang=zh \
+        --text=${BIN_DIR}/../sentences.txt \
+        --output_dir=${train_output_path}/test_e2e \
+        --phones_dict=dump/phone_id_map.txt \
+        --inference_dir=${train_output_path}/inference \
+        --use_rhy=True
+fi
--- a/examples/csmsc/tts3_rhy/local/train.sh
+++ b/examples/csmsc/tts3_rhy/local/train.sh
@ -0,0 +1 @@
+../../tts3/local/train.sh
--- a/examples/csmsc/tts3_rhy/path.sh
+++ b/examples/csmsc/tts3_rhy/path.sh
@ -0,0 +1 @@
+../tts3/path.sh
--- a/examples/csmsc/tts3_rhy/run.sh
+++ b/examples/csmsc/tts3_rhy/run.sh
@ -0,0 +1,38 @@
+#!/bin/bash
+
+set -e
+source path.sh
+
+gpus=0,1
+stage=0
+stop_stage=100
+
+conf_path=conf/default.yaml
+train_output_path=exp/default
+ckpt_name=snapshot_iter_153.pdz
+
+# with the following command, you can choose the stage range you want to run
+# such as `./run.sh --stage 0 --stop-stage 0`
+# this can not be mixed use with `$1`, `$2` ...
+source ${MAIN_ROOT}/utils/parse_options.sh || exit 1
+
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+    # prepare data
+    ### please place the mfa result of rhythm here
+    ./local/preprocess.sh ${conf_path} || exit -1
+fi
+
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+    # train model, all `ckpt` under `train_output_path/checkpoints/` dir
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${train_output_path} || exit -1
+fi
+
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+    # synthesize, vocoder is pwgan by default
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
+fi
+
+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
+    # synthesize_e2e, vocoder is pwgan by default
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
+fi
--- a/paddlespeech/cli/ssl/infer.py
+++ b/paddlespeech/cli/ssl/infer.py
@ -25,6 +25,7 @@ import librosa
 import numpy as np
 import paddle
 import soundfile
+from paddlenlp.transformers import AutoTokenizer
 from yacs.config import CfgNode

 from ..executor import BaseExecutor
@ -50,7 +51,7 @@ class SSLExecutor(BaseExecutor):
        self.parser.add_argument(
            '--model',
            type=str,
-            default='wav2vec2ASR_librispeech',
+            default=None,
            choices=[
                tag[:tag.index('-')]
                for tag in self.task_resource.pretrained_models.keys()
@ -123,7 +124,7 @@ class SSLExecutor(BaseExecutor):
            help='Increase logger verbosity of current task.')

    def _init_from_path(self,
-                        model_type: str='wav2vec2ASR_librispeech',
+                        model_type: str=None,
                        task: str='asr',
                        lang: str='en',
                        sample_rate: int=16000,
@ -134,6 +135,18 @@ class SSLExecutor(BaseExecutor):
        Init model and other resources from a specific path.
        """
        logger.debug("start to init the model")
+
+        if model_type is None:
+            if lang == 'en':
+                model_type = 'wav2vec2ASR_librispeech'
+            elif lang == 'zh':
+                model_type = 'wav2vec2ASR_aishell1'
+            else:
+                logger.error(
+                    "invalid lang, please input --lang en or --lang zh")
+            logger.debug(
+                "Model type had not been specified, default {} was used.".
+                format(model_type))
        # default max_len: unit:second
        self.max_len = 50
        if hasattr(self, 'model'):
@ -167,9 +180,13 @@ class SSLExecutor(BaseExecutor):
        self.config.merge_from_file(self.cfg_path)
        if task == 'asr':
            with UpdateConfig(self.config):
-                self.text_feature = TextFeaturizer(
-                    unit_type=self.config.unit_type,
-                    vocab=self.config.vocab_filepath)
+                if lang == 'en':
+                    self.text_feature = TextFeaturizer(
+                        unit_type=self.config.unit_type,
+                        vocab=self.config.vocab_filepath)
+                elif lang == 'zh':
+                    self.text_feature = AutoTokenizer.from_pretrained(
+                        self.config.tokenizer)
                self.config.decode.decoding_method = decode_method
            model_name = model_type[:model_type.rindex(
                '_')]  # model_type: {model_name}_{dataset}
@ -253,7 +270,8 @@ class SSLExecutor(BaseExecutor):
                    audio,
                    text_feature=self.text_feature,
                    decoding_method=cfg.decoding_method,
-                    beam_size=cfg.beam_size)
+                    beam_size=cfg.beam_size,
+                    tokenizer=getattr(self.config, 'tokenizer', None))
                self._outputs["result"] = result_transcripts[0][0]
            except Exception as e:
                logger.exception(e)
@ -413,7 +431,7 @@ class SSLExecutor(BaseExecutor):
    @stats_wrapper
    def __call__(self,
                 audio_file: os.PathLike,
-                 model: str='wav2vec2ASR_librispeech',
+                 model: str=None,
                 task: str='asr',
                 lang: str='en',
                 sample_rate: int=16000,
--- a/paddlespeech/resource/pretrained_models.py
+++ b/paddlespeech/resource/pretrained_models.py
@ -70,6 +70,38 @@ ssl_dynamic_pretrained_models = {
            'exp/wav2vec2ASR/checkpoints/avg_1.pdparams',
        },
    },
+    "wav2vec2-zh-16k": {
+        '1.3': {
+            'url':
+            'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr3/wav2vec2-large-wenetspeech-self_ckpt_1.3.0.model.tar.gz',
+            'md5':
+            '00ea4975c05d1bb58181205674052fe1',
+            'cfg_path':
+            'model.yaml',
+            'ckpt_path':
+            'chinese-wav2vec2-large',
+            'model':
+            'chinese-wav2vec2-large.pdparams',
+            'params':
+            'chinese-wav2vec2-large.pdparams',
+        },
+    },
+    "wav2vec2ASR_aishell1-zh-16k": {
+        '1.3': {
+            'url':
+            'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr3/wav2vec2ASR-large-aishell1_ckpt_1.3.0.model.tar.gz',
+            'md5':
+            'ac8fa0a6345e6a7535f6fabb5e59e218',
+            'cfg_path':
+            'model.yaml',
+            'ckpt_path':
+            'exp/wav2vec2ASR/checkpoints/avg_1',
+            'model':
+            'exp/wav2vec2ASR/checkpoints/avg_1.pdparams',
+            'params':
+            'exp/wav2vec2ASR/checkpoints/avg_1.pdparams',
+        },
+    },
 }

 # ---------------------------------
@ -1658,3 +1690,16 @@ g2pw_onnx_models = {
        },
    },
 }
+
+# ---------------------------------
+# ------------- Rhy_frontend ---------------
+# ---------------------------------
+rhy_frontend_models = {
+    'rhy_e2e': {
+        '1.0': {
+            'url':
+            'https://paddlespeech.bj.bcebos.com/Rhy_e2e/rhy_frontend.zip',
+            'md5': '6624a77393de5925d5a84400b363d8ef',
+        },
+    },
+}
--- a/paddlespeech/s2t/models/wav2vec2/modules/modeling_wav2vec2.py
+++ b/paddlespeech/s2t/models/wav2vec2/modules/modeling_wav2vec2.py
@ -1173,10 +1173,6 @@ class Wav2Vec2ConfigPure():
        self.proj_codevector_dim = config.proj_codevector_dim
        self.diversity_loss_weight = config.diversity_loss_weight

-        # ctc loss
-        self.ctc_loss_reduction = config.ctc_loss_reduction
-        self.ctc_zero_infinity = config.ctc_zero_infinity
-
        # adapter
        self.add_adapter = config.add_adapter
        self.adapter_kernel_size = config.adapter_kernel_size
--- a/paddlespeech/s2t/models/wav2vec2/wav2vec2_ASR.py
+++ b/paddlespeech/s2t/models/wav2vec2/wav2vec2_ASR.py
@ -76,28 +76,66 @@ class Wav2vec2ASR(nn.Layer):
               feats: paddle.Tensor,
               text_feature: Dict[str, int],
               decoding_method: str,
-               beam_size: int):
+               beam_size: int,
+               tokenizer: str=None):
        batch_size = feats.shape[0]

        if decoding_method == 'ctc_prefix_beam_search' and batch_size > 1:
-            logger.error(
-                f'decoding mode {decoding_method} must be running with batch_size == 1'
+            raise ValueError(
+                f"decoding mode {decoding_method} must be running with batch_size == 1"
            )
-            logger.error(f"current batch_size is {batch_size}")
-            sys.exit(1)

        if decoding_method == 'ctc_greedy_search':
-            hyps = self.ctc_greedy_search(feats)
-            res = [text_feature.defeaturize(hyp) for hyp in hyps]
-            res_tokenids = [hyp for hyp in hyps]
+            if tokenizer is None:
+                hyps = self.ctc_greedy_search(feats)
+                res = [text_feature.defeaturize(hyp) for hyp in hyps]
+                res_tokenids = [hyp for hyp in hyps]
+            else:
+                hyps = self.ctc_greedy_search(feats)
+                res = []
+                res_tokenids = []
+                for sequence in hyps:
+                    # Decode token terms to words
+                    predicted_tokens = text_feature.convert_ids_to_tokens(
+                        sequence)
+                    tmp_res = []
+                    tmp_res_tokenids = []
+                    for c in predicted_tokens:
+                        if c == "[CLS]":
+                            continue
+                        elif c == "[SEP]" or c == "[PAD]":
+                            break
+                        else:
+                            tmp_res.append(c)
+                            tmp_res_tokenids.append(text_feature.vocab[c])
+                    res.append(''.join(tmp_res))
+                    res_tokenids.append(tmp_res_tokenids)
        # ctc_prefix_beam_search and attention_rescoring only return one
        # result in List[int], change it to List[List[int]] for compatible
        # with other batch decoding mode
        elif decoding_method == 'ctc_prefix_beam_search':
            assert feats.shape[0] == 1
-            hyp = self.ctc_prefix_beam_search(feats, beam_size)
-            res = [text_feature.defeaturize(hyp)]
-            res_tokenids = [hyp]
+            if tokenizer is None:
+                hyp = self.ctc_prefix_beam_search(feats, beam_size)
+                res = [text_feature.defeaturize(hyp)]
+                res_tokenids = [hyp]
+            else:
+                hyp = self.ctc_prefix_beam_search(feats, beam_size)
+                res = []
+                res_tokenids = []
+                predicted_tokens = text_feature.convert_ids_to_tokens(hyp)
+                tmp_res = []
+                tmp_res_tokenids = []
+                for c in predicted_tokens:
+                    if c == "[CLS]":
+                        continue
+                    elif c == "[SEP]" or c == "[PAD]":
+                        break
+                    else:
+                        tmp_res.append(c)
+                        tmp_res_tokenids.append(text_feature.vocab[c])
+                res.append(''.join(tmp_res))
+                res_tokenids.append(tmp_res_tokenids)
        else:
            raise ValueError(
                f"wav2vec2 not support decoding method: {decoding_method}")
--- a/paddlespeech/t2s/exps/lite_predict.py
+++ b/paddlespeech/t2s/exps/lite_predict.py
@ -17,10 +17,10 @@ from pathlib import Path
 import soundfile as sf
 from timer import timer

+from paddlespeech.t2s.exps.lite_syn_utils import get_lite_am_output
+from paddlespeech.t2s.exps.lite_syn_utils import get_lite_predictor
+from paddlespeech.t2s.exps.lite_syn_utils import get_lite_voc_output
 from paddlespeech.t2s.exps.syn_utils import get_frontend
-from paddlespeech.t2s.exps.syn_utils import get_lite_am_output
-from paddlespeech.t2s.exps.syn_utils import get_lite_predictor
-from paddlespeech.t2s.exps.syn_utils import get_lite_voc_output
 from paddlespeech.t2s.exps.syn_utils import get_sentences


--- a/paddlespeech/t2s/exps/lite_predict_streaming.py
+++ b/paddlespeech/t2s/exps/lite_predict_streaming.py
@ -18,13 +18,13 @@ import numpy as np
 import soundfile as sf
 from timer import timer

+from paddlespeech.t2s.exps.lite_syn_utils import get_lite_am_sublayer_output
+from paddlespeech.t2s.exps.lite_syn_utils import get_lite_predictor
+from paddlespeech.t2s.exps.lite_syn_utils import get_lite_streaming_am_output
+from paddlespeech.t2s.exps.lite_syn_utils import get_lite_voc_output
 from paddlespeech.t2s.exps.syn_utils import denorm
 from paddlespeech.t2s.exps.syn_utils import get_chunks
 from paddlespeech.t2s.exps.syn_utils import get_frontend
-from paddlespeech.t2s.exps.syn_utils import get_lite_am_sublayer_output
-from paddlespeech.t2s.exps.syn_utils import get_lite_predictor
-from paddlespeech.t2s.exps.syn_utils import get_lite_streaming_am_output
-from paddlespeech.t2s.exps.syn_utils import get_lite_voc_output
 from paddlespeech.t2s.exps.syn_utils import get_sentences
 from paddlespeech.t2s.exps.syn_utils import run_frontend
 from paddlespeech.t2s.utils import str2bool
--- a/paddlespeech/t2s/exps/lite_syn_utils.py
+++ b/paddlespeech/t2s/exps/lite_syn_utils.py
@ -0,0 +1,111 @@
+import os
+from pathlib import Path
+from typing import Optional
+
+import numpy as np
+from paddlelite.lite import create_paddle_predictor
+from paddlelite.lite import MobileConfig
+
+from .syn_utils import run_frontend
+
+
+# Paddle-Lite
+def get_lite_predictor(model_dir: Optional[os.PathLike]=None,
+                       model_file: Optional[os.PathLike]=None,
+                       cpu_threads: int=1):
+    config = MobileConfig()
+    config.set_model_from_file(str(Path(model_dir) / model_file))
+    predictor = create_paddle_predictor(config)
+    return predictor
+
+
+def get_lite_am_output(
+        input: str,
+        am_predictor,
+        am: str,
+        frontend: object,
+        lang: str='zh',
+        merge_sentences: bool=True,
+        speaker_dict: Optional[os.PathLike]=None,
+        spk_id: int=0, ):
+    am_name = am[:am.rindex('_')]
+    am_dataset = am[am.rindex('_') + 1:]
+    get_spk_id = False
+    get_tone_ids = False
+    if am_name == 'speedyspeech':
+        get_tone_ids = True
+    if am_dataset in {"aishell3", "vctk", "mix"} and speaker_dict:
+        get_spk_id = True
+        spk_id = np.array([spk_id])
+
+    frontend_dict = run_frontend(
+        frontend=frontend,
+        text=input,
+        merge_sentences=merge_sentences,
+        get_tone_ids=get_tone_ids,
+        lang=lang)
+
+    if get_tone_ids:
+        tone_ids = frontend_dict['tone_ids']
+        tones = tone_ids[0].numpy()
+        tones_handle = am_predictor.get_input(1)
+        tones_handle.from_numpy(tones)
+
+    if get_spk_id:
+        spk_id_handle = am_predictor.get_input(1)
+        spk_id_handle.from_numpy(spk_id)
+    phone_ids = frontend_dict['phone_ids']
+    phones = phone_ids[0].numpy()
+    phones_handle = am_predictor.get_input(0)
+    phones_handle.from_numpy(phones)
+    am_predictor.run()
+    am_output_handle = am_predictor.get_output(0)
+    am_output_data = am_output_handle.numpy()
+    return am_output_data
+
+
+def get_lite_voc_output(voc_predictor, input):
+    mel_handle = voc_predictor.get_input(0)
+    mel_handle.from_numpy(input)
+    voc_predictor.run()
+    voc_output_handle = voc_predictor.get_output(0)
+    wav = voc_output_handle.numpy()
+    return wav
+
+
+def get_lite_am_sublayer_output(am_sublayer_predictor, input):
+    input_handle = am_sublayer_predictor.get_input(0)
+    input_handle.from_numpy(input)
+
+    am_sublayer_predictor.run()
+    am_sublayer_handle = am_sublayer_predictor.get_output(0)
+    am_sublayer_output = am_sublayer_handle.numpy()
+    return am_sublayer_output
+
+
+def get_lite_streaming_am_output(input: str,
+                                 am_encoder_infer_predictor,
+                                 am_decoder_predictor,
+                                 am_postnet_predictor,
+                                 frontend,
+                                 lang: str='zh',
+                                 merge_sentences: bool=True):
+    get_tone_ids = False
+    frontend_dict = run_frontend(
+        frontend=frontend,
+        text=input,
+        merge_sentences=merge_sentences,
+        get_tone_ids=get_tone_ids,
+        lang=lang)
+    phone_ids = frontend_dict['phone_ids']
+    phones = phone_ids[0].numpy()
+    am_encoder_infer_output = get_lite_am_sublayer_output(
+        am_encoder_infer_predictor, input=phones)
+    am_decoder_output = get_lite_am_sublayer_output(
+        am_decoder_predictor, input=am_encoder_infer_output)
+    am_postnet_output = get_lite_am_sublayer_output(
+        am_postnet_predictor, input=np.transpose(am_decoder_output, (0, 2, 1)))
+    am_output_data = am_decoder_output + np.transpose(am_postnet_output,
+                                                      (0, 2, 1))
+    normalized_mel = am_output_data[0]
+    return normalized_mel
--- a/paddlespeech/t2s/exps/syn_utils.py
+++ b/paddlespeech/t2s/exps/syn_utils.py
@ -26,8 +26,6 @@ import paddle
 from paddle import inference
 from paddle import jit
 from paddle.static import InputSpec
-from paddlelite.lite import create_paddle_predictor
-from paddlelite.lite import MobileConfig
 from yacs.config import CfgNode

 from paddlespeech.t2s.datasets.data_table import DataTable
@ -163,10 +161,13 @@ def get_test_dataset(test_metadata: List[Dict[str, Any]],
 # frontend
 def get_frontend(lang: str='zh',
                 phones_dict: Optional[os.PathLike]=None,
-                 tones_dict: Optional[os.PathLike]=None):
+                 tones_dict: Optional[os.PathLike]=None,
+                 use_rhy=False):
    if lang == 'zh':
        frontend = Frontend(
-            phone_vocab_path=phones_dict, tone_vocab_path=tones_dict)
+            phone_vocab_path=phones_dict,
+            tone_vocab_path=tones_dict,
+            use_rhy=use_rhy)
    elif lang == 'en':
        frontend = English(phone_vocab_path=phones_dict)
    elif lang == 'mix':
@ -512,105 +513,3 @@ def get_sess(model_path: Optional[os.PathLike],
    sess = ort.InferenceSession(
        model_path, providers=providers, sess_options=sess_options)
    return sess
-
-
-# Paddle-Lite
-def get_lite_predictor(model_dir: Optional[os.PathLike]=None,
-                       model_file: Optional[os.PathLike]=None,
-                       cpu_threads: int=1):
-    config = MobileConfig()
-    config.set_model_from_file(str(Path(model_dir) / model_file))
-    predictor = create_paddle_predictor(config)
-    return predictor
-
-
-def get_lite_am_output(
-        input: str,
-        am_predictor,
-        am: str,
-        frontend: object,
-        lang: str='zh',
-        merge_sentences: bool=True,
-        speaker_dict: Optional[os.PathLike]=None,
-        spk_id: int=0, ):
-    am_name = am[:am.rindex('_')]
-    am_dataset = am[am.rindex('_') + 1:]
-    get_spk_id = False
-    get_tone_ids = False
-    if am_name == 'speedyspeech':
-        get_tone_ids = True
-    if am_dataset in {"aishell3", "vctk", "mix"} and speaker_dict:
-        get_spk_id = True
-        spk_id = np.array([spk_id])
-
-    frontend_dict = run_frontend(
-        frontend=frontend,
-        text=input,
-        merge_sentences=merge_sentences,
-        get_tone_ids=get_tone_ids,
-        lang=lang)
-
-    if get_tone_ids:
-        tone_ids = frontend_dict['tone_ids']
-        tones = tone_ids[0].numpy()
-        tones_handle = am_predictor.get_input(1)
-        tones_handle.from_numpy(tones)
-
-    if get_spk_id:
-        spk_id_handle = am_predictor.get_input(1)
-        spk_id_handle.from_numpy(spk_id)
-    phone_ids = frontend_dict['phone_ids']
-    phones = phone_ids[0].numpy()
-    phones_handle = am_predictor.get_input(0)
-    phones_handle.from_numpy(phones)
-    am_predictor.run()
-    am_output_handle = am_predictor.get_output(0)
-    am_output_data = am_output_handle.numpy()
-    return am_output_data
-
-
-def get_lite_voc_output(voc_predictor, input):
-    mel_handle = voc_predictor.get_input(0)
-    mel_handle.from_numpy(input)
-    voc_predictor.run()
-    voc_output_handle = voc_predictor.get_output(0)
-    wav = voc_output_handle.numpy()
-    return wav
-
-
-def get_lite_am_sublayer_output(am_sublayer_predictor, input):
-    input_handle = am_sublayer_predictor.get_input(0)
-    input_handle.from_numpy(input)
-
-    am_sublayer_predictor.run()
-    am_sublayer_handle = am_sublayer_predictor.get_output(0)
-    am_sublayer_output = am_sublayer_handle.numpy()
-    return am_sublayer_output
-
-
-def get_lite_streaming_am_output(input: str,
-                                 am_encoder_infer_predictor,
-                                 am_decoder_predictor,
-                                 am_postnet_predictor,
-                                 frontend,
-                                 lang: str='zh',
-                                 merge_sentences: bool=True):
-    get_tone_ids = False
-    frontend_dict = run_frontend(
-        frontend=frontend,
-        text=input,
-        merge_sentences=merge_sentences,
-        get_tone_ids=get_tone_ids,
-        lang=lang)
-    phone_ids = frontend_dict['phone_ids']
-    phones = phone_ids[0].numpy()
-    am_encoder_infer_output = get_lite_am_sublayer_output(
-        am_encoder_infer_predictor, input=phones)
-    am_decoder_output = get_lite_am_sublayer_output(
-        am_decoder_predictor, input=am_encoder_infer_output)
-    am_postnet_output = get_lite_am_sublayer_output(
-        am_postnet_predictor, input=np.transpose(am_decoder_output, (0, 2, 1)))
-    am_output_data = am_decoder_output + np.transpose(am_postnet_output,
-                                                      (0, 2, 1))
-    normalized_mel = am_output_data[0]
-    return normalized_mel
--- a/paddlespeech/t2s/exps/synthesize_e2e.py
+++ b/paddlespeech/t2s/exps/synthesize_e2e.py
@ -27,6 +27,7 @@ from paddlespeech.t2s.exps.syn_utils import get_sentences
 from paddlespeech.t2s.exps.syn_utils import get_voc_inference
 from paddlespeech.t2s.exps.syn_utils import run_frontend
 from paddlespeech.t2s.exps.syn_utils import voc_to_static
+from paddlespeech.t2s.utils import str2bool


 def evaluate(args):
@ -49,7 +50,8 @@ def evaluate(args):
    frontend = get_frontend(
        lang=args.lang,
        phones_dict=args.phones_dict,
-        tones_dict=args.tones_dict)
+        tones_dict=args.tones_dict,
+        use_rhy=args.use_rhy)
    print("frontend done!")

    # acoustic model
@ -240,6 +242,11 @@ def parse_args():
        type=str,
        help="text to synthesize, a 'utt_id sentence' pair per line.")
    parser.add_argument("--output_dir", type=str, help="output dir.")
+    parser.add_argument(
+        "--use_rhy",
+        type=str2bool,
+        default=False,
+        help="run rhythm frontend or not")

    args = parser.parse_args()
    return args
--- a/paddlespeech/t2s/frontend/rhy_prediction/init.py
+++ b/paddlespeech/t2s/frontend/rhy_prediction/init.py
@ -0,0 +1,14 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .rhy_predictor import *
--- a/paddlespeech/t2s/frontend/rhy_prediction/rhy_predictor.py
+++ b/paddlespeech/t2s/frontend/rhy_prediction/rhy_predictor.py
@ -0,0 +1,106 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import re
+
+import paddle
+import yaml
+from paddlenlp.transformers import ErnieTokenizer
+from yacs.config import CfgNode
+
+from paddlespeech.cli.utils import download_and_decompress
+from paddlespeech.resource.pretrained_models import rhy_frontend_models
+from paddlespeech.text.models.ernie_linear import ErnieLinear
+from paddlespeech.utils.env import MODEL_HOME
+
+DefinedClassifier = {
+    'ErnieLinear': ErnieLinear,
+}
+
+model_version = '1.0'
+
+
+class RhyPredictor():
+    def __init__(
+            self,
+            model_dir: os.PathLike=MODEL_HOME, ):
+        uncompress_path = download_and_decompress(
+            rhy_frontend_models['rhy_e2e'][model_version], model_dir)
+        with open(os.path.join(uncompress_path, 'rhy_default.yaml')) as f:
+            config = CfgNode(yaml.safe_load(f))
+        self.punc_list = []
+        with open(os.path.join(uncompress_path, 'rhy_token'), 'r') as f:
+            for line in f:
+                self.punc_list.append(line.strip())
+        self.punc_list = [0] + self.punc_list
+        self.make_rhy_dict()
+        self.model = DefinedClassifier["ErnieLinear"](**config["model"])
+        pretrained_token = config['data_params']['pretrained_token']
+        self.tokenizer = ErnieTokenizer.from_pretrained(pretrained_token)
+        state_dict = paddle.load(
+            os.path.join(uncompress_path, 'snapshot_iter_2600_main_params.pdz'))
+        self.model.set_state_dict(state_dict)
+        self.model.eval()
+
+    def _clean_text(self, text):
+        text = text.lower()
+        text = re.sub('[^A-Za-z0-9\u4e00-\u9fa5]', '', text)
+        text = re.sub(f'[{"".join([p for p in self.punc_list][1:])}]', '', text)
+        return text
+
+    def preprocess(self, text, tokenizer):
+        clean_text = self._clean_text(text)
+        assert len(clean_text) > 0, f'Invalid input string: {text}'
+        tokenized_input = tokenizer(
+            list(clean_text), return_length=True, is_split_into_words=True)
+        _inputs = dict()
+        _inputs['input_ids'] = tokenized_input['input_ids']
+        _inputs['seg_ids'] = tokenized_input['token_type_ids']
+        _inputs['seq_len'] = tokenized_input['seq_len']
+        return _inputs
+
+    def get_prediction(self, raw_text):
+        _inputs = self.preprocess(raw_text, self.tokenizer)
+        seq_len = _inputs['seq_len']
+        input_ids = paddle.to_tensor(_inputs['input_ids']).unsqueeze(0)
+        seg_ids = paddle.to_tensor(_inputs['seg_ids']).unsqueeze(0)
+        logits, _ = self.model(input_ids, seg_ids)
+        preds = paddle.argmax(logits, axis=-1).squeeze(0)
+        tokens = self.tokenizer.convert_ids_to_tokens(
+            _inputs['input_ids'][1:seq_len - 1])
+        labels = preds[1:seq_len - 1].tolist()
+        assert len(tokens) == len(labels)
+        # add 0 for non punc
+        text = ''
+        for t, l in zip(tokens, labels):
+            text += t
+            if l != 0:  # Non punc.
+                text += self.punc_list[l]
+        return text
+
+    def make_rhy_dict(self):
+        self.rhy_dict = {}
+        for i, p in enumerate(self.punc_list[1:]):
+            self.rhy_dict[p] = 'sp' + str(i + 1)
+
+    def pinyin_align(self, pinyins, rhy_pre):
+        final_py = []
+        j = 0
+        for i in range(len(rhy_pre)):
+            if rhy_pre[i] in self.rhy_dict:
+                final_py.append(self.rhy_dict[rhy_pre[i]])
+            else:
+                final_py.append(pinyins[j])
+                j += 1
+        return final_py
--- a/paddlespeech/t2s/frontend/zh_frontend.py
+++ b/paddlespeech/t2s/frontend/zh_frontend.py
@ -30,6 +30,7 @@ from pypinyin_dict.phrase_pinyin_data import large_pinyin

 from paddlespeech.t2s.frontend.g2pw import G2PWOnnxConverter
 from paddlespeech.t2s.frontend.generate_lexicon import generate_lexicon
+from paddlespeech.t2s.frontend.rhy_prediction.rhy_predictor import RhyPredictor
 from paddlespeech.t2s.frontend.tone_sandhi import ToneSandhi
 from paddlespeech.t2s.frontend.zh_normalization.text_normlization import TextNormalizer
 from paddlespeech.t2s.ssml.xml_processor import MixTextProcessor
@ -82,11 +83,13 @@ class Frontend():
    def __init__(self,
                 g2p_model="g2pW",
                 phone_vocab_path=None,
-                 tone_vocab_path=None):
+                 tone_vocab_path=None,
+                 use_rhy=False):
        self.mix_ssml_processor = MixTextProcessor()
        self.tone_modifier = ToneSandhi()
        self.text_normalizer = TextNormalizer()
        self.punc = "：，；。？！“”‘’':,;.?!"
+        self.rhy_phns = ['sp1', 'sp2', 'sp3', 'sp4']
        self.phrases_dict = {
            '开户行': [['ka1i'], ['hu4'], ['hang2']],
            '发卡行': [['fa4'], ['ka3'], ['hang2']],
@ -105,6 +108,10 @@ class Frontend():
            '嘞': [['lei5']],
            '掺和': [['chan1'], ['huo5']]
        }
+        self.use_rhy = use_rhy
+        if use_rhy:
+            self.rhy_predictor = RhyPredictor()
+            print("Rhythm predictor loaded.")
        # g2p_model can be pypinyin and g2pM and g2pW
        self.g2p_model = g2p_model
        if self.g2p_model == "g2pM":
@ -195,9 +202,13 @@ class Frontend():
        segments = sentences
        phones_list = []
        for seg in segments:
+            if self.use_rhy:
+                seg = self.rhy_predictor._clean_text(seg)
            phones = []
            # Replace all English words in the sentence
            seg = re.sub('[a-zA-Z]+', '', seg)
+            if self.use_rhy:
+                seg = self.rhy_predictor.get_prediction(seg)
            seg_cut = psg.lcut(seg)
            initials = []
            finals = []
@ -205,11 +216,18 @@ class Frontend():
            # 为了多音词获得更好的效果，这里采用整句预测
            if self.g2p_model == "g2pW":
                try:
+                    if self.use_rhy:
+                        seg = self.rhy_predictor._clean_text(seg)
                    pinyins = self.g2pW_model(seg)[0]
                except Exception:
                    # g2pW采用模型采用繁体输入，如果有cover不了的简体词，采用g2pM预测
                    print("[%s] not in g2pW dict,use g2pM" % seg)
                    pinyins = self.g2pM_model(seg, tone=True, char_split=False)
+                if self.use_rhy:
+                    rhy_text = self.rhy_predictor.get_prediction(seg)
+                    final_py = self.rhy_predictor.pinyin_align(pinyins,
+                                                               rhy_text)
+                    pinyins = final_py
                pre_word_length = 0
                for word, pos in seg_cut:
                    sub_initials = []
@ -271,7 +289,7 @@ class Frontend():
                    phones.append(c)
                if c and c in self.punc:
                    phones.append('sp')
-                if v and v not in self.punc:
+                if v and v not in self.punc and v not in self.rhy_phns:
                    phones.append(v)
            phones_list.append(phones)
        if merge_sentences:
@ -330,7 +348,7 @@ class Frontend():
                phones.append(c)
            if c and c in self.punc:
                phones.append('sp')
-            if v and v not in self.punc:
+            if v and v not in self.punc and v not in self.rhy_phns:
                phones.append(v)
        phones_list.append(phones)
        if merge_sentences:
@ -504,6 +522,11 @@ class Frontend():
            print("----------------------------")
        return [sum(all_phonemes, [])]

+    def add_sp_if_no(self, phonemes):
+        if not phonemes[-1][-1].startswith('sp'):
+            phonemes[-1].append('sp4')
+        return phonemes
+
    def get_input_ids(self,
                      sentence: str,
                      merge_sentences: bool=True,
@ -519,6 +542,8 @@ class Frontend():
            merge_sentences=merge_sentences,
            print_info=print_info,
            robot=robot)
+        if self.use_rhy:
+            phonemes = self.add_sp_if_no(phonemes)
        result = {}
        phones = []
        tones = []
--- a/setup.py
+++ b/setup.py
@ -47,7 +47,7 @@ base = [
    "onnxruntime==1.10.0",
    "opencc",
    "pandas",
-    "paddlenlp",
+    "paddlenlp>=2.4.3",
    "paddlespeech_feat",
    "Pillow>=9.0.0",
    "praatio==5.0.0",
@ -71,11 +71,10 @@ base = [
    "prettytable",
    "zhon",
    "colorlog",
-    "pathos == 0.2.8",
+    "pathos==0.2.8",
    "braceexpand",
    "pyyaml",
    "pybind11",
-    "paddlelite",
    "paddleslim==2.3.4",
 ]

--- a/speechx/README.md
+++ b/speechx/README.md
@ -22,7 +22,7 @@ We develop under:
 1. First to launch docker container.

 ```
-docker run --privileged  --net=host --ipc=host -it --rm -v $PWD:/workspace --name=dev registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7 /bin/bash
+docker run --privileged  --net=host --ipc=host -it --rm -v /path/to/paddlespeech:/workspace --name=dev registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7 /bin/bash
 ```

 * More `Paddle` docker images you can see [here](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/docker/linux-docker.html).
--- a/speechx/examples/custom_asr/README.md
+++ b/speechx/examples/custom_asr/README.md
@ -1,4 +1,4 @@
-# customized Auto Speech Recognition
+# Customized ASR

 ## introduction
 These scripts are tutorials to show you how build your own decoding graph.
--- a/speechx/examples/ds2_ol/README.md
+++ b/speechx/examples/ds2_ol/README.md
@ -4,3 +4,4 @@

 * `websocket` - Streaming ASR with websocket for deepspeech2_aishell.    
 * `aishell` - Streaming Decoding under aishell dataset, for local WER test.    
+* `onnx` - Example to convert deepspeech2 to onnx format.
--- a/speechx/examples/ds2_ol/aishell/README.md
+++ b/speechx/examples/ds2_ol/aishell/README.md
@ -1,12 +1,57 @@
 # Aishell - Deepspeech2 Streaming

-## How to run
+> We recommend using U2/U2++ model instead of DS2, please see [here](../../u2pp_ol/wenetspeech/).

+A C++ deployment example for using the deepspeech2 model to recognize `wav` and compute `CER`. We using AISHELL-1 as test data.
+
+## Source path.sh
+
+```bash
+. path.sh
 ```
+
+SpeechX bins is under `echo $SPEECHX_BUILD`, more info please see `path.sh`.
+
+## Recognize with linear feature
+
+```bash
 bash run.sh
 ```

-## Results
+`run.sh` has multi stage, for details please see `run.sh`: 
+
+1. donwload dataset, model and lm
+2. convert cmvn format and compute feature
+3. decode w/o lm by feature
+4. decode w/ ngram lm by feature
+5. decode w/ TLG graph by feature
+6. recognize w/ TLG graph by wav input
+
+### Recognize with `.scp` file for wav
+
+This sciprt using `recognizer_main` to recognize wav file.
+
+The input is `scp` file which look like this:
+```text
+# head data/split1/1/aishell_test.scp 
+BAC009S0764W0121        /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0121.wav
+BAC009S0764W0122        /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0122.wav
+...
+BAC009S0764W0125        /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0125.wav
+```
+
+If you want to recognize one wav, you can make `scp` file like this:
+```text
+key  path/to/wav/file
+```
+
+Then specify `--wav_rspecifier=` param for `recognizer_main` bin. For other flags meaning, please see `help`:
+```bash
+recognizer_main --help
+```
+
+For the exmaple to using `recognizer_main` please see `run.sh`.
+

 ### CTC Prefix Beam Search w/o LM

@ -25,7 +70,7 @@ Mandarin -> 7.86 % N=104768 C=96865 S=7573 D=330 I=327
 Other -> 0.00 % N=0 C=0 S=0 D=0 I=0
 ```

-### CTC WFST
+### CTC TLG WFST

 LM: [aishell train](http://paddlespeech.bj.bcebos.com/speechx/examples/ds2_ol/aishell/aishell_graph.zip)
 --acoustic_scale=1.2
@ -43,8 +88,11 @@ Mandarin -> 10.93 % N=104762 C=93410 S=9779 D=1573 I=95
 Other -> 100.00 % N=3 C=0 S=1 D=2 I=0
 ```

-## fbank
-```
+## Recognize with fbank feature
+
+This script is same to `run.sh`, but using fbank feature.
+
+```bash
 bash run_fbank.sh
 ```

@ -66,7 +114,7 @@ Mandarin -> 5.82 % N=104762 C=99386 S=4941 D=435 I=720
 English -> 0.00 % N=0 C=0 S=0 D=0 I=0
 ```

-### CTC WFST
+### CTC TLG WFST

 LM: [aishell train](https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_graph2.zip)
 ```
@ -75,7 +123,11 @@ Mandarin -> 9.57 % N=104762 C=94817 S=4325 D=5620 I=84
 Other -> 100.00 % N=3 C=0 S=1 D=2 I=0
 ```

-## build TLG graph 
-```
- bash run_build_tlg.sh
+## Build TLG WFST graph 
+
+The script is for building TLG wfst graph, depending on `srilm`, please make sure it is installed.
+For more information please see the script below.
+
+```bash
+ bash ./local/run_build_tlg.sh
 ```
--- a/speechx/examples/ds2_ol/aishell/local/run_build_tlg.sh
+++ b/speechx/examples/ds2_ol/aishell/local/run_build_tlg.sh
@ -22,6 +22,7 @@ mkdir -p $data

 if [ $stage -le -1 ] && [ $stop_stage -ge -1 ]; then
    if [ ! -f $data/speech.ngram.zh.tar.gz ];then
+        # download ngram
        pushd $data
        wget -c http://paddlespeech.bj.bcebos.com/speechx/examples/ngram/zh/speech.ngram.zh.tar.gz
        tar xvzf speech.ngram.zh.tar.gz
@ -29,6 +30,7 @@ if [ $stage -le -1 ] && [ $stop_stage -ge -1 ]; then
    fi

    if [ ! -f $ckpt_dir/data/mean_std.json ]; then
+        # download model
        mkdir -p $ckpt_dir
        pushd $ckpt_dir
        wget -c https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr0/WIP1_asr0_deepspeech2_online_wenetspeech_ckpt_1.0.0a.model.tar.gz
@ -43,6 +45,7 @@ if [ ! -f $unit ]; then
 fi

 if ! which ngram-count; then
+    # need srilm install
    pushd $MAIN_ROOT/tools
    make srilm.done
    popd
@ -71,7 +74,7 @@ lm=data/local/lm
 mkdir -p $lm

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
-    # Train lm
+    # Train ngram lm
    cp $text $lm/text
    local/aishell_train_lms.sh
    echo "build LM done."
@ -94,8 +97,8 @@ cmvn=$data/cmvn_fbank.ark
 wfst=$data/lang_test

 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
-
    if [ ! -d $data/test ]; then
+        # download test dataset
        pushd $data
        wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_test.zip
        unzip  aishell_test.zip
@ -107,7 +110,8 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    fi

    ./local/split_data.sh $data $data/$aishell_wav_scp $aishell_wav_scp $nj
-
+    
+    # convert cmvn format
    cmvn-json2kaldi --json_file=$ckpt_dir/data/mean_std.json --cmvn_write_path=$cmvn
 fi

@ -116,7 +120,7 @@ label_file=aishell_result
 export GLOG_logtostderr=1

 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
-    #  TLG decoder
+    #  recognize w/ TLG graph
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/check_tlg.log \
    recognizer_main \
        --wav_rspecifier=scp:$data/split${nj}/JOB/${aishell_wav_scp} \
--- a/speechx/examples/ds2_ol/aishell/run.sh
+++ b/speechx/examples/ds2_ol/aishell/run.sh
@ -32,6 +32,7 @@ exp=$PWD/exp
 aishell_wav_scp=aishell_test.scp
 if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ];then
    if [ ! -d $data/test ]; then
+        # donwload dataset
        pushd $data
        wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_test.zip
        unzip  aishell_test.zip
@ -43,6 +44,7 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ];then
    fi

    if [ ! -f $ckpt_dir/data/mean_std.json ]; then
+        # download model
        mkdir -p $ckpt_dir
        pushd $ckpt_dir
        wget -c https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz
@ -52,6 +54,7 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ];then

    lm=$data/zh_giga.no_cna_cmn.prune01244.klm
    if [ ! -f $lm ]; then
+        # download kenlm bin
        pushd $data
        wget -c https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm
        popd
@ -68,7 +71,7 @@ export GLOG_logtostderr=1

 cmvn=$data/cmvn.ark
 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
-    # 3. gen linear feat
+    # 3. convert cmvn format and compute linear feat
    cmvn_json2kaldi_main --json_file=$ckpt_dir/data/mean_std.json --cmvn_write_path=$cmvn

    ./local/split_data.sh $data $data/$aishell_wav_scp $aishell_wav_scp $nj
@ -82,14 +85,14 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
-    #  recognizer
+    #  decode w/o lm
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.wolm.log \
    ctc_beam_search_decoder_main \
        --feature_rspecifier=scp:$data/split${nj}/JOB/feat.scp \
        --model_path=$model_dir/avg_1.jit.pdmodel \
        --param_path=$model_dir/avg_1.jit.pdiparams \
        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
-	--nnet_decoder_chunk=8 \
+	    --nnet_decoder_chunk=8 \
        --dict_file=$vocb_dir/vocab.txt \
        --result_wspecifier=ark,t:$data/split${nj}/JOB/result

@ -101,14 +104,14 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
 fi

 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
-    #  decode with lm
+    # decode w/ ngram lm with feature input
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.lm.log \
    ctc_beam_search_decoder_main \
        --feature_rspecifier=scp:$data/split${nj}/JOB/feat.scp \
        --model_path=$model_dir/avg_1.jit.pdmodel \
        --param_path=$model_dir/avg_1.jit.pdiparams \
        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
-	--nnet_decoder_chunk=8 \
+	    --nnet_decoder_chunk=8 \
        --dict_file=$vocb_dir/vocab.txt \
        --lm_path=$lm \
        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_lm
@ -124,6 +127,7 @@ wfst=$data/wfst/
 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
    mkdir -p $wfst
    if [ ! -f $wfst/aishell_graph.zip ]; then
+        # download TLG graph
        pushd $wfst
        wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_graph.zip
        unzip aishell_graph.zip
@ -133,7 +137,7 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
 fi

 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
-    #  TLG decoder
+    #  decoder w/ TLG graph with feature input
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.wfst.log \
    ctc_tlg_decoder_main \
        --feature_rspecifier=scp:$data/split${nj}/JOB/feat.scp \
@ -142,7 +146,7 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
        --word_symbol_table=$wfst/words.txt \
        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
        --graph_path=$wfst/TLG.fst --max_active=7500 \
-	--nnet_decoder_chunk=8 \
+	    --nnet_decoder_chunk=8 \
        --acoustic_scale=1.2 \
        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_tlg

@ -154,7 +158,7 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
 fi

 if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
-    #  TLG decoder
+    #  recognize from wav file w/ TLG graph
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recognizer.log \
    recognizer_main \
        --wav_rspecifier=scp:$data/split${nj}/JOB/${aishell_wav_scp} \
@ -162,7 +166,7 @@ if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
        --model_path=$model_dir/avg_1.jit.pdmodel \
        --param_path=$model_dir/avg_1.jit.pdiparams \
        --word_symbol_table=$wfst/words.txt \
-	--nnet_decoder_chunk=8 \
+	    --nnet_decoder_chunk=8 \
        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
        --graph_path=$wfst/TLG.fst --max_active=7500 \
        --acoustic_scale=1.2 \
@ -173,4 +177,4 @@ if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
    echo "recognizer test have finished!!!"
    echo "please checkout in ${exp}/${wer}.recognizer"
    tail -n 7 $exp/${wer}.recognizer
-fi
+fi
--- a/speechx/examples/ds2_ol/aishell/run_fbank.sh
+++ b/speechx/examples/ds2_ol/aishell/run_fbank.sh
@ -68,7 +68,7 @@ export GLOG_logtostderr=1

 cmvn=$data/cmvn_fbank.ark
 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
-    # 3. gen linear feat
+    # 3. convert cmvn format and compute fbank feat
    cmvn_json2kaldi_main --json_file=$ckpt_dir/data/mean_std.json --cmvn_write_path=$cmvn --binary=false

    ./local/split_data.sh $data $data/$aishell_wav_scp $aishell_wav_scp $nj
@ -82,7 +82,7 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
-    #  recognizer
+    #  decode w/ lm by feature
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.fbank.wolm.log \
    ctc_beam_search_decoder_main \
        --feature_rspecifier=scp:$data/split${nj}/JOB/fbank_feat.scp \
@ -90,7 +90,7 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
        --param_path=$model_dir/avg_5.jit.pdiparams \
        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
    	--model_cache_shapes="5-1-2048,5-1-2048" \
-	--nnet_decoder_chunk=8 \
+	    --nnet_decoder_chunk=8 \
        --dict_file=$vocb_dir/vocab.txt \
        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_fbank

@ -100,15 +100,15 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
 fi

 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
-    #  decode with lm
+    #  decode with ngram lm by feature
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.fbank.lm.log \
    ctc_beam_search_decoder_main \
        --feature_rspecifier=scp:$data/split${nj}/JOB/fbank_feat.scp \
        --model_path=$model_dir/avg_5.jit.pdmodel \
        --param_path=$model_dir/avg_5.jit.pdiparams \
        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
-	--model_cache_shapes="5-1-2048,5-1-2048" \
-	--nnet_decoder_chunk=8 \
+	    --model_cache_shapes="5-1-2048,5-1-2048" \
+	    --nnet_decoder_chunk=8 \
        --dict_file=$vocb_dir/vocab.txt \
        --lm_path=$lm \
        --result_wspecifier=ark,t:$data/split${nj}/JOB/fbank_result_lm
@ -131,7 +131,7 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
 fi

 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
-    #  TLG decoder
+    #  decode w/ TLG graph by feature
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/recog.fbank.wfst.log \
    ctc_tlg_decoder_main \
        --feature_rspecifier=scp:$data/split${nj}/JOB/fbank_feat.scp \
@ -139,8 +139,8 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
        --param_path=$model_dir/avg_5.jit.pdiparams \
        --word_symbol_table=$wfst/words.txt \
        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
-	--model_cache_shapes="5-1-2048,5-1-2048" \
-	--nnet_decoder_chunk=8 \
+	    --model_cache_shapes="5-1-2048,5-1-2048" \
+	    --nnet_decoder_chunk=8 \
        --graph_path=$wfst/TLG.fst --max_active=7500 \
        --acoustic_scale=1.2 \
        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_tlg
@ -153,6 +153,7 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
 fi

 if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+    # recgonize w/ TLG graph by wav
    utils/run.pl JOB=1:$nj $data/split${nj}/JOB/fbank_recognizer.log \
    recognizer_main \
        --wav_rspecifier=scp:$data/split${nj}/JOB/${aishell_wav_scp} \
@ -163,7 +164,7 @@ if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
        --word_symbol_table=$wfst/words.txt \
        --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
        --model_cache_shapes="5-1-2048,5-1-2048" \
-	--nnet_decoder_chunk=8 \
+	    --nnet_decoder_chunk=8 \
        --graph_path=$wfst/TLG.fst --max_active=7500 \
        --acoustic_scale=1.2 \
        --result_wspecifier=ark,t:$data/split${nj}/JOB/result_fbank_recognizer
--- a/speechx/examples/ds2_ol/websocket/README.md
+++ b/speechx/examples/ds2_ol/websocket/README.md
@ -0,0 +1,78 @@
+#  Streaming DeepSpeech2 Server with WebSocket
+
+This example is about using `websocket` as streaming deepspeech2 server. For deepspeech2 model training please see [here](../../../../examples/aishell/asr0/).
+
+The websocket protocal is same to [PaddleSpeech Server](../../../../demos/streaming_asr_server/), 
+for detail of implementation please see [here](../../../speechx/protocol/websocket/).
+
+
+## Source path.sh
+
+```bash
+. path.sh
+```
+
+SpeechX bins is under `echo $SPEECHX_BUILD`, more info please see `path.sh`.
+
+
+## Start WebSocket Server
+
+```bash
+bash websoket_server.sh
+```
+
+The output is like below:
+
+```text
+I1130 02:19:32.029882 12856 cmvn_json2kaldi_main.cc:39] cmvn josn path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/data/mean_std.json
+I1130 02:19:32.032230 12856 cmvn_json2kaldi_main.cc:73] nframe: 907497
+I1130 02:19:32.032564 12856 cmvn_json2kaldi_main.cc:85] cmvn stats have write into: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/cmvn.ark
+I1130 02:19:32.032579 12856 cmvn_json2kaldi_main.cc:86] Binary: 1
+I1130 02:19:32.798342 12937 feature_pipeline.h:53] cmvn file: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/cmvn.ark
+I1130 02:19:32.798542 12937 feature_pipeline.h:58] dither: 0
+I1130 02:19:32.798583 12937 feature_pipeline.h:60] frame shift ms: 10
+I1130 02:19:32.798588 12937 feature_pipeline.h:62] feature type: linear
+I1130 02:19:32.798596 12937 feature_pipeline.h:80] frame length ms: 20
+I1130 02:19:32.798601 12937 feature_pipeline.h:88] subsampling rate: 4
+I1130 02:19:32.798606 12937 feature_pipeline.h:90] nnet receptive filed length: 7
+I1130 02:19:32.798611 12937 feature_pipeline.h:92] nnet chunk size: 1
+I1130 02:19:32.798615 12937 feature_pipeline.h:94] frontend fill zeros: 0
+I1130 02:19:32.798630 12937 nnet_itf.h:52] subsampling rate: 4
+I1130 02:19:32.798635 12937 nnet_itf.h:54] model path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/exp/deepspeech2_online/checkpoints//avg_1.jit.pdmodel
+I1130 02:19:32.798640 12937 nnet_itf.h:57] param path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/model/exp/deepspeech2_online/checkpoints//avg_1.jit.pdiparams
+I1130 02:19:32.798643 12937 nnet_itf.h:59] DS2 param: 
+I1130 02:19:32.798647 12937 nnet_itf.h:61]   cache names: chunk_state_h_box,chunk_state_c_box
+I1130 02:19:32.798652 12937 nnet_itf.h:63]   cache shape: 5-1-1024,5-1-1024
+I1130 02:19:32.798656 12937 nnet_itf.h:65]   input names: audio_chunk,audio_chunk_lens,chunk_state_h_box,chunk_state_c_box
+I1130 02:19:32.798660 12937 nnet_itf.h:67]   output names: softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0
+I1130 02:19:32.798664 12937 ctc_tlg_decoder.h:41] fst path: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/wfst//TLG.fst
+I1130 02:19:32.798669 12937 ctc_tlg_decoder.h:42] fst symbole table: /workspace/zhanghui/PaddleSpeech/speechx/examples/ds2_ol/websocket/data/wfst//words.txt
+I1130 02:19:32.798673 12937 ctc_tlg_decoder.h:47] LatticeFasterDecoder max active: 7500
+I1130 02:19:32.798677 12937 ctc_tlg_decoder.h:49] LatticeFasterDecoder beam: 15
+I1130 02:19:32.798681 12937 ctc_tlg_decoder.h:50] LatticeFasterDecoder lattice_beam: 7.5
+I1130 02:19:32.798708 12937 websocket_server_main.cc:37] Listening at port 8082
+```
+
+## Start WebSocket Client
+
+```bash
+bash websocket_client.sh
+```
+
+This script using AISHELL-1 test data to call websocket server.
+
+The input is specific by `--wav_rspecifier=scp:$data/$aishell_wav_scp`.
+
+The `scp` file which look like this:
+```text
+# head data/split1/1/aishell_test.scp 
+BAC009S0764W0121        /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0121.wav
+BAC009S0764W0122        /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0122.wav
+...
+BAC009S0764W0125        /workspace/PaddleSpeech/speechx/examples/u2pp_ol/wenetspeech/data/test/S0764/BAC009S0764W0125.wav
+```
+
+If you want to recognize one wav, you can make `scp` file like this:
+```text
+key  path/to/wav/file
+```
--- a/speechx/examples/u2pp_ol/wenetspeech/README.md
+++ b/speechx/examples/u2pp_ol/wenetspeech/README.md
@ -6,13 +6,14 @@ This example will demonstrate how to using the u2/u2++ model to recognize `wav`

 ## Testing with Aishell Test Data

-### Source `path.sh` first 
+## Source path.sh

-```bash 
-source path.sh
+```bash
+. path.sh
 ```

-All bins are under `echo $SPEECHX_BUILD` dir.
+SpeechX bins is under `echo $SPEECHX_BUILD`, more info please see `path.sh`.
+

 ### Download dataset and model

--- a/speechx/examples/u2pp_ol/wenetspeech/run.sh
+++ b/speechx/examples/u2pp_ol/wenetspeech/run.sh
@ -83,5 +83,10 @@ fi

 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    # decode with wav input
-    ./loca/recognizer.sh
+    ./local/recognizer.sh
+fi
+
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+    # decode with wav input with quanted model
+    ./local/recognizer_quant.sh
 fi