diff --git a/README.md b/README.md
index e1f57fcaf..49e40624d 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,7 @@
     <a href="#quick-start"> Quick Start </a>
   | <a href="#documents"> Documents </a>
   | <a href="#model-list"> Models List </a>
-  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a>
+  | <a href="https://aistudio.baidu.com/aistudio/course/introduce/25130"> AIStudio Courses </a>
   | <a href="https://arxiv.org/abs/2205.12007"> NAACL2022 Best Demo Award Paper </a>
   | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee </a>
 </h4>
@@ -179,7 +179,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
 - Scan the QR code below with your Wechat, you can access to official technical exchange group and get the bonus ( more than 20GB learning materials, such as papers, codes and videos ) and the live link of the lessons. Look forward to your participation.
 
 <div align="center">
-<img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg"  width = "200"  />
+<img src="https://user-images.githubusercontent.com/30135920/196351517-19dece6b-d6ea-448e-a341-d6bfe5712ec1.jpg"  width = "200"  />
 </div>
 
 ## Installation
diff --git a/README_cn.md b/README_cn.md
index 1e932201f..bf3ff4dfd 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -23,7 +23,7 @@
   | <a href="#快速开始"> 快速开始 </a>
   | <a href="#教程文档"> 教程文档 </a>
   | <a href="#模型列表"> 模型列表 </a>
-  | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio 课程 </a>
+  | <a href="https://aistudio.baidu.com/aistudio/course/introduce/25130"> AIStudio 课程 </a>
   | <a href="https://arxiv.org/abs/2205.12007"> NAACL2022 论文 </a>
   | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee 
 </h4>
@@ -162,22 +162,7 @@
   - 🧩 级联模型应用: 作为传统语音任务的扩展，我们结合了自然语言处理、计算机视觉等任务，实现更接近实际需求的产业级应用。
 
 
-### 近期活动
-
- ❗️重磅❗️飞桨智慧金融行业系列直播课
-✅ 覆盖智能风控、智能运维、智能营销、智能客服四大金融主流场景
-
-📆 9月6日-9月29日每周二、四19:00
-+ 智慧金融行业深入洞察
-+ 8节理论+实践精品直播课
-+ 10+真实产业场景范例教学及实践
-+ 更有免费算力+结业证书等礼品等你来拿
-扫码报名码住直播链接，与行业精英深度交流
-
-<div align="center">
-<img src="https://user-images.githubusercontent.com/30135920/188431897-a02f028f-dd13-41e8-8ff6-749468cdc850.jpg"  width = "200"  />
-</div>
-
+  
 ### 近期更新
 - 👑 2022.10.11: 新增 [Wav2vec2ASR](./examples/librispeech/asr3), 在 LibriSpeech 上针对ASR任务对wav2vec2.0 的fine-tuning.
 - 🔥 2022.09.26: 新增 Voice Cloning, TTS finetune 和 ERNIE-SAT 到 [PaddleSpeech 网页应用](./demos/speech_web)。
@@ -200,13 +185,13 @@
 
  ### 🔥 加入技术交流群获取入群福利
 
- - 3 日直播课链接: 深度解读 PP-TTS、PP-ASR、PP-VPR 三项核心语音系统关键技术
+ - 3 日直播课链接: 深度解读 【一句话语音合成】【小样本语音合成】【定制化语音识别】语音交互技术
  - 20G 学习大礼包：视频课程、前沿论文与学习资料
   
 微信扫描二维码关注公众号，点击“马上报名”填写问卷加入官方交流群，获得更高效的问题答疑，与各行各业开发者充分交流，期待您的加入。
 
 <div align="center">
-<img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg"  width = "200"  />
+<img src="https://user-images.githubusercontent.com/30135920/196351517-19dece6b-d6ea-448e-a341-d6bfe5712ec1.jpg"  width = "200"  />
 </div>
 
 <a name="安装"></a>
diff --git a/demos/streaming_tts_serving_fastdeploy/README.md b/demos/streaming_tts_serving_fastdeploy/README.md
new file mode 100644
index 000000000..3e983a06d
--- /dev/null
+++ b/demos/streaming_tts_serving_fastdeploy/README.md
@@ -0,0 +1,67 @@
+([简体中文](./README_cn.md)|English)
+
+# Streaming Speech Synthesis Service
+
+## Introduction
+This demo is an implementation of starting the streaming speech synthesis service and accessing the service.
+
+`Server` must be started in the docker, while `Client` does not have to be in the docker.
+
+**The streaming_tts_serving under the path of this article ($PWD) contains the configuration and code of the model, which needs to be mapped to the docker for use.**
+
+## Usage
+### 1. Server
+#### 1.1 Docker
+
+```bash
+docker pull registry.baidubce.com/paddlepaddle/fastdeploy_serving_cpu_only:22.09
+docker run -dit  --net=host --name fastdeploy --shm-size="1g" -v $PWD:/models registry.baidubce.com/paddlepaddle/fastdeploy_serving_cpu_only:22.09
+docker exec -it -u root fastdeploy bash
+```
+
+#### 1.2 Installation(inside the docker)
+```bash
+apt-get install build-essential python3-dev libssl-dev libffi-dev libxml2 libxml2-dev libxslt1-dev zlib1g-dev libsndfile1 language-pack-zh-hans wget zip
+pip3 install paddlespeech
+export LC_ALL="zh_CN.UTF-8"
+export LANG="zh_CN.UTF-8"
+export LANGUAGE="zh_CN:zh:en_US:en"
+```
+
+#### 1.3 Download models(inside the docker)
+```bash
+cd /models/streaming_tts_serving/1
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_onnx_0.2.0.zip
+unzip fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip
+unzip mb_melgan_csmsc_onnx_0.2.0.zip
+```
+**For the convenience of users, we recommend that you use the command `docker -v` to map $PWD (streaming_tts_service and the configuration and code of the model contained therein) to the docker path `/models`. You can also use other methods, but regardless of which method you use, the final model directory and structure in the docker are shown in the following figure.**
+
+<p align="center">
+  <img src="./tree.png" />
+</p>
+
+#### 1.4 Start the server(inside the docker)
+
+```bash
+fastdeployserver --model-repository=/models --model-control-mode=explicit --load-model=streaming_tts_serving
+```
+Arguments:
+  - `model-repository`(required): Path of model storage.
+  - `model-control-mode`(required): The mode of loading the model. At present, you can use 'explicit'.
+  - `load-model`(required): Name of the model to be loaded.
+  - `http-port`(optional): Port for http service. Default: `8000`. This is not used in our example.
+  - `grpc-port`(optional): Port for grpc service. Default: `8001`.
+  - `metrics-port`(optional): Port for metrics service. Default: `8002`. This is not used in our example.
+
+### 2. Client
+#### 2.1 Installation
+```bash
+pip3 install tritonclient[all]
+```
+
+#### 2.2 Send request
+```bash
+python3 /models/streaming_tts_serving/stream_client.py
+```
diff --git a/demos/streaming_tts_serving_fastdeploy/README_cn.md b/demos/streaming_tts_serving_fastdeploy/README_cn.md
new file mode 100644
index 000000000..7edd32830
--- /dev/null
+++ b/demos/streaming_tts_serving_fastdeploy/README_cn.md
@@ -0,0 +1,67 @@
+(简体中文|[English](./README.md))
+
+# 流式语音合成服务
+
+## 介绍
+
+本文介绍了使用FastDeploy搭建流式语音合成服务的方法。
+
+`服务端`必须在docker内启动,而`客户端`不是必须在docker容器内.
+
+**本文所在路径`($PWD)下的streaming_tts_serving里包含模型的配置和代码`(服务端会加载模型和代码以启动服务),需要将其映射到docker中使用。**
+
+## 使用
+### 1. 服务端
+#### 1.1 Docker
+```bash
+docker pull registry.baidubce.com/paddlepaddle/fastdeploy_serving_cpu_only:22.09
+docker run -dit  --net=host --name fastdeploy --shm-size="1g" -v $PWD:/models registry.baidubce.com/paddlepaddle/fastdeploy_serving_cpu_only:22.09
+docker exec -it -u root fastdeploy bash
+```
+
+#### 1.2 安装(在docker内)
+```bash
+apt-get install build-essential python3-dev libssl-dev libffi-dev libxml2 libxml2-dev libxslt1-dev zlib1g-dev libsndfile1 language-pack-zh-hans wget zip
+pip3 install paddlespeech
+export LC_ALL="zh_CN.UTF-8"
+export LANG="zh_CN.UTF-8"
+export LANGUAGE="zh_CN:zh:en_US:en"
+```
+
+#### 1.3 下载模型(在docker内)
+```bash
+cd /models/streaming_tts_serving/1
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip
+wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_onnx_0.2.0.zip
+unzip fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip
+unzip mb_melgan_csmsc_onnx_0.2.0.zip
+```
+**为了方便用户使用，我们推荐用户使用1.1中的`docker -v`命令将`$PWD(streaming_tts_serving及里面包含的模型的配置和代码)映射到了docker内的/models路径`,用户也可以使用其他办法,但无论使用哪种方法,最终在docker内的模型目录及结构如下图所示。**
+
+<p align="center">
+  <img src="./tree.png" />
+</p>
+
+#### 1.4 启动服务端(在docker内)
+```bash
+fastdeployserver --model-repository=/models --model-control-mode=explicit --load-model=streaming_tts_serving
+```
+
+参数:
+  - `model-repository`(required): 整套模型streaming_tts_serving存放的路径.
+  - `model-control-mode`(required): 模型加载的方式,现阶段, 使用'explicit'即可.
+  - `load-model`(required): 需要加载的模型的名称.
+  - `http-port`(optional): HTTP服务的端口号. 默认: `8000`. 本示例中未使用该端口.
+  - `grpc-port`(optional): GRPC服务的端口号. 默认: `8001`.
+  - `metrics-port`(optional): 服务端指标的端口号. 默认: `8002`. 本示例中未使用该端口.
+
+### 2. 客户端
+#### 2.1 安装
+```bash
+pip3 install tritonclient[all]
+```
+
+#### 2.2 发送请求
+```bash
+python3 /models/streaming_tts_serving/stream_client.py
+```
diff --git a/demos/streaming_tts_serving_fastdeploy/streaming_tts_serving/1/model.py b/demos/streaming_tts_serving_fastdeploy/streaming_tts_serving/1/model.py
new file mode 100644
index 000000000..46473fdb2
--- /dev/null
+++ b/demos/streaming_tts_serving_fastdeploy/streaming_tts_serving/1/model.py
@@ -0,0 +1,289 @@
+import codecs
+import json
+import math
+import sys
+import threading
+import time
+
+import numpy as np
+import onnxruntime as ort
+import triton_python_backend_utils as pb_utils
+
+from paddlespeech.server.utils.util import denorm
+from paddlespeech.server.utils.util import get_chunks
+from paddlespeech.t2s.frontend.zh_frontend import Frontend
+
+voc_block = 36
+voc_pad = 14
+am_block = 72
+am_pad = 12
+voc_upsample = 300
+
+# 模型路径
+dir_name = "/models/streaming_tts_serving/1/"
+phones_dict = dir_name + "fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0/phone_id_map.txt"
+am_stat_path = dir_name + "fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0/speech_stats.npy"
+
+onnx_am_encoder = dir_name + "fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0/fastspeech2_csmsc_am_encoder_infer.onnx"
+onnx_am_decoder = dir_name + "fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0/fastspeech2_csmsc_am_decoder.onnx"
+onnx_am_postnet = dir_name + "fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0/fastspeech2_csmsc_am_postnet.onnx"
+onnx_voc_melgan = dir_name + "mb_melgan_csmsc_onnx_0.2.0/mb_melgan_csmsc.onnx"
+
+frontend = Frontend(phone_vocab_path=phones_dict, tone_vocab_path=None)
+am_mu, am_std = np.load(am_stat_path)
+
+# 用CPU推理
+providers = ['CPUExecutionProvider']
+
+# 配置ort session
+sess_options = ort.SessionOptions()
+
+# 创建session
+am_encoder_infer_sess = ort.InferenceSession(
+    onnx_am_encoder, providers=providers, sess_options=sess_options)
+am_decoder_sess = ort.InferenceSession(
+    onnx_am_decoder, providers=providers, sess_options=sess_options)
+am_postnet_sess = ort.InferenceSession(
+    onnx_am_postnet, providers=providers, sess_options=sess_options)
+voc_melgan_sess = ort.InferenceSession(
+    onnx_voc_melgan, providers=providers, sess_options=sess_options)
+
+
+def depadding(data, chunk_num, chunk_id, block, pad, upsample):
+    """
+    Streaming inference removes the result of pad inference
+    """
+    front_pad = min(chunk_id * block, pad)
+    # first chunk
+    if chunk_id == 0:
+        data = data[:block * upsample]
+    # last chunk
+    elif chunk_id == chunk_num - 1:
+        data = data[front_pad * upsample:]
+    # middle chunk
+    else:
+        data = data[front_pad * upsample:(front_pad + block) * upsample]
+
+    return data
+
+
+class TritonPythonModel:
+    """Your Python model must use the same class name. Every Python model
+    that is created must have "TritonPythonModel" as the class name.
+    """
+
+    def initialize(self, args):
+        """`initialize` is called only once when the model is being loaded.
+        Implementing `initialize` function is optional. This function allows
+        the model to intialize any state associated with this model.
+        Parameters
+        ----------
+        args : dict
+          Both keys and values are strings. The dictionary keys and values are:
+          * model_config: A JSON string containing the model configuration
+          * model_instance_kind: A string containing model instance kind
+          * model_instance_device_id: A string containing model instance device ID
+          * model_repository: Model repository path
+          * model_version: Model version
+          * model_name: Model name
+        """
+        sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
+        print(sys.getdefaultencoding())
+        # You must parse model_config. JSON string is not parsed here
+        self.model_config = model_config = json.loads(args['model_config'])
+        print("model_config:", self.model_config)
+
+        using_decoupled = pb_utils.using_decoupled_model_transaction_policy(
+            model_config)
+
+        if not using_decoupled:
+            raise pb_utils.TritonModelException(
+                """the model `{}` can generate any number of responses per request,
+                enable decoupled transaction policy in model configuration to
+                serve this model""".format(args['model_name']))
+
+        self.input_names = []
+        for input_config in self.model_config["input"]:
+            self.input_names.append(input_config["name"])
+        print("input:", self.input_names)
+
+        self.output_names = []
+        self.output_dtype = []
+        for output_config in self.model_config["output"]:
+            self.output_names.append(output_config["name"])
+            dtype = pb_utils.triton_string_to_numpy(output_config["data_type"])
+            self.output_dtype.append(dtype)
+        print("output:", self.output_names)
+
+        # To keep track of response threads so that we can delay
+        # the finalizing the model until all response threads
+        # have completed.
+        self.inflight_thread_count = 0
+        self.inflight_thread_count_lck = threading.Lock()
+
+    def execute(self, requests):
+        """`execute` must be implemented in every Python model. `execute`
+        function receives a list of pb_utils.InferenceRequest as the only
+        argument. This function is called when an inference is requested
+        for this model. Depending on the batching configuration (e.g. Dynamic
+        Batching) used, `requests` may contain multiple requests. Every
+        Python model, must create one pb_utils.InferenceResponse for every
+        pb_utils.InferenceRequest in `requests`. If there is an error, you can
+        set the error argument when creating a pb_utils.InferenceResponse.
+        Parameters
+        ----------
+        requests : list
+          A list of pb_utils.InferenceRequest
+        Returns
+        -------
+        list
+          A list of pb_utils.InferenceResponse. The length of this list must
+          be the same as `requests`
+        """
+
+        # This model does not support batching, so 'request_count' should always
+        # be 1.
+        if len(requests) != 1:
+            raise pb_utils.TritonModelException("unsupported batch size " + len(
+                requests))
+
+        input_data = []
+        for idx in range(len(self.input_names)):
+            data = pb_utils.get_input_tensor_by_name(requests[0],
+                                                     self.input_names[idx])
+            data = data.as_numpy()
+            data = data[0].decode('utf-8')
+            input_data.append(data)
+        text = input_data[0]
+
+        # Start a separate thread to send the responses for the request. The
+        # sending back the responses is delegated to this thread.
+        thread = threading.Thread(
+            target=self.response_thread,
+            args=(requests[0].get_response_sender(), text))
+        thread.daemon = True
+        with self.inflight_thread_count_lck:
+            self.inflight_thread_count += 1
+
+        thread.start()
+        # Unlike in non-decoupled model transaction policy, execute function
+        # here returns no response. A return from this function only notifies
+        # Triton that the model instance is ready to receive another request. As
+        # we are not waiting for the response thread to complete here, it is
+        # possible that at any give time the model may be processing multiple
+        # requests. Depending upon the request workload, this may lead to a lot
+        # of requests being processed by a single model instance at a time. In
+        # real-world models, the developer should be mindful of when to return
+        # from execute and be willing to accept next request.
+        return None
+
+    def response_thread(self, response_sender, text):
+        input_ids = frontend.get_input_ids(
+            text, merge_sentences=False, get_tone_ids=False)
+        phone_ids = input_ids["phone_ids"]
+        for i in range(len(phone_ids)):
+            part_phone_ids = phone_ids[i].numpy()
+            voc_chunk_id = 0
+
+            orig_hs = am_encoder_infer_sess.run(
+                None, input_feed={'text': part_phone_ids})
+            orig_hs = orig_hs[0]
+
+            # streaming voc chunk info
+            mel_len = orig_hs.shape[1]
+            voc_chunk_num = math.ceil(mel_len / voc_block)
+            start = 0
+            end = min(voc_block + voc_pad, mel_len)
+
+            # streaming am
+            hss = get_chunks(orig_hs, am_block, am_pad, "am")
+            am_chunk_num = len(hss)
+            for i, hs in enumerate(hss):
+                am_decoder_output = am_decoder_sess.run(
+                    None, input_feed={'xs': hs})
+                am_postnet_output = am_postnet_sess.run(
+                    None,
+                    input_feed={
+                        'xs': np.transpose(am_decoder_output[0], (0, 2, 1))
+                    })
+                am_output_data = am_decoder_output + np.transpose(
+                    am_postnet_output[0], (0, 2, 1))
+                normalized_mel = am_output_data[0][0]
+
+                sub_mel = denorm(normalized_mel, am_mu, am_std)
+                sub_mel = depadding(sub_mel, am_chunk_num, i, am_block, am_pad,
+                                    1)
+
+                if i == 0:
+                    mel_streaming = sub_mel
+                else:
+                    mel_streaming = np.concatenate(
+                        (mel_streaming, sub_mel), axis=0)
+
+                # streaming voc
+                # 当流式AM推理的mel帧数大于流式voc推理的chunk size，开始进行流式voc 推理
+                while (mel_streaming.shape[0] >= end and
+                       voc_chunk_id < voc_chunk_num):
+                    voc_chunk = mel_streaming[start:end, :]
+
+                    sub_wav = voc_melgan_sess.run(
+                        output_names=None, input_feed={'logmel': voc_chunk})
+                    sub_wav = depadding(sub_wav[0], voc_chunk_num, voc_chunk_id,
+                                        voc_block, voc_pad, voc_upsample)
+
+                    output_np = np.array(sub_wav, dtype=self.output_dtype[0])
+                    out_tensor1 = pb_utils.Tensor(self.output_names[0],
+                                                  output_np)
+
+                    status = 0 if voc_chunk_id != (voc_chunk_num - 1) else 1
+                    output_status = np.array(
+                        [status], dtype=self.output_dtype[1])
+                    out_tensor2 = pb_utils.Tensor(self.output_names[1],
+                                                  output_status)
+
+                    inference_response = pb_utils.InferenceResponse(
+                        output_tensors=[out_tensor1, out_tensor2])
+
+                    #yield sub_wav
+                    response_sender.send(inference_response)
+
+                    voc_chunk_id += 1
+                    start = max(0, voc_chunk_id * voc_block - voc_pad)
+                    end = min((voc_chunk_id + 1) * voc_block + voc_pad, mel_len)
+
+        # We must close the response sender to indicate to Triton that we are
+        # done sending responses for the corresponding request. We can't use the
+        # response sender after closing it. The response sender is closed by
+        # setting the TRITONSERVER_RESPONSE_COMPLETE_FINAL.
+        response_sender.send(
+            flags=pb_utils.TRITONSERVER_RESPONSE_COMPLETE_FINAL)
+
+        with self.inflight_thread_count_lck:
+            self.inflight_thread_count -= 1
+
+    def finalize(self):
+        """`finalize` is called only once when the model is being unloaded.
+        Implementing `finalize` function is OPTIONAL. This function allows
+        the model to perform any necessary clean ups before exit.
+        Here we will wait for all response threads to complete sending
+        responses.
+        """
+        print('Finalize invoked')
+
+        inflight_threads = True
+        cycles = 0
+        logging_time_sec = 5
+        sleep_time_sec = 0.1
+        cycle_to_log = (logging_time_sec / sleep_time_sec)
+        while inflight_threads:
+            with self.inflight_thread_count_lck:
+                inflight_threads = (self.inflight_thread_count != 0)
+                if (cycles % cycle_to_log == 0):
+                    print(
+                        f"Waiting for {self.inflight_thread_count} response threads to complete..."
+                    )
+            if inflight_threads:
+                time.sleep(sleep_time_sec)
+                cycles += 1
+
+        print('Finalize complete...')
diff --git a/demos/streaming_tts_serving_fastdeploy/streaming_tts_serving/config.pbtxt b/demos/streaming_tts_serving_fastdeploy/streaming_tts_serving/config.pbtxt
new file mode 100644
index 000000000..e63721d1c
--- /dev/null
+++ b/demos/streaming_tts_serving_fastdeploy/streaming_tts_serving/config.pbtxt
@@ -0,0 +1,33 @@
+name: "streaming_tts_serving"
+backend: "python"
+max_batch_size: 0
+model_transaction_policy {
+  decoupled: True
+}
+input [
+  {
+    name: "INPUT_0"
+    data_type: TYPE_STRING
+    dims: [ 1 ]
+  }
+]
+
+output [
+  {
+    name: "OUTPUT_0"
+    data_type: TYPE_FP32
+    dims: [ -1, 1 ]
+  },
+  {
+    name: "status"
+    data_type: TYPE_BOOL
+    dims: [ 1 ]
+  }
+]
+
+instance_group [
+  {
+      count: 1
+      kind: KIND_CPU
+  }
+]
diff --git a/demos/streaming_tts_serving_fastdeploy/streaming_tts_serving/stream_client.py b/demos/streaming_tts_serving_fastdeploy/streaming_tts_serving/stream_client.py
new file mode 100644
index 000000000..e7f120b7d
--- /dev/null
+++ b/demos/streaming_tts_serving_fastdeploy/streaming_tts_serving/stream_client.py
@@ -0,0 +1,117 @@
+#!/usr/bin/env python
+import argparse
+import queue
+import sys
+from functools import partial
+
+import numpy as np
+import tritonclient.grpc as grpcclient
+from tritonclient.utils import *
+
+FLAGS = None
+
+
+class UserData:
+    def __init__(self):
+        self._completed_requests = queue.Queue()
+
+
+# Define the callback function. Note the last two parameters should be
+# result and error. InferenceServerClient would povide the results of an
+# inference as grpcclient.InferResult in result. For successful
+# inference, error will be None, otherwise it will be an object of
+# tritonclientutils.InferenceServerException holding the error details
+def callback(user_data, result, error):
+    if error:
+        user_data._completed_requests.put(error)
+    else:
+        user_data._completed_requests.put(result)
+
+
+def async_stream_send(triton_client, values, request_id, model_name):
+
+    infer_inputs = []
+    outputs = []
+    for idx, data in enumerate(values):
+        data = np.array([data.encode('utf-8')], dtype=np.object_)
+        infer_input = grpcclient.InferInput('INPUT_0', [len(data)], "BYTES")
+        infer_input.set_data_from_numpy(data)
+        infer_inputs.append(infer_input)
+
+        outputs.append(grpcclient.InferRequestedOutput('OUTPUT_0'))
+        # Issue the asynchronous sequence inference.
+        triton_client.async_stream_infer(
+            model_name=model_name,
+            inputs=infer_inputs,
+            outputs=outputs,
+            request_id=request_id)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '-v',
+        '--verbose',
+        action="store_true",
+        required=False,
+        default=False,
+        help='Enable verbose output')
+    parser.add_argument(
+        '-u',
+        '--url',
+        type=str,
+        required=False,
+        default='localhost:8001',
+        help='Inference server URL and it gRPC port. Default is localhost:8001.')
+
+    FLAGS = parser.parse_args()
+
+    # We use custom "sequence" models which take 1 input
+    # value. The output is the accumulated value of the inputs. See
+    # src/custom/sequence.
+    model_name = "streaming_tts_serving"
+
+    values = ["哈哈哈哈"]
+
+    request_id = "0"
+
+    string_result0_list = []
+
+    user_data = UserData()
+
+    # It is advisable to use client object within with..as clause
+    # when sending streaming requests. This ensures the client
+    # is closed when the block inside with exits.
+    with grpcclient.InferenceServerClient(
+            url=FLAGS.url, verbose=FLAGS.verbose) as triton_client:
+        try:
+            # Establish stream
+            triton_client.start_stream(callback=partial(callback, user_data))
+            # Now send the inference sequences...
+            async_stream_send(triton_client, values, request_id, model_name)
+        except InferenceServerException as error:
+            print(error)
+            sys.exit(1)
+
+        # Retrieve results...
+        recv_count = 0
+        result_dict = {}
+        status = True
+        while True:
+            data_item = user_data._completed_requests.get()
+            if type(data_item) == InferenceServerException:
+                raise data_item
+            else:
+                this_id = data_item.get_response().id
+                if this_id not in result_dict.keys():
+                    result_dict[this_id] = []
+                result_dict[this_id].append((recv_count, data_item))
+                sub_wav = data_item.as_numpy('OUTPUT_0')
+                status = data_item.as_numpy('status')
+                print('sub_wav = ', sub_wav, "subwav.shape = ", sub_wav.shape)
+                print('status = ', status)
+                if status[0] == 1:
+                    break
+            recv_count += 1
+
+    print("PASS: stream_client")
diff --git a/demos/streaming_tts_serving_fastdeploy/tree.png b/demos/streaming_tts_serving_fastdeploy/tree.png
new file mode 100644
index 000000000..b8d61686a
Binary files /dev/null and b/demos/streaming_tts_serving_fastdeploy/tree.png differ
diff --git a/docs/source/released_model.md b/docs/source/released_model.md
index 4e76da033..2f3c9d098 100644
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@@ -9,7 +9,7 @@ Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER |
 [Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_fbank161_ckpt_0.2.1.model.tar.gz) | Aishell Dataset | Char-based | 491 MB  | 2 Conv + 5 LSTM layers | 0.0666 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0) | onnx/inference/python |
 [Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_offline_aishell_ckpt_1.0.1.model.tar.gz)| Aishell Dataset | Char-based | 1.4 GB | 2 Conv + 5 bidirectional LSTM layers| 0.0554 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0) | inference/python |
 [Conformer Online Wenetspeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_wenetspeech_ckpt_1.0.0a.model.tar.gz) | WenetSpeech Dataset | Char-based | 457 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.11 (test\_net) 0.1879 (test\_meeting) |-| 10000 h |- | python |
-[Conformer U2PP Online Wenetspeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.1.4.model.tar.gz) | WenetSpeech Dataset | Char-based | 476 MB  | Encoder:Conformer, Decoder:BiTransformer, Decoding method: Attention rescoring| 0.047198 (aishell test\_-1) 0.059212 (aishell test\_16) |-| 10000 h |- | python |
+[Conformer U2PP Online Wenetspeech ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.3.0.model.tar.gz) | WenetSpeech Dataset | Char-based | 476 MB  | Encoder:Conformer, Decoder:BiTransformer, Decoding method: Attention rescoring| 0.047198 (aishell test\_-1) 0.059212 (aishell test\_16) |-| 10000 h |- | python |
 [Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring| 0.0544 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1) | python |
 [Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_1.0.1.model.tar.gz) | Aishell Dataset | Char-based | 189 MB  | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0460 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1) | python |
 [Transformer Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz) | Aishell Dataset | Char-based | 128 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0523 || 151 h | [Transformer  Aishell ASR1](../../examples/aishell/asr1) | python |
diff --git a/paddlespeech/cli/asr/infer.py b/paddlespeech/cli/asr/infer.py
index 437f64631..004143361 100644
--- a/paddlespeech/cli/asr/infer.py
+++ b/paddlespeech/cli/asr/infer.py
@@ -52,7 +52,7 @@ class ASRExecutor(BaseExecutor):
         self.parser.add_argument(
             '--model',
             type=str,
-            default='conformer_u2pp_wenetspeech',
+            default='conformer_u2pp_online_wenetspeech',
             choices=[
                 tag[:tag.index('-')]
                 for tag in self.task_resource.pretrained_models.keys()
@@ -470,7 +470,7 @@ class ASRExecutor(BaseExecutor):
     @stats_wrapper
     def __call__(self,
                  audio_file: os.PathLike,
-                 model: str='conformer_u2pp_wenetspeech',
+                 model: str='conformer_u2pp_online_wenetspeech',
                  lang: str='zh',
                  sample_rate: int=16000,
                  config: os.PathLike=None,
diff --git a/paddlespeech/resource/model_alias.py b/paddlespeech/resource/model_alias.py
index f5ec655b7..8e9ecc4ba 100644
--- a/paddlespeech/resource/model_alias.py
+++ b/paddlespeech/resource/model_alias.py
@@ -25,7 +25,6 @@ model_alias = {
     "deepspeech2online": ["paddlespeech.s2t.models.ds2:DeepSpeech2Model"],
     "conformer": ["paddlespeech.s2t.models.u2:U2Model"],
     "conformer_online": ["paddlespeech.s2t.models.u2:U2Model"],
-    "conformer_u2pp": ["paddlespeech.s2t.models.u2:U2Model"],
     "conformer_u2pp_online": ["paddlespeech.s2t.models.u2:U2Model"],
     "transformer": ["paddlespeech.s2t.models.u2:U2Model"],
     "wenetspeech": ["paddlespeech.s2t.models.u2:U2Model"],
diff --git a/paddlespeech/resource/pretrained_models.py b/paddlespeech/resource/pretrained_models.py
index efd6bb3f2..df50a6a9d 100644
--- a/paddlespeech/resource/pretrained_models.py
+++ b/paddlespeech/resource/pretrained_models.py
@@ -68,32 +68,12 @@ asr_dynamic_pretrained_models = {
             '',
         },
     },
-    "conformer_u2pp_wenetspeech-zh-16k": {
-        '1.1': {
-            'url':
-            'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.1.3.model.tar.gz',
-            'md5':
-            '662b347e1d2131b7a4dc5398365e2134',
-            'cfg_path':
-            'model.yaml',
-            'ckpt_path':
-            'exp/chunk_conformer_u2pp/checkpoints/avg_10',
-            'model':
-            'exp/chunk_conformer_u2pp/checkpoints/avg_10.pdparams',
-            'params':
-            'exp/chunk_conformer_u2pp/checkpoints/avg_10.pdparams',
-            'lm_url':
-            '',
-            'lm_md5':
-            '',
-        },
-    },
     "conformer_u2pp_online_wenetspeech-zh-16k": {
-        '1.1': {
+        '1.3': {
             'url':
-            'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.1.4.model.tar.gz',
+            'https://paddlespeech.bj.bcebos.com/s2t/wenetspeech/asr1/asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.3.0.model.tar.gz',
             'md5':
-            '3100fc1eac5779486cab859366992d0b',
+            '62d230c1bf27731192aa9d3b8deca300',
             'cfg_path':
             'model.yaml',
             'ckpt_path':
diff --git a/paddlespeech/s2t/modules/attention.py b/paddlespeech/s2t/modules/attention.py
index 128f87c07..d9568dcc9 100644
--- a/paddlespeech/s2t/modules/attention.py
+++ b/paddlespeech/s2t/modules/attention.py
@@ -19,7 +19,6 @@ from typing import Tuple
 
 import paddle
 from paddle import nn
-from paddle.nn import functional as F
 from paddle.nn import initializer as I
 
 from paddlespeech.s2t.modules.align import Linear
@@ -56,16 +55,6 @@ class MultiHeadedAttention(nn.Layer):
         self.linear_out = Linear(n_feat, n_feat)
         self.dropout = nn.Dropout(p=dropout_rate)
 
-    def _build_once(self, *args, **kwargs):
-        super()._build_once(*args, **kwargs)
-        # if self.self_att:
-        # self.linear_kv = Linear(self.n_feat, self.n_feat*2)
-        if not self.training:
-            self.weight = paddle.concat(
-                [self.linear_k.weight, self.linear_v.weight], axis=-1)
-            self.bias = paddle.concat([self.linear_k.bias, self.linear_v.bias])
-        self._built = True
-
     def forward_qkv(self,
                     query: paddle.Tensor,
                     key: paddle.Tensor,
@@ -87,13 +76,8 @@ class MultiHeadedAttention(nn.Layer):
         n_batch = query.shape[0]
 
         q = self.linear_q(query).view(n_batch, -1, self.h, self.d_k)
-        if self.training:
-            k = self.linear_k(key).view(n_batch, -1, self.h, self.d_k)
-            v = self.linear_v(value).view(n_batch, -1, self.h, self.d_k)
-        else:
-            k, v = F.linear(key, self.weight, self.bias).view(
-                n_batch, -1, 2 * self.h, self.d_k).split(
-                    2, axis=2)
+        k = self.linear_k(key).view(n_batch, -1, self.h, self.d_k)
+        v = self.linear_v(value).view(n_batch, -1, self.h, self.d_k)
 
         q = q.transpose([0, 2, 1, 3])  # (batch, head, time1, d_k)
         k = k.transpose([0, 2, 1, 3])  # (batch, head, time2, d_k)
diff --git a/paddlespeech/server/bin/paddlespeech_server.py b/paddlespeech/server/bin/paddlespeech_server.py
index 10a91d9be..1b1792bd1 100644
--- a/paddlespeech/server/bin/paddlespeech_server.py
+++ b/paddlespeech/server/bin/paddlespeech_server.py
@@ -113,7 +113,7 @@ class ServerExecutor(BaseExecutor):
         """
         config = get_config(config_file)
         if self.init(config):
-            uvicorn.run(app, host=config.host, port=config.port, debug=True)
+            uvicorn.run(app, host=config.host, port=config.port)
 
 
 @cli_server_register(
diff --git a/paddlespeech/t2s/frontend/polyphonic.yaml b/paddlespeech/t2s/frontend/polyphonic.yaml
index 51b76f23f..6885035e7 100644
--- a/paddlespeech/t2s/frontend/polyphonic.yaml
+++ b/paddlespeech/t2s/frontend/polyphonic.yaml
@@ -46,4 +46,5 @@ polyphonic:
     幸免于难: ['xing4','mian3','yu2','nan4']
     恶行: ['e4','xing2']
     唉: ['ai4']
-
+    扎实: ['zha1','shi2']
+    干将: ['gan4','jiang4']
\ No newline at end of file
diff --git a/paddlespeech/t2s/frontend/tone_sandhi.py b/paddlespeech/t2s/frontend/tone_sandhi.py
index 10a9540c3..42f7b8b2f 100644
--- a/paddlespeech/t2s/frontend/tone_sandhi.py
+++ b/paddlespeech/t2s/frontend/tone_sandhi.py
@@ -65,7 +65,7 @@ class ToneSandhi():
             '男子', '女子', '分子', '原子', '量子', '莲子', '石子', '瓜子', '电子', '人人', '虎虎',
             '幺幺', '干嘛', '学子', '哈哈', '数数', '袅袅', '局地', '以下', '娃哈哈', '花花草草', '留得',
             '耕地', '想想', '熙熙', '攘攘', '卵子', '死死', '冉冉', '恳恳', '佼佼', '吵吵', '打打',
-            '考考', '整整', '莘莘'
+            '考考', '整整', '莘莘', '落地', '算子', '家家户户'
         }
         self.punc = "：，；。？！“”‘’':,;.?!"
 
diff --git a/paddlespeech/text/exps/ernie_linear/punc_restore.py b/paddlespeech/text/exps/ernie_linear/punc_restore.py
index 2cb4d0719..98804606c 100644
--- a/paddlespeech/text/exps/ernie_linear/punc_restore.py
+++ b/paddlespeech/text/exps/ernie_linear/punc_restore.py
@@ -25,8 +25,6 @@ DefinedClassifier = {
     'ErnieLinear': ErnieLinear,
 }
 
-tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
-
 
 def _clean_text(text, punc_list):
     text = text.lower()
@@ -35,7 +33,7 @@ def _clean_text(text, punc_list):
     return text
 
 
-def preprocess(text, punc_list):
+def preprocess(text, punc_list, tokenizer):
     clean_text = _clean_text(text, punc_list)
     assert len(clean_text) > 0, f'Invalid input string: {text}'
     tokenized_input = tokenizer(
@@ -51,7 +49,8 @@ def test(args):
     with open(args.config) as f:
         config = CfgNode(yaml.safe_load(f))
     print("========Args========")
-    print(yaml.safe_dump(vars(args)))
+    print(yaml.safe_dump(vars(args), allow_unicode=True))
+    # print(args)
     print("========Config========")
     print(config)
 
@@ -61,10 +60,16 @@ def test(args):
             punc_list.append(line.strip())
 
     model = DefinedClassifier[config["model_type"]](**config["model"])
+    # print(model)
+
+    pretrained_token = config['data_params']['pretrained_token']
+    tokenizer = ErnieTokenizer.from_pretrained(pretrained_token)
+    # tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
+
     state_dict = paddle.load(args.checkpoint)
     model.set_state_dict(state_dict["main_params"])
     model.eval()
-    _inputs = preprocess(args.text, punc_list)
+    _inputs = preprocess(args.text, punc_list, tokenizer)
     seq_len = _inputs['seq_len']
     input_ids = paddle.to_tensor(_inputs['input_ids']).unsqueeze(0)
     seg_ids = paddle.to_tensor(_inputs['seg_ids']).unsqueeze(0)