Merge pull request #1906 from Honei/acs_server

[acs][server]add audio content search server
4 years ago · bde7093578
parent 4b3f6c615e 5793f1bc1a
commit bde7093578
20 changed files with 747 additions and 8 deletions
--- a/demos/audio_content_search/README.md
+++ b/demos/audio_content_search/README.md
@ -0,0 +1,74 @@
 ([简体中文](./README_cn.md)|English)
 # ACS (Audio Content Search)
 ## Introduction
 ACS, or Audio Content Search, refers to the problem of getting the key word time stamp from automatically transcribe spoken language (speech-to-text). 
 This demo is an implementation of obtaining the keyword timestamp in the text from a given audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. 
 Now, the search word in demo is:
 ```
 我
 康
 ```
 ## Usage
 ### 1. Installation
 see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
 You can choose one way from meduim and hard to install paddlespeech.
 The dependency refers to the requirements.txt
 ### 2. Prepare Input File
 The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
 Here are sample files for this demo that can be downloaded:
 ```bash
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
 ```
 ### 3. Usage
 - Command Line(Recommended)
  ```bash
  # Chinese
  paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav 
  ```
  Usage:
  ```bash
  paddlespeech asr --help
  ```
  Arguments:
  - `input`(required): Audio file to recognize.
  - `server_ip`: the server ip.
  - `port`: the server port.
  - `lang`: the language type of the model. Default: `zh`.
  - `sample_rate`: Sample rate of the model. Default: `16000`.
  - `audio_format`: The audio format.
  Output:
  ```bash
  [2022-05-15 15:00:58,185] [    INFO] - acs http client start
  [2022-05-15 15:00:58,185] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
  [2022-05-15 15:01:03,220] [    INFO] - acs http client finished
  [2022-05-15 15:01:03,221] [    INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
  [2022-05-15 15:01:03,221] [    INFO] - Response time 5.036084 s.
  ```
 - Python API
  ```python
  from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
  acs_executor = ACSClientExecutor()
  res = acs_executor(
      input='./zh.wav',
      server_ip="127.0.0.1",
      port=8490,)
  print(res)
  ```
  Output:
  ```bash
  [2022-05-15 15:08:13,955] [    INFO] - acs http client start
  [2022-05-15 15:08:13,956] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
  [2022-05-15 15:08:19,026] [    INFO] - acs http client finished
  {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
  ```
--- a/demos/audio_content_search/README_cn.md
+++ b/demos/audio_content_search/README_cn.md
@ -0,0 +1,74 @@
 (简体中文|[English](./README.md))
 # 语音内容搜索
 ## 介绍
 语音内容搜索是一项用计算机程序获取转录语音内容关键词时间戳的技术。
 这个 demo 是一个从给定音频文件获取其文本中关键词时间戳的实现，它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
 当前示例中检索词是
 ```
 我
 康
 ```
 ## 使用方法
 ### 1. 安装
 请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
 你可以从 medium，hard 三中方式中选择一种方式安装。
 依赖参见 requirements.txt
 ### 2. 准备输入
 这个 demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
 可以下载此 demo 的示例音频：
 ```bash
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
 ```
 ### 3. 使用方法
 - 命令行 (推荐使用)
  ```bash
  # 中文
  paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav 
  ```
  使用方法：
  ```bash
  paddlespeech acs --help
  ```
  参数：
  - `input`(必须输入)：用于识别的音频文件。
  - `server_ip`: 服务的ip。
  - `port`：服务的端口。
  - `lang`：模型语言，默认值：`zh`。
  - `sample_rate`：音频采样率，默认值：`16000`。
  - `audio_format`: 音频的格式。
  输出：
  ```bash
  [2022-05-15 15:00:58,185] [    INFO] - acs http client start
  [2022-05-15 15:00:58,185] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
  [2022-05-15 15:01:03,220] [    INFO] - acs http client finished
  [2022-05-15 15:01:03,221] [    INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
  [2022-05-15 15:01:03,221] [    INFO] - Response time 5.036084 s.
  ```
 - Python API
  ```python
  from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
  acs_executor = ACSClientExecutor()
  res = acs_executor(
      input='./zh.wav',
      server_ip="127.0.0.1",
      port=8490,)
  print(res)
  ```
  输出：
  ```bash
  [2022-05-15 15:08:13,955] [    INFO] - acs http client start
  [2022-05-15 15:08:13,956] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
  [2022-05-15 15:08:19,026] [    INFO] - acs http client finished
  {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
  ```
--- a/demos/audio_content_search/acs_clinet.py
+++ b/demos/audio_content_search/acs_clinet.py
@ -0,0 +1,49 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import argparse
 from paddlespeech.cli.log import logger
 from paddlespeech.server.utils.audio_handler import ASRHttpHandler
 def main(args):
    logger.info("asr http client start")
    audio_format = "wav"
    sample_rate = 16000
    lang = "zh"
    handler = ASRHttpHandler(
        server_ip=args.server_ip, port=args.port, endpoint=args.endpoint)
    res = handler.run(args.wavfile, audio_format, sample_rate, lang)
    # res = res['result']
    logger.info(f"the final result: {res}")
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="audio content search client")
    parser.add_argument(
        '--server_ip', type=str, default='127.0.0.1', help='server ip')
    parser.add_argument('--port', type=int, default=8090, help='server port')
    parser.add_argument(
        "--wavfile",
        action="store",
        help="wav file path ",
        default="./16_audio.wav")
    parser.add_argument(
        '--endpoint',
        type=str,
        default='/paddlespeech/asr/search',
        help='server endpoint')
    args = parser.parse_args()
    main(args)
--- a/demos/audio_content_search/conf/acs_application.yaml
+++ b/demos/audio_content_search/conf/acs_application.yaml
@ -0,0 +1,34 @@
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
 host: 0.0.0.0
 port: 8490
 # The task format in the engin_list is: <speech task>_<engine type>
 # task choices = ['acs_python']
 # protocol = ['http'] (only one can be selected). 
 # http only support offline engine type.
 protocol: 'http'
 engine_list: ['acs_python']
 #################################################################################
 #                                ENGINE CONFIG                                  #
 #################################################################################
 ################################### ACS #########################################
 ################### acs task: engine_type: python ###############################
 acs_python:
    task: acs
    asr_protocol: 'websocket' # 'websocket'
    offset: 1.0 # second
    asr_server_ip: 127.0.0.1
    asr_server_port: 8390
    lang: 'zh'
    word_list: "./conf/words.txt"
    sample_rate: 16000
    device: 'cpu' # set 'gpu:id' or 'cpu'
--- a/demos/audio_content_search/conf/words.txt
+++ b/demos/audio_content_search/conf/words.txt
@ -0,0 +1,2 @@
 我
 康
--- a/demos/audio_content_search/conf/ws_conformer_application.yaml
+++ b/demos/audio_content_search/conf/ws_conformer_application.yaml
@ -0,0 +1,43 @@
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
 host: 0.0.0.0
 port: 8390
 # The task format in the engin_list is: <speech task>_<engine type>
 # task choices = ['asr_online']
 # protocol = ['websocket'] (only one can be selected).
 # websocket only support online engine type.
 protocol: 'websocket'
 engine_list: ['asr_online']
 #################################################################################
 #                                ENGINE CONFIG                                  #
 #################################################################################
 ################################### ASR #########################################
 ################### speech task: asr; engine_type: online #######################
 asr_online:
    model_type: 'conformer_online_multicn'
    am_model: # the pdmodel file of am static model [optional]
    am_params:  # the pdiparams file of am static model [optional]
    lang: 'zh'
    sample_rate: 16000
    cfg_path: 
    decode_method: 'attention_rescoring' 
    force_yes: True
    device: 'cpu' # cpu or gpu:id
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True
        glog_info: False  # True -> print glog
        summary: True  # False -> do not show predictor config
    chunk_buffer_conf:
        window_n: 7     # frame
        shift_n: 4      # frame
        window_ms: 25   # ms
        shift_ms: 10    # ms
        sample_rate: 16000
        sample_width: 2
--- a/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml
+++ b/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml
@ -0,0 +1,46 @@
 # This is the parameter configuration file for PaddleSpeech Serving.
 #################################################################################
 #                             SERVER SETTING                                    #
 #################################################################################
 host: 0.0.0.0
 port: 8390
 # The task format in the engin_list is: <speech task>_<engine type>
 # task choices = ['asr_online']
 # protocol = ['websocket'] (only one can be selected).
 # websocket only support online engine type.
 protocol: 'websocket'
 engine_list: ['asr_online']
 #################################################################################
 #                                ENGINE CONFIG                                  #
 #################################################################################
 ################################### ASR #########################################
 ################### speech task: asr; engine_type: online #######################
 asr_online:
    model_type: 'conformer_online_wenetspeech'
    am_model: # the pdmodel file of am static model [optional]
    am_params:  # the pdiparams file of am static model [optional]
    lang: 'zh'
    sample_rate: 16000
    cfg_path: 
    decode_method: 
    force_yes: True
    device: 'cpu' # cpu or gpu:id
    decode_method: "attention_rescoring"
    am_predictor_conf:
        device:  # set 'gpu:id' or 'cpu'
        switch_ir_optim: True
        glog_info: False  # True -> print glog
        summary: True  # False -> do not show predictor config
    chunk_buffer_conf:
        window_n: 7     # frame
        shift_n: 4      # frame
        window_ms: 25   # ms
        shift_ms: 10    # ms
        sample_rate: 16000
        sample_width: 2
--- a/demos/audio_content_search/run.sh
+++ b/demos/audio_content_search/run.sh
@ -0,0 +1,7 @@
 export CUDA_VISIBLE_DEVICE=0,1,2,3
 # we need the streaming asr server
 nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log  2>&1  &
 # start the acs server
 nohup paddlespeech_server start --config_file conf/acs_application.yaml > acs.log 2>&1 &
--- a/paddlespeech/server/bin/paddlespeech_client.py
+++ b/paddlespeech/server/bin/paddlespeech_client.py
@ -754,3 +754,88 @@ class VectorClientExecutor(BaseExecutor):
            logger.info(f"The vector score is: {res}")
        else:
            logger.error(f"Sorry, we have not support such task {task}")
@cli_client_register(
    name='paddlespeech_client.acs', description='visit acs service')
 class ACSClientExecutor(BaseExecutor):
    def __init__(self):
        super(ACSClientExecutor, self).__init__()
        self.parser = argparse.ArgumentParser(
            prog='paddlespeech_client.acs', add_help=True)
        self.parser.add_argument(
            '--server_ip', type=str, default='127.0.0.1', help='server ip')
        self.parser.add_argument(
            '--port', type=int, default=8090, help='server port')
        self.parser.add_argument(
            '--input',
            type=str,
            default=None,
            help='Audio file to be recognized',
            required=True)
        self.parser.add_argument(
            '--sample_rate', type=int, default=16000, help='audio sample rate')
        self.parser.add_argument(
            '--lang', type=str, default="zh_cn", help='language')
        self.parser.add_argument(
            '--audio_format', type=str, default="wav", help='audio format')
    def execute(self, argv: List[str]) -> bool:
        args = self.parser.parse_args(argv)
        input_ = args.input
        server_ip = args.server_ip
        port = args.port
        sample_rate = args.sample_rate
        lang = args.lang
        audio_format = args.audio_format
        try:
            time_start = time.time()
            res = self(
                input=input_,
                server_ip=server_ip,
                port=port,
                sample_rate=sample_rate,
                lang=lang,
                audio_format=audio_format, )
            time_end = time.time()
            logger.info(f"ACS result: {res}")
            logger.info("Response time %f s." % (time_end - time_start))
            return True
        except Exception as e:
            logger.error("Failed to speech recognition.")
            logger.error(e)
            return False
    @stats_wrapper
    def __call__(
            self,
            input: str,
            server_ip: str="127.0.0.1",
            port: int=8090,
            sample_rate: int=16000,
            lang: str="zh_cn",
            audio_format: str="wav", ):
        """Python API to call an executor.
        Args:
            input (str): The input audio file path
            server_ip (str, optional): The ASR server ip. Defaults to "127.0.0.1".
            port (int, optional): The ASR server port. Defaults to 8090.
            sample_rate (int, optional): The audio sample rate. Defaults to 16000.
            lang (str, optional): The audio language type. Defaults to "zh_cn".
            audio_format (str, optional): The audio format information. Defaults to "wav".
        Returns:
            str: The ACS results
        """
        # we use the acs server to get the key word time stamp in audio text content
        logger.info("acs http client start")
        from paddlespeech.server.utils.audio_handler import ASRHttpHandler
        handler = ASRHttpHandler(
            server_ip=server_ip, port=port, endpoint="/paddlespeech/asr/search")
        res = handler.run(input, audio_format, sample_rate, lang)
        res = res['result']
        logger.info("acs http client finished")
        return res
--- a/paddlespeech/server/bin/paddlespeech_server.py
+++ b/paddlespeech/server/bin/paddlespeech_server.py
@ -82,7 +82,7 @@ class ServerExecutor(BaseExecutor):
        else:
            raise Exception("unsupported protocol")
        app.include_router(api_router)
-
+        logger.info("start to init the engine")
        if not init_engine_pool(config):
            return False
--- a/paddlespeech/server/engine/acs/init.py
+++ b/paddlespeech/server/engine/acs/init.py
--- a/paddlespeech/server/engine/acs/python/init.py
+++ b/paddlespeech/server/engine/acs/python/init.py
--- a/paddlespeech/server/engine/acs/python/acs_engine.py
+++ b/paddlespeech/server/engine/acs/python/acs_engine.py
@ -0,0 +1,188 @@
 # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import io
 import json
 import os
 import re
 import paddle
 import soundfile
 import websocket
 from paddlespeech.cli.log import logger
 from paddlespeech.server.engine.base_engine import BaseEngine
 class ACSEngine(BaseEngine):
    def __init__(self):
        """The ACSEngine Engine
        """
        super(ACSEngine, self).__init__()
        logger.info("Create the ACSEngine Instance")
        self.word_list = []
    def init(self, config: dict):
        """Init the ACSEngine Engine
        Args:
            config (dict): The server configuation
        Returns:
            bool: The engine instance flag
        """
        logger.info("Init the acs engine")
        try:
            self.config = config
            if self.config.device:
                self.device = self.config.device
            else:
                self.device = paddle.get_device()
            paddle.set_device(self.device)
            logger.info(f"ACS Engine set the device: {self.device}")
        except BaseException as e:
            logger.error(
                "Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
            )
            logger.error("Initialize Text server engine Failed on device: %s." %
                         (self.device))
            return False
        self.read_search_words()
        # init the asr url
        self.url = "ws://" + self.config.asr_server_ip + ":" + str(
            self.config.asr_server_port) + "/paddlespeech/asr/streaming"
        logger.info("Init the acs engine successfully")
        return True
    def read_search_words(self):
        word_list = self.config.word_list
        if word_list is None:
            logger.error(
                "No word list file in config, please set the word list parameter"
            )
            return
        if not os.path.exists(word_list):
            logger.error("Please input correct word list file")
            return
        with open(word_list, 'r') as fp:
            self.word_list = [line.strip() for line in fp.readlines()]
        logger.info(f"word list: {self.word_list}")
    def get_asr_content(self, audio_data):
        """Get the streaming asr result
        Args:
            audio_data (_type_): _description_
        Returns:
            _type_: _description_
        """
        logger.info("send a message to the server")
        if self.url is None:
            logger.error("No asr server, please input valid ip and port")
            return ""
        ws = websocket.WebSocket()
        ws.connect(self.url)
        # with websocket.WebSocket.connect(self.url) as ws:
        audio_info = json.dumps(
            {
                "name": "test.wav",
                "signal": "start",
                "nbest": 1
            },
            sort_keys=True,
            indent=4,
            separators=(',', ': '))
        ws.send(audio_info)
        msg = ws.recv()
        logger.info("client receive msg={}".format(msg))
        # send the total audio data
        samples, sample_rate = soundfile.read(audio_data, dtype='int16')
        ws.send_binary(samples.tobytes())
        msg = ws.recv()
        msg = json.loads(msg)
        logger.info(f"audio result: {msg}")
        # 3. send chunk audio data to engine
        logger.info("send the end signal")
        audio_info = json.dumps(
            {
                "name": "test.wav",
                "signal": "end",
                "nbest": 1
            },
            sort_keys=True,
            indent=4,
            separators=(',', ': '))
        ws.send(audio_info)
        msg = ws.recv()
        msg = json.loads(msg)
        logger.info(f"the final result: {msg}")
        ws.close()
        return msg
    def get_macthed_word(self, msg):
        """Get the matched info in msg
        Args:
            msg (dict): the asr info, including the asr result and time stamp
        Returns:
            acs_result, asr_result: the acs result and the asr result
        """
        asr_result = msg['result']
        time_stamp = msg['times']
        acs_result = []
        # search for each word in self.word_list
        offset = self.config.offset
        max_ed = time_stamp[-1]['ed']
        for w in self.word_list:
            # search the w in asr_result and the index in asr_result
            for m in re.finditer(w, asr_result):
                start = max(time_stamp[m.start(0)]['bg'] - offset, 0)
                end = min(time_stamp[m.end(0) - 1]['ed'] + offset, max_ed)
                logger.info(f'start: {start}, end: {end}')
                acs_result.append({'w': w, 'bg': start, 'ed': end})
        return acs_result, asr_result
    def run(self, audio_data):
        """process the audio data in acs engine
           the engine does not store any data, so all the request use the self.run api
        Args:
            audio_data (str): the audio data
        Returns:
            acs_result, asr_result: the acs result and the asr result
        """
        logger.info("start to process the audio content search")
        msg = self.get_asr_content(io.BytesIO(audio_data))
        acs_result, asr_result = self.get_macthed_word(msg)
        logger.info(f'the asr result {asr_result}')
        logger.info(f'the acs result: {acs_result}')
        return acs_result, asr_result
--- a/paddlespeech/server/engine/engine_factory.py
+++ b/paddlespeech/server/engine/engine_factory.py
@ -52,5 +52,8 @@ class EngineFactory(object):
        elif engine_name.lower() == 'vector' and engine_type.lower() == 'python':
            from paddlespeech.server.engine.vector.python.vector_engine import VectorEngine
            return VectorEngine()
        elif engine_name.lower() == 'acs' and engine_type.lower() == 'python':
            from paddlespeech.server.engine.acs.python.acs_engine import ACSEngine
            return ACSEngine()
        else:
            return None
--- a/paddlespeech/server/engine/engine_pool.py
+++ b/paddlespeech/server/engine/engine_pool.py
@ -34,6 +34,7 @@ def init_engine_pool(config) -> bool:
        engine_type = engine_and_type.split("_")[1]
        ENGINE_POOL[engine] = EngineFactory.get_engine(
            engine_name=engine, engine_type=engine_type)
        if not ENGINE_POOL[engine].init(config=config[engine_and_type]):
            return False
--- a/paddlespeech/server/restful/acs_api.py
+++ b/paddlespeech/server/restful/acs_api.py
@ -0,0 +1,101 @@
 # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import base64
 from typing import Union
 from fastapi import APIRouter
 from paddlespeech.cli.log import logger
 from paddlespeech.server.engine.engine_pool import get_engine_pool
 from paddlespeech.server.restful.request import ASRRequest
 from paddlespeech.server.restful.response import ACSResponse
 from paddlespeech.server.restful.response import ErrorResponse
 from paddlespeech.server.utils.errors import ErrorCode
 from paddlespeech.server.utils.errors import failed_response
 from paddlespeech.server.utils.exception import ServerBaseException
 router = APIRouter()
@router.get('/paddlespeech/asr/search/help')
 def help():
    """help
    Returns:
        json: the audio content search result
    """
    response = {
        "success": "True",
        "code": 200,
        "message": {
            "global": "success"
        },
        "result": {
            "description": "acs server",
            "input": "base64 string of wavfile",
            "output": {
                "asr_result": "你好",
                "acs_result": [{
                    'w': '你',
                    'bg': 0.0,
                    'ed': 1.2
                }]
            }
        }
    }
    return response
@router.post(
    "/paddlespeech/asr/search",
    response_model=Union[ACSResponse, ErrorResponse])
 def acs(request_body: ASRRequest):
    """acs api 
    Args:
        request_body (ASRRequest): the acs request, we reuse the http ASRRequest
    Returns:
        json: the acs result
    """
    try:
        # 1. get the audio data via base64 decoding
        audio_data = base64.b64decode(request_body.audio)
        # 2. get single engine from engine pool
        engine_pool = get_engine_pool()
        acs_engine = engine_pool['acs']
        # 3. no data stored in acs_engine, so we need to create the another instance process the data
        acs_result, asr_result = acs_engine.run(audio_data)
        response = {
            "success": True,
            "code": 200,
            "message": {
                "description": "success"
            },
            "result": {
                "transcription": asr_result,
                "acs": acs_result
            }
        }
    except ServerBaseException as e:
        response = failed_response(e.error_code, e.msg)
    except BaseException as e:
        response = failed_response(ErrorCode.SERVER_UNKOWN_ERR)
        logger.error(e)
    return response
--- a/paddlespeech/server/restful/api.py
+++ b/paddlespeech/server/restful/api.py
@ -22,6 +22,7 @@ from paddlespeech.server.restful.cls_api import router as cls_router
 from paddlespeech.server.restful.text_api import router as text_router
 from paddlespeech.server.restful.tts_api import router as tts_router
 from paddlespeech.server.restful.vector_api import router as vec_router
 from paddlespeech.server.restful.acs_api import router as acs_router
 _router = APIRouter()
@ -45,6 +46,8 @@ def setup_router(api_list: List):
            _router.include_router(text_router)
        elif api_name.lower() == 'vector':
            _router.include_router(vec_router)
        elif api_name.lower() == 'acs':
            _router.include_router(acs_router)
        else:
            logger.error(
                f"PaddleSpeech has not support such service: {api_name}")
--- a/paddlespeech/server/restful/response.py
+++ b/paddlespeech/server/restful/response.py
@ -17,7 +17,7 @@ from pydantic import BaseModel
 __all__ = [
    'ASRResponse', 'TTSResponse', 'CLSResponse', 'TextResponse',
-    'VectorResponse', 'VectorScoreResponse'
+    'VectorResponse', 'VectorScoreResponse', 'ACSResponse'
 ]
@ -231,3 +231,32 @@ class ErrorResponse(BaseModel):
    success: bool
    code: int
    message: Message
 #****************************************************************************************/
 #************************************ ACS response **************************************/
 #****************************************************************************************/
 class AcsResult(BaseModel):
    transcription: str
    acs: list
 class ACSResponse(BaseModel):
    """
    response example
    {
        "success": true,
        "code": 0,
        "message": {
            "description": "success" 
        },
        "result": {
            "transcription": "你好，飞桨"
            "acs": [(你好, 0.0, 0.45)]
        }
    }
    """
    success: bool
    code: int
    message: Message
    result: AcsResult
--- a/paddlespeech/server/utils/audio_handler.py
+++ b/paddlespeech/server/utils/audio_handler.py
@ -205,7 +205,7 @@ class ASRWsAudioHandler:
 class ASRHttpHandler:
-    def __init__(self, server_ip=None, port=None):
+    def __init__(self, server_ip=None, port=None, endpoint="/paddlespeech/asr"):
        """The ASR client http request
        Args:
@ -219,7 +219,7 @@ class ASRHttpHandler:
            self.url = None
        else:
            self.url = 'http://' + self.server_ip + ":" + str(
-                self.port) + '/paddlespeech/asr'
+                self.port) + endpoint
        logger.info(f"endpoint: {self.url}")
    def run(self, input, audio_format, sample_rate, lang):
--- a/paddlespeech/server/ws/asr_api.py
+++ b/paddlespeech/server/ws/asr_api.py
@ -18,9 +18,9 @@ from fastapi import WebSocket
 from fastapi import WebSocketDisconnect
 from starlette.websockets import WebSocketState as WebSocketState
 from paddlespeech.cli.log import logger
 from paddlespeech.server.engine.asr.online.asr_engine import PaddleASRConnectionHanddler
 from paddlespeech.server.engine.engine_pool import get_engine_pool
 router = APIRouter()
@ -106,5 +106,5 @@ async def websocket_endpoint(websocket: WebSocket):
                # if the engine create the vad instance, this connection will have many period results 
                resp = {'result': asr_results}
                await websocket.send_json(resp)
-    except WebSocketDisconnect:
+    except WebSocketDisconnect as e:
-        pass
+        logger.error(e)