Merge pull request #1906 from Honei/acs_server

[acs][server]add audio content search server
4 years ago · bde7093578
parent 4b3f6c615e 5793f1bc1a
commit bde7093578
20 changed files with 747 additions and 8 deletions
--- a/demos/audio_content_search/README.md
+++ b/demos/audio_content_search/README.md
@ -0,0 +1,74 @@
+([简体中文](./README_cn.md)|English)
+# ACS (Audio Content Search)
+
+## Introduction
+ACS, or Audio Content Search, refers to the problem of getting the key word time stamp from automatically transcribe spoken language (speech-to-text). 
+
+This demo is an implementation of obtaining the keyword timestamp in the text from a given audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. 
+Now, the search word in demo is:
+```
+我
+康
+```
+## Usage
+### 1. Installation
+see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
+
+You can choose one way from meduim and hard to install paddlespeech.
+
+The dependency refers to the requirements.txt
+### 2. Prepare Input File
+The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
+
+Here are sample files for this demo that can be downloaded:
+```bash
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+```
+
+### 3. Usage
+- Command Line(Recommended)
+  ```bash
+  # Chinese
+  paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav 
+  ```
+  
+  Usage:
+  ```bash
+  paddlespeech asr --help
+  ```
+  Arguments:
+  - `input`(required): Audio file to recognize.
+  - `server_ip`: the server ip.
+  - `port`: the server port.
+  - `lang`: the language type of the model. Default: `zh`.
+  - `sample_rate`: Sample rate of the model. Default: `16000`.
+  - `audio_format`: The audio format.
+
+  Output:
+  ```bash
+  [2022-05-15 15:00:58,185] [    INFO] - acs http client start
+  [2022-05-15 15:00:58,185] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:01:03,220] [    INFO] - acs http client finished
+  [2022-05-15 15:01:03,221] [    INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  [2022-05-15 15:01:03,221] [    INFO] - Response time 5.036084 s.
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
+
+  acs_executor = ACSClientExecutor()
+  res = acs_executor(
+      input='./zh.wav',
+      server_ip="127.0.0.1",
+      port=8490,)
+  print(res)
+  ```
+
+  Output:
+  ```bash
+  [2022-05-15 15:08:13,955] [    INFO] - acs http client start
+  [2022-05-15 15:08:13,956] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:08:19,026] [    INFO] - acs http client finished
+  {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  ```
--- a/demos/audio_content_search/README_cn.md
+++ b/demos/audio_content_search/README_cn.md
@ -0,0 +1,74 @@
+(简体中文|[English](./README.md))
+
+# 语音内容搜索
+## 介绍
+语音内容搜索是一项用计算机程序获取转录语音内容关键词时间戳的技术。
+
+这个 demo 是一个从给定音频文件获取其文本中关键词时间戳的实现，它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
+
+当前示例中检索词是
+```
+我
+康
+```
+## 使用方法
+### 1. 安装
+请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
+
+你可以从 medium，hard 三中方式中选择一种方式安装。
+依赖参见 requirements.txt
+
+### 2. 准备输入
+这个 demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
+
+可以下载此 demo 的示例音频：
+```bash
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+```
+### 3. 使用方法
+- 命令行 (推荐使用)
+  ```bash
+  # 中文
+  paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav 
+  ```
+  
+  使用方法：
+  ```bash
+  paddlespeech acs --help
+  ```
+  参数：
+  - `input`(必须输入)：用于识别的音频文件。
+  - `server_ip`: 服务的ip。
+  - `port`：服务的端口。
+  - `lang`：模型语言，默认值：`zh`。
+  - `sample_rate`：音频采样率，默认值：`16000`。
+  - `audio_format`: 音频的格式。
+
+  输出：
+  ```bash
+  [2022-05-15 15:00:58,185] [    INFO] - acs http client start
+  [2022-05-15 15:00:58,185] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:01:03,220] [    INFO] - acs http client finished
+  [2022-05-15 15:01:03,221] [    INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  [2022-05-15 15:01:03,221] [    INFO] - Response time 5.036084 s.
+  ```
+
+- Python API
+  ```python
+  from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
+
+  acs_executor = ACSClientExecutor()
+  res = acs_executor(
+      input='./zh.wav',
+      server_ip="127.0.0.1",
+      port=8490,)
+  print(res)
+  ```
+
+  输出：
+  ```bash
+  [2022-05-15 15:08:13,955] [    INFO] - acs http client start
+  [2022-05-15 15:08:13,956] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
+  [2022-05-15 15:08:19,026] [    INFO] - acs http client finished
+  {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
+  ```
--- a/demos/audio_content_search/acs_clinet.py
+++ b/demos/audio_content_search/acs_clinet.py
@ -0,0 +1,49 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.utils.audio_handler import ASRHttpHandler
+
+
+def main(args):
+    logger.info("asr http client start")
+    audio_format = "wav"
+    sample_rate = 16000
+    lang = "zh"
+    handler = ASRHttpHandler(
+        server_ip=args.server_ip, port=args.port, endpoint=args.endpoint)
+    res = handler.run(args.wavfile, audio_format, sample_rate, lang)
+    # res = res['result']
+    logger.info(f"the final result: {res}")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="audio content search client")
+    parser.add_argument(
+        '--server_ip', type=str, default='127.0.0.1', help='server ip')
+    parser.add_argument('--port', type=int, default=8090, help='server port')
+    parser.add_argument(
+        "--wavfile",
+        action="store",
+        help="wav file path ",
+        default="./16_audio.wav")
+    parser.add_argument(
+        '--endpoint',
+        type=str,
+        default='/paddlespeech/asr/search',
+        help='server endpoint')
+    args = parser.parse_args()
+
+    main(args)
--- a/demos/audio_content_search/conf/acs_application.yaml
+++ b/demos/audio_content_search/conf/acs_application.yaml
@ -0,0 +1,34 @@
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8490
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['acs_python']
+# protocol = ['http'] (only one can be selected). 
+# http only support offline engine type.
+protocol: 'http'
+engine_list: ['acs_python']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ACS #########################################
+################### acs task: engine_type: python ###############################
+acs_python:
+    task: acs
+    asr_protocol: 'websocket' # 'websocket'
+    offset: 1.0 # second
+    asr_server_ip: 127.0.0.1
+    asr_server_port: 8390
+    lang: 'zh'
+    word_list: "./conf/words.txt"
+    sample_rate: 16000
+    device: 'cpu' # set 'gpu:id' or 'cpu'
+
+
+
+
--- a/demos/audio_content_search/conf/words.txt
+++ b/demos/audio_content_search/conf/words.txt
@ -0,0 +1,2 @@
+我
+康
--- a/demos/audio_content_search/conf/ws_conformer_application.yaml
+++ b/demos/audio_content_search/conf/ws_conformer_application.yaml
@ -0,0 +1,43 @@
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8390
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_multicn'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 'attention_rescoring' 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
--- a/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml
+++ b/demos/audio_content_search/conf/ws_conformer_wenetspeech_application.yaml
@ -0,0 +1,46 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8390
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_wenetspeech'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2
--- a/demos/audio_content_search/run.sh
+++ b/demos/audio_content_search/run.sh
@ -0,0 +1,7 @@
+export CUDA_VISIBLE_DEVICE=0,1,2,3
+# we need the streaming asr server
+nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml > streaming_asr.log  2>&1  &
+
+# start the acs server
+nohup paddlespeech_server start --config_file conf/acs_application.yaml > acs.log 2>&1 &
+
--- a/paddlespeech/server/bin/paddlespeech_client.py
+++ b/paddlespeech/server/bin/paddlespeech_client.py
@ -754,3 +754,88 @@ class VectorClientExecutor(BaseExecutor):
            logger.info(f"The vector score is: {res}")
        else:
            logger.error(f"Sorry, we have not support such task {task}")
+
+
+@cli_client_register(
+    name='paddlespeech_client.acs', description='visit acs service')
+class ACSClientExecutor(BaseExecutor):
+    def __init__(self):
+        super(ACSClientExecutor, self).__init__()
+        self.parser = argparse.ArgumentParser(
+            prog='paddlespeech_client.acs', add_help=True)
+        self.parser.add_argument(
+            '--server_ip', type=str, default='127.0.0.1', help='server ip')
+        self.parser.add_argument(
+            '--port', type=int, default=8090, help='server port')
+        self.parser.add_argument(
+            '--input',
+            type=str,
+            default=None,
+            help='Audio file to be recognized',
+            required=True)
+        self.parser.add_argument(
+            '--sample_rate', type=int, default=16000, help='audio sample rate')
+        self.parser.add_argument(
+            '--lang', type=str, default="zh_cn", help='language')
+        self.parser.add_argument(
+            '--audio_format', type=str, default="wav", help='audio format')
+
+    def execute(self, argv: List[str]) -> bool:
+        args = self.parser.parse_args(argv)
+        input_ = args.input
+        server_ip = args.server_ip
+        port = args.port
+        sample_rate = args.sample_rate
+        lang = args.lang
+        audio_format = args.audio_format
+
+        try:
+            time_start = time.time()
+            res = self(
+                input=input_,
+                server_ip=server_ip,
+                port=port,
+                sample_rate=sample_rate,
+                lang=lang,
+                audio_format=audio_format, )
+            time_end = time.time()
+            logger.info(f"ACS result: {res}")
+            logger.info("Response time %f s." % (time_end - time_start))
+            return True
+        except Exception as e:
+            logger.error("Failed to speech recognition.")
+            logger.error(e)
+            return False
+
+    @stats_wrapper
+    def __call__(
+            self,
+            input: str,
+            server_ip: str="127.0.0.1",
+            port: int=8090,
+            sample_rate: int=16000,
+            lang: str="zh_cn",
+            audio_format: str="wav", ):
+        """Python API to call an executor.
+
+        Args:
+            input (str): The input audio file path
+            server_ip (str, optional): The ASR server ip. Defaults to "127.0.0.1".
+            port (int, optional): The ASR server port. Defaults to 8090.
+            sample_rate (int, optional): The audio sample rate. Defaults to 16000.
+            lang (str, optional): The audio language type. Defaults to "zh_cn".
+            audio_format (str, optional): The audio format information. Defaults to "wav".
+
+        Returns:
+            str: The ACS results
+        """
+        # we use the acs server to get the key word time stamp in audio text content
+        logger.info("acs http client start")
+        from paddlespeech.server.utils.audio_handler import ASRHttpHandler
+        handler = ASRHttpHandler(
+            server_ip=server_ip, port=port, endpoint="/paddlespeech/asr/search")
+        res = handler.run(input, audio_format, sample_rate, lang)
+        res = res['result']
+        logger.info("acs http client finished")
+
+        return res
--- a/paddlespeech/server/bin/paddlespeech_server.py
+++ b/paddlespeech/server/bin/paddlespeech_server.py
@ -82,7 +82,7 @@ class ServerExecutor(BaseExecutor):
        else:
            raise Exception("unsupported protocol")
        app.include_router(api_router)
-
+        logger.info("start to init the engine")
        if not init_engine_pool(config):
            return False

--- a/paddlespeech/server/engine/acs/init.py
+++ b/paddlespeech/server/engine/acs/init.py
--- a/paddlespeech/server/engine/acs/python/init.py
+++ b/paddlespeech/server/engine/acs/python/init.py
--- a/paddlespeech/server/engine/acs/python/acs_engine.py
+++ b/paddlespeech/server/engine/acs/python/acs_engine.py
@ -0,0 +1,188 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import io
+import json
+import os
+import re
+
+import paddle
+import soundfile
+import websocket
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.engine.base_engine import BaseEngine
+
+
+class ACSEngine(BaseEngine):
+    def __init__(self):
+        """The ACSEngine Engine
+        """
+        super(ACSEngine, self).__init__()
+        logger.info("Create the ACSEngine Instance")
+        self.word_list = []
+
+    def init(self, config: dict):
+        """Init the ACSEngine Engine
+
+        Args:
+            config (dict): The server configuation
+
+        Returns:
+            bool: The engine instance flag
+        """
+        logger.info("Init the acs engine")
+        try:
+            self.config = config
+            if self.config.device:
+                self.device = self.config.device
+            else:
+                self.device = paddle.get_device()
+
+            paddle.set_device(self.device)
+            logger.info(f"ACS Engine set the device: {self.device}")
+
+        except BaseException as e:
+            logger.error(
+                "Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
+            )
+            logger.error("Initialize Text server engine Failed on device: %s." %
+                         (self.device))
+            return False
+
+        self.read_search_words()
+
+        # init the asr url
+        self.url = "ws://" + self.config.asr_server_ip + ":" + str(
+            self.config.asr_server_port) + "/paddlespeech/asr/streaming"
+
+        logger.info("Init the acs engine successfully")
+        return True
+
+    def read_search_words(self):
+        word_list = self.config.word_list
+        if word_list is None:
+            logger.error(
+                "No word list file in config, please set the word list parameter"
+            )
+            return
+
+        if not os.path.exists(word_list):
+            logger.error("Please input correct word list file")
+            return
+
+        with open(word_list, 'r') as fp:
+            self.word_list = [line.strip() for line in fp.readlines()]
+
+        logger.info(f"word list: {self.word_list}")
+
+    def get_asr_content(self, audio_data):
+        """Get the streaming asr result
+
+        Args:
+            audio_data (_type_): _description_
+
+        Returns:
+            _type_: _description_
+        """
+        logger.info("send a message to the server")
+        if self.url is None:
+            logger.error("No asr server, please input valid ip and port")
+            return ""
+        ws = websocket.WebSocket()
+        ws.connect(self.url)
+        # with websocket.WebSocket.connect(self.url) as ws:
+        audio_info = json.dumps(
+            {
+                "name": "test.wav",
+                "signal": "start",
+                "nbest": 1
+            },
+            sort_keys=True,
+            indent=4,
+            separators=(',', ': '))
+        ws.send(audio_info)
+        msg = ws.recv()
+        logger.info("client receive msg={}".format(msg))
+
+        # send the total audio data
+        samples, sample_rate = soundfile.read(audio_data, dtype='int16')
+        ws.send_binary(samples.tobytes())
+        msg = ws.recv()
+        msg = json.loads(msg)
+        logger.info(f"audio result: {msg}")
+
+        # 3. send chunk audio data to engine
+        logger.info("send the end signal")
+        audio_info = json.dumps(
+            {
+                "name": "test.wav",
+                "signal": "end",
+                "nbest": 1
+            },
+            sort_keys=True,
+            indent=4,
+            separators=(',', ': '))
+        ws.send(audio_info)
+        msg = ws.recv()
+        msg = json.loads(msg)
+
+        logger.info(f"the final result: {msg}")
+        ws.close()
+
+        return msg
+
+    def get_macthed_word(self, msg):
+        """Get the matched info in msg
+
+        Args:
+            msg (dict): the asr info, including the asr result and time stamp
+
+        Returns:
+            acs_result, asr_result: the acs result and the asr result
+        """
+        asr_result = msg['result']
+        time_stamp = msg['times']
+        acs_result = []
+
+        # search for each word in self.word_list
+        offset = self.config.offset
+        max_ed = time_stamp[-1]['ed']
+        for w in self.word_list:
+            # search the w in asr_result and the index in asr_result
+            for m in re.finditer(w, asr_result):
+                start = max(time_stamp[m.start(0)]['bg'] - offset, 0)
+
+                end = min(time_stamp[m.end(0) - 1]['ed'] + offset, max_ed)
+                logger.info(f'start: {start}, end: {end}')
+                acs_result.append({'w': w, 'bg': start, 'ed': end})
+
+        return acs_result, asr_result
+
+    def run(self, audio_data):
+        """process the audio data in acs engine
+           the engine does not store any data, so all the request use the self.run api
+
+        Args:
+            audio_data (str): the audio data
+
+        Returns:
+            acs_result, asr_result: the acs result and the asr result
+        """
+        logger.info("start to process the audio content search")
+        msg = self.get_asr_content(io.BytesIO(audio_data))
+
+        acs_result, asr_result = self.get_macthed_word(msg)
+        logger.info(f'the asr result {asr_result}')
+        logger.info(f'the acs result: {acs_result}')
+        return acs_result, asr_result
--- a/paddlespeech/server/engine/engine_factory.py
+++ b/paddlespeech/server/engine/engine_factory.py
@ -52,5 +52,8 @@ class EngineFactory(object):
        elif engine_name.lower() == 'vector' and engine_type.lower() == 'python':
            from paddlespeech.server.engine.vector.python.vector_engine import VectorEngine
            return VectorEngine()
+        elif engine_name.lower() == 'acs' and engine_type.lower() == 'python':
+            from paddlespeech.server.engine.acs.python.acs_engine import ACSEngine
+            return ACSEngine()
        else:
            return None
--- a/paddlespeech/server/engine/engine_pool.py
+++ b/paddlespeech/server/engine/engine_pool.py
@ -34,6 +34,7 @@ def init_engine_pool(config) -> bool:
        engine_type = engine_and_type.split("_")[1]
        ENGINE_POOL[engine] = EngineFactory.get_engine(
            engine_name=engine, engine_type=engine_type)
+
        if not ENGINE_POOL[engine].init(config=config[engine_and_type]):
            return False

--- a/paddlespeech/server/restful/acs_api.py
+++ b/paddlespeech/server/restful/acs_api.py
@ -0,0 +1,101 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import base64
+from typing import Union
+
+from fastapi import APIRouter
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.engine.engine_pool import get_engine_pool
+from paddlespeech.server.restful.request import ASRRequest
+from paddlespeech.server.restful.response import ACSResponse
+from paddlespeech.server.restful.response import ErrorResponse
+from paddlespeech.server.utils.errors import ErrorCode
+from paddlespeech.server.utils.errors import failed_response
+from paddlespeech.server.utils.exception import ServerBaseException
+
+router = APIRouter()
+
+
+@router.get('/paddlespeech/asr/search/help')
+def help():
+    """help
+
+    Returns:
+        json: the audio content search result
+    """
+    response = {
+        "success": "True",
+        "code": 200,
+        "message": {
+            "global": "success"
+        },
+        "result": {
+            "description": "acs server",
+            "input": "base64 string of wavfile",
+            "output": {
+                "asr_result": "你好",
+                "acs_result": [{
+                    'w': '你',
+                    'bg': 0.0,
+                    'ed': 1.2
+                }]
+            }
+        }
+    }
+    return response
+
+
+@router.post(
+    "/paddlespeech/asr/search",
+    response_model=Union[ACSResponse, ErrorResponse])
+def acs(request_body: ASRRequest):
+    """acs api 
+
+    Args:
+        request_body (ASRRequest): the acs request, we reuse the http ASRRequest
+
+    Returns:
+        json: the acs result
+    """
+    try:
+        # 1. get the audio data via base64 decoding
+        audio_data = base64.b64decode(request_body.audio)
+
+        # 2. get single engine from engine pool
+        engine_pool = get_engine_pool()
+        acs_engine = engine_pool['acs']
+
+        # 3. no data stored in acs_engine, so we need to create the another instance process the data
+        acs_result, asr_result = acs_engine.run(audio_data)
+
+        response = {
+            "success": True,
+            "code": 200,
+            "message": {
+                "description": "success"
+            },
+            "result": {
+                "transcription": asr_result,
+                "acs": acs_result
+            }
+        }
+
+    except ServerBaseException as e:
+        response = failed_response(e.error_code, e.msg)
+    except BaseException as e:
+        response = failed_response(ErrorCode.SERVER_UNKOWN_ERR)
+        logger.error(e)
+
+    return response
--- a/paddlespeech/server/restful/api.py
+++ b/paddlespeech/server/restful/api.py
@ -22,6 +22,7 @@ from paddlespeech.server.restful.cls_api import router as cls_router
 from paddlespeech.server.restful.text_api import router as text_router
 from paddlespeech.server.restful.tts_api import router as tts_router
 from paddlespeech.server.restful.vector_api import router as vec_router
+from paddlespeech.server.restful.acs_api import router as acs_router
 _router = APIRouter()


@ -45,6 +46,8 @@ def setup_router(api_list: List):
            _router.include_router(text_router)
        elif api_name.lower() == 'vector':
            _router.include_router(vec_router)
+        elif api_name.lower() == 'acs':
+            _router.include_router(acs_router)
        else:
            logger.error(
                f"PaddleSpeech has not support such service: {api_name}")
--- a/paddlespeech/server/restful/response.py
+++ b/paddlespeech/server/restful/response.py
@ -17,7 +17,7 @@ from pydantic import BaseModel

 __all__ = [
    'ASRResponse', 'TTSResponse', 'CLSResponse', 'TextResponse',
-    'VectorResponse', 'VectorScoreResponse'
+    'VectorResponse', 'VectorScoreResponse', 'ACSResponse'
 ]


@ -231,3 +231,32 @@ class ErrorResponse(BaseModel):
    success: bool
    code: int
    message: Message
+
+
+#****************************************************************************************/
+#************************************ ACS response **************************************/
+#****************************************************************************************/
+class AcsResult(BaseModel):
+    transcription: str
+    acs: list
+
+
+class ACSResponse(BaseModel):
+    """
+    response example
+    {
+        "success": true,
+        "code": 0,
+        "message": {
+            "description": "success" 
+        },
+        "result": {
+            "transcription": "你好，飞桨"
+            "acs": [(你好, 0.0, 0.45)]
+        }
+    }
+    """
+    success: bool
+    code: int
+    message: Message
+    result: AcsResult
--- a/paddlespeech/server/utils/audio_handler.py
+++ b/paddlespeech/server/utils/audio_handler.py
@ -205,7 +205,7 @@ class ASRWsAudioHandler:


 class ASRHttpHandler:
-    def __init__(self, server_ip=None, port=None):
+    def __init__(self, server_ip=None, port=None, endpoint="/paddlespeech/asr"):
        """The ASR client http request

        Args:
@ -219,7 +219,7 @@ class ASRHttpHandler:
            self.url = None
        else:
            self.url = 'http://' + self.server_ip + ":" + str(
-                self.port) + '/paddlespeech/asr'
+                self.port) + endpoint
        logger.info(f"endpoint: {self.url}")

    def run(self, input, audio_format, sample_rate, lang):
--- a/paddlespeech/server/ws/asr_api.py
+++ b/paddlespeech/server/ws/asr_api.py
@ -18,9 +18,9 @@ from fastapi import WebSocket
 from fastapi import WebSocketDisconnect
 from starlette.websockets import WebSocketState as WebSocketState

+from paddlespeech.cli.log import logger
 from paddlespeech.server.engine.asr.online.asr_engine import PaddleASRConnectionHanddler
 from paddlespeech.server.engine.engine_pool import get_engine_pool
-
 router = APIRouter()


@ -106,5 +106,5 @@ async def websocket_endpoint(websocket: WebSocket):
                # if the engine create the vad instance, this connection will have many period results 
                resp = {'result': asr_results}
                await websocket.send_json(resp)
-    except WebSocketDisconnect:
-        pass
+    except WebSocketDisconnect as e:
+        logger.error(e)