update the acs engine doc, test=doc

3 years ago · 3535079434
parent d94ab22e92
commit 3535079434
12 changed files with 406 additions and 70 deletions
--- a/demos/audio_content_search/README.md
+++ b/demos/audio_content_search/README.md
@ -0,0 +1,69 @@
 ([简体中文](./README_cn.md)|English)
 # ACS (Audio Content Search)
 ## Introduction
 ACS, or Audio Content Search, refers to the problem of getting the key word time stamp to from automatically transcribe spoken language (speech-to-text). 
 This demo is an implementation to get the key word stamp from the text from a specific audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. 
 ## Usage
 ### 1. Installation
 see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
 You can choose one way from meduim and hard to install paddlespeech.
 ### 2. Prepare Input File
 The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
 Here are sample files for this demo that can be downloaded:
 ```bash
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
 ```
 ### 3. Usage
 - Command Line(Recommended)
  ```bash
  # Chinese
  paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav 
  ```
  Usage:
  ```bash
  paddlespeech asr --help
  ```
  Arguments:
  - `input`(required): Audio file to recognize.
  - `server_ip`: the server ip.
  - `port`: the server port.
  - `lang`: the language type of the model. Default: `zh`.
  - `sample_rate`: Sample rate of the model. Default: `16000`.
  - `audio_format`: The audio format.
  Output:
  ```bash
  [2022-05-15 15:00:58,185] [    INFO] - acs http client start
  [2022-05-15 15:00:58,185] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
  [2022-05-15 15:01:03,220] [    INFO] - acs http client finished
  [2022-05-15 15:01:03,221] [    INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
  [2022-05-15 15:01:03,221] [    INFO] - Response time 5.036084 s.
  ```
 - Python API
  ```python
  from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
  acs_executor = ACSClientExecutor()
  res = acs_executor(
      input='./zh.wav',
      server_ip="127.0.0.1",
      port=8490,)
  print(res)
  ```
  Output:
  ```bash
  [2022-05-15 15:08:13,955] [    INFO] - acs http client start
  [2022-05-15 15:08:13,956] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
  [2022-05-15 15:08:19,026] [    INFO] - acs http client finished
  {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
  ```
--- a/demos/audio_content_search/README_cn.md
+++ b/demos/audio_content_search/README_cn.md
@ -0,0 +1,68 @@
 (简体中文|[English](./README.md))
 # 语音内容搜索
 ## 介绍
 语音内容搜索是一项用计算机程序获取转录语音内容关键词时间戳的技术。
 这个 demo 是一个从给定音频文件获取其文本中关键词时间戳的实现，它可以通过使用 `PaddleSpeech` 的单个命令或 python 中的几行代码来实现。
 ## 使用方法
 ### 1. 安装
 请看[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md)。
 你可以从 medium，hard 三中方式中选择一种方式安装。
 ### 2. 准备输入
 这个 demo 的输入应该是一个 WAV 文件（`.wav`），并且采样率必须与模型的采样率相同。
 可以下载此 demo 的示例音频：
 ```bash
 wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
 ```
 ### 3. 使用方法
 - 命令行 (推荐使用)
  ```bash
  # 中文
  paddlespeech_client acs --server_ip 127.0.0.1 --port 8090 --input ./zh.wav 
  ```
  使用方法：
  ```bash
  paddlespeech acs --help
  ```
  参数：
  - `input`(必须输入)：用于识别的音频文件。
  - `server_ip`: 服务的ip。
  - `port`：服务的端口。
  - `lang`：模型语言，默认值：`zh`。
  - `sample_rate`：音频采样率，默认值：`16000`。
  - `audio_format`: 音频的格式。
  输出：
  ```bash
  [2022-05-15 15:00:58,185] [    INFO] - acs http client start
  [2022-05-15 15:00:58,185] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
  [2022-05-15 15:01:03,220] [    INFO] - acs http client finished
  [2022-05-15 15:01:03,221] [    INFO] - ACS result: {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
  [2022-05-15 15:01:03,221] [    INFO] - Response time 5.036084 s.
  ```
 - Python API
  ```python
  from paddlespeech.server.bin.paddlespeech_client import ACSClientExecutor
  acs_executor = ACSClientExecutor()
  res = acs_executor(
      input='./zh.wav',
      server_ip="127.0.0.1",
      port=8490,)
  print(res)
  ```
  输出：
  ```bash
  [2022-05-15 15:08:13,955] [    INFO] - acs http client start
  [2022-05-15 15:08:13,956] [    INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search
  [2022-05-15 15:08:19,026] [    INFO] - acs http client finished
  {'transcription': '我认为跑步最重要的就是给我带来了身体健康', 'acs': [{'w': '我', 'bg': 0, 'ed': 1.6800000000000002}, {'w': '我', 'bg': 2.1, 'ed': 4.28}, {'w': '康', 'bg': 3.2, 'ed': 4.92}]}
  ```
--- a/demos/audio_content_search/conf/acs_application.yaml
+++ b/demos/audio_content_search/conf/acs_application.yaml
@ -27,7 +27,7 @@ acs_python:
    asr_server_ip: 127.0.0.1
    asr_server_port: 8390
    lang: 'zh'
-    word_list: "words.txt"
+    word_list: "./conf/words.txt"
    sample_rate: 16000
    device: 'cpu' # set 'gpu:id' or 'cpu'
--- a/demos/audio_content_search/conf/words.txt
+++ b/demos/audio_content_search/conf/words.txt
--- a/demos/audio_content_search/run.sh
+++ b/demos/audio_content_search/run.sh
@ -0,0 +1,6 @@
 export CUDA_VISIBLE_DEVICE=0,1,2,3
 #nohup python3 streaming_asr_server.py --config_file conf/ws_conformer_application.yaml &> streaming_asr.log &
 # nohup python3 punc_server.py --config_file conf/punc_application.yaml > punc.log 2>&1 &
 paddlespeech_server start --config_file conf/acs_application.yaml 
--- a/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
+++ b/demos/streaming_asr_server/conf/ws_conformer_wenetspeech_application.yaml
@ -4,7 +4,7 @@
 #                             SERVER SETTING                                    #
 #################################################################################
 host: 0.0.0.0
-port: 8390
+port: 8090
 # The task format in the engin_list is: <speech task>_<engine type>
 # task choices = ['asr_online']
--- a/paddlespeech/server/bin/paddlespeech_client.py
+++ b/paddlespeech/server/bin/paddlespeech_client.py
@ -752,3 +752,88 @@ class VectorClientExecutor(BaseExecutor):
            logger.info(f"The vector score is: {res}")
        else:
            logger.error(f"Sorry, we have not support such task {task}")
@cli_client_register(
    name='paddlespeech_client.acs', description='visit acs service')
 class ACSClientExecutor(BaseExecutor):
    def __init__(self):
        super(ACSClientExecutor, self).__init__()
        self.parser = argparse.ArgumentParser(
            prog='paddlespeech_client.acs', add_help=True)
        self.parser.add_argument(
            '--server_ip', type=str, default='127.0.0.1', help='server ip')
        self.parser.add_argument(
            '--port', type=int, default=8090, help='server port')
        self.parser.add_argument(
            '--input',
            type=str,
            default=None,
            help='Audio file to be recognized',
            required=True)
        self.parser.add_argument(
            '--sample_rate', type=int, default=16000, help='audio sample rate')
        self.parser.add_argument(
            '--lang', type=str, default="zh_cn", help='language')
        self.parser.add_argument(
            '--audio_format', type=str, default="wav", help='audio format')
    def execute(self, argv: List[str]) -> bool:
        args = self.parser.parse_args(argv)
        input_ = args.input
        server_ip = args.server_ip
        port = args.port
        sample_rate = args.sample_rate
        lang = args.lang
        audio_format = args.audio_format
        try:
            time_start = time.time()
            res = self(
                input=input_,
                server_ip=server_ip,
                port=port,
                sample_rate=sample_rate,
                lang=lang,
                audio_format=audio_format, )
            time_end = time.time()
            logger.info(f"ACS result: {res}")
            logger.info("Response time %f s." % (time_end - time_start))
            return True
        except Exception as e:
            logger.error("Failed to speech recognition.")
            logger.error(e)
            return False
    @stats_wrapper
    def __call__(
            self,
            input: str,
            server_ip: str="127.0.0.1",
            port: int=8090,
            sample_rate: int=16000,
            lang: str="zh_cn",
            audio_format: str="wav", ):
        """Python API to call an executor.
        Args:
            input (str): The input audio file path
            server_ip (str, optional): The ASR server ip. Defaults to "127.0.0.1".
            port (int, optional): The ASR server port. Defaults to 8090.
            sample_rate (int, optional): The audio sample rate. Defaults to 16000.
            lang (str, optional): The audio language type. Defaults to "zh_cn".
            audio_format (str, optional): The audio format information. Defaults to "wav".
        Returns:
            str: The ACS results
        """
        # we use the acs server to get the key word time stamp in audio text content
        logger.info("asr http client start")
        from paddlespeech.server.utils.audio_handler import ASRHttpHandler
        handler = ASRHttpHandler(
            server_ip=server_ip, port=port, endpoint="/paddlespeech/asr/search")
        res = handler.run(input, audio_format, sample_rate, lang)
        res = res['result']
        logger.info("asr http client finished")
        return res
--- a/paddlespeech/server/engine/acs/python/acs_engine.py
+++ b/paddlespeech/server/engine/acs/python/acs_engine.py
@ -62,6 +62,7 @@ class ACSEngine(BaseEngine):
        self.read_search_words()
        # init the asr url
        self.url = "ws://" + self.config.asr_server_ip + ":" + str(
            self.config.asr_server_port) + "/paddlespeech/asr/streaming"
@ -81,11 +82,19 @@ class ACSEngine(BaseEngine):
            return
        with open(word_list, 'r') as fp:
-            self.word_list = fp.readlines()
+            self.word_list = [line.strip() for line in fp.readlines()]
        logger.info(f"word list: {self.word_list}")
    def get_asr_content(self, audio_data):
        """Get the streaming asr result
        Args:
            audio_data (_type_): _description_
        Returns:
            _type_: _description_
        """
        logger.info("send a message to the server")
        if self.url is None:
            logger.error("No asr server, please input valid ip and port")
@ -134,17 +143,46 @@ class ACSEngine(BaseEngine):
        return msg
    def get_macthed_word(self, msg):
        """Get the matched info in msg
        Args:
            msg (dict): the asr info, including the asr result and time stamp
        Returns:
            acs_result, asr_result: the acs result and the asr result
        """
        asr_result = msg['result']
        time_stamp = msg['times']
        acs_result = []
        # search for each word in self.word_list
        offset = self.config.offset
        max_ed = time_stamp[-1]['ed']
        for w in self.word_list:
            # search the w in asr_result and the index in asr_result
            for m in re.finditer(w, asr_result):
-                start = time_stamp[m.start(0)]['bg']
+                start = max(time_stamp[m.start(0)]['bg'] - offset, 0)
-                end = time_stamp[m.end(0) - 1]['ed']
+
                end = min(time_stamp[m.end(0) - 1]['ed'] + offset, max_ed)
                logger.info(f'start: {start}, end: {end}')
                acs_result.append({'w': w, 'bg': start, 'ed': end})
        return acs_result, asr_result
    def run(self, audio_data):
        """process the audio data in acs engine
           the engine does not store any data, so all the request use the self.run api
        Args:
            audio_data (str): the audio data
        Returns:
            acs_result, asr_result: the acs result and the asr result
        """
        logger.info("start to process the audio content search")
        msg = self.get_asr_content(io.BytesIO(audio_data))
-        self.get_macthed_word(msg)
+        acs_result, asr_result = self.get_macthed_word(msg)
        logger.info(f'the asr result {asr_result}')
        logger.info(f'the acs result: {acs_result}')
        return acs_result, asr_result
--- a/paddlespeech/server/restful/acs_api.py
+++ b/paddlespeech/server/restful/acs_api.py
@ -0,0 +1,101 @@
 # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import base64
 from typing import Union
 from fastapi import APIRouter
 from paddlespeech.cli.log import logger
 from paddlespeech.server.engine.engine_pool import get_engine_pool
 from paddlespeech.server.restful.request import ASRRequest
 from paddlespeech.server.restful.response import ACSResponse
 from paddlespeech.server.restful.response import ErrorResponse
 from paddlespeech.server.utils.errors import ErrorCode
 from paddlespeech.server.utils.errors import failed_response
 from paddlespeech.server.utils.exception import ServerBaseException
 router = APIRouter()
@router.get('/paddlespeech/asr/search/help')
 def help():
    """help
    Returns:
        json: the audio content search result
    """
    response = {
        "success": "True",
        "code": 200,
        "message": {
            "global": "success"
        },
        "result": {
            "description": "acs server",
            "input": "base64 string of wavfile",
            "output": {
                "asr_result": "你好",
                "acs_result": [{
                    'w': '你',
                    'bg': 0.0,
                    'ed': 1.2
                }]
            }
        }
    }
    return response
@router.post(
    "/paddlespeech/asr/search",
    response_model=Union[ACSResponse, ErrorResponse])
 def acs(request_body: ASRRequest):
    """acs api 
    Args:
        request_body (ASRRequest): the acs request, we reuse the http ASRRequest
    Returns:
        json: the acs result
    """
    try:
        # 1. get the audio data via base64 decoding
        audio_data = base64.b64decode(request_body.audio)
        # 2. get single engine from engine pool
        engine_pool = get_engine_pool()
        acs_engine = engine_pool['acs']
        # 3. no data stored in acs_engine, so we need to create the another instance process the data
        acs_result, asr_result = acs_engine.run(audio_data)
        response = {
            "success": True,
            "code": 200,
            "message": {
                "description": "success"
            },
            "result": {
                "transcription": asr_result,
                "acs": acs_result
            }
        }
    except ServerBaseException as e:
        response = failed_response(e.error_code, e.msg)
    except BaseException as e:
        response = failed_response(ErrorCode.SERVER_UNKOWN_ERR)
        logger.error(e)
    return response
--- a/paddlespeech/server/restful/response.py
+++ b/paddlespeech/server/restful/response.py
@ -17,7 +17,7 @@ from pydantic import BaseModel
 __all__ = [
    'ASRResponse', 'TTSResponse', 'CLSResponse', 'TextResponse',
-    'VectorResponse', 'VectorScoreResponse'
+    'VectorResponse', 'VectorScoreResponse', 'ACSResponse'
 ]
@ -231,3 +231,32 @@ class ErrorResponse(BaseModel):
    success: bool
    code: int
    message: Message
 #****************************************************************************************/
 #************************************ ACS response **************************************/
 #****************************************************************************************/
 class AcsResult(BaseModel):
    transcription: str
    acs: list
 class ACSResponse(BaseModel):
    """
    response example
    {
        "success": true,
        "code": 0,
        "message": {
            "description": "success" 
        },
        "result": {
            "transcription": "你好，飞桨"
            "acs": [(你好, 0.0, 0.45)]
        }
    }
    """
    success: bool
    code: int
    message: Message
    result: AcsResult
--- a/paddlespeech/server/utils/audio_handler.py
+++ b/paddlespeech/server/utils/audio_handler.py
@ -96,7 +96,7 @@ class ASRWsAudioHandler:
        self.punc_server = TextHttpHandler(punc_server_ip, punc_server_port)
        logger.info(f"endpoint: {self.url}")
-    def read_wave(self, wavfile_path):
+    def read_wave(self, wavfile_path: str):
        """read the audio file from specific wavfile path
        Args:
@ -129,7 +129,7 @@ class ASRWsAudioHandler:
            x_chunk = padded_x[start:end]
            yield x_chunk
-    async def run(self, wavfile_path):
+    async def run(self, wavfile_path: str):
        """Send a audio file to online server
        Args:
--- a/paddlespeech/server/ws/asr_api.py
+++ b/paddlespeech/server/ws/asr_api.py
@ -12,24 +12,15 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import json
-import base64
+
 from typing import Union
 from fastapi import APIRouter
 from fastapi import WebSocket
 import soundfile
 import io
 from fastapi import WebSocketDisconnect
 from starlette.websockets import WebSocketState as WebSocketState
 from paddlespeech.cli.log import logger
 from paddlespeech.server.engine.asr.online.asr_engine import PaddleASRConnectionHanddler
 from paddlespeech.server.engine.engine_pool import get_engine_pool
 from paddlespeech.server.restful.response import ASRResponse
 from paddlespeech.server.restful.response import ErrorResponse
 from paddlespeech.server.restful.request import ASRRequest
 from paddlespeech.server.utils.exception import ServerBaseException
 from paddlespeech.server.utils.errors import failed_response
 from paddlespeech.server.utils.errors import ErrorCode
 router = APIRouter()
@ -117,54 +108,3 @@ async def websocket_endpoint(websocket: WebSocket):
                await websocket.send_json(resp)
    except WebSocketDisconnect as e:
        logger.error(e)
 # @router.post(
 #     "/paddlespeech/asr/search/", response_model=Union[ASRResponse, ErrorResponse])
 # def asr(request_body: ASRRequest):
 #     """asr api 
 #     Args:
 #         request_body (ASRRequest): [description]
 #     Returns:
 #         json: [description]
 #     """
 #     try:
 #         audio_data = base64.b64decode(request_body.audio)
 #         # get single engine from engine pool
 #         engine_pool = get_engine_pool()
 #         asr_engine = engine_pool['asr']
 #         samples, sample_rate = soundfile.read(io.BytesIO(audio_data), dtype='int16')
 #         # print(samples.shape)
 #         # print(sample_rate)
 #         connection_handler = PaddleASRConnectionHanddler(asr_engine)
 #         connection_handler.extract_feat(samples)
 #         connection_handler.decode(is_finished=True)
 #         asr_results = connection_handler.rescoring()
 #         asr_results = connection_handler.get_result()
 #         word_time_stamp = connection_handler.get_word_time_stamp()
 #         response = {
 #             "success": True,
 #             "code": 200,
 #             "message": {
 #                 "description": "success"
 #             },
 #             "result": {
 #                 "transcription": asr_results,
 #                 "times": word_time_stamp
 #             }
 #         }
 #     except ServerBaseException as e:
 #         response = failed_response(e.error_code, e.msg)
 #     except BaseException as e:
 #         response = failed_response(ErrorCode.SERVER_UNKOWN_ERR)
 #         print(e)
 #     return response