[wip][vec] add search result description, test=doc #1543

3 years ago · ad7ddf8a60
parent 508f2f5b62
commit ad7ddf8a60
3 changed files with 114 additions and 59 deletions
--- a/demos/audio_searching/README.md
+++ b/demos/audio_searching/README.md
@ -11,7 +11,7 @@ Audio retrieval (speech, music, speaker, etc.) enables querying and finding simi

 In this demo, you will learn how to build an audio retrieval system to retrieve similar sound snippets.  The uploaded audio clips are converted into vector data using paddlespeech-based pre-training models (audio classification model, speaker recognition model, etc.) and stored in Milvus.  Milvus automatically generates a unique ID for each vector, then stores the ID and the corresponding audio information (audio ID, audio speaker ID, etc.) in MySQL to complete the library construction.  During retrieval, users upload test audio to obtain vector, and then conduct vector similarity search in Milvus. The retrieval result returned by Milvus is vector ID, and the corresponding audio information can be queried in MySQL by ID

-![Workflow of an audio searching system](./img/audo_searching.png)
+![Workflow of an audio searching system](./img/audio_searching.png)

 Note：this demo uses the [CN-Celeb](http://openslr.org/82/) dataset of at least 650,000 audio entries and 3000 speakers to build the audio vector library, which is then retrieved using a preset distance calculation. The dataset can also use other,  Adjust as needed, e.g. Librispeech, VoxCeleb, UrbanSound, etc

@ -31,6 +31,7 @@ Creating milvus-minio    ... done
 Creating milvus-etcd     ... done
 Creating audio-mysql     ... done
 Creating milvus-standalone ... done
+Creating audio-webclient     ... done
 ```

 And show all containers with `docker ps`, and you can use `docker logs audio-mysql` to get the logs of server container
@ -41,7 +42,7 @@ b2bcf279e599  milvusdb/milvus:v2.0.1  "/tini -- milvus run…"  22 hours ago  Up
 d8ef4c84e25c  mysql:5.7 "docker-entrypoint.s…"  22 hours ago  Up 22 hours 0.0.0.0:3306->3306/tcp, 33060/tcp audio-mysql
 8fb501edb4f3  quay.io/coreos/etcd:v3.5.0  "etcd -advertise-cli…"  22 hours ago  Up 22 hours 2379-2380/tcp milvus-etcd
 ffce340b3790  minio/minio:RELEASE.2020-12-03T00-03-10Z  "/usr/bin/docker-ent…"  22 hours ago  Up 22 hours (healthy) 9000/tcp  milvus-minio
-
+15c84a506754  iregistry.baidu-int.com/paddlespeech/audio-search-client:1.0  "/bin/bash -c '/usr/…"  22 hours ago  Up 22 hours (healthy) 0.0.0.0:8068->80/tcp  audio-webclient
 ```

 ### 2. Start API Server
@ -49,79 +50,112 @@ Then to start the system server, and it provides HTTP backend services.

 - Install the Python packages

-```bash
-pip install -r requirements.txt
-```
+  ```bash
+  pip install -r requirements.txt
+  ```
 - Set configuration

-```bash
-vim src/config.py
-```
+  ```bash
+  vim src/config.py
+  ```

-Modify the parameters according to your own environment. Here listing some parameters that need to be set, for more information please refer to [config.py](./src/config.py).
+  Modify the parameters according to your own environment. Here listing some parameters that need to be set, for more information please refer to [config.py](./src/config.py).

-| **Parameter**    | **Description**                                       | **Default setting** |
-| ---------------- | ----------------------------------------------------- | ------------------- |
-| MILVUS_HOST      | The IP address of Milvus, you can get it by ifconfig. If running everything on one machine, most likely 127.0.0.1 | 127.0.0.1           |
-| MILVUS_PORT      | Port of Milvus.                                       | 19530               |
-| VECTOR_DIMENSION | Dimension of the vectors.                             | 2048                |
-| MYSQL_HOST       | The IP address of Mysql.                              | 127.0.0.1           |
-| MYSQL_PORT       | Port of Milvus.                                       | 3306                |
-| DEFAULT_TABLE    | The milvus and mysql default collection name.         | audio_table          |
+  | **Parameter**    | **Description**                                       | **Default setting** |
+  | ---------------- | ----------------------------------------------------- | ------------------- |
+  | MILVUS_HOST      | The IP address of Milvus, you can get it by ifconfig. If running everything on one machine, most likely 127.0.0.1 | 127.0.0.1           |
+  | MILVUS_PORT      | Port of Milvus.                                       | 19530               |
+  | VECTOR_DIMENSION | Dimension of the vectors.                             | 2048                |
+  | MYSQL_HOST       | The IP address of Mysql.                              | 127.0.0.1           |
+  | MYSQL_PORT       | Port of Milvus.                                       | 3306                |
+  | DEFAULT_TABLE    | The milvus and mysql default collection name.         | audio_table          |

 - Run the code

-Then start the server with Fastapi.
+  Then start the server with Fastapi.

-```bash
-python src/main.py
-```
+  ```bash
+  python src/main.py
+  ```

-Then you will see the Application is started:
+  Then you will see the Application is started:

-```bash
-INFO:     Started server process [3949]
-2022-03-07 17:39:14,864 ｜ INFO ｜ server.py ｜ serve ｜ 75 ｜ Started server process [3949]
-INFO:     Waiting for application startup.
-2022-03-07 17:39:14,865 ｜ INFO ｜ on.py ｜ startup ｜ 45 ｜ Waiting for application startup.
-INFO:     Application startup complete.
-2022-03-07 17:39:14,866 ｜ INFO ｜ on.py ｜ startup ｜ 59 ｜ Application startup complete.
-INFO:     Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit)
-2022-03-07 17:39:14,867 ｜ INFO ｜ server.py ｜ _log_started_message ｜ 206 ｜ Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit)
-```
+  ```bash
+  INFO:     Started server process [3949]
+  2022-03-07 17:39:14,864 ｜ INFO ｜ server.py ｜ serve ｜ 75 ｜ Started server process [3949]
+  INFO:     Waiting for application startup.
+  2022-03-07 17:39:14,865 ｜ INFO ｜ on.py ｜ startup ｜ 45 ｜ Waiting for application startup.
+  INFO:     Application startup complete.
+  2022-03-07 17:39:14,866 ｜ INFO ｜ on.py ｜ startup ｜ 59 ｜ Application startup complete.
+  INFO:     Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit)
+  2022-03-07 17:39:14,867 ｜ INFO ｜ server.py ｜ _log_started_message ｜ 206 ｜ Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit)
+  ```

 ### 3. Usage
 - Prepare data
  ```bash
  wget -c https://www.openslr.org/resources/82/cn-celeb_v2.tar.gz && tar -xvf cn-celeb_v2.tar.gz 
  ```
-  Note: If you want to build a quick demo, you can use ./src/test_main.py:download_audio_data function, it download 20 audio files , Subsequent results show this collection as an example
+  Note: If you want to build a quick demo, you can use ./src/test_main.py:download_audio_data function, it downloads 20 audio files , Subsequent results show this collection as an example
 
- - Run
-  The internal process is downloading data, loading the Paddlespeech model, extracting embedding, storing library, retrieving and deleting library  
-  ```bash
-  python ./src/test_main.py
-  ```
+ - scripts test (recommend!)

-  Output：
-  ```bash
-  Checkpoint path: %your model path%
-  Extracting feature from audio No. 1 , 20 audios in total
-  Extracting feature from audio No. 2 , 20 audios in total
-  ...
-  2022-03-09 17:22:13,870 ｜ INFO ｜ main.py ｜ load_audios ｜ 85 ｜ Successfully loaded data, total count: 20
-  2022-03-09 17:22:13,898 ｜ INFO ｜ main.py ｜ count_audio ｜ 147 ｜ Successfully count the number of data!
-  2022-03-09 17:22:13,918 ｜ INFO ｜ main.py ｜ audio_path ｜ 57 ｜ Successfully load audio: ./example_audio/test.wav
-  ...
-  2022-03-09 17:22:32,580 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 131 ｜ search result http://testserver/data?audio_path=./example_audio/test.wav, distance 0.0
-  2022-03-09 17:22:32,580 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 131 ｜ search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, distance 0.021805256605148315
-  2022-03-09 17:22:32,580 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 131 ｜ search result http://testserver/data?audio_path=./example_audio/knife_cut_into_flesh.wav, distance 0.052762262523174286
-  ...
-  2022-03-09 17:22:32,582 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 135 ｜ Successfully searched similar audio!
-  2022-03-09 17:22:33,658 ｜ INFO ｜ main.py ｜ drop_tables ｜ 159 ｜ Successfully drop tables in Milvus and MySQL!
-  ```
+    The internal process is downloading data, loading the Paddlespeech model, extracting embedding, storing library, retrieving and deleting library  
+    ```bash
+    python ./src/test_main.py
+    ```
+
+    Output：
+    ```bash
+    Checkpoint path: %your model path%
+    Extracting feature from audio No. 1 , 20 audios in total
+    Extracting feature from audio No. 2 , 20 audios in total
+    ...
+    2022-03-09 17:22:13,870 ｜ INFO ｜ main.py ｜ load_audios ｜ 85 ｜ Successfully loaded data, total count: 20
+    2022-03-09 17:22:13,898 ｜ INFO ｜ main.py ｜ count_audio ｜ 147 ｜ Successfully count the number of data!
+    2022-03-09 17:22:13,918 ｜ INFO ｜ main.py ｜ audio_path ｜ 57 ｜ Successfully load audio: ./example_audio/test.wav
+    ...
+    2022-03-09 17:22:32,580 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 131 ｜ search result http://testserver/data?audio_path=./example_audio/test.wav, distance 0.0
+    2022-03-09 17:22:32,580 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 131 ｜ search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, distance 0.021805256605148315
+    2022-03-09 17:22:32,580 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 131 ｜ search result http://testserver/data?audio_path=./example_audio/knife_cut_into_flesh.wav, distance 0.052762262523174286
+    ...
+    2022-03-09 17:22:32,582 ｜ INFO ｜ main.py ｜ search_local_audio ｜ 135 ｜ Successfully searched similar audio!
+    2022-03-09 17:22:33,658 ｜ INFO ｜ main.py ｜ drop_tables ｜ 159 ｜ Successfully drop tables in Milvus and MySQL!
+    ```
+- GUI test (optional)
+  
+    Navigate to 127.0.0.1:8068 in your browser to access the front-end interface.
+    - Insert data
+
+      Download the data and decompress it to a path named /home/speech/data. Then enter /home/speech/data in the address bar of the upload page to upload the data  
+    
+      ![](./img/insert.png)
+
+    - Search for similar audio
+
+      Select the magnifying glass icon on the left side of the interface. Then, press the "Default Target Audio File" button and upload a .wav sound file you'd like to search. Results will be displayed
+
+      ![](./img/search.png)
+
+### 4.Result
+
+ machine configuration：
+- OS: CentOS release 7.6 
+- kernel：4.17.11-1.el7.elrepo.x86_64
+- CPU：Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 
+- memory：132G
+
+dataset：
+- CN-Celeb, train size 650,000, test size 10,000, dimention 256, distance L2
+
+recall and elapsed time statistics are shown in the following figure：
+
+  ![](./img/result.png)
+
+
+Compared with other algorithms, the retrieval framework based on Milvus ranks in the middle in terms of speed and performance. Under the premise of 90% recall rate, the retrieval time is about 2.9 milliseconds, which can meet most application scenarios

-### 4.Pretrained Models
+### 5.Pretrained Models

 Here is a list of pretrained models released by PaddleSpeech :

--- a/demos/audio_searching/README_cn.md
+++ b/demos/audio_searching/README_cn.md
@ -12,9 +12,9 @@

 在本 demo 中，你将学会如何构建一个音频检索系统，用来检索相似的声音片段。使用基于 PaddleSpeech 预训练模型（音频分类模型，说话人识别模型等）将上传的音频片段转换为向量数据，并存储在 Milvus 中。Milvus 自动为每个向量生成唯一的 ID，然后将 ID 和 相应的音频信息（音频id，音频的说话人id等等）存储在 MySQL，这样就完成建库的工作。用户在检索时，上传测试音频，得到向量，然后在 Milvus 中进行向量相似度搜索，Milvus 返回的检索结果为向量 ID，通过 ID 在 MySQL 内部查询相应的音频信息即可

-![音频检索程图](./img/audio_searching.png)
+![音频检索流程图](./img/audio_searching.png)

-注：该 demo 使用 [CN-Celeb](http://openslr.org/82/) 数据集，包括至少 650000 条音频，3000 个说话人，来建立音频向量库（音频特征，或音频说话人特征），然后通过预设的距离计算方式进行音频（或说话人）检索，这里面数据集也可以使用其他的，根据需要调整，如Librispeech，VoxCeleb，UrbanSound等
+注：该 demo 使用 [CN-Celeb](http://openslr.org/82/) 数据集，包括至少 650000 条音频，3000 个说话人，来建立音频向量库（音频特征，或音频说话人特征），然后通过预设的距离计算方式进行音频（或说话人）检索，这里面数据集也可以使用其他的，根据需要调整，如Librispeech，VoxCeleb，UrbanSound，GloVe，MNIST等

 ## 使用方法
 ### 1. MySQL 和 Milvus 安装
@ -129,13 +129,34 @@ ffce340b3790  minio/minio:RELEASE.2020-12-03T00-03-10Z  "/usr/bin/docker-ent…"
    在浏览器中输入 127.0.0.1:8068 访问前端页面
    - 上传音频
    
+      下载数据并解压到一文件夹，假设为 /home/speech/data，那么在上传页面地址栏输入 /home/speech/data 进行数据上传
+    
      ![](./img/insert.png)

    - 检索相似音频

+      选择左上角放大镜，点击 “Default Target Audio File” 按钮，上传测试音频，接着你将看到检索结果
+
      ![](./img/search.png)

-### 4. 预训练模型
+### 4. 结果
+
+机器配置：
+- 操作系统: CentOS release 7.6 
+- 内核：4.17.11-1.el7.elrepo.x86_64
+- 处理器：Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 
+- 内存：132G
+
+数据集：
+- CN-Celeb, 训练集 65万, 测试集 1万，向量维度 256，距离 L2
+
+召回和耗时统计如下图：
+
+  ![](./img/result.png)
+
+和其他算法比较，基于 milvus 的检索框架在速度与性能排名居中，在召回率 90% 的前提下，检索耗时约 2.9 毫秒，可以满足大多数应用场景
+
+### 5. 预训练模型

 以下是 PaddleSpeech 提供的预训练模型列表：

--- a/demos/audio_searching/img/result.png
+++ b/demos/audio_searching/img/result.png