diff --git a/README.md b/README.md
index 1144d3ab5..5093dbd67 100644
--- a/README.md
+++ b/README.md
@@ -180,7 +180,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
--->
- 👏🏻 2022.03.28: PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech.
-- 👏🏻 2022.03.28: PaddleSpeech CLI is available for Speaker Verfication.
+- 👏🏻 2022.03.28: PaddleSpeech CLI is available for Speaker Verification.
- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
- 👏🏻 2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.
@@ -280,10 +280,14 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
For more information about server command lines, please see: [speech server demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
+
+
## Model List
PaddleSpeech supports a series of most popular models. They are summarized in [released models](./docs/source/released_model.md) and attached with available pretrained models.
+
+
**Speech-to-Text** contains *Acoustic Model*, *Language Model*, and *Speech Translation*, with the following details:
@@ -357,6 +361,8 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
+
+
**Text-to-Speech** in PaddleSpeech mainly contains three modules: *Text Frontend*, *Acoustic Model* and *Vocoder*. Acoustic Model and Vocoder models are listed as follow:
@@ -457,10 +463,10 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
- GE2E + Tactron2 |
+ GE2E + Tacotron2 |
AISHELL-3 |
- ge2e-tactron2-aishell3
+ ge2e-tacotron2-aishell3
|
@@ -473,6 +479,8 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
+
+
**Audio Classification**
@@ -496,6 +504,8 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
+
+
**Speaker Verification**
@@ -519,6 +529,8 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
+
+
**Punctuation Restoration**
@@ -559,10 +571,18 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht
- [Advanced Usage](./docs/source/tts/advanced_usage.md)
- [Chinese Rule Based Text Frontend](./docs/source/tts/zh_text_frontend.md)
- [Test Audio Samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html)
+ - Speaker Verification
+ - [Audio Searching](./demos/audio_searching/README.md)
+ - [Speaker Verification](./demos/speaker_verification/README.md)
- [Audio Classification](./demos/audio_tagging/README.md)
- - [Speaker Verification](./demos/speaker_verification/README.md)
- [Speech Translation](./demos/speech_translation/README.md)
+ - [Speech Server](./demos/speech_server/README.md)
- [Released Models](./docs/source/released_model.md)
+ - [Speech-to-Text](#SpeechToText)
+ - [Text-to-Speech](#TextToSpeech)
+ - [Audio Classification](#AudioClassification)
+ - [Speaker Verification](#SpeakerVerification)
+ - [Punctuation Restoration](#PunctuationRestoration)
- [Community](#Community)
- [Welcome to contribute](#contribution)
- [License](#License)
diff --git a/README_cn.md b/README_cn.md
index ab4ce6e6b..5dab7fa0c 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -273,6 +273,8 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
## 模型列表
PaddleSpeech 支持很多主流的模型,并提供了预训练模型,详情请见[模型列表](./docs/source/released_model.md)。
+
+
PaddleSpeech 的 **语音转文本** 包含语音识别声学模型、语音识别语言模型和语音翻译, 详情如下:
@@ -347,6 +349,7 @@ PaddleSpeech 的 **语音转文本** 包含语音识别声学模型、语音识
+
PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声学模型和声码器。声学模型和声码器模型如下:
+
+
**声纹识别**
@@ -511,6 +516,8 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
+
+
**标点恢复**
@@ -556,13 +563,18 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- [进阶用法](./docs/source/tts/advanced_usage.md)
- [中文文本前端](./docs/source/tts/zh_text_frontend.md)
- [测试语音样本](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html)
+ - 声纹识别
+ - [声纹识别](./demos/speaker_verification/README_cn.md)
+ - [音频检索](./demos/audio_searching/README_cn.md)
- [声音分类](./demos/audio_tagging/README_cn.md)
- - [声纹识别](./demos/speaker_verification/README_cn.md)
- [语音翻译](./demos/speech_translation/README_cn.md)
+ - [服务化部署](./demos/speech_server/README_cn.md)
- [模型列表](#模型列表)
- [语音识别](#语音识别模型)
- [语音合成](#语音合成模型)
- [声音分类](#声音分类模型)
+ - [声纹识别](#声纹识别模型)
+ - [标点恢复](#标点恢复模型)
- [技术交流群](#技术交流群)
- [欢迎贡献](#欢迎贡献)
- [License](#License)
diff --git a/dataset/rir_noise/rir_noise.py b/dataset/rir_noise/rir_noise.py
index e7b122890..009175e5b 100644
--- a/dataset/rir_noise/rir_noise.py
+++ b/dataset/rir_noise/rir_noise.py
@@ -34,14 +34,14 @@ from utils.utility import unzip
DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset/speech')
-URL_ROOT = 'http://www.openslr.org/resources/28'
+URL_ROOT = '--no-check-certificate http://www.openslr.org/resources/28'
DATA_URL = URL_ROOT + '/rirs_noises.zip'
MD5_DATA = 'e6f48e257286e05de56413b4779d8ffb'
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--target_dir",
- default=DATA_HOME + "/Aishell",
+ default=DATA_HOME + "/rirs_noise",
type=str,
help="Directory to save the dataset. (default: %(default)s)")
parser.add_argument(
@@ -81,6 +81,10 @@ def create_manifest(data_dir, manifest_path_prefix):
},
ensure_ascii=False))
manifest_path = manifest_path_prefix + '.' + dtype
+
+ if not os.path.exists(os.path.dirname(manifest_path)):
+ os.makedirs(os.path.dirname(manifest_path))
+
with codecs.open(manifest_path, 'w', 'utf-8') as fout:
for line in json_lines:
fout.write(line + '\n')
diff --git a/dataset/voxceleb/voxceleb1.py b/dataset/voxceleb/voxceleb1.py
index 905862008..95827f708 100644
--- a/dataset/voxceleb/voxceleb1.py
+++ b/dataset/voxceleb/voxceleb1.py
@@ -149,7 +149,7 @@ def prepare_dataset(base_url, data_list, target_dir, manifest_path,
# we will download the voxceleb1 data to ${target_dir}/vox1/dev/ or ${target_dir}/vox1/test directory
if not os.path.exists(os.path.join(target_dir, "wav")):
# download all dataset part
- print("start to download the vox1 dev zip package")
+ print(f"start to download the vox1 zip package to {target_dir}")
for zip_part in data_list.keys():
download_url = " --no-check-certificate " + base_url + "/" + zip_part
download(
diff --git a/dataset/voxceleb/voxceleb2.py b/dataset/voxceleb/voxceleb2.py
index 22a2e2ffe..fe9e8b9c8 100644
--- a/dataset/voxceleb/voxceleb2.py
+++ b/dataset/voxceleb/voxceleb2.py
@@ -22,10 +22,12 @@ import codecs
import glob
import json
import os
+import subprocess
from pathlib import Path
import soundfile
+from utils.utility import check_md5sum
from utils.utility import download
from utils.utility import unzip
@@ -35,12 +37,22 @@ DATA_HOME = os.path.expanduser('.')
BASE_URL = "--no-check-certificate https://www.robots.ox.ac.uk/~vgg/data/voxceleb/data/"
# dev data
-DEV_DATA_URL = BASE_URL + '/vox2_aac.zip'
-DEV_MD5SUM = "bbc063c46078a602ca71605645c2a402"
+DEV_LIST = {
+ "vox2_dev_aac_partaa": "da070494c573e5c0564b1d11c3b20577",
+ "vox2_dev_aac_partab": "17fe6dab2b32b48abaf1676429cdd06f",
+ "vox2_dev_aac_partac": "1de58e086c5edf63625af1cb6d831528",
+ "vox2_dev_aac_partad": "5a043eb03e15c5a918ee6a52aad477f9",
+ "vox2_dev_aac_partae": "cea401b624983e2d0b2a87fb5d59aa60",
+ "vox2_dev_aac_partaf": "fc886d9ba90ab88e7880ee98effd6ae9",
+ "vox2_dev_aac_partag": "d160ecc3f6ee3eed54d55349531cb42e",
+ "vox2_dev_aac_partah": "6b84a81b9af72a9d9eecbb3b1f602e65",
+}
+
+DEV_TARGET_DATA = "vox2_dev_aac_parta* vox2_dev_aac.zip bbc063c46078a602ca71605645c2a402"
# test data
-TEST_DATA_URL = BASE_URL + '/vox2_test_aac.zip'
-TEST_MD5SUM = "0d2b3ea430a821c33263b5ea37ede312"
+TEST_LIST = {"vox2_test_aac.zip": "0d2b3ea430a821c33263b5ea37ede312"}
+TEST_TARGET_DATA = "vox2_test_aac.zip vox2_test_aac.zip 0d2b3ea430a821c33263b5ea37ede312"
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
@@ -68,6 +80,14 @@ args = parser.parse_args()
def create_manifest(data_dir, manifest_path_prefix):
+ """Generate the voxceleb2 dataset manifest file.
+ We will create the ${manifest_path_prefix}.vox2 as the final manifest file
+ The dev and test wav info will be put in one manifest file.
+
+ Args:
+ data_dir (str): voxceleb2 wav directory, which include dev and test subdataset
+ manifest_path_prefix (str): manifest file prefix
+ """
print("Creating manifest %s ..." % manifest_path_prefix)
json_lines = []
data_path = os.path.join(data_dir, "**", "*.wav")
@@ -119,7 +139,19 @@ def create_manifest(data_dir, manifest_path_prefix):
print(f"{total_sec / total_num} sec/utt", file=f)
-def download_dataset(url, md5sum, target_dir, dataset):
+def download_dataset(base_url, data_list, target_data, target_dir, dataset):
+ """Download the voxceleb2 zip package
+
+ Args:
+ base_url (str): the voxceleb2 dataset download baseline url
+ data_list (dict): the dataset part zip package and the md5 value
+ target_data (str): the final dataset zip info
+ target_dir (str): the dataset stored directory
+ dataset (str): the dataset name, dev or test
+
+ Raises:
+ RuntimeError: the md5sum occurs error
+ """
if not os.path.exists(target_dir):
os.makedirs(target_dir)
@@ -129,9 +161,34 @@ def download_dataset(url, md5sum, target_dir, dataset):
# but the test dataset will unzip to aac
# so, wo create the ${target_dir}/test and unzip the m4a to test dir
if not os.path.exists(os.path.join(target_dir, dataset)):
- filepath = download(url, md5sum, target_dir)
+ print(f"start to download the vox2 zip package to {target_dir}")
+ for zip_part in data_list.keys():
+ download_url = " --no-check-certificate " + base_url + "/" + zip_part
+ download(
+ url=download_url,
+ md5sum=data_list[zip_part],
+ target_dir=target_dir)
+
+ # pack the all part to target zip file
+ all_target_part, target_name, target_md5sum = target_data.split()
+ target_name = os.path.join(target_dir, target_name)
+ if not os.path.exists(target_name):
+ pack_part_cmd = "cat {}/{} > {}".format(target_dir, all_target_part,
+ target_name)
+ subprocess.call(pack_part_cmd, shell=True)
+
+ # check the target zip file md5sum
+ if not check_md5sum(target_name, target_md5sum):
+ raise RuntimeError("{} MD5 checkssum failed".format(target_name))
+ else:
+ print("Check {} md5sum successfully".format(target_name))
+
if dataset == "test":
- unzip(filepath, os.path.join(target_dir, "test"))
+ # we need make the test directory
+ unzip(target_name, os.path.join(target_dir, "test"))
+ else:
+ # upzip dev zip pacakge and will create the dev directory
+ unzip(target_name, target_dir)
def main():
@@ -142,14 +199,16 @@ def main():
print("download: {}".format(args.download))
if args.download:
download_dataset(
- url=DEV_DATA_URL,
- md5sum=DEV_MD5SUM,
+ base_url=BASE_URL,
+ data_list=DEV_LIST,
+ target_data=DEV_TARGET_DATA,
target_dir=args.target_dir,
dataset="dev")
download_dataset(
- url=TEST_DATA_URL,
- md5sum=TEST_MD5SUM,
+ base_url=BASE_URL,
+ data_list=TEST_LIST,
+ target_data=TEST_TARGET_DATA,
target_dir=args.target_dir,
dataset="test")
diff --git a/demos/speaker_verification/README.md b/demos/speaker_verification/README.md
index 8739d402d..7d7180ae9 100644
--- a/demos/speaker_verification/README.md
+++ b/demos/speaker_verification/README.md
@@ -30,6 +30,11 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
paddlespeech vector --task spk --input vec.job
echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk
+
+ paddlespeech vector --task score --input "./85236145389.wav ./123456789.wav"
+
+ echo -e "demo4 85236145389.wav 85236145389.wav \n demo5 85236145389.wav 123456789.wav" > vec.job
+ paddlespeech vector --task score --input vec.job
```
Usage:
@@ -38,6 +43,7 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
```
Arguments:
- `input`(required): Audio file to recognize.
+ - `task` (required): Specify `vector` task. Default `spk`。
- `model`: Model type of vector task. Default: `ecapatdnn_voxceleb12`.
- `sample_rate`: Sample rate of the model. Default: `16000`.
- `config`: Config of vector task. Use pretrained model when it is None. Default: `None`.
@@ -47,45 +53,45 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
Output:
```bash
- demo [ -5.749211 9.505463 -8.200284 -5.2075014 5.3940268
- -3.04878 1.611095 10.127234 -10.534177 -15.821609
- 1.2032688 -0.35080156 1.2629458 -12.643498 -2.5758228
- -11.343508 2.3385992 -8.719341 14.213509 15.404744
- -0.39327756 6.338786 2.688887 8.7104025 17.469526
- -8.77959 7.0576906 4.648855 -1.3089896 -23.294737
- 8.013747 13.891729 -9.926753 5.655307 -5.9422326
- -22.842539 0.6293588 -18.46266 -10.811862 9.8192625
- 3.0070958 3.8072643 -2.3861165 3.0821571 -14.739942
- 1.7594414 -0.6485091 4.485623 2.0207152 7.264915
- -6.40137 23.63524 2.9711294 -22.708025 9.93719
- 20.354511 -10.324688 -0.700492 -8.783211 -5.27593
- 15.999649 3.3004563 12.747926 15.429879 4.7849145
- 5.6699696 -2.3826702 10.605882 3.9112158 3.1500628
- 15.859915 -2.1832209 -23.908653 -6.4799504 -4.5365124
- -9.224193 14.568347 -10.568833 4.982321 -4.342062
- 0.0914714 12.645902 -5.74285 -3.2141201 -2.7173362
- -6.680575 0.4757669 -5.035051 -6.7964664 16.865469
- -11.54324 7.681869 0.44475392 9.708182 -8.932846
- 0.4123232 -4.361452 1.3948607 9.511665 0.11667654
- 2.9079323 6.049952 9.275183 -18.078873 6.2983274
- -0.7500531 -2.725033 -7.6027865 3.3404543 2.990815
- 4.010979 11.000591 -2.8873312 7.1352735 -16.79663
- 18.495346 -14.293832 7.89578 2.2714825 22.976387
- -4.875734 -3.0836344 -2.9999814 13.751918 6.448228
- -11.924197 2.171869 2.0423572 -6.173772 10.778437
- 25.77281 -4.9495463 14.57806 0.3044315 2.6132357
- -7.591999 -2.076944 9.025118 1.7834753 -3.1799617
- -4.9401326 23.465864 5.1685796 -9.018578 9.037825
- -4.4150195 6.859591 -12.274467 -0.88911164 5.186309
- -3.9988663 -13.638606 -9.925445 -0.06329413 -3.6709652
- -12.397416 -12.719869 -1.395601 2.1150916 5.7381287
- -4.4691963 -3.82819 -0.84233856 -1.1604277 -13.490127
- 8.731719 -20.778936 -11.495662 5.8033476 -4.752041
- 10.833007 -6.717991 4.504732 13.4244375 1.1306485
- 7.3435574 1.400918 14.704036 -9.501399 7.2315617
- -6.417456 1.3333273 11.872697 -0.30664724 8.8845
- 6.5569253 4.7948146 0.03662816 -8.704245 6.224871
- -3.2701402 -11.508579 ]
+ demo [ 1.4217498 5.626253 -5.342073 1.1773866 3.308055
+ 1.756596 5.167894 10.80636 -3.8226728 -5.6141334
+ 2.623845 -0.8072968 1.9635103 -7.3128724 0.01103897
+ -9.723131 0.6619743 -6.976803 10.213478 7.494748
+ 2.9105635 3.8949256 3.7999806 7.1061673 16.905321
+ -7.1493764 8.733103 3.4230042 -4.831653 -11.403367
+ 11.232214 7.1274667 -4.2828417 2.452362 -5.130748
+ -18.177666 -2.6116815 -11.000337 -6.7314315 1.6564683
+ 0.7618269 1.1253023 -2.083836 4.725744 -8.782597
+ -3.539873 3.814236 5.1420674 2.162061 4.096431
+ -6.4162116 12.747448 1.9429878 -15.152943 6.417416
+ 16.097002 -9.716668 -1.9920526 -3.3649497 -1.871939
+ 11.567354 3.69788 11.258265 7.442363 9.183411
+ 4.5281515 -1.2417862 4.3959084 6.6727695 5.8898783
+ 7.627124 -0.66919386 -11.889693 -9.208865 -7.4274073
+ -3.7776625 6.917234 -9.848748 -2.0944717 -5.135116
+ 0.49563864 9.317534 -5.9141874 -1.8098574 -0.11738578
+ -7.169265 -1.0578263 -5.7216787 -5.1173844 16.137651
+ -4.473626 7.6624317 -0.55381083 9.631587 -6.4704556
+ -8.548508 4.3716145 -0.79702514 4.478997 -2.9758704
+ 3.272176 2.8382776 5.134597 -9.190781 -0.5657382
+ -4.8745747 2.3165567 -5.984303 -2.1798875 0.35541576
+ -0.31784213 9.493548 2.1144536 4.358092 -12.089823
+ 8.451689 -7.925461 4.6242585 4.4289427 18.692003
+ -2.6204622 -5.149185 -0.35821092 8.488551 4.981496
+ -9.32683 -2.2544234 6.6417594 1.2119585 10.977129
+ 16.555033 3.3238444 9.551863 -1.6676947 -0.79539716
+ -8.605674 -0.47356385 2.6741948 -5.359179 -2.6673796
+ 0.66607 15.443222 4.740594 -3.4725387 11.592567
+ -2.054497 1.7361217 -8.265324 -9.30447 5.4068313
+ -1.5180256 -7.746615 -6.089606 0.07112726 -0.34904733
+ -8.649895 -9.998958 -2.564841 -0.53999114 2.601808
+ -0.31927416 -1.8815292 -2.07215 -3.4105783 -8.2998085
+ 1.483641 -15.365992 -8.288208 3.8847756 -3.4876456
+ 7.3629923 0.4657332 3.132599 12.438889 -1.8337058
+ 4.532936 2.7264361 10.145339 -6.521951 2.897153
+ -3.3925855 5.079156 7.759716 4.677565 5.8457737
+ 2.402413 7.7071047 3.9711342 -6.390043 6.1268735
+ -3.7760346 -11.118123 ]
```
- Python API
@@ -97,56 +103,113 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
audio_emb = vector_executor(
model='ecapatdnn_voxceleb12',
sample_rate=16000,
- config=None,
+ config=None, # Set `config` and `ckpt_path` to None to use pretrained model.
ckpt_path=None,
audio_file='./85236145389.wav',
- force_yes=False,
device=paddle.get_device())
print('Audio embedding Result: \n{}'.format(audio_emb))
+
+ test_emb = vector_executor(
+ model='ecapatdnn_voxceleb12',
+ sample_rate=16000,
+ config=None, # Set `config` and `ckpt_path` to None to use pretrained model.
+ ckpt_path=None,
+ audio_file='./123456789.wav',
+ device=paddle.get_device())
+ print('Test embedding Result: \n{}'.format(test_emb))
+
+ # score range [0, 1]
+ score = vector_executor.get_embeddings_score(audio_emb, test_emb)
+ print(f"Eembeddings Score: {score}")
```
- Output:
+ Output:
+
```bash
# Vector Result:
- [ -5.749211 9.505463 -8.200284 -5.2075014 5.3940268
- -3.04878 1.611095 10.127234 -10.534177 -15.821609
- 1.2032688 -0.35080156 1.2629458 -12.643498 -2.5758228
- -11.343508 2.3385992 -8.719341 14.213509 15.404744
- -0.39327756 6.338786 2.688887 8.7104025 17.469526
- -8.77959 7.0576906 4.648855 -1.3089896 -23.294737
- 8.013747 13.891729 -9.926753 5.655307 -5.9422326
- -22.842539 0.6293588 -18.46266 -10.811862 9.8192625
- 3.0070958 3.8072643 -2.3861165 3.0821571 -14.739942
- 1.7594414 -0.6485091 4.485623 2.0207152 7.264915
- -6.40137 23.63524 2.9711294 -22.708025 9.93719
- 20.354511 -10.324688 -0.700492 -8.783211 -5.27593
- 15.999649 3.3004563 12.747926 15.429879 4.7849145
- 5.6699696 -2.3826702 10.605882 3.9112158 3.1500628
- 15.859915 -2.1832209 -23.908653 -6.4799504 -4.5365124
- -9.224193 14.568347 -10.568833 4.982321 -4.342062
- 0.0914714 12.645902 -5.74285 -3.2141201 -2.7173362
- -6.680575 0.4757669 -5.035051 -6.7964664 16.865469
- -11.54324 7.681869 0.44475392 9.708182 -8.932846
- 0.4123232 -4.361452 1.3948607 9.511665 0.11667654
- 2.9079323 6.049952 9.275183 -18.078873 6.2983274
- -0.7500531 -2.725033 -7.6027865 3.3404543 2.990815
- 4.010979 11.000591 -2.8873312 7.1352735 -16.79663
- 18.495346 -14.293832 7.89578 2.2714825 22.976387
- -4.875734 -3.0836344 -2.9999814 13.751918 6.448228
- -11.924197 2.171869 2.0423572 -6.173772 10.778437
- 25.77281 -4.9495463 14.57806 0.3044315 2.6132357
- -7.591999 -2.076944 9.025118 1.7834753 -3.1799617
- -4.9401326 23.465864 5.1685796 -9.018578 9.037825
- -4.4150195 6.859591 -12.274467 -0.88911164 5.186309
- -3.9988663 -13.638606 -9.925445 -0.06329413 -3.6709652
- -12.397416 -12.719869 -1.395601 2.1150916 5.7381287
- -4.4691963 -3.82819 -0.84233856 -1.1604277 -13.490127
- 8.731719 -20.778936 -11.495662 5.8033476 -4.752041
- 10.833007 -6.717991 4.504732 13.4244375 1.1306485
- 7.3435574 1.400918 14.704036 -9.501399 7.2315617
- -6.417456 1.3333273 11.872697 -0.30664724 8.8845
- 6.5569253 4.7948146 0.03662816 -8.704245 6.224871
- -3.2701402 -11.508579 ]
+ Audio embedding Result:
+ [ 1.4217498 5.626253 -5.342073 1.1773866 3.308055
+ 1.756596 5.167894 10.80636 -3.8226728 -5.6141334
+ 2.623845 -0.8072968 1.9635103 -7.3128724 0.01103897
+ -9.723131 0.6619743 -6.976803 10.213478 7.494748
+ 2.9105635 3.8949256 3.7999806 7.1061673 16.905321
+ -7.1493764 8.733103 3.4230042 -4.831653 -11.403367
+ 11.232214 7.1274667 -4.2828417 2.452362 -5.130748
+ -18.177666 -2.6116815 -11.000337 -6.7314315 1.6564683
+ 0.7618269 1.1253023 -2.083836 4.725744 -8.782597
+ -3.539873 3.814236 5.1420674 2.162061 4.096431
+ -6.4162116 12.747448 1.9429878 -15.152943 6.417416
+ 16.097002 -9.716668 -1.9920526 -3.3649497 -1.871939
+ 11.567354 3.69788 11.258265 7.442363 9.183411
+ 4.5281515 -1.2417862 4.3959084 6.6727695 5.8898783
+ 7.627124 -0.66919386 -11.889693 -9.208865 -7.4274073
+ -3.7776625 6.917234 -9.848748 -2.0944717 -5.135116
+ 0.49563864 9.317534 -5.9141874 -1.8098574 -0.11738578
+ -7.169265 -1.0578263 -5.7216787 -5.1173844 16.137651
+ -4.473626 7.6624317 -0.55381083 9.631587 -6.4704556
+ -8.548508 4.3716145 -0.79702514 4.478997 -2.9758704
+ 3.272176 2.8382776 5.134597 -9.190781 -0.5657382
+ -4.8745747 2.3165567 -5.984303 -2.1798875 0.35541576
+ -0.31784213 9.493548 2.1144536 4.358092 -12.089823
+ 8.451689 -7.925461 4.6242585 4.4289427 18.692003
+ -2.6204622 -5.149185 -0.35821092 8.488551 4.981496
+ -9.32683 -2.2544234 6.6417594 1.2119585 10.977129
+ 16.555033 3.3238444 9.551863 -1.6676947 -0.79539716
+ -8.605674 -0.47356385 2.6741948 -5.359179 -2.6673796
+ 0.66607 15.443222 4.740594 -3.4725387 11.592567
+ -2.054497 1.7361217 -8.265324 -9.30447 5.4068313
+ -1.5180256 -7.746615 -6.089606 0.07112726 -0.34904733
+ -8.649895 -9.998958 -2.564841 -0.53999114 2.601808
+ -0.31927416 -1.8815292 -2.07215 -3.4105783 -8.2998085
+ 1.483641 -15.365992 -8.288208 3.8847756 -3.4876456
+ 7.3629923 0.4657332 3.132599 12.438889 -1.8337058
+ 4.532936 2.7264361 10.145339 -6.521951 2.897153
+ -3.3925855 5.079156 7.759716 4.677565 5.8457737
+ 2.402413 7.7071047 3.9711342 -6.390043 6.1268735
+ -3.7760346 -11.118123 ]
+ # get the test embedding
+ Test embedding Result:
+ [ -1.902964 2.0690894 -8.034194 3.5472693 0.18089125
+ 6.9085927 1.4097427 -1.9487704 -10.021278 -0.20755845
+ -8.04332 4.344489 2.3200977 -14.306299 5.184692
+ -11.55602 -3.8497238 0.6444722 1.2833948 2.6766639
+ 0.5878921 0.7946299 1.7207596 2.5791872 14.998469
+ -1.3385371 15.031221 -0.8006958 1.99287 -9.52007
+ 2.435466 4.003221 -4.33817 -4.898601 -5.304714
+ -18.033886 10.790787 -12.784645 -5.641755 2.9761686
+ -10.566622 1.4839455 6.152458 -5.7195854 2.8603241
+ 6.112133 8.489869 5.5958056 1.2836679 -1.2293907
+ 0.89927405 7.0288725 -2.854029 -0.9782962 5.8255906
+ 14.905906 -5.025907 0.7866458 -4.2444224 -16.354029
+ 10.521315 0.9604709 -3.3257897 7.144871 -13.592733
+ -8.568869 -1.7953678 0.26313916 10.916714 -6.9374123
+ 1.857403 -6.2746415 2.8154466 -7.2338667 -2.293357
+ -0.05452765 5.4287076 5.0849075 -6.690375 -1.6183422
+ 3.654291 0.94352573 -9.200294 -5.4749465 -3.5235846
+ 1.3420814 4.240421 -2.772944 -2.8451524 16.311104
+ 4.2969875 -1.762936 -12.5758915 8.595198 -0.8835239
+ -1.5708797 1.568961 1.1413603 3.5032008 -0.45251232
+ -6.786333 16.89443 5.3366146 -8.789056 0.6355629
+ 3.2579517 -3.328322 7.5969577 0.66025066 -6.550468
+ -9.148656 2.020372 -0.4615173 1.1965656 -3.8764873
+ 11.6562195 -6.0750933 12.182899 3.2218833 0.81969476
+ 5.570001 -3.8459578 -7.205299 7.9262037 -7.6611166
+ -5.249467 -2.2671914 7.2658715 -13.298164 4.821147
+ -2.7263982 11.691089 -3.8918593 -2.838112 -1.0336838
+ -3.8034165 2.8536487 -5.60398 -1.1972581 1.3455094
+ -3.4903061 2.2408795 5.5010734 -3.970756 11.99696
+ -7.8858757 0.43160373 -5.5059714 4.3426995 16.322706
+ 11.635366 0.72157705 -9.245714 -3.91465 -4.449838
+ -1.5716927 7.713747 -2.2430465 -6.198303 -13.481864
+ 2.8156567 -5.7812386 5.1456156 2.7289324 -14.505571
+ 13.270688 3.448231 -7.0659585 4.5886116 -4.466099
+ -0.296428 -11.463529 -2.6076477 14.110243 -6.9725137
+ -1.9962958 2.7119343 19.391657 0.01961198 14.607133
+ -1.6695905 -4.391516 1.3131028 -6.670972 -5.888604
+ 12.0612335 5.9285784 3.3715196 1.492534 10.723728
+ -0.95514804 -12.085431 ]
+ # get the score between enroll and test
+ Eembeddings Score: 0.4292638301849365
```
### 4.Pretrained Models
diff --git a/demos/speaker_verification/README_cn.md b/demos/speaker_verification/README_cn.md
index fe8949b3c..db382f298 100644
--- a/demos/speaker_verification/README_cn.md
+++ b/demos/speaker_verification/README_cn.md
@@ -29,6 +29,11 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
paddlespeech vector --task spk --input vec.job
echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk
+
+ paddlespeech vector --task score --input "./85236145389.wav ./123456789.wav"
+
+ echo -e "demo4 85236145389.wav 85236145389.wav \n demo5 85236145389.wav 123456789.wav" > vec.job
+ paddlespeech vector --task score --input vec.job
```
使用方法:
@@ -37,6 +42,7 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
```
参数:
- `input`(必须输入):用于识别的音频文件。
+ - `task` (必须输入): 用于指定 `vector` 处理的具体任务,默认是 `spk`。
- `model`:声纹任务的模型,默认值:`ecapatdnn_voxceleb12`。
- `sample_rate`:音频采样率,默认值:`16000`。
- `config`:声纹任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`。
@@ -45,45 +51,45 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
输出:
```bash
- demo [ -5.749211 9.505463 -8.200284 -5.2075014 5.3940268
- -3.04878 1.611095 10.127234 -10.534177 -15.821609
- 1.2032688 -0.35080156 1.2629458 -12.643498 -2.5758228
- -11.343508 2.3385992 -8.719341 14.213509 15.404744
- -0.39327756 6.338786 2.688887 8.7104025 17.469526
- -8.77959 7.0576906 4.648855 -1.3089896 -23.294737
- 8.013747 13.891729 -9.926753 5.655307 -5.9422326
- -22.842539 0.6293588 -18.46266 -10.811862 9.8192625
- 3.0070958 3.8072643 -2.3861165 3.0821571 -14.739942
- 1.7594414 -0.6485091 4.485623 2.0207152 7.264915
- -6.40137 23.63524 2.9711294 -22.708025 9.93719
- 20.354511 -10.324688 -0.700492 -8.783211 -5.27593
- 15.999649 3.3004563 12.747926 15.429879 4.7849145
- 5.6699696 -2.3826702 10.605882 3.9112158 3.1500628
- 15.859915 -2.1832209 -23.908653 -6.4799504 -4.5365124
- -9.224193 14.568347 -10.568833 4.982321 -4.342062
- 0.0914714 12.645902 -5.74285 -3.2141201 -2.7173362
- -6.680575 0.4757669 -5.035051 -6.7964664 16.865469
- -11.54324 7.681869 0.44475392 9.708182 -8.932846
- 0.4123232 -4.361452 1.3948607 9.511665 0.11667654
- 2.9079323 6.049952 9.275183 -18.078873 6.2983274
- -0.7500531 -2.725033 -7.6027865 3.3404543 2.990815
- 4.010979 11.000591 -2.8873312 7.1352735 -16.79663
- 18.495346 -14.293832 7.89578 2.2714825 22.976387
- -4.875734 -3.0836344 -2.9999814 13.751918 6.448228
- -11.924197 2.171869 2.0423572 -6.173772 10.778437
- 25.77281 -4.9495463 14.57806 0.3044315 2.6132357
- -7.591999 -2.076944 9.025118 1.7834753 -3.1799617
- -4.9401326 23.465864 5.1685796 -9.018578 9.037825
- -4.4150195 6.859591 -12.274467 -0.88911164 5.186309
- -3.9988663 -13.638606 -9.925445 -0.06329413 -3.6709652
- -12.397416 -12.719869 -1.395601 2.1150916 5.7381287
- -4.4691963 -3.82819 -0.84233856 -1.1604277 -13.490127
- 8.731719 -20.778936 -11.495662 5.8033476 -4.752041
- 10.833007 -6.717991 4.504732 13.4244375 1.1306485
- 7.3435574 1.400918 14.704036 -9.501399 7.2315617
- -6.417456 1.3333273 11.872697 -0.30664724 8.8845
- 6.5569253 4.7948146 0.03662816 -8.704245 6.224871
- -3.2701402 -11.508579 ]
+ demo [ 1.4217498 5.626253 -5.342073 1.1773866 3.308055
+ 1.756596 5.167894 10.80636 -3.8226728 -5.6141334
+ 2.623845 -0.8072968 1.9635103 -7.3128724 0.01103897
+ -9.723131 0.6619743 -6.976803 10.213478 7.494748
+ 2.9105635 3.8949256 3.7999806 7.1061673 16.905321
+ -7.1493764 8.733103 3.4230042 -4.831653 -11.403367
+ 11.232214 7.1274667 -4.2828417 2.452362 -5.130748
+ -18.177666 -2.6116815 -11.000337 -6.7314315 1.6564683
+ 0.7618269 1.1253023 -2.083836 4.725744 -8.782597
+ -3.539873 3.814236 5.1420674 2.162061 4.096431
+ -6.4162116 12.747448 1.9429878 -15.152943 6.417416
+ 16.097002 -9.716668 -1.9920526 -3.3649497 -1.871939
+ 11.567354 3.69788 11.258265 7.442363 9.183411
+ 4.5281515 -1.2417862 4.3959084 6.6727695 5.8898783
+ 7.627124 -0.66919386 -11.889693 -9.208865 -7.4274073
+ -3.7776625 6.917234 -9.848748 -2.0944717 -5.135116
+ 0.49563864 9.317534 -5.9141874 -1.8098574 -0.11738578
+ -7.169265 -1.0578263 -5.7216787 -5.1173844 16.137651
+ -4.473626 7.6624317 -0.55381083 9.631587 -6.4704556
+ -8.548508 4.3716145 -0.79702514 4.478997 -2.9758704
+ 3.272176 2.8382776 5.134597 -9.190781 -0.5657382
+ -4.8745747 2.3165567 -5.984303 -2.1798875 0.35541576
+ -0.31784213 9.493548 2.1144536 4.358092 -12.089823
+ 8.451689 -7.925461 4.6242585 4.4289427 18.692003
+ -2.6204622 -5.149185 -0.35821092 8.488551 4.981496
+ -9.32683 -2.2544234 6.6417594 1.2119585 10.977129
+ 16.555033 3.3238444 9.551863 -1.6676947 -0.79539716
+ -8.605674 -0.47356385 2.6741948 -5.359179 -2.6673796
+ 0.66607 15.443222 4.740594 -3.4725387 11.592567
+ -2.054497 1.7361217 -8.265324 -9.30447 5.4068313
+ -1.5180256 -7.746615 -6.089606 0.07112726 -0.34904733
+ -8.649895 -9.998958 -2.564841 -0.53999114 2.601808
+ -0.31927416 -1.8815292 -2.07215 -3.4105783 -8.2998085
+ 1.483641 -15.365992 -8.288208 3.8847756 -3.4876456
+ 7.3629923 0.4657332 3.132599 12.438889 -1.8337058
+ 4.532936 2.7264361 10.145339 -6.521951 2.897153
+ -3.3925855 5.079156 7.759716 4.677565 5.8457737
+ 2.402413 7.7071047 3.9711342 -6.390043 6.1268735
+ -3.7760346 -11.118123 ]
```
- Python API
@@ -98,53 +104,109 @@ wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
config=None, # Set `config` and `ckpt_path` to None to use pretrained model.
ckpt_path=None,
audio_file='./85236145389.wav',
- force_yes=False,
device=paddle.get_device())
print('Audio embedding Result: \n{}'.format(audio_emb))
+
+ test_emb = vector_executor(
+ model='ecapatdnn_voxceleb12',
+ sample_rate=16000,
+ config=None, # Set `config` and `ckpt_path` to None to use pretrained model.
+ ckpt_path=None,
+ audio_file='./123456789.wav',
+ device=paddle.get_device())
+ print('Test embedding Result: \n{}'.format(test_emb))
+
+ # score range [0, 1]
+ score = vector_executor.get_embeddings_score(audio_emb, test_emb)
+ print(f"Eembeddings Score: {score}")
```
输出:
```bash
# Vector Result:
- [ -5.749211 9.505463 -8.200284 -5.2075014 5.3940268
- -3.04878 1.611095 10.127234 -10.534177 -15.821609
- 1.2032688 -0.35080156 1.2629458 -12.643498 -2.5758228
- -11.343508 2.3385992 -8.719341 14.213509 15.404744
- -0.39327756 6.338786 2.688887 8.7104025 17.469526
- -8.77959 7.0576906 4.648855 -1.3089896 -23.294737
- 8.013747 13.891729 -9.926753 5.655307 -5.9422326
- -22.842539 0.6293588 -18.46266 -10.811862 9.8192625
- 3.0070958 3.8072643 -2.3861165 3.0821571 -14.739942
- 1.7594414 -0.6485091 4.485623 2.0207152 7.264915
- -6.40137 23.63524 2.9711294 -22.708025 9.93719
- 20.354511 -10.324688 -0.700492 -8.783211 -5.27593
- 15.999649 3.3004563 12.747926 15.429879 4.7849145
- 5.6699696 -2.3826702 10.605882 3.9112158 3.1500628
- 15.859915 -2.1832209 -23.908653 -6.4799504 -4.5365124
- -9.224193 14.568347 -10.568833 4.982321 -4.342062
- 0.0914714 12.645902 -5.74285 -3.2141201 -2.7173362
- -6.680575 0.4757669 -5.035051 -6.7964664 16.865469
- -11.54324 7.681869 0.44475392 9.708182 -8.932846
- 0.4123232 -4.361452 1.3948607 9.511665 0.11667654
- 2.9079323 6.049952 9.275183 -18.078873 6.2983274
- -0.7500531 -2.725033 -7.6027865 3.3404543 2.990815
- 4.010979 11.000591 -2.8873312 7.1352735 -16.79663
- 18.495346 -14.293832 7.89578 2.2714825 22.976387
- -4.875734 -3.0836344 -2.9999814 13.751918 6.448228
- -11.924197 2.171869 2.0423572 -6.173772 10.778437
- 25.77281 -4.9495463 14.57806 0.3044315 2.6132357
- -7.591999 -2.076944 9.025118 1.7834753 -3.1799617
- -4.9401326 23.465864 5.1685796 -9.018578 9.037825
- -4.4150195 6.859591 -12.274467 -0.88911164 5.186309
- -3.9988663 -13.638606 -9.925445 -0.06329413 -3.6709652
- -12.397416 -12.719869 -1.395601 2.1150916 5.7381287
- -4.4691963 -3.82819 -0.84233856 -1.1604277 -13.490127
- 8.731719 -20.778936 -11.495662 5.8033476 -4.752041
- 10.833007 -6.717991 4.504732 13.4244375 1.1306485
- 7.3435574 1.400918 14.704036 -9.501399 7.2315617
- -6.417456 1.3333273 11.872697 -0.30664724 8.8845
- 6.5569253 4.7948146 0.03662816 -8.704245 6.224871
- -3.2701402 -11.508579 ]
+ Audio embedding Result:
+ [ 1.4217498 5.626253 -5.342073 1.1773866 3.308055
+ 1.756596 5.167894 10.80636 -3.8226728 -5.6141334
+ 2.623845 -0.8072968 1.9635103 -7.3128724 0.01103897
+ -9.723131 0.6619743 -6.976803 10.213478 7.494748
+ 2.9105635 3.8949256 3.7999806 7.1061673 16.905321
+ -7.1493764 8.733103 3.4230042 -4.831653 -11.403367
+ 11.232214 7.1274667 -4.2828417 2.452362 -5.130748
+ -18.177666 -2.6116815 -11.000337 -6.7314315 1.6564683
+ 0.7618269 1.1253023 -2.083836 4.725744 -8.782597
+ -3.539873 3.814236 5.1420674 2.162061 4.096431
+ -6.4162116 12.747448 1.9429878 -15.152943 6.417416
+ 16.097002 -9.716668 -1.9920526 -3.3649497 -1.871939
+ 11.567354 3.69788 11.258265 7.442363 9.183411
+ 4.5281515 -1.2417862 4.3959084 6.6727695 5.8898783
+ 7.627124 -0.66919386 -11.889693 -9.208865 -7.4274073
+ -3.7776625 6.917234 -9.848748 -2.0944717 -5.135116
+ 0.49563864 9.317534 -5.9141874 -1.8098574 -0.11738578
+ -7.169265 -1.0578263 -5.7216787 -5.1173844 16.137651
+ -4.473626 7.6624317 -0.55381083 9.631587 -6.4704556
+ -8.548508 4.3716145 -0.79702514 4.478997 -2.9758704
+ 3.272176 2.8382776 5.134597 -9.190781 -0.5657382
+ -4.8745747 2.3165567 -5.984303 -2.1798875 0.35541576
+ -0.31784213 9.493548 2.1144536 4.358092 -12.089823
+ 8.451689 -7.925461 4.6242585 4.4289427 18.692003
+ -2.6204622 -5.149185 -0.35821092 8.488551 4.981496
+ -9.32683 -2.2544234 6.6417594 1.2119585 10.977129
+ 16.555033 3.3238444 9.551863 -1.6676947 -0.79539716
+ -8.605674 -0.47356385 2.6741948 -5.359179 -2.6673796
+ 0.66607 15.443222 4.740594 -3.4725387 11.592567
+ -2.054497 1.7361217 -8.265324 -9.30447 5.4068313
+ -1.5180256 -7.746615 -6.089606 0.07112726 -0.34904733
+ -8.649895 -9.998958 -2.564841 -0.53999114 2.601808
+ -0.31927416 -1.8815292 -2.07215 -3.4105783 -8.2998085
+ 1.483641 -15.365992 -8.288208 3.8847756 -3.4876456
+ 7.3629923 0.4657332 3.132599 12.438889 -1.8337058
+ 4.532936 2.7264361 10.145339 -6.521951 2.897153
+ -3.3925855 5.079156 7.759716 4.677565 5.8457737
+ 2.402413 7.7071047 3.9711342 -6.390043 6.1268735
+ -3.7760346 -11.118123 ]
+ # get the test embedding
+ Test embedding Result:
+ [ -1.902964 2.0690894 -8.034194 3.5472693 0.18089125
+ 6.9085927 1.4097427 -1.9487704 -10.021278 -0.20755845
+ -8.04332 4.344489 2.3200977 -14.306299 5.184692
+ -11.55602 -3.8497238 0.6444722 1.2833948 2.6766639
+ 0.5878921 0.7946299 1.7207596 2.5791872 14.998469
+ -1.3385371 15.031221 -0.8006958 1.99287 -9.52007
+ 2.435466 4.003221 -4.33817 -4.898601 -5.304714
+ -18.033886 10.790787 -12.784645 -5.641755 2.9761686
+ -10.566622 1.4839455 6.152458 -5.7195854 2.8603241
+ 6.112133 8.489869 5.5958056 1.2836679 -1.2293907
+ 0.89927405 7.0288725 -2.854029 -0.9782962 5.8255906
+ 14.905906 -5.025907 0.7866458 -4.2444224 -16.354029
+ 10.521315 0.9604709 -3.3257897 7.144871 -13.592733
+ -8.568869 -1.7953678 0.26313916 10.916714 -6.9374123
+ 1.857403 -6.2746415 2.8154466 -7.2338667 -2.293357
+ -0.05452765 5.4287076 5.0849075 -6.690375 -1.6183422
+ 3.654291 0.94352573 -9.200294 -5.4749465 -3.5235846
+ 1.3420814 4.240421 -2.772944 -2.8451524 16.311104
+ 4.2969875 -1.762936 -12.5758915 8.595198 -0.8835239
+ -1.5708797 1.568961 1.1413603 3.5032008 -0.45251232
+ -6.786333 16.89443 5.3366146 -8.789056 0.6355629
+ 3.2579517 -3.328322 7.5969577 0.66025066 -6.550468
+ -9.148656 2.020372 -0.4615173 1.1965656 -3.8764873
+ 11.6562195 -6.0750933 12.182899 3.2218833 0.81969476
+ 5.570001 -3.8459578 -7.205299 7.9262037 -7.6611166
+ -5.249467 -2.2671914 7.2658715 -13.298164 4.821147
+ -2.7263982 11.691089 -3.8918593 -2.838112 -1.0336838
+ -3.8034165 2.8536487 -5.60398 -1.1972581 1.3455094
+ -3.4903061 2.2408795 5.5010734 -3.970756 11.99696
+ -7.8858757 0.43160373 -5.5059714 4.3426995 16.322706
+ 11.635366 0.72157705 -9.245714 -3.91465 -4.449838
+ -1.5716927 7.713747 -2.2430465 -6.198303 -13.481864
+ 2.8156567 -5.7812386 5.1456156 2.7289324 -14.505571
+ 13.270688 3.448231 -7.0659585 4.5886116 -4.466099
+ -0.296428 -11.463529 -2.6076477 14.110243 -6.9725137
+ -1.9962958 2.7119343 19.391657 0.01961198 14.607133
+ -1.6695905 -4.391516 1.3131028 -6.670972 -5.888604
+ 12.0612335 5.9285784 3.3715196 1.492534 10.723728
+ -0.95514804 -12.085431 ]
+ # get the score between enroll and test
+ Eembeddings Score: 0.4292638301849365
```
### 4.预训练模型
diff --git a/demos/speaker_verification/run.sh b/demos/speaker_verification/run.sh
index 856886d33..6140f7f38 100644
--- a/demos/speaker_verification/run.sh
+++ b/demos/speaker_verification/run.sh
@@ -1,6 +1,9 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
+wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav
-# asr
-paddlespeech vector --task spk --input ./85236145389.wav
\ No newline at end of file
+# vector
+paddlespeech vector --task spk --input ./85236145389.wav
+
+paddlespeech vector --task score --input "./85236145389.wav ./123456789.wav"
diff --git a/docs/source/released_model.md b/docs/source/released_model.md
index 9a423e03e..1cbe39895 100644
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@@ -6,7 +6,7 @@
### Speech Recognition Model
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech | Example Link
:-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----: | :-----: | :-----:
-[Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.1.1.model.tar.gz) | Aishell Dataset | Char-based | 345 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.080 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0)
+[Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz) | Aishell Dataset | Char-based | 345 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.078 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0)
[Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.064 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0)
[Conformer Online Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_chunk_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0565 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1)
[Conformer Offline Aishell ASR1 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_conformer_aishell_ckpt_0.1.2.model.tar.gz) | Aishell Dataset | Char-based | 189 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0483 |-| 151 h | [Conformer Offline Aishell ASR1](../../examples/aishell/asr1)
@@ -37,8 +37,8 @@ Model Type | Dataset| Example Link | Pretrained Models|Static Models|Size (stati
Tacotron2|LJSpeech|[tacotron2-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_ljspeech_ckpt_0.2.0.zip)|||
Tacotron2|CSMSC|[tacotron2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts0)|[tacotron2_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_ckpt_0.2.0.zip)|[tacotron2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_static_0.2.0.zip)|103MB|
TransformerTTS| LJSpeech| [transformer-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts1)|[transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/transformer_tts/transformer_tts_ljspeech_ckpt_0.4.zip)|||
-SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2) |[speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_ckpt_0.5.zip)|[speedyspeech_nosil_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_static_0.5.zip)|12MB|
-FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_nosil_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_static_0.4.zip)|157MB|
+SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2) |[speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_ckpt_0.5.zip)|[speedyspeech_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_static_0.2.0.zip)|12MB|
+FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_static_0.2.0.zip)|157MB|
FastSpeech2-Conformer| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_conformer_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_baker_ckpt_0.5.zip)|||
FastSpeech2| AISHELL-3 |[fastspeech2-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/tts3)|[fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip)|||
FastSpeech2| LJSpeech |[fastspeech2-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts3)|[fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)|||
@@ -80,7 +80,7 @@ PANN | ESC-50 |[pann-esc50](../../examples/esc50/cls0)|[esc50_cnn6.tar.gz](https
Model Type | Dataset| Example Link | Pretrained Models | Static Models
:-------------:| :------------:| :-----: | :-----: | :-----:
-PANN | VoxCeleb| [voxceleb_ecapatdnn](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) | [ecapatdnn.tar.gz](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz) | -
+PANN | VoxCeleb| [voxceleb_ecapatdnn](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/voxceleb/sv0) | [ecapatdnn.tar.gz](https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz) | -
## Punctuation Restoration Models
Model Type | Dataset| Example Link | Pretrained Models
diff --git a/examples/aishell/asr0/README.md b/examples/aishell/asr0/README.md
index bb45d8df0..4459b1382 100644
--- a/examples/aishell/asr0/README.md
+++ b/examples/aishell/asr0/README.md
@@ -151,21 +151,14 @@ avg.sh best exp/deepspeech2/checkpoints 1
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/deepspeech2.yaml exp/deepspeech2/checkpoints/avg_1
```
## Pretrained Model
-You can get the pretrained transformer or conformer using the scripts below:
-```bash
-Deepspeech2 offline:
-wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/ds2.model.tar.gz
-
-Deepspeech2 online:
-wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/aishell_ds2_online_cer8.00_release.tar.gz
+You can get the pretrained models from [this](../../../docs/source/released_model.md).
-```
using the `tar` scripts to unpack the model and then you can use the script to test the model.
For example:
```
-wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/ds2.model.tar.gz
-tar xzvf ds2.model.tar.gz
+wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz
+tar xzvf asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz
source path.sh
# If you have process the data and get the manifest file, you can skip the following 2 steps
bash local/data.sh --stage -1 --stop_stage -1
@@ -173,12 +166,7 @@ bash local/data.sh --stage 2 --stop_stage 2
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/deepspeech2.yaml exp/deepspeech2/checkpoints/avg_1
```
-The performance of the released models are shown below:
-
-| Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech |
-| :----------------------------: | :-------------: | :---------: | -----: | :------------------------------------------------- | :---- | :--- | :-------------- |
-| Ds2 Online Aishell ASR0 Model | Aishell Dataset | Char-based | 345 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.080 | - | 151 h |
-| Ds2 Offline Aishell ASR0 Model | Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers | 0.064 | - | 151 h |
+The performance of the released models are shown in [this](./RESULTS.md)
## Stage 4: Static graph model Export
This stage is to transform dygraph to static graph.
```bash
@@ -214,8 +202,8 @@ if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
```
you can train the model by yourself, or you can download the pretrained model by the script below:
```bash
-wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/ds2.model.tar.gz
-tar xzvf ds2.model.tar.gz
+wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz
+tar xzvf asr0_deepspeech2_aishell_ckpt_0.1.1.model.tar.gz
```
You can download the audio demo:
```bash
diff --git a/examples/aishell/asr0/RESULTS.md b/examples/aishell/asr0/RESULTS.md
index 5841a8522..8af3d66d1 100644
--- a/examples/aishell/asr0/RESULTS.md
+++ b/examples/aishell/asr0/RESULTS.md
@@ -4,15 +4,16 @@
| Model | Number of Params | Release | Config | Test set | Valid Loss | CER |
| --- | --- | --- | --- | --- | --- | --- |
-| DeepSpeech2 | 45.18M | 2.2.0 | conf/deepspeech2_online.yaml + spec aug | test | 7.994938373565674 | 0.080 |
+| DeepSpeech2 | 45.18M | r0.2.0 | conf/deepspeech2_online.yaml + spec aug | test | 7.708217620849609| 0.078 |
+| DeepSpeech2 | 45.18M | v2.2.0 | conf/deepspeech2_online.yaml + spec aug | test | 7.994938373565674 | 0.080 |
## Deepspeech2 Non-Streaming
| Model | Number of Params | Release | Config | Test set | Valid Loss | CER |
| --- | --- | --- | --- | --- | --- | --- |
-| DeepSpeech2 | 58.4M | 2.2.0 | conf/deepspeech2.yaml + spec aug | test | 5.738585948944092 | 0.064000 |
-| DeepSpeech2 | 58.4M | 2.1.0 | conf/deepspeech2.yaml + spec aug | test | 7.483316898345947 | 0.077860 |
-| DeepSpeech2 | 58.4M | 2.1.0 | conf/deepspeech2.yaml | test | 7.299022197723389 | 0.078671 |
-| DeepSpeech2 | 58.4M | 2.0.0 | conf/deepspeech2.yaml | test | - | 0.078977 |
+| DeepSpeech2 | 58.4M | v2.2.0 | conf/deepspeech2.yaml + spec aug | test | 5.738585948944092 | 0.064000 |
+| DeepSpeech2 | 58.4M | v2.1.0 | conf/deepspeech2.yaml + spec aug | test | 7.483316898345947 | 0.077860 |
+| DeepSpeech2 | 58.4M | v2.1.0 | conf/deepspeech2.yaml | test | 7.299022197723389 | 0.078671 |
+| DeepSpeech2 | 58.4M | v2.0.0 | conf/deepspeech2.yaml | test | - | 0.078977 |
| --- | --- | --- | --- | --- | --- | --- |
-| DeepSpeech2 | 58.4M | 1.8.5 | - | test | - | 0.080447 |
+| DeepSpeech2 | 58.4M | v1.8.5 | - | test | - | 0.080447 |
diff --git a/examples/aishell/asr1/README.md b/examples/aishell/asr1/README.md
index 5277a31eb..25b28ede8 100644
--- a/examples/aishell/asr1/README.md
+++ b/examples/aishell/asr1/README.md
@@ -143,25 +143,14 @@ avg.sh best exp/conformer/checkpoints 20
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
```
## Pretrained Model
-You can get the pretrained transformer or conformer using the scripts below:
+You can get the pretrained transformer or conformer from [this](../../../docs/source/released_model.md)
-```bash
-# Conformer:
-wget https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.release.tar.gz
-
-# Chunk Conformer:
-wget https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.chunk.release.tar.gz
-
-# Transformer:
-wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/transformer.model.tar.gz
-
-```
using the `tar` scripts to unpack the model and then you can use the script to test the model.
For example:
```
-wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/transformer.model.tar.gz
-tar xzvf transformer.model.tar.gz
+wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz
+tar xzvf asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz
source path.sh
# If you have process the data and get the manifest file, you can skip the following 2 steps
bash local/data.sh --stage -1 --stop_stage -1
@@ -206,7 +195,7 @@ In some situations, you want to use the trained model to do the inference for th
```
you can train the model by yourself using ```bash run.sh --stage 0 --stop_stage 3```, or you can download the pretrained model through the script below:
```bash
-wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/transformer.model.tar.gz
+wget https://paddlespeech.bj.bcebos.com/s2t/aishell/asr1/asr1_transformer_aishell_ckpt_0.1.1.model.tar.gz
tar xzvf transformer.model.tar.gz
```
You can download the audio demo:
diff --git a/examples/aishell3/vc0/README.md b/examples/aishell3/vc0/README.md
index 664ec1ac3..925663ab1 100644
--- a/examples/aishell3/vc0/README.md
+++ b/examples/aishell3/vc0/README.md
@@ -118,7 +118,7 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/voice_cloning.sh ${conf_path} ${train_outpu
```
## Pretrained Model
-[tacotron2_aishell3_ckpt_vc0_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_aishell3_ckpt_vc0_0.2.0.zip)
+- [tacotron2_aishell3_ckpt_vc0_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_aishell3_ckpt_vc0_0.2.0.zip)
Model | Step | eval/loss | eval/l1_loss | eval/mse_loss | eval/bce_loss| eval/attn_loss
diff --git a/examples/aishell3/vc1/README.md b/examples/aishell3/vc1/README.md
index 04b83a5ff..8ab0f9c8c 100644
--- a/examples/aishell3/vc1/README.md
+++ b/examples/aishell3/vc1/README.md
@@ -119,7 +119,7 @@ ref_audio
CUDA_VISIBLE_DEVICES=${gpus} ./local/voice_cloning.sh ${conf_path} ${train_output_path} ${ckpt_name} ${ge2e_params_path} ${ref_audio_dir}
```
## Pretrained Model
-[fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip)
+- [fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip)
Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/pitch_loss| eval/energy_loss
:-------------:| :------------:| :-----: | :-----: | :--------: |:--------:|:---------:
diff --git a/examples/aishell3/voc1/README.md b/examples/aishell3/voc1/README.md
index dad464092..eb30e7c40 100644
--- a/examples/aishell3/voc1/README.md
+++ b/examples/aishell3/voc1/README.md
@@ -137,7 +137,8 @@ optional arguments:
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Models
-Pretrained models can be downloaded here [pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip).
+Pretrained models can be downloaded here:
+- [pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip)
Model | Step | eval/generator_loss | eval/log_stft_magnitude_loss:| eval/spectral_convergence_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
diff --git a/examples/aishell3/voc5/README.md b/examples/aishell3/voc5/README.md
index ebe2530be..c957c4a3a 100644
--- a/examples/aishell3/voc5/README.md
+++ b/examples/aishell3/voc5/README.md
@@ -136,7 +136,8 @@ optional arguments:
4. `--output-dir` is the directory to save the synthesized audio files.
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Models
-The pretrained model can be downloaded here [hifigan_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip).
+The pretrained model can be downloaded here:
+- [hifigan_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip)
Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss
diff --git a/examples/ami/sd0/conf/ecapa_tdnn.yaml b/examples/ami/sd0/conf/ecapa_tdnn.yaml
new file mode 100755
index 000000000..319e44976
--- /dev/null
+++ b/examples/ami/sd0/conf/ecapa_tdnn.yaml
@@ -0,0 +1,62 @@
+###########################################################
+# AMI DATA PREPARE SETTING #
+###########################################################
+split_type: 'full_corpus_asr'
+skip_TNO: True
+# Options for mic_type: 'Mix-Lapel', 'Mix-Headset', 'Array1', 'Array1-01', 'BeamformIt'
+mic_type: 'Mix-Headset'
+vad_type: 'oracle'
+max_subseg_dur: 3.0
+overlap: 1.5
+# Some more exp folders (for cleaner structure).
+embedding_dir: emb #!ref /emb
+meta_data_dir: metadata #!ref /metadata
+ref_rttm_dir: ref_rttms #!ref /ref_rttms
+sys_rttm_dir: sys_rttms #!ref /sys_rttms
+der_dir: DER #!ref /DER
+
+
+###########################################################
+# FEATURE EXTRACTION SETTING #
+###########################################################
+# currently, we only support fbank
+sr: 16000 # sample rate
+n_mels: 80
+window_size: 400 #25ms, sample rate 16000, 25 * 16000 / 1000 = 400
+hop_size: 160 #10ms, sample rate 16000, 10 * 16000 / 1000 = 160
+#left_frames: 0
+#right_frames: 0
+#deltas: False
+
+
+###########################################################
+# MODEL SETTING #
+###########################################################
+# currently, we only support ecapa-tdnn in the ecapa_tdnn.yaml
+# if we want use another model, please choose another configuration yaml file
+seed: 1234
+emb_dim: 192
+batch_size: 16
+model:
+ input_size: 80
+ channels: [1024, 1024, 1024, 1024, 3072]
+ kernel_sizes: [5, 3, 3, 3, 1]
+ dilations: [1, 2, 3, 4, 1]
+ attention_channels: 128
+ lin_neurons: 192
+# Will automatically download ECAPA-TDNN model (best).
+
+###########################################################
+# SPECTRAL CLUSTERING SETTING #
+###########################################################
+backend: 'SC' # options: 'kmeans' # Note: kmeans goes only with cos affinity
+affinity: 'cos' # options: cos, nn
+max_num_spkrs: 10
+oracle_n_spkrs: True
+
+
+###########################################################
+# DER EVALUATION SETTING #
+###########################################################
+ignore_overlap: True
+forgiveness_collar: 0.25
diff --git a/examples/ami/sd0/local/compute_embdding.py b/examples/ami/sd0/local/compute_embdding.py
new file mode 100644
index 000000000..dc824d7ca
--- /dev/null
+++ b/examples/ami/sd0/local/compute_embdding.py
@@ -0,0 +1,231 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import json
+import os
+import pickle
+import sys
+
+import numpy as np
+import paddle
+from paddle.io import BatchSampler
+from paddle.io import DataLoader
+from tqdm.contrib import tqdm
+from yacs.config import CfgNode
+
+from paddlespeech.s2t.utils.log import Log
+from paddlespeech.vector.cluster.diarization import EmbeddingMeta
+from paddlespeech.vector.io.batch import batch_feature_normalize
+from paddlespeech.vector.io.dataset_from_json import JSONDataset
+from paddlespeech.vector.models.ecapa_tdnn import EcapaTdnn
+from paddlespeech.vector.modules.sid_model import SpeakerIdetification
+from paddlespeech.vector.training.seeding import seed_everything
+
+# Logger setup
+logger = Log(__name__).getlog()
+
+
+def prepare_subset_json(full_meta_data, rec_id, out_meta_file):
+ """Prepares metadata for a given recording ID.
+
+ Arguments
+ ---------
+ full_meta_data : json
+ Full meta (json) containing all the recordings
+ rec_id : str
+ The recording ID for which meta (json) has to be prepared
+ out_meta_file : str
+ Path of the output meta (json) file.
+ """
+
+ subset = {}
+ for key in full_meta_data:
+ k = str(key)
+ if k.startswith(rec_id):
+ subset[key] = full_meta_data[key]
+
+ with open(out_meta_file, mode="w") as json_f:
+ json.dump(subset, json_f, indent=2)
+
+
+def create_dataloader(json_file, batch_size):
+ """Creates the datasets and their data processing pipelines.
+ This is used for multi-mic processing.
+ """
+
+ # create datasets
+ dataset = JSONDataset(
+ json_file=json_file,
+ feat_type='melspectrogram',
+ n_mels=config.n_mels,
+ window_size=config.window_size,
+ hop_length=config.hop_size)
+
+ # create dataloader
+ batch_sampler = BatchSampler(dataset, batch_size=batch_size, shuffle=True)
+ dataloader = DataLoader(dataset,
+ batch_sampler=batch_sampler,
+ collate_fn=lambda x: batch_feature_normalize(
+ x, mean_norm=True, std_norm=False),
+ return_list=True)
+
+ return dataloader
+
+
+def main(args, config):
+ # set the training device, cpu or gpu
+ paddle.set_device(args.device)
+ # set the random seed
+ seed_everything(config.seed)
+
+ # stage1: build the dnn backbone model network
+ ecapa_tdnn = EcapaTdnn(**config.model)
+
+ # stage2: build the speaker verification eval instance with backbone model
+ model = SpeakerIdetification(backbone=ecapa_tdnn, num_class=1)
+
+ # stage3: load the pre-trained model
+ # we get the last model from the epoch and save_interval
+ args.load_checkpoint = os.path.abspath(
+ os.path.expanduser(args.load_checkpoint))
+
+ # load model checkpoint to sid model
+ state_dict = paddle.load(
+ os.path.join(args.load_checkpoint, 'model.pdparams'))
+ model.set_state_dict(state_dict)
+ logger.info(f'Checkpoint loaded from {args.load_checkpoint}')
+
+ # set the model to eval mode
+ model.eval()
+
+ # load meta data
+ meta_file = os.path.join(
+ args.data_dir,
+ config.meta_data_dir,
+ "ami_" + args.dataset + "." + config.mic_type + ".subsegs.json", )
+ with open(meta_file, "r") as f:
+ full_meta = json.load(f)
+
+ # get all the recording IDs in this dataset.
+ all_keys = full_meta.keys()
+ A = [word.rstrip().split("_")[0] for word in all_keys]
+ all_rec_ids = list(set(A[1:]))
+ all_rec_ids.sort()
+ split = "AMI_" + args.dataset
+ i = 1
+
+ msg = "Extra embdding for " + args.dataset + " set"
+ logger.info(msg)
+
+ if len(all_rec_ids) <= 0:
+ msg = "No recording IDs found! Please check if meta_data json file is properly generated."
+ logger.error(msg)
+ sys.exit()
+
+ # extra different recordings embdding in a dataset.
+ for rec_id in tqdm(all_rec_ids):
+ # This tag will be displayed in the log.
+ tag = ("[" + str(args.dataset) + ": " + str(i) + "/" +
+ str(len(all_rec_ids)) + "]")
+ i = i + 1
+
+ # log message.
+ msg = "Embdding %s : %s " % (tag, rec_id)
+ logger.debug(msg)
+
+ # embedding directory.
+ if not os.path.exists(
+ os.path.join(args.data_dir, config.embedding_dir, split)):
+ os.makedirs(
+ os.path.join(args.data_dir, config.embedding_dir, split))
+
+ # file to store embeddings.
+ emb_file_name = rec_id + "." + config.mic_type + ".emb_stat.pkl"
+ diary_stat_emb_file = os.path.join(args.data_dir, config.embedding_dir,
+ split, emb_file_name)
+
+ # prepare a metadata (json) for one recording. This is basically a subset of full_meta.
+ # lets keep this meta-info in embedding directory itself.
+ json_file_name = rec_id + "." + config.mic_type + ".json"
+ meta_per_rec_file = os.path.join(args.data_dir, config.embedding_dir,
+ split, json_file_name)
+
+ # write subset (meta for one recording) json metadata.
+ prepare_subset_json(full_meta, rec_id, meta_per_rec_file)
+
+ # prepare data loader.
+ diary_set_loader = create_dataloader(meta_per_rec_file,
+ config.batch_size)
+
+ # extract embeddings (skip if already done).
+ if not os.path.isfile(diary_stat_emb_file):
+ logger.debug("Extracting deep embeddings")
+ embeddings = np.empty(shape=[0, config.emb_dim], dtype=np.float64)
+ segset = []
+
+ for batch_idx, batch in enumerate(tqdm(diary_set_loader)):
+ # extrac the audio embedding
+ ids, feats, lengths = batch['ids'], batch['feats'], batch[
+ 'lengths']
+ seg = [x for x in ids]
+ segset = segset + seg
+ emb = model.backbone(feats, lengths).squeeze(
+ -1).numpy() # (N, emb_size, 1) -> (N, emb_size)
+ embeddings = np.concatenate((embeddings, emb), axis=0)
+
+ segset = np.array(segset, dtype="|O")
+ stat_obj = EmbeddingMeta(
+ segset=segset,
+ stats=embeddings, )
+ logger.debug("Saving Embeddings...")
+ with open(diary_stat_emb_file, "wb") as output:
+ pickle.dump(stat_obj, output)
+
+ else:
+ logger.debug("Skipping embedding extraction (as already present).")
+
+
+# Begin experiment!
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(__doc__)
+ parser.add_argument(
+ '--device',
+ default="gpu",
+ help="Select which device to perform diarization, defaults to gpu.")
+ parser.add_argument(
+ "--config", default=None, type=str, help="configuration file")
+ parser.add_argument(
+ "--data-dir",
+ default="../save/",
+ type=str,
+ help="processsed data directory")
+ parser.add_argument(
+ "--dataset",
+ choices=['dev', 'eval'],
+ default="dev",
+ type=str,
+ help="Select which dataset to extra embdding, defaults to dev")
+ parser.add_argument(
+ "--load-checkpoint",
+ type=str,
+ default='',
+ help="Directory to load model checkpoint to compute embeddings.")
+ args = parser.parse_args()
+ config = CfgNode(new_allowed=True)
+ if args.config:
+ config.merge_from_file(args.config)
+
+ config.freeze()
+
+ main(args, config)
diff --git a/examples/ami/sd0/local/data.sh b/examples/ami/sd0/local/data.sh
deleted file mode 100755
index 478ec432d..000000000
--- a/examples/ami/sd0/local/data.sh
+++ /dev/null
@@ -1,49 +0,0 @@
-#!/bin/bash
-
-stage=1
-
-TARGET_DIR=${MAIN_ROOT}/dataset/ami
-data_folder=${TARGET_DIR}/amicorpus #e.g., /path/to/amicorpus/
-manual_annot_folder=${TARGET_DIR}/ami_public_manual_1.6.2 #e.g., /path/to/ami_public_manual_1.6.2/
-
-save_folder=${MAIN_ROOT}/examples/ami/sd0/data
-ref_rttm_dir=${save_folder}/ref_rttms
-meta_data_dir=${save_folder}/metadata
-
-set=L
-
-. ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
-set -u
-set -o pipefail
-
-mkdir -p ${save_folder}
-
-if [ ${stage} -le 0 ]; then
- # Download AMI corpus, You need around 10GB of free space to get whole data
- # The signals are too large to package in this way,
- # so you need to use the chooser to indicate which ones you wish to download
- echo "Please follow https://groups.inf.ed.ac.uk/ami/download/ to download the data."
- echo "Annotations: AMI manual annotations v1.6.2 "
- echo "Signals: "
- echo "1) Select one or more AMI meetings: the IDs please follow ./ami_split.py"
- echo "2) Select media streams: Just select Headset mix"
- exit 0;
-fi
-
-if [ ${stage} -le 1 ]; then
- echo "AMI Data preparation"
-
- python local/ami_prepare.py --data_folder ${data_folder} \
- --manual_annot_folder ${manual_annot_folder} \
- --save_folder ${save_folder} --ref_rttm_dir ${ref_rttm_dir} \
- --meta_data_dir ${meta_data_dir}
-
- if [ $? -ne 0 ]; then
- echo "Prepare AMI failed. Please check log message."
- exit 1
- fi
-
-fi
-
-echo "AMI data preparation done."
-exit 0
diff --git a/examples/ami/sd0/local/experiment.py b/examples/ami/sd0/local/experiment.py
new file mode 100755
index 000000000..298228376
--- /dev/null
+++ b/examples/ami/sd0/local/experiment.py
@@ -0,0 +1,428 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import glob
+import json
+import os
+import pickle
+import shutil
+import sys
+
+import numpy as np
+from tqdm.contrib import tqdm
+from yacs.config import CfgNode
+
+from paddlespeech.s2t.utils.log import Log
+from paddlespeech.vector.cluster import diarization as diar
+from utils.DER import DER
+
+# Logger setup
+logger = Log(__name__).getlog()
+
+
+def diarize_dataset(
+ full_meta,
+ split_type,
+ n_lambdas,
+ pval,
+ save_dir,
+ config,
+ n_neighbors=10, ):
+ """This function diarizes all the recordings in a given dataset. It performs
+ computation of embedding and clusters them using spectral clustering (or other backends).
+ The output speaker boundary file is stored in the RTTM format.
+ """
+
+ # prepare `spkr_info` only once when Oracle num of speakers is selected.
+ # spkr_info is essential to obtain number of speakers from groundtruth.
+ if config.oracle_n_spkrs is True:
+ full_ref_rttm_file = os.path.join(save_dir, config.ref_rttm_dir,
+ "fullref_ami_" + split_type + ".rttm")
+ rttm = diar.read_rttm(full_ref_rttm_file)
+
+ spkr_info = list( # noqa F841
+ filter(lambda x: x.startswith("SPKR-INFO"), rttm))
+
+ # get all the recording IDs in this dataset.
+ all_keys = full_meta.keys()
+ A = [word.rstrip().split("_")[0] for word in all_keys]
+ all_rec_ids = list(set(A[1:]))
+ all_rec_ids.sort()
+ split = "AMI_" + split_type
+ i = 1
+
+ # adding tag for directory path.
+ type_of_num_spkr = "oracle" if config.oracle_n_spkrs else "est"
+ tag = (type_of_num_spkr + "_" + str(config.affinity) + "_" + config.backend)
+
+ # make out rttm dir
+ out_rttm_dir = os.path.join(save_dir, config.sys_rttm_dir, config.mic_type,
+ split, tag)
+ if not os.path.exists(out_rttm_dir):
+ os.makedirs(out_rttm_dir)
+
+ # diarizing different recordings in a dataset.
+ for rec_id in tqdm(all_rec_ids):
+ # this tag will be displayed in the log.
+ tag = ("[" + str(split_type) + ": " + str(i) + "/" +
+ str(len(all_rec_ids)) + "]")
+ i = i + 1
+
+ # log message.
+ msg = "Diarizing %s : %s " % (tag, rec_id)
+ logger.debug(msg)
+
+ # load embeddings.
+ emb_file_name = rec_id + "." + config.mic_type + ".emb_stat.pkl"
+ diary_stat_emb_file = os.path.join(save_dir, config.embedding_dir,
+ split, emb_file_name)
+ if not os.path.isfile(diary_stat_emb_file):
+ msg = "Embdding file %s not found! Please check if embdding file is properly generated." % (
+ diary_stat_emb_file)
+ logger.error(msg)
+ sys.exit()
+ with open(diary_stat_emb_file, "rb") as in_file:
+ diary_obj = pickle.load(in_file)
+
+ out_rttm_file = out_rttm_dir + "/" + rec_id + ".rttm"
+
+ # processing starts from here.
+ if config.oracle_n_spkrs is True:
+ # oracle num of speakers.
+ num_spkrs = diar.get_oracle_num_spkrs(rec_id, spkr_info)
+ else:
+ if config.affinity == "nn":
+ # num of speakers tunned on dev set (only for nn affinity).
+ num_spkrs = n_lambdas
+ else:
+ # num of speakers will be estimated using max eigen gap for cos based affinity.
+ # so adding None here. Will use this None later-on.
+ num_spkrs = None
+
+ if config.backend == "kmeans":
+ diar.do_kmeans_clustering(
+ diary_obj,
+ out_rttm_file,
+ rec_id,
+ num_spkrs,
+ pval, )
+
+ if config.backend == "SC":
+ # go for Spectral Clustering (SC).
+ diar.do_spec_clustering(
+ diary_obj,
+ out_rttm_file,
+ rec_id,
+ num_spkrs,
+ pval,
+ config.affinity,
+ n_neighbors, )
+
+ # can used for AHC later. Likewise one can add different backends here.
+ if config.backend == "AHC":
+ # call AHC
+ threshold = pval # pval for AHC is nothing but threshold.
+ diar.do_AHC(diary_obj, out_rttm_file, rec_id, num_spkrs, threshold)
+
+ # once all RTTM outputs are generated, concatenate individual RTTM files to obtain single RTTM file.
+ # this is not needed but just staying with the standards.
+ concate_rttm_file = out_rttm_dir + "/sys_output.rttm"
+ logger.debug("Concatenating individual RTTM files...")
+ with open(concate_rttm_file, "w") as cat_file:
+ for f in glob.glob(out_rttm_dir + "/*.rttm"):
+ if f == concate_rttm_file:
+ continue
+ with open(f, "r") as indi_rttm_file:
+ shutil.copyfileobj(indi_rttm_file, cat_file)
+
+ msg = "The system generated RTTM file for %s set : %s" % (
+ split_type, concate_rttm_file, )
+ logger.debug(msg)
+
+ return concate_rttm_file
+
+
+def dev_pval_tuner(full_meta, save_dir, config):
+ """Tuning p_value for affinity matrix.
+ The p_value used so that only p% of the values in each row is retained.
+ """
+
+ DER_list = []
+ prange = np.arange(0.002, 0.015, 0.001)
+
+ n_lambdas = None # using it as flag later.
+ for p_v in prange:
+ # Process whole dataset for value of p_v.
+ concate_rttm_file = diarize_dataset(full_meta, "dev", n_lambdas, p_v,
+ save_dir, config)
+
+ ref_rttm_file = os.path.join(save_dir, config.ref_rttm_dir,
+ "fullref_ami_dev.rttm")
+ sys_rttm_file = concate_rttm_file
+ [MS, FA, SER, DER_] = DER(
+ ref_rttm_file,
+ sys_rttm_file,
+ config.ignore_overlap,
+ config.forgiveness_collar, )
+
+ DER_list.append(DER_)
+
+ if config.oracle_n_spkrs is True and config.backend == "kmeans":
+ # no need of p_val search. Note p_val is needed for SC for both oracle and est num of speakers.
+ # p_val is needed in oracle_n_spkr=False when using kmeans backend.
+ break
+
+ # Take p_val that gave minmum DER on Dev dataset.
+ tuned_p_val = prange[DER_list.index(min(DER_list))]
+
+ return tuned_p_val
+
+
+def dev_ahc_threshold_tuner(full_meta, save_dir, config):
+ """Tuning threshold for affinity matrix. This function is called when AHC is used as backend.
+ """
+
+ DER_list = []
+ prange = np.arange(0.0, 1.0, 0.1)
+
+ n_lambdas = None # using it as flag later.
+
+ # Note: p_val is threshold in case of AHC.
+ for p_v in prange:
+ # Process whole dataset for value of p_v.
+ concate_rttm_file = diarize_dataset(full_meta, "dev", n_lambdas, p_v,
+ save_dir, config)
+
+ ref_rttm = os.path.join(save_dir, config.ref_rttm_dir,
+ "fullref_ami_dev.rttm")
+ sys_rttm = concate_rttm_file
+ [MS, FA, SER, DER_] = DER(
+ ref_rttm,
+ sys_rttm,
+ config.ignore_overlap,
+ config.forgiveness_collar, )
+
+ DER_list.append(DER_)
+
+ if config.oracle_n_spkrs is True:
+ break # no need of threshold search.
+
+ # Take p_val that gave minmum DER on Dev dataset.
+ tuned_p_val = prange[DER_list.index(min(DER_list))]
+
+ return tuned_p_val
+
+
+def dev_nn_tuner(full_meta, split_type, save_dir, config):
+ """Tuning n_neighbors on dev set. Assuming oracle num of speakers.
+ This is used when nn based affinity is selected.
+ """
+
+ DER_list = []
+ pval = None
+
+ # Now assumming oracle num of speakers.
+ n_lambdas = 4
+
+ for nn in range(5, 15):
+
+ # Process whole dataset for value of n_lambdas.
+ concate_rttm_file = diarize_dataset(full_meta, "dev", n_lambdas, p_v,
+ save_dir, config, nn)
+
+ ref_rttm = os.path.join(save_dir, config.ref_rttm_dir,
+ "fullref_ami_dev.rttm")
+ sys_rttm = concate_rttm_file
+ [MS, FA, SER, DER_] = DER(
+ ref_rttm,
+ sys_rttm,
+ config.ignore_overlap,
+ config.forgiveness_collar, )
+
+ DER_list.append([nn, DER_])
+
+ if config.oracle_n_spkrs is True and config.backend == "kmeans":
+ break
+
+ DER_list.sort(key=lambda x: x[1])
+ tunned_nn = DER_list[0]
+
+ return tunned_nn[0]
+
+
+def dev_tuner(full_meta, split_type, save_dir, config):
+ """Tuning n_components on dev set. Used for nn based affinity matrix.
+ Note: This is a very basic tunning for nn based affinity.
+ This is work in progress till we find a better way.
+ """
+
+ DER_list = []
+ pval = None
+ for n_lambdas in range(1, config.max_num_spkrs + 1):
+
+ # Process whole dataset for value of n_lambdas.
+ concate_rttm_file = diarize_dataset(full_meta, "dev", n_lambdas, p_v,
+ save_dir, config)
+
+ ref_rttm = os.path.join(save_dir, config.ref_rttm_dir,
+ "fullref_ami_dev.rttm")
+ sys_rttm = concate_rttm_file
+ [MS, FA, SER, DER_] = DER(
+ ref_rttm,
+ sys_rttm,
+ config.ignore_overlap,
+ config.forgiveness_collar, )
+
+ DER_list.append(DER_)
+
+ # Take n_lambdas with minmum DER.
+ tuned_n_lambdas = DER_list.index(min(DER_list)) + 1
+
+ return tuned_n_lambdas
+
+
+def main(args, config):
+ # AMI Dev Set: Tune hyperparams on dev set.
+ # Read the embdding file for dev set generated during embdding compute
+ dev_meta_file = os.path.join(
+ args.data_dir,
+ config.meta_data_dir,
+ "ami_dev." + config.mic_type + ".subsegs.json", )
+ with open(dev_meta_file, "r") as f:
+ meta_dev = json.load(f)
+
+ full_meta = meta_dev
+
+ # Processing starts from here
+ # Following few lines selects option for different backend and affinity matrices. Finds best values for hyperameters using dev set.
+ ref_rttm_file = os.path.join(args.data_dir, config.ref_rttm_dir,
+ "fullref_ami_dev.rttm")
+ best_nn = None
+ if config.affinity == "nn":
+ logger.info("Tuning for nn (Multiple iterations over AMI Dev set)")
+ best_nn = dev_nn_tuner(full_meta, args.data_dir, config)
+
+ n_lambdas = None
+ best_pval = None
+
+ if config.affinity == "cos" and (config.backend == "SC" or
+ config.backend == "kmeans"):
+ # oracle num_spkrs or not, doesn't matter for kmeans and SC backends
+ # cos: Tune for the best pval for SC /kmeans (for unknown num of spkrs)
+ logger.info(
+ "Tuning for p-value for SC (Multiple iterations over AMI Dev set)")
+ best_pval = dev_pval_tuner(full_meta, args.data_dir, config)
+
+ elif config.backend == "AHC":
+ logger.info("Tuning for threshold-value for AHC")
+ best_threshold = dev_ahc_threshold_tuner(full_meta, args.data_dir,
+ config)
+ best_pval = best_threshold
+ else:
+ # NN for unknown num of speakers (can be used in future)
+ if config.oracle_n_spkrs is False:
+ # nn: Tune num of number of components (to be updated later)
+ logger.info(
+ "Tuning for number of eigen components for NN (Multiple iterations over AMI Dev set)"
+ )
+ # dev_tuner used for tuning num of components in NN. Can be used in future.
+ n_lambdas = dev_tuner(full_meta, args.data_dir, config)
+
+ # load 'dev' and 'eval' metadata files.
+ full_meta_dev = full_meta # current full_meta is for 'dev'
+ eval_meta_file = os.path.join(
+ args.data_dir,
+ config.meta_data_dir,
+ "ami_eval." + config.mic_type + ".subsegs.json", )
+ with open(eval_meta_file, "r") as f:
+ full_meta_eval = json.load(f)
+
+ # tag to be appended to final output DER files. Writing DER for individual files.
+ type_of_num_spkr = "oracle" if config.oracle_n_spkrs else "est"
+ tag = (
+ type_of_num_spkr + "_" + str(config.affinity) + "." + config.mic_type)
+
+ # perform final diarization on 'dev' and 'eval' with best hyperparams.
+ final_DERs = {}
+ out_der_dir = os.path.join(args.data_dir, config.der_dir)
+ if not os.path.exists(out_der_dir):
+ os.makedirs(out_der_dir)
+
+ for split_type in ["dev", "eval"]:
+ if split_type == "dev":
+ full_meta = full_meta_dev
+ else:
+ full_meta = full_meta_eval
+
+ # performing diarization.
+ msg = "Diarizing using best hyperparams: " + split_type + " set"
+ logger.info(msg)
+ out_boundaries = diarize_dataset(
+ full_meta,
+ split_type,
+ n_lambdas=n_lambdas,
+ pval=best_pval,
+ n_neighbors=best_nn,
+ save_dir=args.data_dir,
+ config=config)
+
+ # computing DER.
+ msg = "Computing DERs for " + split_type + " set"
+ logger.info(msg)
+ ref_rttm = os.path.join(args.data_dir, config.ref_rttm_dir,
+ "fullref_ami_" + split_type + ".rttm")
+ sys_rttm = out_boundaries
+ [MS, FA, SER, DER_vals] = DER(
+ ref_rttm,
+ sys_rttm,
+ config.ignore_overlap,
+ config.forgiveness_collar,
+ individual_file_scores=True, )
+
+ # writing DER values to a file. Append tag.
+ der_file_name = split_type + "_DER_" + tag
+ out_der_file = os.path.join(out_der_dir, der_file_name)
+ msg = "Writing DER file to: " + out_der_file
+ logger.info(msg)
+ diar.write_ders_file(ref_rttm, DER_vals, out_der_file)
+
+ msg = ("AMI " + split_type + " set DER = %s %%\n" %
+ (str(round(DER_vals[-1], 2))))
+ logger.info(msg)
+ final_DERs[split_type] = round(DER_vals[-1], 2)
+
+ # final print DERs
+ msg = (
+ "Final Diarization Error Rate (%%) on AMI corpus: Dev = %s %% | Eval = %s %%\n"
+ % (str(final_DERs["dev"]), str(final_DERs["eval"])))
+ logger.info(msg)
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(__doc__)
+ parser.add_argument(
+ "--config", default=None, type=str, help="configuration file")
+ parser.add_argument(
+ "--data-dir",
+ default="../data/",
+ type=str,
+ help="processsed data directory")
+ args = parser.parse_args()
+ config = CfgNode(new_allowed=True)
+ if args.config:
+ config.merge_from_file(args.config)
+
+ config.freeze()
+
+ main(args, config)
diff --git a/examples/ami/sd0/local/process.sh b/examples/ami/sd0/local/process.sh
new file mode 100755
index 000000000..1dfd11b86
--- /dev/null
+++ b/examples/ami/sd0/local/process.sh
@@ -0,0 +1,49 @@
+#!/bin/bash
+
+stage=0
+set=L
+
+. ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
+set -o pipefail
+
+data_folder=$1
+manual_annot_folder=$2
+save_folder=$3
+pretrained_model_dir=$4
+conf_path=$5
+device=$6
+
+ref_rttm_dir=${save_folder}/ref_rttms
+meta_data_dir=${save_folder}/metadata
+
+if [ ${stage} -le 0 ]; then
+ echo "AMI Data preparation"
+ python local/ami_prepare.py --data_folder ${data_folder} \
+ --manual_annot_folder ${manual_annot_folder} \
+ --save_folder ${save_folder} --ref_rttm_dir ${ref_rttm_dir} \
+ --meta_data_dir ${meta_data_dir}
+
+ if [ $? -ne 0 ]; then
+ echo "Prepare AMI failed. Please check log message."
+ exit 1
+ fi
+ echo "AMI data preparation done."
+fi
+
+if [ ${stage} -le 1 ]; then
+ # extra embddings for dev and eval dataset
+ for name in dev eval; do
+ python local/compute_embdding.py --config ${conf_path} \
+ --data-dir ${save_folder} \
+ --device ${device} \
+ --dataset ${name} \
+ --load-checkpoint ${pretrained_model_dir}
+ done
+fi
+
+if [ ${stage} -le 2 ]; then
+ # tune hyperparams on dev set
+ # perform final diarization on 'dev' and 'eval' with best hyperparams
+ python local/experiment.py --config ${conf_path} \
+ --data-dir ${save_folder}
+fi
diff --git a/examples/ami/sd0/run.sh b/examples/ami/sd0/run.sh
index 91d4b706a..9035f5955 100644
--- a/examples/ami/sd0/run.sh
+++ b/examples/ami/sd0/run.sh
@@ -1,14 +1,46 @@
#!/bin/bash
-. path.sh || exit 1;
+. ./path.sh || exit 1;
set -e
-stage=1
+stage=0
+#TARGET_DIR=${MAIN_ROOT}/dataset/ami
+TARGET_DIR=/home/dataset/AMI
+data_folder=${TARGET_DIR}/amicorpus #e.g., /path/to/amicorpus/
+manual_annot_folder=${TARGET_DIR}/ami_public_manual_1.6.2 #e.g., /path/to/ami_public_manual_1.6.2/
+
+save_folder=./save
+pretraind_model_dir=${save_folder}/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1/model
+conf_path=conf/ecapa_tdnn.yaml
+device=gpu
. ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
-if [ ${stage} -le 1 ]; then
- # prepare data
- bash ./local/data.sh || exit -1
-fi
\ No newline at end of file
+if [ $stage -le 0 ]; then
+ # Prepare data
+ # Download AMI corpus, You need around 10GB of free space to get whole data
+ # The signals are too large to package in this way,
+ # so you need to use the chooser to indicate which ones you wish to download
+ echo "Please follow https://groups.inf.ed.ac.uk/ami/download/ to download the data."
+ echo "Annotations: AMI manual annotations v1.6.2 "
+ echo "Signals: "
+ echo "1) Select one or more AMI meetings: the IDs please follow ./ami_split.py"
+ echo "2) Select media streams: Just select Headset mix"
+fi
+
+if [ $stage -le 1 ]; then
+ # Download the pretrained model
+ wget https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz
+ mkdir -p ${save_folder} && tar -xvf sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz -C ${save_folder}
+ rm -rf sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz
+ echo "download the pretrained ECAPA-TDNN Model to path: "${pretraind_model_dir}
+fi
+
+if [ $stage -le 2 ]; then
+ # Tune hyperparams on dev set and perform final diarization on dev and eval with best hyperparams.
+ echo ${data_folder} ${manual_annot_folder} ${save_folder} ${pretraind_model_dir} ${conf_path}
+ bash ./local/process.sh ${data_folder} ${manual_annot_folder} \
+ ${save_folder} ${pretraind_model_dir} ${conf_path} ${device} || exit 1
+fi
+
diff --git a/examples/csmsc/tts0/README.md b/examples/csmsc/tts0/README.md
index 0129329ae..01376bd61 100644
--- a/examples/csmsc/tts0/README.md
+++ b/examples/csmsc/tts0/README.md
@@ -212,7 +212,8 @@ optional arguments:
Pretrained Tacotron2 model with no silence in the edge of audios:
- [tacotron2_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_ckpt_0.2.0.zip)
-The static model can be downloaded here [tacotron2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_static_0.2.0.zip).
+The static model can be downloaded here:
+- [tacotron2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_static_0.2.0.zip)
Model | Step | eval/loss | eval/l1_loss | eval/mse_loss | eval/bce_loss| eval/attn_loss
diff --git a/examples/csmsc/tts2/README.md b/examples/csmsc/tts2/README.md
index 5f31f7b36..4fbe34cbf 100644
--- a/examples/csmsc/tts2/README.md
+++ b/examples/csmsc/tts2/README.md
@@ -221,9 +221,12 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path}
```
## Pretrained Model
-Pretrained SpeedySpeech model with no silence in the edge of audios[speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_ckpt_0.5.zip).
+Pretrained SpeedySpeech model with no silence in the edge of audios:
+- [speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_ckpt_0.5.zip)
-The static model can be downloaded here [speedyspeech_nosil_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_static_0.5.zip).
+The static model can be downloaded here:
+- [speedyspeech_nosil_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_static_0.5.zip)
+- [speedyspeech_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_csmsc_static_0.2.0.zip)
Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/ssim_loss
:-------------:| :------------:| :-----: | :-----: | :--------:|:--------:
diff --git a/examples/csmsc/tts3/README.md b/examples/csmsc/tts3/README.md
index ae8f7af60..bc672f66f 100644
--- a/examples/csmsc/tts3/README.md
+++ b/examples/csmsc/tts3/README.md
@@ -232,6 +232,9 @@ The static model can be downloaded here:
- [fastspeech2_nosil_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_static_0.4.zip)
- [fastspeech2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_static_0.2.0.zip)
+The ONNX model can be downloaded here:
+- [fastspeech2_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_onnx_0.2.0.zip)
+
Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/pitch_loss| eval/energy_loss
:-------------:| :------------:| :-----: | :-----: | :--------: |:--------:|:---------:
default| 2(gpu) x 76000|1.0991|0.59132|0.035815|0.31915|0.15287|
diff --git a/examples/csmsc/tts3/local/ort_predict.sh b/examples/csmsc/tts3/local/ort_predict.sh
new file mode 100755
index 000000000..3154f6e5a
--- /dev/null
+++ b/examples/csmsc/tts3/local/ort_predict.sh
@@ -0,0 +1,31 @@
+train_output_path=$1
+
+stage=0
+stop_stage=0
+
+# only support default_fastspeech2 + hifigan/mb_melgan now!
+
+# synthesize from metadata
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+ python3 ${BIN_DIR}/../ort_predict.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=fastspeech2_csmsc \
+ --voc=hifigan_csmsc \
+ --test_metadata=dump/test/norm/metadata.jsonl \
+ --output_dir=${train_output_path}/onnx_infer_out \
+ --device=cpu \
+ --cpu_threads=2
+fi
+
+# e2e, synthesize from text
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ python3 ${BIN_DIR}/../ort_predict_e2e.py \
+ --inference_dir=${train_output_path}/inference_onnx \
+ --am=fastspeech2_csmsc \
+ --voc=hifigan_csmsc \
+ --output_dir=${train_output_path}/onnx_infer_out_e2e \
+ --text=${BIN_DIR}/../csmsc_test.txt \
+ --phones_dict=dump/phone_id_map.txt \
+ --device=cpu \
+ --cpu_threads=2
+fi
diff --git a/examples/csmsc/tts3/local/paddle2onnx.sh b/examples/csmsc/tts3/local/paddle2onnx.sh
new file mode 100755
index 000000000..505f3b663
--- /dev/null
+++ b/examples/csmsc/tts3/local/paddle2onnx.sh
@@ -0,0 +1,22 @@
+train_output_path=$1
+model_dir=$2
+output_dir=$3
+model=$4
+
+enable_dev_version=True
+
+model_name=${model%_*}
+echo model_name: ${model_name}
+
+if [ ${model_name} = 'mb_melgan' ] ;then
+ enable_dev_version=False
+fi
+
+mkdir -p ${train_output_path}/${output_dir}
+
+paddle2onnx \
+ --model_dir ${train_output_path}/${model_dir} \
+ --model_filename ${model}.pdmodel \
+ --params_filename ${model}.pdiparams \
+ --save_file ${train_output_path}/${output_dir}/${model}.onnx \
+ --enable_dev_version ${enable_dev_version}
\ No newline at end of file
diff --git a/examples/csmsc/tts3/run.sh b/examples/csmsc/tts3/run.sh
index e1a149b65..b617d5352 100755
--- a/examples/csmsc/tts3/run.sh
+++ b/examples/csmsc/tts3/run.sh
@@ -41,3 +41,25 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1
fi
+# paddle2onnx, please make sure the static models are in ${train_output_path}/inference first
+# we have only tested the following models so far
+if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+ # install paddle2onnx
+ version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}')
+ if [[ -z "$version" || ${version} != '0.9.4' ]]; then
+ pip install paddle2onnx==0.9.4
+ fi
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_csmsc
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_csmsc
+ ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx mb_melgan_csmsc
+fi
+
+# inference with onnxruntime, use fastspeech2 + hifigan by default
+if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
+ # install onnxruntime
+ version=$(echo `pip list |grep "onnxruntime"` |awk -F" " '{print $2}')
+ if [[ -z "$version" || ${version} != '1.10.0' ]]; then
+ pip install onnxruntime==1.10.0
+ fi
+ ./local/ort_predict.sh ${train_output_path}
+fi
diff --git a/examples/csmsc/voc1/README.md b/examples/csmsc/voc1/README.md
index 5527e8088..2d6de168a 100644
--- a/examples/csmsc/voc1/README.md
+++ b/examples/csmsc/voc1/README.md
@@ -127,9 +127,11 @@ optional arguments:
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Models
-The pretrained model can be downloaded here [pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip).
+The pretrained model can be downloaded here:
+- [pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip)
-The static model can be downloaded here [pwg_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip).
+The static model can be downloaded here:
+- [pwg_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip)
Model | Step | eval/generator_loss | eval/log_stft_magnitude_loss| eval/spectral_convergence_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
diff --git a/examples/csmsc/voc3/README.md b/examples/csmsc/voc3/README.md
index 22104a8f2..12adaf7f4 100644
--- a/examples/csmsc/voc3/README.md
+++ b/examples/csmsc/voc3/README.md
@@ -152,11 +152,17 @@ TODO:
The hyperparameter of `finetune.yaml` is not good enough, a smaller `learning_rate` should be used (more `milestones` should be set).
## Pretrained Models
-The pretrained model can be downloaded here [mb_melgan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_ckpt_0.1.1.zip).
+The pretrained model can be downloaded here:
+- [mb_melgan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_ckpt_0.1.1.zip)
-The finetuned model can be downloaded here [mb_melgan_baker_finetune_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_baker_finetune_ckpt_0.5.zip).
+The finetuned model can be downloaded here:
+- [mb_melgan_baker_finetune_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_baker_finetune_ckpt_0.5.zip)
-The static model can be downloaded here [mb_melgan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_static_0.1.1.zip)
+The static model can be downloaded here:
+- [mb_melgan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_static_0.1.1.zip)
+
+The ONNX model can be downloaded here:
+- [mb_melgan_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_onnx_0.2.0.zip)
Model | Step | eval/generator_loss | eval/log_stft_magnitude_loss|eval/spectral_convergence_loss |eval/sub_log_stft_magnitude_loss|eval/sub_spectral_convergence_loss
:-------------:| :------------:| :-----: | :-----: | :--------:| :--------:| :--------:
diff --git a/examples/csmsc/voc4/README.md b/examples/csmsc/voc4/README.md
index b5c687391..b7add3e57 100644
--- a/examples/csmsc/voc4/README.md
+++ b/examples/csmsc/voc4/README.md
@@ -112,7 +112,8 @@ optional arguments:
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Models
-The pretrained model can be downloaded here [style_melgan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/style_melgan/style_melgan_csmsc_ckpt_0.1.1.zip).
+The pretrained model can be downloaded here:
+- [style_melgan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/style_melgan/style_melgan_csmsc_ckpt_0.1.1.zip)
The static model of Style MelGAN is not available now.
diff --git a/examples/csmsc/voc5/README.md b/examples/csmsc/voc5/README.md
index 21afe6eef..33e676165 100644
--- a/examples/csmsc/voc5/README.md
+++ b/examples/csmsc/voc5/README.md
@@ -112,9 +112,14 @@ optional arguments:
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Models
-The pretrained model can be downloaded here [hifigan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_ckpt_0.1.1.zip).
+The pretrained model can be downloaded here:
+- [hifigan_csmsc_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_ckpt_0.1.1.zip)
-The static model can be downloaded here [hifigan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_static_0.1.1.zip).
+The static model can be downloaded here:
+- [hifigan_csmsc_static_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_static_0.1.1.zip)
+
+The ONNX model can be downloaded here:
+- [hifigan_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_csmsc_onnx_0.2.0.zip)
Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss
:-------------:| :------------:| :-----: | :-----: | :--------:
diff --git a/examples/csmsc/voc6/README.md b/examples/csmsc/voc6/README.md
index 7763b3551..26d4523d9 100644
--- a/examples/csmsc/voc6/README.md
+++ b/examples/csmsc/voc6/README.md
@@ -109,9 +109,11 @@ optional arguments:
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Models
-The pretrained model can be downloaded here [wavernn_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_ckpt_0.2.0.zip).
+The pretrained model can be downloaded here:
+- [wavernn_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_ckpt_0.2.0.zip)
-The static model can be downloaded here [wavernn_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_static_0.2.0.zip).
+The static model can be downloaded here:
+- [wavernn_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_static_0.2.0.zip)
Model | Step | eval/loss
:-------------:|:------------:| :------------:
diff --git a/examples/esc50/README.md b/examples/esc50/README.md
index 911a72ad7..9eab95d26 100644
--- a/examples/esc50/README.md
+++ b/examples/esc50/README.md
@@ -4,7 +4,7 @@
对于声音分类任务,传统机器学习的一个常用做法是首先人工提取音频的时域和频域的多种特征并做特征选择、组合、变换等,然后基于SVM或决策树进行分类。而端到端的深度学习则通常利用深度网络如RNN,CNN等直接对声间波形(waveform)或时频特征(time-frequency)进行特征学习(representation learning)和分类预测。
-在IEEE ICASSP 2017 大会上,谷歌开放了一个大规模的音频数据集[Audioset](https://research.google.com/audioset/)。该数据集包含了 632 类的音频类别以及 2,084,320 条人工标记的每段 10 秒长度的声音剪辑片段(来源于YouTube视频)。目前该数据集已经有210万个已标注的视频数据,5800小时的音频数据,经过标记的声音样本的标签类别为527。
+在IEEE ICASSP 2017 大会上,谷歌开放了一个大规模的音频数据集[Audioset](https://research.google.com/audioset/)。该数据集包含了 632 类的音频类别以及 2,084,320 条人工标记的每段 **10 秒**长度的声音剪辑片段(来源于YouTube视频)。目前该数据集已经有 210万 个已标注的视频数据,5800 小时的音频数据,经过标记的声音样本的标签类别为 527。
`PANNs`([PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf))是基于Audioset数据集训练的声音分类/识别的模型。经过预训练后,模型可以用于提取音频的embbedding。本示例将使用`PANNs`的预训练模型Finetune完成声音分类的任务。
@@ -12,14 +12,14 @@
## 模型简介
PaddleAudio提供了PANNs的CNN14、CNN10和CNN6的预训练模型,可供用户选择使用:
-- CNN14: 该模型主要包含12个卷积层和2个全连接层,模型参数的数量为79.6M,embbedding维度是2048。
-- CNN10: 该模型主要包含8个卷积层和2个全连接层,模型参数的数量为4.9M,embbedding维度是512。
-- CNN6: 该模型主要包含4个卷积层和2个全连接层,模型参数的数量为4.5M,embbedding维度是512。
+- CNN14: 该模型主要包含12个卷积层和2个全连接层,模型参数的数量为 79.6M,embbedding维度是 2048。
+- CNN10: 该模型主要包含8个卷积层和2个全连接层,模型参数的数量为 4.9M,embbedding维度是 512。
+- CNN6: 该模型主要包含4个卷积层和2个全连接层,模型参数的数量为 4.5M,embbedding维度是 512。
## 数据集
-[ESC-50: Dataset for Environmental Sound Classification](https://github.com/karolpiczak/ESC-50) 是一个包含有 2000 个带标签的环境声音样本,音频样本采样率为 44,100Hz 的单通道音频文件,所有样本根据标签被划分为 50 个类别,每个类别有 40 个样本。
+[ESC-50: Dataset for Environmental Sound Classification](https://github.com/karolpiczak/ESC-50) 是一个包含有 2000 个带标签的时长为 **5 秒**的环境声音样本,音频样本采样率为 44,100Hz 的单通道音频文件,所有样本根据标签被划分为 50 个类别,每个类别有 40 个样本。
## 模型指标
@@ -43,13 +43,13 @@ $ CUDA_VISIBLE_DEVICES=0 ./run.sh 1 conf/panns.yaml
```
训练的参数可在 `conf/panns.yaml` 的 `training` 中配置,其中:
-- `epochs`: 训练轮次,默认为50。
+- `epochs`: 训练轮次,默认为 50。
- `learning_rate`: Fine-tune的学习率;默认为5e-5。
-- `batch_size`: 批处理大小,请结合显存情况进行调整,若出现显存不足,请适当调低这一参数;默认为16。
+- `batch_size`: 批处理大小,请结合显存情况进行调整,若出现显存不足,请适当调低这一参数;默认为 16。
- `num_workers`: Dataloader获取数据的子进程数。默认为0,加载数据的流程在主进程执行。
- `checkpoint_dir`: 模型参数文件和optimizer参数文件的保存目录,默认为`./checkpoint`。
-- `save_freq`: 训练过程中的模型保存频率,默认为10。
-- `log_freq`: 训练过程中的信息打印频率,默认为10。
+- `save_freq`: 训练过程中的模型保存频率,默认为 10。
+- `log_freq`: 训练过程中的信息打印频率,默认为 10。
示例代码中使用的预训练模型为`CNN14`,如果想更换为其他预训练模型,可通过修改 `conf/panns.yaml` 的 `model` 中配置:
```yaml
@@ -76,7 +76,7 @@ $ CUDA_VISIBLE_DEVICES=0 ./run.sh 2 conf/panns.yaml
训练的参数可在 `conf/panns.yaml` 的 `predicting` 中配置,其中:
- `audio_file`: 指定预测的音频文件。
-- `top_k`: 预测显示的top k标签的得分,默认为1。
+- `top_k`: 预测显示的top k标签的得分,默认为 1。
- `checkpoint`: 模型参数checkpoint文件。
输出的预测结果如下:
diff --git a/examples/iwslt2012/punc0/README.md b/examples/iwslt2012/punc0/README.md
index 74d599a21..6caa9710b 100644
--- a/examples/iwslt2012/punc0/README.md
+++ b/examples/iwslt2012/punc0/README.md
@@ -21,7 +21,7 @@
The pretrained model can be downloaded here [ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/text/ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip).
### Test Result
-- Ernie Linear
+- Ernie
| |COMMA | PERIOD | QUESTION | OVERALL|
|:-----:|:-----:|:-----:|:-----:|:-----:|
|Precision |0.510955 |0.526462 |0.820755 |0.619391|
diff --git a/examples/iwslt2012/punc0/RESULTS.md b/examples/iwslt2012/punc0/RESULTS.md
new file mode 100644
index 000000000..2e22713d8
--- /dev/null
+++ b/examples/iwslt2012/punc0/RESULTS.md
@@ -0,0 +1,9 @@
+# iwslt2012
+
+## Ernie
+
+| |COMMA | PERIOD | QUESTION | OVERALL|
+|:-----:|:-----:|:-----:|:-----:|:-----:|
+|Precision |0.510955 |0.526462 |0.820755 |0.619391|
+|Recall |0.517433 |0.564179 |0.861386 |0.647666|
+|F1 |0.514173 |0.544669 |0.840580 |0.633141|
diff --git a/examples/librispeech/asr1/README.md b/examples/librispeech/asr1/README.md
index eb1a44001..ae252a58b 100644
--- a/examples/librispeech/asr1/README.md
+++ b/examples/librispeech/asr1/README.md
@@ -151,44 +151,22 @@ avg.sh best exp/conformer/checkpoints 20
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
```
## Pretrained Model
-You can get the pretrained transformer or conformer using the scripts below:
-```bash
-# Conformer:
-wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/conformer.model.tar.gz
-# Transformer:
-wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/transformer.model.tar.gz
-```
+You can get the pretrained transformer or conformer from [this](../../../docs/source/released_model.md).
+
using the `tar` scripts to unpack the model and then you can use the script to test the model.
For example:
```bash
-wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/conformer.model.tar.gz
-tar xzvf transformer.model.tar.gz
+wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
+tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
source path.sh
# If you have process the data and get the manifest file, you can skip the following 2 steps
bash local/data.sh --stage -1 --stop_stage -1
bash local/data.sh --stage 2 --stop_stage 2
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
```
-The performance of the released models are shown below:
-## Conformer
-train: Epoch 70, 4 V100-32G, best avg: 20
-
-| Model | Params | Config | Augmentation | Test set | Decode method | Loss | WER |
-| --------- | ------- | ------------------- | ------------ | ---------- | ---------------------- | ----------------- | -------- |
-| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | attention | 6.433612394332886 | 0.039771 |
-| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | ctc_greedy_search | 6.433612394332886 | 0.040342 |
-| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | ctc_prefix_beam_search | 6.433612394332886 | 0.040342 |
-| conformer | 47.63 M | conf/conformer.yaml | spec_aug | test-clean | attention_rescoring | 6.433612394332886 | 0.033761 |
-## Transformer
-train: Epoch 120, 4 V100-32G, 27 Day, best avg: 10
+The performance of the released models are shown in [here](./RESULTS.md).
-| Model | Params | Config | Augmentation | Test set | Decode method | Loss | WER |
-| ----------- | ------- | --------------------- | ------------ | ---------- | ---------------------- | ----------------- | -------- |
-| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | attention | 6.382194232940674 | 0.049661 |
-| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | ctc_greedy_search | 6.382194232940674 | 0.049566 |
-| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | ctc_prefix_beam_search | 6.382194232940674 | 0.049585 |
-| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | attention_rescoring | 6.382194232940674 | 0.038135 |
## Stage 4: CTC Alignment
If you want to get the alignment between the audio and the text, you can use the ctc alignment. The code of this stage is shown below:
```bash
@@ -227,8 +205,8 @@ In some situations, you want to use the trained model to do the inference for th
```
you can train the model by yourself using ```bash run.sh --stage 0 --stop_stage 3```, or you can download the pretrained model through the script below:
```bash
-wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/conformer.model.tar.gz
-tar xzvf conformer.model.tar.gz
+wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
+tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
```
You can download the audio demo:
```bash
diff --git a/examples/librispeech/asr2/README.md b/examples/librispeech/asr2/README.md
index 7d6fe11df..5bc7185a9 100644
--- a/examples/librispeech/asr2/README.md
+++ b/examples/librispeech/asr2/README.md
@@ -1,4 +1,4 @@
-# Transformer/Conformer ASR with Librispeech Asr2
+# Transformer/Conformer ASR with Librispeech ASR2
This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Librispeech dataset](http://www.openslr.org/resources/12) and use some functions in kaldi.
@@ -213,17 +213,14 @@ avg.sh latest exp/transformer/checkpoints 10
./local/recog.sh --ckpt_prefix exp/transformer/checkpoints/avg_10
```
## Pretrained Model
-You can get the pretrained transformer using the scripts below:
-```bash
-# Transformer:
-wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/transformer.model.tar.gz
-```
+You can get the pretrained models from [this](../../../docs/source/released_model.md).
+
using the `tar` scripts to unpack the model and then you can use the script to test the model.
For example:
```bash
-wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/transformer.model.tar.gz
-tar xzvf transformer.model.tar.gz
+wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr2/asr2_transformer_librispeech_ckpt_0.1.1.model.tar.gz
+tar xzvf asr2_transformer_librispeech_ckpt_0.1.1.model.tar.gz
source path.sh
# If you have process the data and get the manifest file, you can skip the following 2 steps
bash local/data.sh --stage -1 --stop_stage -1
@@ -231,26 +228,7 @@ bash local/data.sh --stage 2 --stop_stage 2
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/transformer.yaml exp/ctc/checkpoints/avg_10
```
-The performance of the released models are shown below:
-### Transformer
-| Model | Params | GPUS | Averaged Model | Config | Augmentation | Loss |
-| :---------: | :----: | :--------------------: | :--------------: | :-------------------: | :----------: | :-------------: |
-| transformer | 32.52M | 8 Tesla V100-SXM2-32GB | 10-best val_loss | conf/transformer.yaml | spec_aug | 6.3197922706604 |
-
-#### Attention Rescore
-| Test Set | Decode Method | #Snt | #Wrd | Corr | Sub | Del | Ins | Err | S.Err |
-| ---------- | --------------------- | ---- | ----- | ---- | ---- | ---- | ---- | ---- | ----- |
-| test-clean | attention | 2620 | 52576 | 96.4 | 2.5 | 1.1 | 0.4 | 4.0 | 34.7 |
-| test-clean | ctc_greedy_search | 2620 | 52576 | 95.9 | 3.7 | 0.4 | 0.5 | 4.6 | 48.0 |
-| test-clean | ctc_prefix_beamsearch | 2620 | 52576 | 95.9 | 3.7 | 0.4 | 0.5 | 4.6 | 47.6 |
-| test-clean | attention_rescore | 2620 | 52576 | 96.8 | 2.9 | 0.3 | 0.4 | 3.7 | 38.0 |
-
-#### JoinCTC
-| Test Set | Decode Method | #Snt | #Wrd | Corr | Sub | Del | Ins | Err | S.Err |
-| ---------- | ----------------- | ---- | ----- | ---- | ---- | ---- | ---- | ---- | ----- |
-| test-clean | join_ctc_only_att | 2620 | 52576 | 96.1 | 2.5 | 1.4 | 0.4 | 4.4 | 34.7 |
-| test-clean | join_ctc_w/o_lm | 2620 | 52576 | 97.2 | 2.6 | 0.3 | 0.4 | 3.2 | 34.9 |
-| test-clean | join_ctc_w_lm | 2620 | 52576 | 97.9 | 1.8 | 0.2 | 0.3 | 2.4 | 27.8 |
+The performance of the released models are shown [here](./RESULTS.md).
Compare with [ESPNET](https://github.com/espnet/espnet/blob/master/egs/librispeech/asr1/RESULTS.md#pytorch-large-transformer-with-specaug-4-gpus--transformer-lm-4-gpus) we using 8gpu, but the model size (aheads4-adim256) small than it.
## Stage 5: CTC Alignment
diff --git a/examples/ljspeech/tts1/README.md b/examples/ljspeech/tts1/README.md
index 4f7680e84..7f32522ac 100644
--- a/examples/ljspeech/tts1/README.md
+++ b/examples/ljspeech/tts1/README.md
@@ -171,7 +171,8 @@ optional arguments:
6. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model
-Pretrained Model can be downloaded here. [transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/transformer_tts/transformer_tts_ljspeech_ckpt_0.4.zip)
+Pretrained Model can be downloaded here:
+- [transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/transformer_tts/transformer_tts_ljspeech_ckpt_0.4.zip)
TransformerTTS checkpoint contains files listed below.
```text
diff --git a/examples/ljspeech/tts3/README.md b/examples/ljspeech/tts3/README.md
index f5e919c0f..e028fa05d 100644
--- a/examples/ljspeech/tts3/README.md
+++ b/examples/ljspeech/tts3/README.md
@@ -214,7 +214,8 @@ optional arguments:
9. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model
-Pretrained FastSpeech2 model with no silence in the edge of audios. [fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)
+Pretrained FastSpeech2 model with no silence in the edge of audios:
+- [fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)
Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/pitch_loss| eval/energy_loss
:-------------:| :------------:| :-----: | :-----: | :--------: |:--------:|:---------:
diff --git a/examples/ljspeech/voc0/README.md b/examples/ljspeech/voc0/README.md
index 13a50efb5..41b08d57f 100644
--- a/examples/ljspeech/voc0/README.md
+++ b/examples/ljspeech/voc0/README.md
@@ -50,4 +50,5 @@ Synthesize waveform.
6. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model
-Pretrained Model with residual channel equals 128 can be downloaded here. [waveflow_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/waveflow/waveflow_ljspeech_ckpt_0.3.zip).
+Pretrained Model with residual channel equals 128 can be downloaded here:
+- [waveflow_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/waveflow/waveflow_ljspeech_ckpt_0.3.zip)
diff --git a/examples/ljspeech/voc1/README.md b/examples/ljspeech/voc1/README.md
index 6fcb2a520..4513b2a05 100644
--- a/examples/ljspeech/voc1/README.md
+++ b/examples/ljspeech/voc1/README.md
@@ -127,7 +127,8 @@ optional arguments:
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model
-Pretrained models can be downloaded here. [pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip)
+Pretrained models can be downloaded here:
+- [pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip)
Parallel WaveGAN checkpoint contains files listed below.
diff --git a/examples/ljspeech/voc5/README.md b/examples/ljspeech/voc5/README.md
index 9fbb9f746..9b31e2650 100644
--- a/examples/ljspeech/voc5/README.md
+++ b/examples/ljspeech/voc5/README.md
@@ -127,7 +127,8 @@ optional arguments:
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model
-The pretrained model can be downloaded here [hifigan_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip).
+The pretrained model can be downloaded here:
+- [hifigan_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_ljspeech_ckpt_0.2.0.zip)
Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss
@@ -143,6 +144,5 @@ hifigan_ljspeech_ckpt_0.2.0
└── snapshot_iter_2500000.pdz # generator parameters of hifigan
```
-
## Acknowledgement
We adapted some code from https://github.com/kan-bayashi/ParallelWaveGAN.
diff --git a/examples/vctk/tts3/README.md b/examples/vctk/tts3/README.md
index 157949d1f..f373ca6a3 100644
--- a/examples/vctk/tts3/README.md
+++ b/examples/vctk/tts3/README.md
@@ -217,7 +217,8 @@ optional arguments:
9. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model
-Pretrained FastSpeech2 model with no silence in the edge of audios. [fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip)
+Pretrained FastSpeech2 model with no silence in the edge of audios:
+- [fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip)
FastSpeech2 checkpoint contains files listed below.
```text
diff --git a/examples/vctk/voc1/README.md b/examples/vctk/voc1/README.md
index 4714f28dc..1c3016f88 100644
--- a/examples/vctk/voc1/README.md
+++ b/examples/vctk/voc1/README.md
@@ -132,7 +132,8 @@ optional arguments:
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model
-Pretrained models can be downloaded here [pwg_vctk_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.1.1.zip).
+Pretrained models can be downloaded here:
+- [pwg_vctk_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_vctk_ckpt_0.1.1.zip)
Parallel WaveGAN checkpoint contains files listed below.
diff --git a/examples/vctk/voc5/README.md b/examples/vctk/voc5/README.md
index b4be341c0..4eb25c02d 100644
--- a/examples/vctk/voc5/README.md
+++ b/examples/vctk/voc5/README.md
@@ -133,7 +133,8 @@ optional arguments:
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model
-The pretrained model can be downloaded here [hifigan_vctk_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip).
+The pretrained model can be downloaded here:
+- [hifigan_vctk_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip)
Model | Step | eval/generator_loss | eval/mel_loss| eval/feature_matching_loss
diff --git a/examples/voxceleb/sv0/RESULT.md b/examples/voxceleb/sv0/RESULT.md
index c37bcecef..3a3f67d09 100644
--- a/examples/voxceleb/sv0/RESULT.md
+++ b/examples/voxceleb/sv0/RESULT.md
@@ -4,4 +4,4 @@
| Model | Number of Params | Release | Config | dim | Test set | Cosine | Cosine + S-Norm |
| --- | --- | --- | --- | --- | --- | --- | ---- |
-| ECAPA-TDNN | 85M | 0.1.1 | conf/ecapa_tdnn.yaml |192 | test | 1.15 | 1.06 |
+| ECAPA-TDNN | 85M | 0.2.0 | conf/ecapa_tdnn.yaml |192 | test | 1.02 | 0.95 |
diff --git a/examples/voxceleb/sv0/conf/ecapa_tdnn.yaml b/examples/voxceleb/sv0/conf/ecapa_tdnn.yaml
index e58dca82d..4715c5a3c 100644
--- a/examples/voxceleb/sv0/conf/ecapa_tdnn.yaml
+++ b/examples/voxceleb/sv0/conf/ecapa_tdnn.yaml
@@ -1,14 +1,16 @@
###########################################
# Data #
###########################################
-# we should explicitly specify the wav path of vox2 audio data converted from m4a
-vox2_base_path:
augment: True
-batch_size: 16
+batch_size: 32
num_workers: 2
-num_speakers: 7205 # 1211 vox1, 5994 vox2, 7205 vox1+2, test speakers: 41
+num_speakers: 1211 # 1211 vox1, 5994 vox2, 7205 vox1+2, test speakers: 41
shuffle: True
+skip_prep: False
+split_ratio: 0.9
+chunk_duration: 3.0 # seconds
random_chunk: True
+verification_file: data/vox1/veri_test2.txt
###########################################################
# FEATURE EXTRACTION SETTING #
@@ -26,7 +28,6 @@ hop_size: 160 #10ms, sample rate 16000, 10 * 16000 / 1000 = 160
# if we want use another model, please choose another configuration yaml file
model:
input_size: 80
- # "channels": [512, 512, 512, 512, 1536],
channels: [1024, 1024, 1024, 1024, 3072]
kernel_sizes: [5, 3, 3, 3, 1]
dilations: [1, 2, 3, 4, 1]
@@ -38,8 +39,8 @@ model:
###########################################
seed: 1986 # according from speechbrain configuration
epochs: 10
-save_interval: 1
-log_interval: 1
+save_interval: 10
+log_interval: 10
learning_rate: 1e-8
diff --git a/examples/voxceleb/sv0/conf/ecapa_tdnn_small.yaml b/examples/voxceleb/sv0/conf/ecapa_tdnn_small.yaml
new file mode 100644
index 000000000..5ad5ea285
--- /dev/null
+++ b/examples/voxceleb/sv0/conf/ecapa_tdnn_small.yaml
@@ -0,0 +1,53 @@
+###########################################
+# Data #
+###########################################
+augment: True
+batch_size: 16
+num_workers: 2
+num_speakers: 1211 # 1211 vox1, 5994 vox2, 7205 vox1+2, test speakers: 41
+shuffle: True
+skip_prep: False
+split_ratio: 0.9
+chunk_duration: 3.0 # seconds
+random_chunk: True
+verification_file: data/vox1/veri_test2.txt
+
+###########################################################
+# FEATURE EXTRACTION SETTING #
+###########################################################
+# currently, we only support fbank
+sr: 16000 # sample rate
+n_mels: 80
+window_size: 400 #25ms, sample rate 16000, 25 * 16000 / 1000 = 400
+hop_size: 160 #10ms, sample rate 16000, 10 * 16000 / 1000 = 160
+
+###########################################################
+# MODEL SETTING #
+###########################################################
+# currently, we only support ecapa-tdnn in the ecapa_tdnn.yaml
+# if we want use another model, please choose another configuration yaml file
+model:
+ input_size: 80
+ channels: [512, 512, 512, 512, 1536]
+ kernel_sizes: [5, 3, 3, 3, 1]
+ dilations: [1, 2, 3, 4, 1]
+ attention_channels: 128
+ lin_neurons: 192
+
+###########################################
+# Training #
+###########################################
+seed: 1986 # according from speechbrain configuration
+epochs: 100
+save_interval: 10
+log_interval: 10
+learning_rate: 1e-8
+
+
+###########################################
+# Testing #
+###########################################
+global_embedding_norm: True
+embedding_mean_norm: True
+embedding_std_norm: False
+
diff --git a/examples/voxceleb/sv0/local/data.sh b/examples/voxceleb/sv0/local/data.sh
index a3ff1c486..d6010ec66 100755
--- a/examples/voxceleb/sv0/local/data.sh
+++ b/examples/voxceleb/sv0/local/data.sh
@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-stage=1
+stage=0
stop_stage=100
. ${MAIN_ROOT}/utils/parse_options.sh || exit -1;
@@ -30,29 +30,114 @@ dir=$1
conf_path=$2
mkdir -p ${dir}
-if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
- # data prepare for vox1 and vox2, vox2 must be converted from m4a to wav
- # we should use the local/convert.sh convert m4a to wav
- python3 local/data_prepare.py \
- --data-dir ${dir} \
- --config ${conf_path}
-fi
-
+# Generally the `MAIN_ROOT` refers to the root of PaddleSpeech,
+# which is defined in the path.sh
+# And we will download the voxceleb data and rirs noise to ${MAIN_ROOT}/dataset
TARGET_DIR=${MAIN_ROOT}/dataset
mkdir -p ${TARGET_DIR}
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
- # download data, generate manifests
- python3 ${TARGET_DIR}/voxceleb/voxceleb1.py \
- --manifest_prefix="data/vox1/manifest" \
+ # download data, generate manifests
+ # we will generate the manifest.{dev,test} file from ${TARGET_DIR}/voxceleb/vox1/{dev,test} directory
+ # and generate the meta info and download the trial file
+ # manifest.dev: 148642
+ # manifest.test: 4847
+ echo "Start to download vox1 dataset and generate the manifest files "
+ python3 ${TARGET_DIR}/voxceleb/voxceleb1.py \
+ --manifest_prefix="${dir}/vox1/manifest" \
--target_dir="${TARGET_DIR}/voxceleb/vox1/"
- if [ $? -ne 0 ]; then
- echo "Prepare voxceleb failed. Terminated."
- exit 1
- fi
+ if [ $? -ne 0 ]; then
+ echo "Prepare voxceleb1 failed. Terminated."
+ exit 1
+ fi
+
+fi
+
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+ # download voxceleb2 data
+ # we will download the data and unzip the package
+ # and we will store the m4a file in ${TARGET_DIR}/voxceleb/vox2/{dev,test}
+ echo "start to download vox2 dataset"
+ python3 ${TARGET_DIR}/voxceleb/voxceleb2.py \
+ --download \
+ --target_dir="${TARGET_DIR}/voxceleb/vox2/"
+
+ if [ $? -ne 0 ]; then
+ echo "Download voxceleb2 dataset failed. Terminated."
+ exit 1
+ fi
+
+fi
+
+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
+ # convert the m4a to wav
+ # and we will not delete the original m4a file
+ echo "start to convert the m4a to wav"
+ bash local/convert.sh ${TARGET_DIR}/voxceleb/vox2/test/ || exit 1;
+
+ if [ $? -ne 0 ]; then
+ echo "Convert voxceleb2 dataset from m4a to wav failed. Terminated."
+ exit 1
+ fi
+ echo "m4a convert to wav operation finished"
+fi
+
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+ # generate the vox2 manifest file from wav file
+ # we will generate the ${dir}/vox2/manifest.vox2
+ # because we use all the vox2 dataset to train, so collect all the vox2 data in one file
+ echo "start generate the vox2 manifest files"
+ python3 ${TARGET_DIR}/voxceleb/voxceleb2.py \
+ --generate \
+ --manifest_prefix="${dir}/vox2/manifest" \
+ --target_dir="${TARGET_DIR}/voxceleb/vox2/"
- # for dataset in train dev test; do
- # mv data/manifest.${dataset} data/manifest.${dataset}.raw
- # done
-fi
\ No newline at end of file
+ if [ $? -ne 0 ]; then
+ echo "Prepare voxceleb2 dataset failed. Terminated."
+ exit 1
+ fi
+fi
+
+if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+ # generate the vox csv file
+ # Currently, our training system use csv file for dataset
+ echo "convert the json format to csv format to be compatible with training process"
+ python3 local/make_vox_csv_dataset_from_json.py\
+ --train "${dir}/vox1/manifest.dev" "${dir}/vox2/manifest.vox2"\
+ --test "${dir}/vox1/manifest.test" \
+ --target_dir "${dir}/vox/" \
+ --config ${conf_path}
+
+ if [ $? -ne 0 ]; then
+ echo "Prepare voxceleb failed. Terminated."
+ exit 1
+ fi
+fi
+
+if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
+ # generate the open rir noise manifest file
+ echo "generate the open rir noise manifest file"
+ python3 ${TARGET_DIR}/rir_noise/rir_noise.py\
+ --manifest_prefix="${dir}/rir_noise/manifest" \
+ --target_dir="${TARGET_DIR}/rir_noise/"
+
+ if [ $? -ne 0 ]; then
+ echo "Prepare rir_noise failed. Terminated."
+ exit 1
+ fi
+fi
+
+if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then
+ # generate the open rir noise manifest file
+ echo "generate the open rir noise csv file"
+ python3 local/make_rirs_noise_csv_dataset_from_json.py \
+ --noise_dir="${TARGET_DIR}/rir_noise/" \
+ --data_dir="${dir}/rir_noise/" \
+ --config ${conf_path}
+
+ if [ $? -ne 0 ]; then
+ echo "Prepare rir_noise failed. Terminated."
+ exit 1
+ fi
+fi
diff --git a/examples/voxceleb/sv0/local/make_rirs_noise_csv_dataset_from_json.py b/examples/voxceleb/sv0/local/make_rirs_noise_csv_dataset_from_json.py
new file mode 100644
index 000000000..b25a9d49a
--- /dev/null
+++ b/examples/voxceleb/sv0/local/make_rirs_noise_csv_dataset_from_json.py
@@ -0,0 +1,167 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Convert the PaddleSpeech jsonline format data to csv format data in voxceleb experiment.
+Currently, Speaker Identificaton Training process use csv format.
+"""
+import argparse
+import csv
+import os
+from typing import List
+
+import tqdm
+from yacs.config import CfgNode
+
+from paddleaudio import load as load_audio
+from paddlespeech.s2t.utils.log import Log
+from paddlespeech.vector.utils.vector_utils import get_chunks
+
+logger = Log(__name__).getlog()
+
+
+def get_chunks_list(wav_file: str,
+ split_chunks: bool,
+ base_path: str,
+ chunk_duration: float=3.0) -> List[List[str]]:
+ """Get the single audio file info
+
+ Args:
+ wav_file (list): the wav audio file and get this audio segment info list
+ split_chunks (bool): audio split flag
+ base_path (str): the audio base path
+ chunk_duration (float): the chunk duration.
+ if set the split_chunks, we split the audio into multi-chunks segment.
+ """
+ waveform, sr = load_audio(wav_file)
+ audio_id = wav_file.split("/rir_noise/")[-1].split(".")[0]
+ audio_duration = waveform.shape[0] / sr
+
+ ret = []
+ if split_chunks and audio_duration > chunk_duration: # Split into pieces of self.chunk_duration seconds.
+ uniq_chunks_list = get_chunks(chunk_duration, audio_id, audio_duration)
+
+ for idx, chunk in enumerate(uniq_chunks_list):
+ s, e = chunk.split("_")[-2:] # Timestamps of start and end
+ start_sample = int(float(s) * sr)
+ end_sample = int(float(e) * sr)
+
+ # currently, all vector csv data format use one representation
+ # id, duration, wav, start, stop, label
+ # in rirs noise, all the label name is 'noise'
+ # the label is string type and we will convert it to integer type in training
+ ret.append([
+ chunk, audio_duration, wav_file, start_sample, end_sample,
+ "noise"
+ ])
+ else: # Keep whole audio.
+ ret.append(
+ [audio_id, audio_duration, wav_file, 0, waveform.shape[0], "noise"])
+ return ret
+
+
+def generate_csv(wav_files,
+ output_file: str,
+ base_path: str,
+ split_chunks: bool=True):
+ """Prepare the csv file according the wav files
+
+ Args:
+ wav_files (list): all the audio list to prepare the csv file
+ output_file (str): the output csv file
+ config (CfgNode): yaml configuration content
+ split_chunks (bool): audio split flag
+ """
+ logger.info(f'Generating csv: {output_file}')
+ header = ["utt_id", "duration", "wav", "start", "stop", "label"]
+ csv_lines = []
+ for item in tqdm.tqdm(wav_files):
+ csv_lines.extend(
+ get_chunks_list(
+ item, base_path=base_path, split_chunks=split_chunks))
+
+ if not os.path.exists(os.path.dirname(output_file)):
+ os.makedirs(os.path.dirname(output_file))
+
+ with open(output_file, mode="w") as csv_f:
+ csv_writer = csv.writer(
+ csv_f, delimiter=",", quotechar='"', quoting=csv.QUOTE_MINIMAL)
+ csv_writer.writerow(header)
+ for line in csv_lines:
+ csv_writer.writerow(line)
+
+
+def prepare_data(args, config):
+ """Convert the jsonline format to csv format
+
+ Args:
+ args (argparse.Namespace): scripts args
+ config (CfgNode): yaml configuration content
+ """
+ # if external config set the skip_prep flat, we will do nothing
+ if config.skip_prep:
+ return
+
+ base_path = args.noise_dir
+ wav_path = os.path.join(base_path, "RIRS_NOISES")
+ logger.info(f"base path: {base_path}")
+ logger.info(f"wav path: {wav_path}")
+ rir_list = os.path.join(wav_path, "real_rirs_isotropic_noises", "rir_list")
+ rir_files = []
+ with open(rir_list, 'r') as f:
+ for line in f.readlines():
+ rir_file = line.strip().split(' ')[-1]
+ rir_files.append(os.path.join(base_path, rir_file))
+
+ noise_list = os.path.join(wav_path, "pointsource_noises", "noise_list")
+ noise_files = []
+ with open(noise_list, 'r') as f:
+ for line in f.readlines():
+ noise_file = line.strip().split(' ')[-1]
+ noise_files.append(os.path.join(base_path, noise_file))
+
+ csv_path = os.path.join(args.data_dir, 'csv')
+ logger.info(f"csv path: {csv_path}")
+ generate_csv(
+ rir_files, os.path.join(csv_path, 'rir.csv'), base_path=base_path)
+ generate_csv(
+ noise_files, os.path.join(csv_path, 'noise.csv'), base_path=base_path)
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ "--noise_dir",
+ default=None,
+ required=True,
+ help="The noise dataset dataset directory.")
+ parser.add_argument(
+ "--data_dir",
+ default=None,
+ required=True,
+ help="The target directory stores the csv files")
+ parser.add_argument(
+ "--config",
+ default=None,
+ required=True,
+ type=str,
+ help="configuration file")
+ args = parser.parse_args()
+
+ # parse the yaml config file
+ config = CfgNode(new_allowed=True)
+ if args.config:
+ config.merge_from_file(args.config)
+
+ # prepare the csv file from jsonlines files
+ prepare_data(args, config)
diff --git a/examples/voxceleb/sv0/local/make_vox_csv_dataset_from_json.py b/examples/voxceleb/sv0/local/make_vox_csv_dataset_from_json.py
new file mode 100644
index 000000000..4e64c3067
--- /dev/null
+++ b/examples/voxceleb/sv0/local/make_vox_csv_dataset_from_json.py
@@ -0,0 +1,251 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Convert the PaddleSpeech jsonline format data to csv format data in voxceleb experiment.
+Currently, Speaker Identificaton Training process use csv format.
+"""
+import argparse
+import csv
+import json
+import os
+import random
+
+import tqdm
+from yacs.config import CfgNode
+
+from paddleaudio import load as load_audio
+from paddlespeech.s2t.utils.log import Log
+from paddlespeech.vector.utils.vector_utils import get_chunks
+
+logger = Log(__name__).getlog()
+
+
+def prepare_csv(wav_files, output_file, config, split_chunks=True):
+ """Prepare the csv file according the wav files
+
+ Args:
+ wav_files (list): all the audio list to prepare the csv file
+ output_file (str): the output csv file
+ config (CfgNode): yaml configuration content
+ split_chunks (bool, optional): audio split flag. Defaults to True.
+ """
+ if not os.path.exists(os.path.dirname(output_file)):
+ os.makedirs(os.path.dirname(output_file))
+ csv_lines = []
+ header = ["utt_id", "duration", "wav", "start", "stop", "label"]
+ # voxceleb meta info for each training utterance segment
+ # we extract a segment from a utterance to train
+ # and the segment' period is between start and stop time point in the original wav file
+ # each field in the meta info means as follows:
+ # utt_id: the utterance segment name, which is uniq in training dataset
+ # duration: the total utterance time
+ # wav: utterance file path, which should be absoulute path
+ # start: start point in the original wav file sample point range
+ # stop: stop point in the original wav file sample point range
+ # label: the utterance segment's label name,
+ # which is speaker name in speaker verification domain
+ for item in tqdm.tqdm(wav_files, total=len(wav_files)):
+ item = json.loads(item.strip())
+ audio_id = item['utt'].replace(".wav",
+ "") # we remove the wav suffix name
+ audio_duration = item['feat_shape'][0]
+ wav_file = item['feat']
+ label = audio_id.split('-')[
+ 0] # speaker name in speaker verification domain
+ waveform, sr = load_audio(wav_file)
+ if split_chunks:
+ uniq_chunks_list = get_chunks(config.chunk_duration, audio_id,
+ audio_duration)
+ for chunk in uniq_chunks_list:
+ s, e = chunk.split("_")[-2:] # Timestamps of start and end
+ start_sample = int(float(s) * sr)
+ end_sample = int(float(e) * sr)
+ # id, duration, wav, start, stop, label
+ # in vector, the label in speaker id
+ csv_lines.append([
+ chunk, audio_duration, wav_file, start_sample, end_sample,
+ label
+ ])
+ else:
+ csv_lines.append([
+ audio_id, audio_duration, wav_file, 0, waveform.shape[0], label
+ ])
+
+ with open(output_file, mode="w") as csv_f:
+ csv_writer = csv.writer(
+ csv_f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
+ csv_writer.writerow(header)
+ for line in csv_lines:
+ csv_writer.writerow(line)
+
+
+def get_enroll_test_list(dataset_list, verification_file):
+ """Get the enroll and test utterance list from all the voxceleb1 test utterance dataset.
+ Generally, we get the enroll and test utterances from the verfification file.
+ The verification file format as follows:
+ target/nontarget enroll-utt test-utt,
+ we set 0 as nontarget and 1 as target, eg:
+ 0 a.wav b.wav
+ 1 a.wav a.wav
+
+ Args:
+ dataset_list (list): all the dataset to get the test utterances
+ verification_file (str): voxceleb1 trial file
+ """
+ logger.info(f"verification file: {verification_file}")
+ enroll_audios = set()
+ test_audios = set()
+ with open(verification_file, 'r') as f:
+ for line in f:
+ _, enroll_file, test_file = line.strip().split(' ')
+ enroll_audios.add('-'.join(enroll_file.split('/')))
+ test_audios.add('-'.join(test_file.split('/')))
+
+ enroll_files = []
+ test_files = []
+ for dataset in dataset_list:
+ with open(dataset, 'r') as f:
+ for line in f:
+ # audio_id may be in enroll and test at the same time
+ # eg: 1 a.wav a.wav
+ # the audio a.wav is enroll and test file at the same time
+ audio_id = json.loads(line.strip())['utt']
+ if audio_id in enroll_audios:
+ enroll_files.append(line)
+ if audio_id in test_audios:
+ test_files.append(line)
+
+ enroll_files = sorted(enroll_files)
+ test_files = sorted(test_files)
+
+ return enroll_files, test_files
+
+
+def get_train_dev_list(dataset_list, target_dir, split_ratio):
+ """Get the train and dev utterance list from all the training utterance dataset.
+ Generally, we use the split_ratio as the train dataset ratio,
+ and the remaining utterance (ratio is 1 - split_ratio) is the dev dataset
+
+ Args:
+ dataset_list (list): all the dataset to get the all utterances
+ target_dir (str): the target train and dev directory,
+ we will create the csv directory to store the {train,dev}.csv file
+ split_ratio (float): train dataset ratio in all utterance list
+ """
+ logger.info("start to get train and dev utt list")
+ if not os.path.exists(os.path.join(target_dir, "meta")):
+ os.makedirs(os.path.join(target_dir, "meta"))
+
+ audio_files = []
+ speakers = set()
+ for dataset in dataset_list:
+ with open(dataset, 'r') as f:
+ for line in f:
+ # the label is speaker name
+ label_name = json.loads(line.strip())['utt2spk']
+ speakers.add(label_name)
+ audio_files.append(line.strip())
+ speakers = sorted(speakers)
+ logger.info(f"we get {len(speakers)} speakers from all the train dataset")
+
+ with open(os.path.join(target_dir, "meta", "label2id.txt"), 'w') as f:
+ for label_id, label_name in enumerate(speakers):
+ f.write(f'{label_name} {label_id}\n')
+ logger.info(
+ f'we store the speakers to {os.path.join(target_dir, "meta", "label2id.txt")}'
+ )
+
+ # the split_ratio is for train dataset
+ # the remaining is for dev dataset
+ split_idx = int(split_ratio * len(audio_files))
+ audio_files = sorted(audio_files)
+ random.shuffle(audio_files)
+ train_files, dev_files = audio_files[:split_idx], audio_files[split_idx:]
+ logger.info(
+ f"we get train utterances: {len(train_files)}, dev utterance: {len(dev_files)}"
+ )
+ return train_files, dev_files
+
+
+def prepare_data(args, config):
+ """Convert the jsonline format to csv format
+
+ Args:
+ args (argparse.Namespace): scripts args
+ config (CfgNode): yaml configuration content
+ """
+ # stage0: set the random seed
+ random.seed(config.seed)
+
+ # if external config set the skip_prep flat, we will do nothing
+ if config.skip_prep:
+ return
+
+ # stage 1: prepare the enroll and test csv file
+ # And we generate the speaker to label file label2id.txt
+ logger.info("start to prepare the data csv file")
+ enroll_files, test_files = get_enroll_test_list(
+ [args.test], verification_file=config.verification_file)
+ prepare_csv(
+ enroll_files,
+ os.path.join(args.target_dir, "csv", "enroll.csv"),
+ config,
+ split_chunks=False)
+ prepare_csv(
+ test_files,
+ os.path.join(args.target_dir, "csv", "test.csv"),
+ config,
+ split_chunks=False)
+
+ # stage 2: prepare the train and dev csv file
+ # we get the train dataset ratio as config.split_ratio
+ # and the remaining is dev dataset
+ logger.info("start to prepare the data csv file")
+ train_files, dev_files = get_train_dev_list(
+ args.train, target_dir=args.target_dir, split_ratio=config.split_ratio)
+ prepare_csv(train_files,
+ os.path.join(args.target_dir, "csv", "train.csv"), config)
+ prepare_csv(dev_files,
+ os.path.join(args.target_dir, "csv", "dev.csv"), config)
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ "--train",
+ required=True,
+ nargs='+',
+ help="The jsonline files list for train.")
+ parser.add_argument(
+ "--test", required=True, help="The jsonline file for test")
+ parser.add_argument(
+ "--target_dir",
+ default=None,
+ required=True,
+ help="The target directory stores the csv files and meta file.")
+ parser.add_argument(
+ "--config",
+ default=None,
+ required=True,
+ type=str,
+ help="configuration file")
+ args = parser.parse_args()
+
+ # parse the yaml config file
+ config = CfgNode(new_allowed=True)
+ if args.config:
+ config.merge_from_file(args.config)
+
+ # prepare the csv file from jsonlines files
+ prepare_data(args, config)
diff --git a/examples/voxceleb/sv0/run.sh b/examples/voxceleb/sv0/run.sh
index bbc9e3dbb..e1dccf2ae 100755
--- a/examples/voxceleb/sv0/run.sh
+++ b/examples/voxceleb/sv0/run.sh
@@ -18,24 +18,22 @@ set -e
#######################################################################
# stage 0: data prepare, including voxceleb1 download and generate {train,dev,enroll,test}.csv
-# voxceleb2 data is m4a format, so we need user to convert the m4a to wav yourselves as described in Readme.md with the script local/convert.sh
+# voxceleb2 data is m4a format, so we need convert the m4a to wav yourselves with the script local/convert.sh
# stage 1: train the speaker identification model
# stage 2: test speaker identification
-# stage 3: extract the training embeding to train the LDA and PLDA
+# stage 3: (todo)extract the training embeding to train the LDA and PLDA
######################################################################
-# we can set the variable PPAUDIO_HOME to specifiy the root directory of the downloaded vox1 and vox2 dataset
-# default the dataset will be stored in the ~/.paddleaudio/
# the vox2 dataset is stored in m4a format, we need to convert the audio from m4a to wav yourself
-# and put all of them to ${PPAUDIO_HOME}/datasets/vox2
-# we will find the wav from ${PPAUDIO_HOME}/datasets/vox1/wav and ${PPAUDIO_HOME}/datasets/vox2/wav
-# export PPAUDIO_HOME=
+# and put all of them to ${MAIN_ROOT}/datasets/vox2
+# we will find the wav from ${MAIN_ROOT}/datasets/vox1/{dev,test}/wav and ${MAIN_ROOT}/datasets/vox2/wav
+
stage=0
stop_stage=50
# data directory
# if we set the variable ${dir}, we will store the wav info to this directory
-# otherwise, we will store the wav info to vox1 and vox2 directory respectively
+# otherwise, we will store the wav info to data/vox1 and data/vox2 directory respectively
# vox2 wav path, we must convert the m4a format to wav format
dir=data/ # data info directory
@@ -64,6 +62,6 @@ if [ $stage -le 2 ] && [ ${stop_stage} -ge 2 ]; then
fi
# if [ $stage -le 3 ]; then
-# # stage 2: extract the training embeding to train the LDA and PLDA
+# # stage 3: extract the training embeding to train the LDA and PLDA
# # todo: extract the training embedding
# fi
diff --git a/paddleaudio/paddleaudio/datasets/voxceleb.py b/paddleaudio/paddleaudio/datasets/voxceleb.py
index 3f72b5f2e..07f44e0c1 100644
--- a/paddleaudio/paddleaudio/datasets/voxceleb.py
+++ b/paddleaudio/paddleaudio/datasets/voxceleb.py
@@ -261,7 +261,7 @@ class VoxCeleb(Dataset):
output_file: str,
split_chunks: bool=True):
print(f'Generating csv: {output_file}')
- header = ["ID", "duration", "wav", "start", "stop", "spk_id"]
+ header = ["id", "duration", "wav", "start", "stop", "spk_id"]
# Note: this may occurs c++ execption, but the program will execute fine
# so we can ignore the execption
with Pool(cpu_count()) as p:
diff --git a/paddleaudio/paddleaudio/metric/__init__.py b/paddleaudio/paddleaudio/metric/__init__.py
index 8e5ca9f75..d2b3a1360 100644
--- a/paddleaudio/paddleaudio/metric/__init__.py
+++ b/paddleaudio/paddleaudio/metric/__init__.py
@@ -14,4 +14,3 @@
from .dtw import dtw_distance
from .eer import compute_eer
from .eer import compute_minDCF
-from .mcd import mcd_distance
diff --git a/paddleaudio/paddleaudio/metric/mcd.py b/paddleaudio/paddleaudio/metric/mcd.py
deleted file mode 100644
index 63a25fc23..000000000
--- a/paddleaudio/paddleaudio/metric/mcd.py
+++ /dev/null
@@ -1,63 +0,0 @@
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-from typing import Callable
-
-import mcd.metrics_fast as mt
-import numpy as np
-from mcd import dtw
-
-__all__ = [
- 'mcd_distance',
-]
-
-
-def mcd_distance(xs: np.ndarray,
- ys: np.ndarray,
- cost_fn: Callable=mt.logSpecDbDist) -> float:
- """Mel cepstral distortion (MCD), dtw distance.
-
- Dynamic Time Warping.
- Uses dynamic programming to compute:
-
- Examples:
- .. code-block:: python
-
- wps[i, j] = cost_fn(xs[i], ys[j]) + min(
- wps[i-1, j ], // vertical / insertion / expansion
- wps[i , j-1], // horizontal / deletion / compression
- wps[i-1, j-1]) // diagonal / match
-
- dtw = sqrt(wps[-1, -1])
-
- Cost Function:
- Examples:
- .. code-block:: python
-
- logSpecDbConst = 10.0 / math.log(10.0) * math.sqrt(2.0)
-
- def logSpecDbDist(x, y):
- diff = x - y
- return logSpecDbConst * math.sqrt(np.inner(diff, diff))
-
- Args:
- xs (np.ndarray): ref sequence, [T,D]
- ys (np.ndarray): hyp sequence, [T,D]
- cost_fn (Callable, optional): Cost function. Defaults to mt.logSpecDbDist.
-
- Returns:
- float: dtw distance
- """
-
- min_cost, path = dtw.dtw(xs, ys, cost_fn)
- return min_cost
diff --git a/paddleaudio/paddleaudio/utils/numeric.py b/paddleaudio/paddleaudio/utils/numeric.py
new file mode 100644
index 000000000..126cada50
--- /dev/null
+++ b/paddleaudio/paddleaudio/utils/numeric.py
@@ -0,0 +1,30 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+
+
+def pcm16to32(audio: np.ndarray) -> np.ndarray:
+ """pcm int16 to float32
+
+ Args:
+ audio (np.ndarray): Waveform with dtype of int16.
+
+ Returns:
+ np.ndarray: Waveform with dtype of float32.
+ """
+ if audio.dtype == np.int16:
+ audio = audio.astype("float32")
+ bits = np.iinfo(np.int16).bits
+ audio = audio / (2**(bits - 1))
+ return audio
diff --git a/paddleaudio/setup.py b/paddleaudio/setup.py
index e08b88a3b..c92e5c73f 100644
--- a/paddleaudio/setup.py
+++ b/paddleaudio/setup.py
@@ -19,7 +19,7 @@ from setuptools.command.install import install
from setuptools.command.test import test
# set the version here
-VERSION = '0.2.0'
+VERSION = '0.2.1'
# Inspired by the example at https://pytest.org/latest/goodpractises.html
@@ -83,9 +83,8 @@ setuptools.setup(
python_requires='>=3.6',
install_requires=[
'numpy >= 1.15.0', 'scipy >= 1.0.0', 'resampy >= 0.2.2',
- 'soundfile >= 0.9.0', 'colorlog', 'dtaidistance == 2.3.1', 'mcd >= 0.4',
- 'pathos'
- ],
+ 'soundfile >= 0.9.0', 'colorlog', 'dtaidistance == 2.3.1', 'pathos'
+ ],
extras_require={
'test': [
'nose', 'librosa==0.8.1', 'soundfile==0.10.3.post1',
diff --git a/paddlespeech/cli/asr/infer.py b/paddlespeech/cli/asr/infer.py
index 1fb4be434..b12b9f6fc 100644
--- a/paddlespeech/cli/asr/infer.py
+++ b/paddlespeech/cli/asr/infer.py
@@ -80,9 +80,9 @@ pretrained_models = {
},
"deepspeech2online_aishell-zh-16k": {
'url':
- 'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.1.1.model.tar.gz',
+ 'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz',
'md5':
- 'd5e076217cf60486519f72c217d21b9b',
+ '23e16c69730a1cb5d735c98c83c21e16',
'cfg_path':
'model.yaml',
'ckpt_path':
@@ -426,6 +426,11 @@ class ASRExecutor(BaseExecutor):
try:
audio, audio_sample_rate = soundfile.read(
audio_file, dtype="int16", always_2d=True)
+ audio_duration = audio.shape[0] / audio_sample_rate
+ max_duration = 50.0
+ if audio_duration >= max_duration:
+ logger.error("Please input audio file less then 50 seconds.\n")
+ return
except Exception as e:
logger.exception(e)
logger.error(
diff --git a/paddlespeech/cli/vector/infer.py b/paddlespeech/cli/vector/infer.py
index 175a9723e..68e832ac7 100644
--- a/paddlespeech/cli/vector/infer.py
+++ b/paddlespeech/cli/vector/infer.py
@@ -15,6 +15,7 @@ import argparse
import os
import sys
from collections import OrderedDict
+from typing import Dict
from typing import List
from typing import Optional
from typing import Union
@@ -42,9 +43,9 @@ pretrained_models = {
# "paddlespeech vector --task spk --model ecapatdnn_voxceleb12-16k --sr 16000 --input ./input.wav"
"ecapatdnn_voxceleb12-16k": {
'url':
- 'https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_1.tar.gz',
+ 'https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz',
'md5':
- 'a1c0dba7d4de997187786ff517d5b4ec',
+ 'cc33023c54ab346cd318408f43fcaf95',
'cfg_path':
'conf/model.yaml', # the yaml config path
'ckpt_path':
@@ -79,7 +80,7 @@ class VectorExecutor(BaseExecutor):
"--task",
type=str,
default="spk",
- choices=["spk"],
+ choices=["spk", "score"],
help="task type in vector domain")
self.parser.add_argument(
"--input",
@@ -147,13 +148,40 @@ class VectorExecutor(BaseExecutor):
logger.info(f"task source: {task_source}")
# stage 3: process the audio one by one
+ # we do action according the task type
task_result = OrderedDict()
has_exceptions = False
for id_, input_ in task_source.items():
try:
- res = self(input_, model, sample_rate, config, ckpt_path,
- device)
- task_result[id_] = res
+ # extract the speaker audio embedding
+ if parser_args.task == "spk":
+ logger.info("do vector spk task")
+ res = self(input_, model, sample_rate, config, ckpt_path,
+ device)
+ task_result[id_] = res
+ elif parser_args.task == "score":
+ logger.info("do vector score task")
+ logger.info(f"input content {input_}")
+ if len(input_.split()) != 2:
+ logger.error(
+ f"vector score task input {input_} wav num is not two,"
+ "that is {len(input_.split())}")
+ sys.exit(-1)
+
+ # get the enroll and test embedding
+ enroll_audio, test_audio = input_.split()
+ logger.info(
+ f"score task, enroll audio: {enroll_audio}, test audio: {test_audio}"
+ )
+ enroll_embedding = self(enroll_audio, model, sample_rate,
+ config, ckpt_path, device)
+ test_embedding = self(test_audio, model, sample_rate,
+ config, ckpt_path, device)
+
+ # get the score
+ res = self.get_embeddings_score(enroll_embedding,
+ test_embedding)
+ task_result[id_] = res
except Exception as e:
has_exceptions = True
task_result[id_] = f'{e.__class__.__name__}: {e}'
@@ -172,6 +200,49 @@ class VectorExecutor(BaseExecutor):
else:
return True
+ def _get_job_contents(
+ self, job_input: os.PathLike) -> Dict[str, Union[str, os.PathLike]]:
+ """
+ Read a job input file and return its contents in a dictionary.
+ Refactor from the Executor._get_job_contents
+
+ Args:
+ job_input (os.PathLike): The job input file.
+
+ Returns:
+ Dict[str, str]: Contents of job input.
+ """
+ job_contents = OrderedDict()
+ with open(job_input) as f:
+ for line in f:
+ line = line.strip()
+ if not line:
+ continue
+ k = line.split(' ')[0]
+ v = ' '.join(line.split(' ')[1:])
+ job_contents[k] = v
+ return job_contents
+
+ def get_embeddings_score(self, enroll_embedding, test_embedding):
+ """get the enroll embedding and test embedding score
+
+ Args:
+ enroll_embedding (numpy.array): shape: (emb_size), enroll audio embedding
+ test_embedding (numpy.array): shape: (emb_size), test audio embedding
+
+ Returns:
+ score: the score between enroll embedding and test embedding
+ """
+ if not hasattr(self, "score_func"):
+ self.score_func = paddle.nn.CosineSimilarity(axis=0)
+ logger.info("create the cosine score function ")
+
+ score = self.score_func(
+ paddle.to_tensor(enroll_embedding),
+ paddle.to_tensor(test_embedding))
+
+ return score.item()
+
@stats_wrapper
def __call__(self,
audio_file: os.PathLike,
diff --git a/paddlespeech/server/bin/paddlespeech_server.py b/paddlespeech/server/bin/paddlespeech_server.py
index f6a7f4295..474a8b79f 100644
--- a/paddlespeech/server/bin/paddlespeech_server.py
+++ b/paddlespeech/server/bin/paddlespeech_server.py
@@ -23,8 +23,9 @@ from ..util import cli_server_register
from ..util import stats_wrapper
from paddlespeech.cli.log import logger
from paddlespeech.server.engine.engine_pool import init_engine_pool
-from paddlespeech.server.restful.api import setup_router
+from paddlespeech.server.restful.api import setup_router as setup_http_router
from paddlespeech.server.utils.config import get_config
+from paddlespeech.server.ws.api import setup_router as setup_ws_router
__all__ = ['ServerExecutor', 'ServerStatsExecutor']
@@ -63,7 +64,12 @@ class ServerExecutor(BaseExecutor):
"""
# init api
api_list = list(engine.split("_")[0] for engine in config.engine_list)
- api_router = setup_router(api_list)
+ if config.protocol == "websocket":
+ api_router = setup_ws_router(api_list)
+ elif config.protocol == "http":
+ api_router = setup_http_router(api_list)
+ else:
+ raise Exception("unsupported protocol")
app.include_router(api_router)
if not init_engine_pool(config):
diff --git a/paddlespeech/server/conf/tts_online_application.yaml b/paddlespeech/server/conf/tts_online_application.yaml
new file mode 100644
index 000000000..a80b3ecec
--- /dev/null
+++ b/paddlespeech/server/conf/tts_online_application.yaml
@@ -0,0 +1,46 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+# SERVER SETTING #
+#################################################################################
+host: 127.0.0.1
+port: 8092
+
+# The task format in the engin_list is: _
+# task choices = ['asr_online', 'tts_online']
+# protocol = ['websocket', 'http'] (only one can be selected).
+protocol: 'http'
+engine_list: ['tts_online']
+
+
+#################################################################################
+# ENGINE CONFIG #
+#################################################################################
+
+################################### TTS #########################################
+################### speech task: tts; engine_type: online #######################
+tts_online:
+ # am (acoustic model) choices=['fastspeech2_csmsc']
+ am: 'fastspeech2_csmsc'
+ am_config:
+ am_ckpt:
+ am_stat:
+ phones_dict:
+ tones_dict:
+ speaker_dict:
+ spk_id: 0
+
+ # voc (vocoder) choices=['mb_melgan_csmsc']
+ voc: 'mb_melgan_csmsc'
+ voc_config:
+ voc_ckpt:
+ voc_stat:
+
+ # others
+ lang: 'zh'
+ device: # set 'gpu:id' or 'cpu'
+ am_block: 42
+ am_pad: 12
+ voc_block: 14
+ voc_pad: 14
+
diff --git a/paddlespeech/server/engine/asr/online/asr_engine.py b/paddlespeech/server/engine/asr/online/asr_engine.py
index 389175a0a..1f356a3c6 100644
--- a/paddlespeech/server/engine/asr/online/asr_engine.py
+++ b/paddlespeech/server/engine/asr/online/asr_engine.py
@@ -27,6 +27,7 @@ from paddlespeech.s2t.frontend.speech import SpeechSegment
from paddlespeech.s2t.modules.ctc import CTCDecoder
from paddlespeech.s2t.utils.utility import UpdateConfig
from paddlespeech.server.engine.base_engine import BaseEngine
+from paddlespeech.server.utils.audio_process import pcm2float
from paddlespeech.server.utils.paddle_predictor import init_predictor
__all__ = ['ASREngine']
@@ -36,7 +37,7 @@ pretrained_models = {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.1.1.model.tar.gz',
'md5':
- 'd5e076217cf60486519f72c217d21b9b',
+ '23e16c69730a1cb5d735c98c83c21e16',
'cfg_path':
'model.yaml',
'ckpt_path':
@@ -222,21 +223,6 @@ class ASRServerExecutor(ASRExecutor):
else:
raise Exception("invalid model name")
- def _pcm16to32(self, audio):
- """pcm int16 to float32
-
- Args:
- audio(numpy.array): numpy.int16
-
- Returns:
- audio(numpy.array): numpy.float32
- """
- if audio.dtype == np.int16:
- audio = audio.astype("float32")
- bits = np.iinfo(np.int16).bits
- audio = audio / (2**(bits - 1))
- return audio
-
def extract_feat(self, samples, sample_rate):
"""extract feat
@@ -249,7 +235,7 @@ class ASRServerExecutor(ASRExecutor):
x_chunk_lens (numpy.array): shape[B]
"""
# pcm16 -> pcm 32
- samples = self._pcm16to32(samples)
+ samples = pcm2float(samples)
# read audio
speech_segment = SpeechSegment.from_pcm(
diff --git a/paddlespeech/server/engine/engine_factory.py b/paddlespeech/server/engine/engine_factory.py
index 2a39fb79b..e147a29a6 100644
--- a/paddlespeech/server/engine/engine_factory.py
+++ b/paddlespeech/server/engine/engine_factory.py
@@ -34,6 +34,9 @@ class EngineFactory(object):
elif engine_name == 'tts' and engine_type == 'python':
from paddlespeech.server.engine.tts.python.tts_engine import TTSEngine
return TTSEngine()
+ elif engine_name == 'tts' and engine_type == 'online':
+ from paddlespeech.server.engine.tts.online.tts_engine import TTSEngine
+ return TTSEngine()
elif engine_name == 'cls' and engine_type == 'inference':
from paddlespeech.server.engine.cls.paddleinference.cls_engine import CLSEngine
return CLSEngine()
diff --git a/paddlespeech/server/engine/tts/online/__init__.py b/paddlespeech/server/engine/tts/online/__init__.py
new file mode 100644
index 000000000..97043fd7b
--- /dev/null
+++ b/paddlespeech/server/engine/tts/online/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/paddlespeech/server/engine/tts/online/tts_engine.py b/paddlespeech/server/engine/tts/online/tts_engine.py
new file mode 100644
index 000000000..25a8bc76f
--- /dev/null
+++ b/paddlespeech/server/engine/tts/online/tts_engine.py
@@ -0,0 +1,220 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import base64
+import time
+
+import numpy as np
+import paddle
+
+from paddlespeech.cli.log import logger
+from paddlespeech.cli.tts.infer import TTSExecutor
+from paddlespeech.server.engine.base_engine import BaseEngine
+from paddlespeech.server.utils.audio_process import float2pcm
+from paddlespeech.server.utils.util import get_chunks
+
+__all__ = ['TTSEngine']
+
+
+class TTSServerExecutor(TTSExecutor):
+ def __init__(self):
+ super().__init__()
+ pass
+
+ @paddle.no_grad()
+ def infer(
+ self,
+ text: str,
+ lang: str='zh',
+ am: str='fastspeech2_csmsc',
+ spk_id: int=0,
+ am_block: int=42,
+ am_pad: int=12,
+ voc_block: int=14,
+ voc_pad: int=14, ):
+ """
+ Model inference and result stored in self.output.
+ """
+ am_name = am[:am.rindex('_')]
+ am_dataset = am[am.rindex('_') + 1:]
+ get_tone_ids = False
+ merge_sentences = False
+ frontend_st = time.time()
+ if lang == 'zh':
+ input_ids = self.frontend.get_input_ids(
+ text,
+ merge_sentences=merge_sentences,
+ get_tone_ids=get_tone_ids)
+ phone_ids = input_ids["phone_ids"]
+ if get_tone_ids:
+ tone_ids = input_ids["tone_ids"]
+ elif lang == 'en':
+ input_ids = self.frontend.get_input_ids(
+ text, merge_sentences=merge_sentences)
+ phone_ids = input_ids["phone_ids"]
+ else:
+ print("lang should in {'zh', 'en'}!")
+ self.frontend_time = time.time() - frontend_st
+
+ for i in range(len(phone_ids)):
+ am_st = time.time()
+ part_phone_ids = phone_ids[i]
+ # am
+ if am_name == 'speedyspeech':
+ part_tone_ids = tone_ids[i]
+ mel = self.am_inference(part_phone_ids, part_tone_ids)
+ # fastspeech2
+ else:
+ # multi speaker
+ if am_dataset in {"aishell3", "vctk"}:
+ mel = self.am_inference(
+ part_phone_ids, spk_id=paddle.to_tensor(spk_id))
+ else:
+ mel = self.am_inference(part_phone_ids)
+ am_et = time.time()
+
+ # voc streaming
+ voc_upsample = self.voc_config.n_shift
+ mel_chunks = get_chunks(mel, voc_block, voc_pad, "voc")
+ chunk_num = len(mel_chunks)
+ voc_st = time.time()
+ for i, mel_chunk in enumerate(mel_chunks):
+ sub_wav = self.voc_inference(mel_chunk)
+ front_pad = min(i * voc_block, voc_pad)
+
+ if i == 0:
+ sub_wav = sub_wav[:voc_block * voc_upsample]
+ elif i == chunk_num - 1:
+ sub_wav = sub_wav[front_pad * voc_upsample:]
+ else:
+ sub_wav = sub_wav[front_pad * voc_upsample:(
+ front_pad + voc_block) * voc_upsample]
+
+ yield sub_wav
+
+
+class TTSEngine(BaseEngine):
+ """TTS server engine
+
+ Args:
+ metaclass: Defaults to Singleton.
+ """
+
+ def __init__(self, name=None):
+ """Initialize TTS server engine
+ """
+ super(TTSEngine, self).__init__()
+
+ def init(self, config: dict) -> bool:
+ self.executor = TTSServerExecutor()
+ self.config = config
+ assert "fastspeech2_csmsc" in config.am and (
+ config.voc == "hifigan_csmsc-zh" or config.voc == "mb_melgan_csmsc"
+ ), 'Please check config, am support: fastspeech2, voc support: hifigan_csmsc-zh or mb_melgan_csmsc.'
+ try:
+ if self.config.device:
+ self.device = self.config.device
+ else:
+ self.device = paddle.get_device()
+ paddle.set_device(self.device)
+ except Exception as e:
+ logger.error(
+ "Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
+ )
+ logger.error("Initialize TTS server engine Failed on device: %s." %
+ (self.device))
+ return False
+
+ try:
+ self.executor._init_from_path(
+ am=self.config.am,
+ am_config=self.config.am_config,
+ am_ckpt=self.config.am_ckpt,
+ am_stat=self.config.am_stat,
+ phones_dict=self.config.phones_dict,
+ tones_dict=self.config.tones_dict,
+ speaker_dict=self.config.speaker_dict,
+ voc=self.config.voc,
+ voc_config=self.config.voc_config,
+ voc_ckpt=self.config.voc_ckpt,
+ voc_stat=self.config.voc_stat,
+ lang=self.config.lang)
+ except Exception as e:
+ logger.error("Failed to get model related files.")
+ logger.error("Initialize TTS server engine Failed on device: %s." %
+ (self.device))
+ return False
+
+ self.am_block = self.config.am_block
+ self.am_pad = self.config.am_pad
+ self.voc_block = self.config.voc_block
+ self.voc_pad = self.config.voc_pad
+
+ logger.info("Initialize TTS server engine successfully on device: %s." %
+ (self.device))
+ return True
+
+ def preprocess(self, text_bese64: str=None, text_bytes: bytes=None):
+ # Convert byte to text
+ if text_bese64:
+ text_bytes = base64.b64decode(text_bese64) # base64 to bytes
+ text = text_bytes.decode('utf-8') # bytes to text
+
+ return text
+
+ def run(self,
+ sentence: str,
+ spk_id: int=0,
+ speed: float=1.0,
+ volume: float=1.0,
+ sample_rate: int=0,
+ save_path: str=None):
+ """ run include inference and postprocess.
+
+ Args:
+ sentence (str): text to be synthesized
+ spk_id (int, optional): speaker id for multi-speaker speech synthesis. Defaults to 0.
+ speed (float, optional): speed. Defaults to 1.0.
+ volume (float, optional): volume. Defaults to 1.0.
+ sample_rate (int, optional): target sample rate for synthesized audio,
+ 0 means the same as the model sampling rate. Defaults to 0.
+ save_path (str, optional): The save path of the synthesized audio.
+ None means do not save audio. Defaults to None.
+
+ Returns:
+ wav_base64: The base64 format of the synthesized audio.
+ """
+
+ lang = self.config.lang
+ wav_list = []
+
+ for wav in self.executor.infer(
+ text=sentence,
+ lang=lang,
+ am=self.config.am,
+ spk_id=spk_id,
+ am_block=self.am_block,
+ am_pad=self.am_pad,
+ voc_block=self.voc_block,
+ voc_pad=self.voc_pad):
+ # wav type: float32, convert to pcm (base64)
+ wav = float2pcm(wav) # float32 to int16
+ wav_bytes = wav.tobytes() # to bytes
+ wav_base64 = base64.b64encode(wav_bytes).decode('utf8') # to base64
+ wav_list.append(wav)
+
+ yield wav_base64
+
+ wav_all = np.concatenate(wav_list, axis=0)
+ logger.info("The durations of audio is: {} s".format(
+ len(wav_all) / self.executor.am_config.fs))
diff --git a/paddlespeech/server/restful/tts_api.py b/paddlespeech/server/restful/tts_api.py
index 4e9bbe23e..d1268428a 100644
--- a/paddlespeech/server/restful/tts_api.py
+++ b/paddlespeech/server/restful/tts_api.py
@@ -15,6 +15,7 @@ import traceback
from typing import Union
from fastapi import APIRouter
+from fastapi.responses import StreamingResponse
from paddlespeech.cli.log import logger
from paddlespeech.server.engine.engine_pool import get_engine_pool
@@ -125,3 +126,14 @@ def tts(request_body: TTSRequest):
traceback.print_exc()
return response
+
+
+@router.post("/paddlespeech/streaming/tts")
+async def stream_tts(request_body: TTSRequest):
+ text = request_body.text
+
+ engine_pool = get_engine_pool()
+ tts_engine = engine_pool['tts']
+ logger.info("Get tts engine successfully.")
+
+ return StreamingResponse(tts_engine.run(sentence=text))
diff --git a/paddlespeech/server/tests/tts/test_client.py b/paddlespeech/server/tests/tts/offline/http_client.py
similarity index 90%
rename from paddlespeech/server/tests/tts/test_client.py
rename to paddlespeech/server/tests/tts/offline/http_client.py
index e42c9bcfa..1bdee4c18 100644
--- a/paddlespeech/server/tests/tts/test_client.py
+++ b/paddlespeech/server/tests/tts/offline/http_client.py
@@ -33,7 +33,8 @@ def tts_client(args):
text: A sentence to be synthesized
outfile: Synthetic audio file
"""
- url = 'http://127.0.0.1:8090/paddlespeech/tts'
+ url = "http://" + str(args.server) + ":" + str(
+ args.port) + "/paddlespeech/tts"
request = {
"text": args.text,
"spk_id": args.spk_id,
@@ -72,7 +73,7 @@ if __name__ == "__main__":
parser.add_argument(
'--text',
type=str,
- default="你好,欢迎使用语音合成服务",
+ default="您好,欢迎使用语音合成服务。",
help='A sentence to be synthesized')
parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
@@ -88,6 +89,9 @@ if __name__ == "__main__":
type=str,
default="./out.wav",
help='Synthesized audio file')
+ parser.add_argument(
+ "--server", type=str, help="server ip", default="127.0.0.1")
+ parser.add_argument("--port", type=int, help="server port", default=8090)
args = parser.parse_args()
st = time.time()
diff --git a/paddlespeech/server/tests/tts/online/http_client.py b/paddlespeech/server/tests/tts/online/http_client.py
new file mode 100644
index 000000000..cbc1f5c02
--- /dev/null
+++ b/paddlespeech/server/tests/tts/online/http_client.py
@@ -0,0 +1,100 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import base64
+import json
+import os
+import time
+
+import requests
+
+from paddlespeech.server.utils.audio_process import pcm2wav
+
+
+def save_audio(buffer, audio_path) -> bool:
+ if args.save_path.endswith("pcm"):
+ with open(args.save_path, "wb") as f:
+ f.write(buffer)
+ elif args.save_path.endswith("wav"):
+ with open("./tmp.pcm", "wb") as f:
+ f.write(buffer)
+ pcm2wav("./tmp.pcm", audio_path, channels=1, bits=16, sample_rate=24000)
+ os.system("rm ./tmp.pcm")
+ else:
+ print("Only supports saved audio format is pcm or wav")
+ return False
+
+ return True
+
+
+def test(args):
+ params = {
+ "text": args.text,
+ "spk_id": args.spk_id,
+ "speed": args.speed,
+ "volume": args.volume,
+ "sample_rate": args.sample_rate,
+ "save_path": ''
+ }
+
+ buffer = b''
+ flag = 1
+ url = "http://" + str(args.server) + ":" + str(
+ args.port) + "/paddlespeech/streaming/tts"
+ st = time.time()
+ html = requests.post(url, json.dumps(params), stream=True)
+ for chunk in html.iter_content(chunk_size=1024):
+ chunk = base64.b64decode(chunk) # bytes
+ if flag:
+ first_response = time.time() - st
+ print(f"首包响应:{first_response} s")
+ flag = 0
+ buffer += chunk
+
+ final_response = time.time() - st
+ duration = len(buffer) / 2.0 / 24000
+
+ print(f"尾包响应:{final_response} s")
+ print(f"音频时长:{duration} s")
+ print(f"RTF: {final_response / duration}")
+
+ if args.save_path is not None:
+ if save_audio(buffer, args.save_path):
+ print("音频保存至:", args.save_path)
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ '--text',
+ type=str,
+ default="您好,欢迎使用语音合成服务。",
+ help='A sentence to be synthesized')
+ parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
+ parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
+ parser.add_argument(
+ '--volume', type=float, default=1.0, help='Audio volume')
+ parser.add_argument(
+ '--sample_rate',
+ type=int,
+ default=0,
+ help='Sampling rate, the default is the same as the model')
+ parser.add_argument(
+ "--server", type=str, help="server ip", default="127.0.0.1")
+ parser.add_argument("--port", type=int, help="server port", default=8092)
+ parser.add_argument(
+ "--save_path", type=str, help="save audio path", default=None)
+
+ args = parser.parse_args()
+ test(args)
diff --git a/paddlespeech/server/tests/tts/online/http_client_playaudio.py b/paddlespeech/server/tests/tts/online/http_client_playaudio.py
new file mode 100644
index 000000000..1e7e8064e
--- /dev/null
+++ b/paddlespeech/server/tests/tts/online/http_client_playaudio.py
@@ -0,0 +1,112 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import base64
+import json
+import threading
+import time
+
+import pyaudio
+import requests
+
+mutex = threading.Lock()
+buffer = b''
+p = pyaudio.PyAudio()
+stream = p.open(
+ format=p.get_format_from_width(2), channels=1, rate=24000, output=True)
+max_fail = 50
+
+
+def play_audio():
+ global stream
+ global buffer
+ global max_fail
+ while True:
+ if not buffer:
+ max_fail -= 1
+ time.sleep(0.05)
+ if max_fail < 0:
+ break
+ mutex.acquire()
+ stream.write(buffer)
+ buffer = b''
+ mutex.release()
+
+
+def test(args):
+ global mutex
+ global buffer
+ params = {
+ "text": args.text,
+ "spk_id": args.spk_id,
+ "speed": args.speed,
+ "volume": args.volume,
+ "sample_rate": args.sample_rate,
+ "save_path": ''
+ }
+
+ all_bytes = 0.0
+ t = threading.Thread(target=play_audio)
+ flag = 1
+ url = "http://" + str(args.server) + ":" + str(
+ args.port) + "/paddlespeech/streaming/tts"
+ st = time.time()
+ html = requests.post(url, json.dumps(params), stream=True)
+ for chunk in html.iter_content(chunk_size=1024):
+ mutex.acquire()
+ chunk = base64.b64decode(chunk) # bytes
+ buffer += chunk
+ mutex.release()
+ if flag:
+ first_response = time.time() - st
+ print(f"首包响应:{first_response} s")
+ flag = 0
+ t.start()
+ all_bytes += len(chunk)
+
+ final_response = time.time() - st
+ duration = all_bytes / 2 / 24000
+
+ print(f"尾包响应:{final_response} s")
+ print(f"音频时长:{duration} s")
+ print(f"RTF: {final_response / duration}")
+
+ t.join()
+ stream.stop_stream()
+ stream.close()
+ p.terminate()
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ '--text',
+ type=str,
+ default="您好,欢迎使用语音合成服务。",
+ help='A sentence to be synthesized')
+ parser.add_argument('--spk_id', type=int, default=0, help='Speaker id')
+ parser.add_argument('--speed', type=float, default=1.0, help='Audio speed')
+ parser.add_argument(
+ '--volume', type=float, default=1.0, help='Audio volume')
+ parser.add_argument(
+ '--sample_rate',
+ type=int,
+ default=0,
+ help='Sampling rate, the default is the same as the model')
+ parser.add_argument(
+ "--server", type=str, help="server ip", default="127.0.0.1")
+ parser.add_argument("--port", type=int, help="server port", default=8092)
+
+ args = parser.parse_args()
+ test(args)
diff --git a/paddlespeech/server/tests/tts/online/ws_client.py b/paddlespeech/server/tests/tts/online/ws_client.py
new file mode 100644
index 000000000..eef010cf2
--- /dev/null
+++ b/paddlespeech/server/tests/tts/online/ws_client.py
@@ -0,0 +1,126 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import _thread as thread
+import argparse
+import base64
+import json
+import ssl
+import time
+
+import websocket
+
+flag = 1
+st = 0.0
+all_bytes = b''
+
+
+class WsParam(object):
+ # 初始化
+ def __init__(self, text, server="127.0.0.1", port=8090):
+ self.server = server
+ self.port = port
+ self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
+ self.text = text
+
+ # 生成url
+ def create_url(self):
+ return self.url
+
+
+def on_message(ws, message):
+ global flag
+ global st
+ global all_bytes
+
+ try:
+ message = json.loads(message)
+ audio = message["audio"]
+ audio = base64.b64decode(audio) # bytes
+ status = message["status"]
+ all_bytes += audio
+
+ if status == 0:
+ print("create successfully.")
+ elif status == 1:
+ if flag:
+ print(f"首包响应:{time.time() - st} s")
+ flag = 0
+ elif status == 2:
+ final_response = time.time() - st
+ duration = len(all_bytes) / 2.0 / 24000
+ print(f"尾包响应:{final_response} s")
+ print(f"音频时长:{duration} s")
+ print(f"RTF: {final_response / duration}")
+ with open("./out.pcm", "wb") as f:
+ f.write(all_bytes)
+ print("ws is closed")
+ ws.close()
+ else:
+ print("infer error")
+
+ except Exception as e:
+ print("receive msg,but parse exception:", e)
+
+
+# 收到websocket错误的处理
+def on_error(ws, error):
+ print("### error:", error)
+
+
+# 收到websocket关闭的处理
+def on_close(ws):
+ print("### closed ###")
+
+
+# 收到websocket连接建立的处理
+def on_open(ws):
+ def run(*args):
+ global st
+ text_base64 = str(
+ base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
+ d = {"text": text_base64}
+ d = json.dumps(d)
+ print("Start sending text data")
+ st = time.time()
+ ws.send(d)
+
+ thread.start_new_thread(run, ())
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ "--text",
+ type=str,
+ help="A sentence to be synthesized",
+ default="您好,欢迎使用语音合成服务。")
+ parser.add_argument(
+ "--server", type=str, help="server ip", default="127.0.0.1")
+ parser.add_argument("--port", type=int, help="server port", default=8092)
+ args = parser.parse_args()
+
+ print("***************************************")
+ print("Server ip: ", args.server)
+ print("Server port: ", args.port)
+ print("Sentence to be synthesized: ", args.text)
+ print("***************************************")
+
+ wsParam = WsParam(text=args.text, server=args.server, port=args.port)
+
+ websocket.enableTrace(False)
+ wsUrl = wsParam.create_url()
+ ws = websocket.WebSocketApp(
+ wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
+ ws.on_open = on_open
+ ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
diff --git a/paddlespeech/server/tests/tts/online/ws_client_playaudio.py b/paddlespeech/server/tests/tts/online/ws_client_playaudio.py
new file mode 100644
index 000000000..cdeb362df
--- /dev/null
+++ b/paddlespeech/server/tests/tts/online/ws_client_playaudio.py
@@ -0,0 +1,160 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import _thread as thread
+import argparse
+import base64
+import json
+import ssl
+import threading
+import time
+
+import pyaudio
+import websocket
+
+mutex = threading.Lock()
+buffer = b''
+p = pyaudio.PyAudio()
+stream = p.open(
+ format=p.get_format_from_width(2), channels=1, rate=24000, output=True)
+flag = 1
+st = 0.0
+all_bytes = 0.0
+
+
+class WsParam(object):
+ # 初始化
+ def __init__(self, text, server="127.0.0.1", port=8090):
+ self.server = server
+ self.port = port
+ self.url = "ws://" + self.server + ":" + str(self.port) + "/ws/tts"
+ self.text = text
+
+ # 生成url
+ def create_url(self):
+ return self.url
+
+
+def play_audio():
+ global stream
+ global buffer
+ while True:
+ time.sleep(0.05)
+ if not buffer: # buffer 为空
+ break
+ mutex.acquire()
+ stream.write(buffer)
+ buffer = b''
+ mutex.release()
+
+
+t = threading.Thread(target=play_audio)
+
+
+def on_message(ws, message):
+ global flag
+ global t
+ global buffer
+ global st
+ global all_bytes
+
+ try:
+ message = json.loads(message)
+ audio = message["audio"]
+ audio = base64.b64decode(audio) # bytes
+ status = message["status"]
+ all_bytes += len(audio)
+
+ if status == 0:
+ print("create successfully.")
+ elif status == 1:
+ mutex.acquire()
+ buffer += audio
+ mutex.release()
+ if flag:
+ print(f"首包响应:{time.time() - st} s")
+ flag = 0
+ print("Start playing audio")
+ t.start()
+ elif status == 2:
+ final_response = time.time() - st
+ duration = all_bytes / 2 / 24000
+ print(f"尾包响应:{final_response} s")
+ print(f"音频时长:{duration} s")
+ print(f"RTF: {final_response / duration}")
+ print("ws is closed")
+ ws.close()
+ else:
+ print("infer error")
+
+ except Exception as e:
+ print("receive msg,but parse exception:", e)
+
+
+# 收到websocket错误的处理
+def on_error(ws, error):
+ print("### error:", error)
+
+
+# 收到websocket关闭的处理
+def on_close(ws):
+ print("### closed ###")
+
+
+# 收到websocket连接建立的处理
+def on_open(ws):
+ def run(*args):
+ global st
+ text_base64 = str(
+ base64.b64encode((wsParam.text).encode('utf-8')), "UTF8")
+ d = {"text": text_base64}
+ d = json.dumps(d)
+ print("Start sending text data")
+ st = time.time()
+ ws.send(d)
+
+ thread.start_new_thread(run, ())
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ "--text",
+ type=str,
+ help="A sentence to be synthesized",
+ default="您好,欢迎使用语音合成服务。")
+ parser.add_argument(
+ "--server", type=str, help="server ip", default="127.0.0.1")
+ parser.add_argument("--port", type=int, help="server port", default=8092)
+ args = parser.parse_args()
+
+ print("***************************************")
+ print("Server ip: ", args.server)
+ print("Server port: ", args.port)
+ print("Sentence to be synthesized: ", args.text)
+ print("***************************************")
+
+ wsParam = WsParam(text=args.text, server=args.server, port=args.port)
+
+ websocket.enableTrace(False)
+ wsUrl = wsParam.create_url()
+ ws = websocket.WebSocketApp(
+ wsUrl, on_message=on_message, on_error=on_error, on_close=on_close)
+ ws.on_open = on_open
+ ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
+
+ t.join()
+ print("End of playing audio")
+ stream.stop_stream()
+ stream.close()
+ p.terminate()
diff --git a/paddlespeech/server/utils/audio_process.py b/paddlespeech/server/utils/audio_process.py
index 3cbb495a6..e85b9a27e 100644
--- a/paddlespeech/server/utils/audio_process.py
+++ b/paddlespeech/server/utils/audio_process.py
@@ -103,3 +103,40 @@ def change_speed(sample_raw, speed_rate, sample_rate):
sample_rate_in=sample_rate).squeeze(-1).astype(np.float32).copy()
return sample_speed
+
+
+def float2pcm(sig, dtype='int16'):
+ """Convert floating point signal with a range from -1 to 1 to PCM.
+
+ Args:
+ sig (array): Input array, must have floating point type.
+ dtype (str, optional): Desired (integer) data type. Defaults to 'int16'.
+
+ Returns:
+ numpy.ndarray: Integer data, scaled and clipped to the range of the given
+ """
+ sig = np.asarray(sig)
+ if sig.dtype.kind != 'f':
+ raise TypeError("'sig' must be a float array")
+ dtype = np.dtype(dtype)
+ if dtype.kind not in 'iu':
+ raise TypeError("'dtype' must be an integer type")
+
+ i = np.iinfo(dtype)
+ abs_max = 2**(i.bits - 1)
+ offset = i.min + abs_max
+ return (sig * abs_max + offset).clip(i.min, i.max).astype(dtype)
+
+
+def pcm2float(data):
+ """pcm int16 to float32
+ Args:
+ audio(numpy.array): numpy.int16
+ Returns:
+ audio(numpy.array): numpy.float32
+ """
+ if data.dtype == np.int16:
+ data = data.astype("float32")
+ bits = np.iinfo(np.int16).bits
+ data = data / (2**(bits - 1))
+ return data
diff --git a/paddlespeech/server/utils/util.py b/paddlespeech/server/utils/util.py
index e9104fa2d..0fe70849d 100644
--- a/paddlespeech/server/utils/util.py
+++ b/paddlespeech/server/utils/util.py
@@ -11,6 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the
import base64
+import math
def wav2base64(wav_file: str):
@@ -31,3 +32,42 @@ def self_check():
""" self check resource
"""
return True
+
+
+def denorm(data, mean, std):
+ """stream am model need to denorm
+ """
+ return data * std + mean
+
+
+def get_chunks(data, block_size, pad_size, step):
+ """Divide data into multiple chunks
+
+ Args:
+ data (tensor): data
+ block_size (int): [description]
+ pad_size (int): [description]
+ step (str): set "am" or "voc", generate chunk for step am or vocoder(voc)
+
+ Returns:
+ list: chunks list
+ """
+ if step == "am":
+ data_len = data.shape[1]
+ elif step == "voc":
+ data_len = data.shape[0]
+ else:
+ print("Please set correct type to get chunks, am or voc")
+
+ chunks = []
+ n = math.ceil(data_len / block_size)
+ for i in range(n):
+ start = max(0, i * block_size - pad_size)
+ end = min((i + 1) * block_size + pad_size, data_len)
+ if step == "am":
+ chunks.append(data[:, start:end, :])
+ elif step == "voc":
+ chunks.append(data[start:end, :])
+ else:
+ print("Please set correct type to get chunks, am or voc")
+ return chunks
diff --git a/paddlespeech/server/ws/api.py b/paddlespeech/server/ws/api.py
index 10664d114..313fd16f5 100644
--- a/paddlespeech/server/ws/api.py
+++ b/paddlespeech/server/ws/api.py
@@ -16,6 +16,7 @@ from typing import List
from fastapi import APIRouter
from paddlespeech.server.ws.asr_socket import router as asr_router
+from paddlespeech.server.ws.tts_socket import router as tts_router
_router = APIRouter()
@@ -31,7 +32,7 @@ def setup_router(api_list: List):
if api_name == 'asr':
_router.include_router(asr_router)
elif api_name == 'tts':
- pass
+ _router.include_router(tts_router)
else:
pass
diff --git a/paddlespeech/server/ws/tts_socket.py b/paddlespeech/server/ws/tts_socket.py
new file mode 100644
index 000000000..11458b3cf
--- /dev/null
+++ b/paddlespeech/server/ws/tts_socket.py
@@ -0,0 +1,62 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import json
+
+from fastapi import APIRouter
+from fastapi import WebSocket
+from fastapi import WebSocketDisconnect
+from starlette.websockets import WebSocketState as WebSocketState
+
+from paddlespeech.cli.log import logger
+from paddlespeech.server.engine.engine_pool import get_engine_pool
+
+router = APIRouter()
+
+
+@router.websocket('/ws/tts')
+async def websocket_endpoint(websocket: WebSocket):
+ await websocket.accept()
+
+ try:
+ # careful here, changed the source code from starlette.websockets
+ assert websocket.application_state == WebSocketState.CONNECTED
+ message = await websocket.receive()
+ websocket._raise_on_disconnect(message)
+
+ # get engine
+ engine_pool = get_engine_pool()
+ tts_engine = engine_pool['tts']
+
+ # 获取 message 并转文本
+ message = json.loads(message["text"])
+ text_bese64 = message["text"]
+ sentence = tts_engine.preprocess(text_bese64=text_bese64)
+
+ # run
+ wav_generator = tts_engine.run(sentence)
+
+ while True:
+ try:
+ tts_results = next(wav_generator)
+ resp = {"status": 1, "audio": tts_results}
+ await websocket.send_json(resp)
+ logger.info("streaming audio...")
+ except StopIteration as e:
+ resp = {"status": 2, "audio": ''}
+ await websocket.send_json(resp)
+ logger.info("Complete the transmission of audio streams")
+ break
+
+ except WebSocketDisconnect:
+ pass
diff --git a/paddlespeech/t2s/exps/fastspeech2/preprocess.py b/paddlespeech/t2s/exps/fastspeech2/preprocess.py
index 5bda75451..db1842b2e 100644
--- a/paddlespeech/t2s/exps/fastspeech2/preprocess.py
+++ b/paddlespeech/t2s/exps/fastspeech2/preprocess.py
@@ -86,6 +86,9 @@ def process_sentence(config: Dict[str, Any],
logmel = mel_extractor.get_log_mel_fbank(wav)
# change duration according to mel_length
compare_duration_and_mel_length(sentences, utt_id, logmel)
+ # utt_id may be popped in compare_duration_and_mel_length
+ if utt_id not in sentences:
+ return None
phones = sentences[utt_id][0]
durations = sentences[utt_id][1]
num_frames = logmel.shape[0]
diff --git a/paddlespeech/t2s/exps/inference.py b/paddlespeech/t2s/exps/inference.py
index 1188ddfb1..62602a01f 100644
--- a/paddlespeech/t2s/exps/inference.py
+++ b/paddlespeech/t2s/exps/inference.py
@@ -104,7 +104,7 @@ def get_voc_output(args, voc_predictor, input):
def parse_args():
parser = argparse.ArgumentParser(
- description="Paddle Infernce with speedyspeech & parallel wavegan.")
+ description="Paddle Infernce with acoustic model & vocoder.")
# acoustic model
parser.add_argument(
'--am',
diff --git a/paddlespeech/t2s/exps/ort_predict.py b/paddlespeech/t2s/exps/ort_predict.py
new file mode 100644
index 000000000..e8d4d61c3
--- /dev/null
+++ b/paddlespeech/t2s/exps/ort_predict.py
@@ -0,0 +1,156 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+from pathlib import Path
+
+import jsonlines
+import numpy as np
+import onnxruntime as ort
+import soundfile as sf
+from timer import timer
+
+from paddlespeech.t2s.exps.syn_utils import get_test_dataset
+from paddlespeech.t2s.utils import str2bool
+
+
+def get_sess(args, filed='am'):
+ full_name = ''
+ if filed == 'am':
+ full_name = args.am
+ elif filed == 'voc':
+ full_name = args.voc
+ model_dir = str(Path(args.inference_dir) / (full_name + ".onnx"))
+ sess_options = ort.SessionOptions()
+ sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
+ sess_options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
+
+ if args.device == "gpu":
+ # fastspeech2/mb_melgan can't use trt now!
+ if args.use_trt:
+ providers = ['TensorrtExecutionProvider']
+ else:
+ providers = ['CUDAExecutionProvider']
+ elif args.device == "cpu":
+ providers = ['CPUExecutionProvider']
+ sess_options.intra_op_num_threads = args.cpu_threads
+ sess = ort.InferenceSession(
+ model_dir, providers=providers, sess_options=sess_options)
+ return sess
+
+
+def ort_predict(args):
+ # construct dataset for evaluation
+ with jsonlines.open(args.test_metadata, 'r') as reader:
+ test_metadata = list(reader)
+ am_name = args.am[:args.am.rindex('_')]
+ am_dataset = args.am[args.am.rindex('_') + 1:]
+ test_dataset = get_test_dataset(args, test_metadata, am_name, am_dataset)
+
+ output_dir = Path(args.output_dir)
+ output_dir.mkdir(parents=True, exist_ok=True)
+
+ fs = 24000 if am_dataset != 'ljspeech' else 22050
+
+ # am
+ am_sess = get_sess(args, filed='am')
+
+ # vocoder
+ voc_sess = get_sess(args, filed='voc')
+
+ # am warmup
+ for T in [27, 38, 54]:
+ data = np.random.randint(1, 266, size=(T, ))
+ am_sess.run(None, {"text": data})
+
+ # voc warmup
+ for T in [227, 308, 544]:
+ data = np.random.rand(T, 80).astype("float32")
+ voc_sess.run(None, {"logmel": data})
+ print("warm up done!")
+
+ N = 0
+ T = 0
+ for example in test_dataset:
+ utt_id = example['utt_id']
+ phone_ids = example["text"]
+ with timer() as t:
+ mel = am_sess.run(output_names=None, input_feed={'text': phone_ids})
+ mel = mel[0]
+ wav = voc_sess.run(output_names=None, input_feed={'logmel': mel})
+
+ N += len(wav[0])
+ T += t.elapse
+ speed = len(wav[0]) / t.elapse
+ rtf = fs / speed
+ sf.write(
+ str(output_dir / (utt_id + ".wav")),
+ np.array(wav)[0],
+ samplerate=fs)
+ print(
+ f"{utt_id}, mel: {mel.shape}, wave: {len(wav[0])}, time: {t.elapse}s, Hz: {speed}, RTF: {rtf}."
+ )
+ print(f"generation speed: {N / T}Hz, RTF: {fs / (N / T) }")
+
+
+def parse_args():
+ parser = argparse.ArgumentParser(description="Infernce with onnxruntime.")
+ # acoustic model
+ parser.add_argument(
+ '--am',
+ type=str,
+ default='fastspeech2_csmsc',
+ choices=[
+ 'fastspeech2_csmsc',
+ ],
+ help='Choose acoustic model type of tts task.')
+
+ # voc
+ parser.add_argument(
+ '--voc',
+ type=str,
+ default='hifigan_csmsc',
+ choices=['hifigan_csmsc', 'mb_melgan_csmsc'],
+ help='Choose vocoder type of tts task.')
+ # other
+ parser.add_argument(
+ "--inference_dir", type=str, help="dir to save inference models")
+ parser.add_argument("--test_metadata", type=str, help="test metadata.")
+ parser.add_argument("--output_dir", type=str, help="output dir")
+
+ # inference
+ parser.add_argument(
+ "--use_trt",
+ type=str2bool,
+ default=False,
+ help="Whether to use inference engin TensorRT.", )
+
+ parser.add_argument(
+ "--device",
+ default="gpu",
+ choices=["gpu", "cpu"],
+ help="Device selected for inference.", )
+ parser.add_argument('--cpu_threads', type=int, default=1)
+
+ args, _ = parser.parse_known_args()
+ return args
+
+
+def main():
+ args = parse_args()
+
+ ort_predict(args)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/paddlespeech/t2s/exps/ort_predict_e2e.py b/paddlespeech/t2s/exps/ort_predict_e2e.py
new file mode 100644
index 000000000..8aa04cbc5
--- /dev/null
+++ b/paddlespeech/t2s/exps/ort_predict_e2e.py
@@ -0,0 +1,183 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+from pathlib import Path
+
+import numpy as np
+import onnxruntime as ort
+import soundfile as sf
+from timer import timer
+
+from paddlespeech.t2s.exps.syn_utils import get_frontend
+from paddlespeech.t2s.exps.syn_utils import get_sentences
+from paddlespeech.t2s.utils import str2bool
+
+
+def get_sess(args, filed='am'):
+ full_name = ''
+ if filed == 'am':
+ full_name = args.am
+ elif filed == 'voc':
+ full_name = args.voc
+ model_dir = str(Path(args.inference_dir) / (full_name + ".onnx"))
+ sess_options = ort.SessionOptions()
+ sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
+ sess_options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
+
+ if args.device == "gpu":
+ # fastspeech2/mb_melgan can't use trt now!
+ if args.use_trt:
+ providers = ['TensorrtExecutionProvider']
+ else:
+ providers = ['CUDAExecutionProvider']
+ elif args.device == "cpu":
+ providers = ['CPUExecutionProvider']
+ sess_options.intra_op_num_threads = args.cpu_threads
+ sess = ort.InferenceSession(
+ model_dir, providers=providers, sess_options=sess_options)
+ return sess
+
+
+def ort_predict(args):
+
+ # frontend
+ frontend = get_frontend(args)
+
+ output_dir = Path(args.output_dir)
+ output_dir.mkdir(parents=True, exist_ok=True)
+ sentences = get_sentences(args)
+
+ am_name = args.am[:args.am.rindex('_')]
+ am_dataset = args.am[args.am.rindex('_') + 1:]
+ fs = 24000 if am_dataset != 'ljspeech' else 22050
+
+ # am
+ am_sess = get_sess(args, filed='am')
+
+ # vocoder
+ voc_sess = get_sess(args, filed='voc')
+
+ # am warmup
+ for T in [27, 38, 54]:
+ data = np.random.randint(1, 266, size=(T, ))
+ am_sess.run(None, {"text": data})
+
+ # voc warmup
+ for T in [227, 308, 544]:
+ data = np.random.rand(T, 80).astype("float32")
+ voc_sess.run(None, {"logmel": data})
+ print("warm up done!")
+
+ # frontend warmup
+ # Loading model cost 0.5+ seconds
+ if args.lang == 'zh':
+ frontend.get_input_ids("你好,欢迎使用飞桨框架进行深度学习研究!", merge_sentences=True)
+ else:
+ print("lang should in be 'zh' here!")
+
+ N = 0
+ T = 0
+ merge_sentences = True
+ for utt_id, sentence in sentences:
+ with timer() as t:
+ if args.lang == 'zh':
+ input_ids = frontend.get_input_ids(
+ sentence, merge_sentences=merge_sentences)
+
+ phone_ids = input_ids["phone_ids"]
+ else:
+ print("lang should in be 'zh' here!")
+ # merge_sentences=True here, so we only use the first item of phone_ids
+ phone_ids = phone_ids[0].numpy()
+ mel = am_sess.run(output_names=None, input_feed={'text': phone_ids})
+ mel = mel[0]
+ wav = voc_sess.run(output_names=None, input_feed={'logmel': mel})
+
+ N += len(wav[0])
+ T += t.elapse
+ speed = len(wav[0]) / t.elapse
+ rtf = fs / speed
+ sf.write(
+ str(output_dir / (utt_id + ".wav")),
+ np.array(wav)[0],
+ samplerate=fs)
+ print(
+ f"{utt_id}, mel: {mel.shape}, wave: {len(wav[0])}, time: {t.elapse}s, Hz: {speed}, RTF: {rtf}."
+ )
+ print(f"generation speed: {N / T}Hz, RTF: {fs / (N / T) }")
+
+
+def parse_args():
+ parser = argparse.ArgumentParser(description="Infernce with onnxruntime.")
+ # acoustic model
+ parser.add_argument(
+ '--am',
+ type=str,
+ default='fastspeech2_csmsc',
+ choices=[
+ 'fastspeech2_csmsc',
+ ],
+ help='Choose acoustic model type of tts task.')
+ parser.add_argument(
+ "--phones_dict", type=str, default=None, help="phone vocabulary file.")
+ parser.add_argument(
+ "--tones_dict", type=str, default=None, help="tone vocabulary file.")
+
+ # voc
+ parser.add_argument(
+ '--voc',
+ type=str,
+ default='hifigan_csmsc',
+ choices=['hifigan_csmsc', 'mb_melgan_csmsc'],
+ help='Choose vocoder type of tts task.')
+ # other
+ parser.add_argument(
+ "--inference_dir", type=str, help="dir to save inference models")
+ parser.add_argument(
+ "--text",
+ type=str,
+ help="text to synthesize, a 'utt_id sentence' pair per line")
+ parser.add_argument("--output_dir", type=str, help="output dir")
+ parser.add_argument(
+ '--lang',
+ type=str,
+ default='zh',
+ help='Choose model language. zh or en')
+
+ # inference
+ parser.add_argument(
+ "--use_trt",
+ type=str2bool,
+ default=False,
+ help="Whether to use inference engin TensorRT.", )
+
+ parser.add_argument(
+ "--device",
+ default="gpu",
+ choices=["gpu", "cpu"],
+ help="Device selected for inference.", )
+ parser.add_argument('--cpu_threads', type=int, default=1)
+
+ args, _ = parser.parse_known_args()
+ return args
+
+
+def main():
+ args = parse_args()
+
+ ort_predict(args)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/paddlespeech/t2s/exps/speedyspeech/preprocess.py b/paddlespeech/t2s/exps/speedyspeech/preprocess.py
index 3f81c4e14..e833d1394 100644
--- a/paddlespeech/t2s/exps/speedyspeech/preprocess.py
+++ b/paddlespeech/t2s/exps/speedyspeech/preprocess.py
@@ -79,6 +79,9 @@ def process_sentence(config: Dict[str, Any],
logmel = mel_extractor.get_log_mel_fbank(wav)
# change duration according to mel_length
compare_duration_and_mel_length(sentences, utt_id, logmel)
+ # utt_id may be popped in compare_duration_and_mel_length
+ if utt_id not in sentences:
+ return None
labels = sentences[utt_id][0]
# extract phone and duration
phones = []
diff --git a/paddlespeech/t2s/exps/synthesize_streaming.py b/paddlespeech/t2s/exps/synthesize_streaming.py
index f38b2d352..7b9906c10 100644
--- a/paddlespeech/t2s/exps/synthesize_streaming.py
+++ b/paddlespeech/t2s/exps/synthesize_streaming.py
@@ -90,6 +90,7 @@ def evaluate(args):
output_dir = Path(args.output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
merge_sentences = True
+ get_tone_ids = False
N = 0
T = 0
@@ -98,8 +99,6 @@ def evaluate(args):
for utt_id, sentence in sentences:
with timer() as t:
- get_tone_ids = False
-
if args.lang == 'zh':
input_ids = frontend.get_input_ids(
sentence,
diff --git a/paddlespeech/t2s/exps/tacotron2/preprocess.py b/paddlespeech/t2s/exps/tacotron2/preprocess.py
index 7f41089eb..14a0d7eae 100644
--- a/paddlespeech/t2s/exps/tacotron2/preprocess.py
+++ b/paddlespeech/t2s/exps/tacotron2/preprocess.py
@@ -82,6 +82,9 @@ def process_sentence(config: Dict[str, Any],
logmel = mel_extractor.get_log_mel_fbank(wav)
# change duration according to mel_length
compare_duration_and_mel_length(sentences, utt_id, logmel)
+ # utt_id may be popped in compare_duration_and_mel_length
+ if utt_id not in sentences:
+ return None
phones = sentences[utt_id][0]
durations = sentences[utt_id][1]
num_frames = logmel.shape[0]
diff --git a/paddlespeech/t2s/modules/positional_encoding.py b/paddlespeech/t2s/modules/positional_encoding.py
index 7c368c3aa..715c576f5 100644
--- a/paddlespeech/t2s/modules/positional_encoding.py
+++ b/paddlespeech/t2s/modules/positional_encoding.py
@@ -31,8 +31,9 @@ def sinusoid_position_encoding(num_positions: int,
channel = paddle.arange(0, feature_size, 2, dtype=dtype)
index = paddle.arange(start_pos, start_pos + num_positions, 1, dtype=dtype)
- p = (paddle.unsqueeze(index, -1) *
- omega) / (10000.0**(channel / float(feature_size)))
+ denominator = channel / float(feature_size)
+ denominator = paddle.to_tensor([10000.0], dtype='float32')**denominator
+ p = (paddle.unsqueeze(index, -1) * omega) / denominator
encodings = paddle.zeros([num_positions, feature_size], dtype=dtype)
encodings[:, 0::2] = paddle.sin(p)
encodings[:, 1::2] = paddle.cos(p)
diff --git a/paddlespeech/vector/cluster/diarization.py b/paddlespeech/vector/cluster/diarization.py
index 597aa4807..5b2157257 100644
--- a/paddlespeech/vector/cluster/diarization.py
+++ b/paddlespeech/vector/cluster/diarization.py
@@ -746,6 +746,77 @@ def merge_ssegs_same_speaker(lol):
return new_lol
+def write_ders_file(ref_rttm, DER, out_der_file):
+ """Write the final DERs for individual recording.
+
+ Arguments
+ ---------
+ ref_rttm : str
+ Reference RTTM file.
+ DER : array
+ Array containing DER values of each recording.
+ out_der_file : str
+ File to write the DERs.
+ """
+
+ rttm = read_rttm(ref_rttm)
+ spkr_info = list(filter(lambda x: x.startswith("SPKR-INFO"), rttm))
+
+ rec_id_list = []
+ count = 0
+
+ with open(out_der_file, "w") as f:
+ for row in spkr_info:
+ a = row.split(" ")
+ rec_id = a[1]
+ if rec_id not in rec_id_list:
+ r = [rec_id, str(round(DER[count], 2))]
+ rec_id_list.append(rec_id)
+ line_str = " ".join(r)
+ f.write("%s\n" % line_str)
+ count += 1
+ r = ["OVERALL ", str(round(DER[count], 2))]
+ line_str = " ".join(r)
+ f.write("%s\n" % line_str)
+
+
+def get_oracle_num_spkrs(rec_id, spkr_info):
+ """
+ Returns actual number of speakers in a recording from the ground-truth.
+ This can be used when the condition is oracle number of speakers.
+
+ Arguments
+ ---------
+ rec_id : str
+ Recording ID for which the number of speakers have to be obtained.
+ spkr_info : list
+ Header of the RTTM file. Starting with `SPKR-INFO`.
+
+ Example
+ -------
+ >>> from speechbrain.processing import diarization as diar
+ >>> spkr_info = ['SPKR-INFO ES2011a 0 unknown ES2011a.A ',
+ ... 'SPKR-INFO ES2011a 0 unknown ES2011a.B ',
+ ... 'SPKR-INFO ES2011a 0 unknown ES2011a.C ',
+ ... 'SPKR-INFO ES2011a 0 unknown ES2011a.D ',
+ ... 'SPKR-INFO ES2011b 0 unknown ES2011b.A ',
+ ... 'SPKR-INFO ES2011b 0 unknown ES2011b.B ',
+ ... 'SPKR-INFO ES2011b 0 unknown ES2011b.C ']
+ >>> diar.get_oracle_num_spkrs('ES2011a', spkr_info)
+ 4
+ >>> diar.get_oracle_num_spkrs('ES2011b', spkr_info)
+ 3
+ """
+
+ num_spkrs = 0
+ for line in spkr_info:
+ if rec_id in line:
+ # Since rec_id is prefix for each speaker
+ num_spkrs += 1
+
+ return num_spkrs
+
+
def distribute_overlap(lol):
"""
Distributes the overlapped speech equally among the adjacent segments
@@ -826,6 +897,29 @@ def distribute_overlap(lol):
return new_lol
+def read_rttm(rttm_file_path):
+ """
+ Reads and returns RTTM in list format.
+
+ Arguments
+ ---------
+ rttm_file_path : str
+ Path to the RTTM file to be read.
+
+ Returns
+ -------
+ rttm : list
+ List containing rows of RTTM file.
+ """
+
+ rttm = []
+ with open(rttm_file_path, "r") as f:
+ for line in f:
+ entry = line[:-1]
+ rttm.append(entry)
+ return rttm
+
+
def write_rttm(segs_list, out_rttm_file):
"""
Writes the segment list in RTTM format (A standard NIST format).
diff --git a/paddlespeech/vector/exps/ecapa_tdnn/test.py b/paddlespeech/vector/exps/ecapa_tdnn/test.py
index d0de6dc51..70b1521ed 100644
--- a/paddlespeech/vector/exps/ecapa_tdnn/test.py
+++ b/paddlespeech/vector/exps/ecapa_tdnn/test.py
@@ -21,10 +21,11 @@ from paddle.io import DataLoader
from tqdm import tqdm
from yacs.config import CfgNode
-from paddleaudio.datasets import VoxCeleb
from paddleaudio.metric import compute_eer
from paddlespeech.s2t.utils.log import Log
from paddlespeech.vector.io.batch import batch_feature_normalize
+from paddlespeech.vector.io.dataset import CSVDataset
+from paddlespeech.vector.io.embedding_norm import InputNormalization
from paddlespeech.vector.models.ecapa_tdnn import EcapaTdnn
from paddlespeech.vector.modules.sid_model import SpeakerIdetification
from paddlespeech.vector.training.seeding import seed_everything
@@ -32,6 +33,91 @@ from paddlespeech.vector.training.seeding import seed_everything
logger = Log(__name__).getlog()
+def compute_dataset_embedding(data_loader, model, mean_var_norm_emb, config,
+ id2embedding):
+ """compute the dataset embeddings
+
+ Args:
+ data_loader (_type_): _description_
+ model (_type_): _description_
+ mean_var_norm_emb (_type_): _description_
+ config (_type_): _description_
+ """
+ logger.info(
+ f'Computing embeddings on {data_loader.dataset.csv_path} dataset')
+ with paddle.no_grad():
+ for batch_idx, batch in enumerate(tqdm(data_loader)):
+
+ # stage 8-1: extrac the audio embedding
+ ids, feats, lengths = batch['ids'], batch['feats'], batch['lengths']
+ embeddings = model.backbone(feats, lengths).squeeze(
+ -1) # (N, emb_size, 1) -> (N, emb_size)
+
+ # Global embedding normalization.
+ # if we use the global embedding norm
+ # eer can reduece about relative 10%
+ if config.global_embedding_norm and mean_var_norm_emb:
+ lengths = paddle.ones([embeddings.shape[0]])
+ embeddings = mean_var_norm_emb(embeddings, lengths)
+
+ # Update embedding dict.
+ id2embedding.update(dict(zip(ids, embeddings)))
+
+
+def compute_verification_scores(id2embedding, train_cohort, config):
+ labels = []
+ enroll_ids = []
+ test_ids = []
+ logger.info(f"read the trial from {config.verification_file}")
+ cos_sim_func = paddle.nn.CosineSimilarity(axis=-1)
+ scores = []
+ with open(config.verification_file, 'r') as f:
+ for line in f.readlines():
+ label, enroll_id, test_id = line.strip().split(' ')
+ enroll_id = enroll_id.split('.')[0].replace('/', '-')
+ test_id = test_id.split('.')[0].replace('/', '-')
+ labels.append(int(label))
+
+ enroll_emb = id2embedding[enroll_id]
+ test_emb = id2embedding[test_id]
+ score = cos_sim_func(enroll_emb, test_emb).item()
+
+ if "score_norm" in config:
+ # Getting norm stats for enroll impostors
+ enroll_rep = paddle.tile(
+ enroll_emb, repeat_times=[train_cohort.shape[0], 1])
+ score_e_c = cos_sim_func(enroll_rep, train_cohort)
+ if "cohort_size" in config:
+ score_e_c, _ = paddle.topk(
+ score_e_c, k=config.cohort_size, axis=0)
+ mean_e_c = paddle.mean(score_e_c, axis=0)
+ std_e_c = paddle.std(score_e_c, axis=0)
+
+ # Getting norm stats for test impostors
+ test_rep = paddle.tile(
+ test_emb, repeat_times=[train_cohort.shape[0], 1])
+ score_t_c = cos_sim_func(test_rep, train_cohort)
+ if "cohort_size" in config:
+ score_t_c, _ = paddle.topk(
+ score_t_c, k=config.cohort_size, axis=0)
+ mean_t_c = paddle.mean(score_t_c, axis=0)
+ std_t_c = paddle.std(score_t_c, axis=0)
+
+ if config.score_norm == "s-norm":
+ score_e = (score - mean_e_c) / std_e_c
+ score_t = (score - mean_t_c) / std_t_c
+
+ score = 0.5 * (score_e + score_t)
+ elif config.score_norm == "z-norm":
+ score = (score - mean_e_c) / std_e_c
+ elif config.score_norm == "t-norm":
+ score = (score - mean_t_c) / std_t_c
+
+ scores.append(score)
+
+ return scores, labels
+
+
def main(args, config):
# stage0: set the training device, cpu or gpu
paddle.set_device(args.device)
@@ -58,9 +144,8 @@ def main(args, config):
# stage4: construct the enroll and test dataloader
- enroll_dataset = VoxCeleb(
- subset='enroll',
- target_dir=args.data_dir,
+ enroll_dataset = CSVDataset(
+ os.path.join(args.data_dir, "vox/csv/enroll.csv"),
feat_type='melspectrogram',
random_chunk=False,
n_mels=config.n_mels,
@@ -68,16 +153,15 @@ def main(args, config):
hop_length=config.hop_size)
enroll_sampler = BatchSampler(
enroll_dataset, batch_size=config.batch_size,
- shuffle=True) # Shuffle to make embedding normalization more robust.
- enrol_loader = DataLoader(enroll_dataset,
+ shuffle=False) # Shuffle to make embedding normalization more robust.
+ enroll_loader = DataLoader(enroll_dataset,
batch_sampler=enroll_sampler,
collate_fn=lambda x: batch_feature_normalize(
- x, mean_norm=True, std_norm=False),
+ x, mean_norm=True, std_norm=False),
num_workers=config.num_workers,
return_list=True,)
- test_dataset = VoxCeleb(
- subset='test',
- target_dir=args.data_dir,
+ test_dataset = CSVDataset(
+ os.path.join(args.data_dir, "vox/csv/test.csv"),
feat_type='melspectrogram',
random_chunk=False,
n_mels=config.n_mels,
@@ -85,7 +169,7 @@ def main(args, config):
hop_length=config.hop_size)
test_sampler = BatchSampler(
- test_dataset, batch_size=config.batch_size, shuffle=True)
+ test_dataset, batch_size=config.batch_size, shuffle=False)
test_loader = DataLoader(test_dataset,
batch_sampler=test_sampler,
collate_fn=lambda x: batch_feature_normalize(
@@ -97,75 +181,65 @@ def main(args, config):
# stage6: global embedding norm to imporve the performance
logger.info(f"global embedding norm: {config.global_embedding_norm}")
- if config.global_embedding_norm:
- global_embedding_mean = None
- global_embedding_std = None
- mean_norm_flag = config.embedding_mean_norm
- std_norm_flag = config.embedding_std_norm
- batch_count = 0
# stage7: Compute embeddings of audios in enrol and test dataset from model.
+
+ if config.global_embedding_norm:
+ mean_var_norm_emb = InputNormalization(
+ norm_type="global",
+ mean_norm=config.embedding_mean_norm,
+ std_norm=config.embedding_std_norm)
+
+ if "score_norm" in config:
+ logger.info(f"we will do score norm: {config.score_norm}")
+ train_dataset = CSVDataset(
+ os.path.join(args.data_dir, "vox/csv/train.csv"),
+ feat_type='melspectrogram',
+ n_train_snts=config.n_train_snts,
+ random_chunk=False,
+ n_mels=config.n_mels,
+ window_size=config.window_size,
+ hop_length=config.hop_size)
+ train_sampler = BatchSampler(
+ train_dataset, batch_size=config.batch_size, shuffle=False)
+ train_loader = DataLoader(train_dataset,
+ batch_sampler=train_sampler,
+ collate_fn=lambda x: batch_feature_normalize(
+ x, mean_norm=True, std_norm=False),
+ num_workers=config.num_workers,
+ return_list=True,)
+
id2embedding = {}
# Run multi times to make embedding normalization more stable.
- for i in range(2):
- for dl in [enrol_loader, test_loader]:
- logger.info(
- f'Loop {[i+1]}: Computing embeddings on {dl.dataset.subset} dataset'
- )
- with paddle.no_grad():
- for batch_idx, batch in enumerate(tqdm(dl)):
-
- # stage 8-1: extrac the audio embedding
- ids, feats, lengths = batch['ids'], batch['feats'], batch[
- 'lengths']
- embeddings = model.backbone(feats, lengths).squeeze(
- -1).numpy() # (N, emb_size, 1) -> (N, emb_size)
-
- # Global embedding normalization.
- # if we use the global embedding norm
- # eer can reduece about relative 10%
- if config.global_embedding_norm:
- batch_count += 1
- current_mean = embeddings.mean(
- axis=0) if mean_norm_flag else 0
- current_std = embeddings.std(
- axis=0) if std_norm_flag else 1
- # Update global mean and std.
- if global_embedding_mean is None and global_embedding_std is None:
- global_embedding_mean, global_embedding_std = current_mean, current_std
- else:
- weight = 1 / batch_count # Weight decay by batches.
- global_embedding_mean = (
- 1 - weight
- ) * global_embedding_mean + weight * current_mean
- global_embedding_std = (
- 1 - weight
- ) * global_embedding_std + weight * current_std
- # Apply global embedding normalization.
- embeddings = (embeddings - global_embedding_mean
- ) / global_embedding_std
-
- # Update embedding dict.
- id2embedding.update(dict(zip(ids, embeddings)))
+ logger.info("First loop for enroll and test dataset")
+ compute_dataset_embedding(enroll_loader, model, mean_var_norm_emb, config,
+ id2embedding)
+ compute_dataset_embedding(test_loader, model, mean_var_norm_emb, config,
+ id2embedding)
+
+ logger.info("Second loop for enroll and test dataset")
+ compute_dataset_embedding(enroll_loader, model, mean_var_norm_emb, config,
+ id2embedding)
+ compute_dataset_embedding(test_loader, model, mean_var_norm_emb, config,
+ id2embedding)
+ mean_var_norm_emb.save(
+ os.path.join(args.load_checkpoint, "mean_var_norm_emb"))
# stage 8: Compute cosine scores.
- labels = []
- enroll_ids = []
- test_ids = []
- logger.info(f"read the trial from {VoxCeleb.veri_test_file}")
- with open(VoxCeleb.veri_test_file, 'r') as f:
- for line in f.readlines():
- label, enroll_id, test_id = line.strip().split(' ')
- labels.append(int(label))
- enroll_ids.append(enroll_id.split('.')[0].replace('/', '-'))
- test_ids.append(test_id.split('.')[0].replace('/', '-'))
-
- cos_sim_func = paddle.nn.CosineSimilarity(axis=1)
- enrol_embeddings, test_embeddings = map(lambda ids: paddle.to_tensor(
- np.asarray([id2embedding[uttid] for uttid in ids], dtype='float32')),
- [enroll_ids, test_ids
- ]) # (N, emb_size)
- scores = cos_sim_func(enrol_embeddings, test_embeddings)
+ train_cohort = None
+ if "score_norm" in config:
+ train_embeddings = {}
+ # cohort embedding not do mean and std norm
+ compute_dataset_embedding(train_loader, model, None, config,
+ train_embeddings)
+ train_cohort = paddle.stack(list(train_embeddings.values()))
+
+ # compute the scores
+ scores, labels = compute_verification_scores(id2embedding, train_cohort,
+ config)
+
+ # compute the EER and threshold
+ scores = paddle.to_tensor(scores)
EER, threshold = compute_eer(np.asarray(labels), scores.numpy())
logger.info(
f'EER of verification test: {EER*100:.4f}%, score threshold: {threshold:.5f}'
diff --git a/paddlespeech/vector/exps/ecapa_tdnn/train.py b/paddlespeech/vector/exps/ecapa_tdnn/train.py
index 257b97abe..b777dae89 100644
--- a/paddlespeech/vector/exps/ecapa_tdnn/train.py
+++ b/paddlespeech/vector/exps/ecapa_tdnn/train.py
@@ -23,13 +23,13 @@ from paddle.io import DistributedBatchSampler
from yacs.config import CfgNode
from paddleaudio.compliance.librosa import melspectrogram
-from paddleaudio.datasets.voxceleb import VoxCeleb
from paddlespeech.s2t.utils.log import Log
from paddlespeech.vector.io.augment import build_augment_pipeline
from paddlespeech.vector.io.augment import waveform_augment
from paddlespeech.vector.io.batch import batch_pad_right
from paddlespeech.vector.io.batch import feature_normalize
from paddlespeech.vector.io.batch import waveform_collate_fn
+from paddlespeech.vector.io.dataset import CSVDataset
from paddlespeech.vector.models.ecapa_tdnn import EcapaTdnn
from paddlespeech.vector.modules.loss import AdditiveAngularMargin
from paddlespeech.vector.modules.loss import LogSoftmaxWrapper
@@ -54,8 +54,12 @@ def main(args, config):
# stage2: data prepare, such vox1 and vox2 data, and augment noise data and pipline
# note: some cmd must do in rank==0, so wo will refactor the data prepare code
- train_dataset = VoxCeleb('train', target_dir=args.data_dir)
- dev_dataset = VoxCeleb('dev', target_dir=args.data_dir)
+ train_dataset = CSVDataset(
+ csv_path=os.path.join(args.data_dir, "vox/csv/train.csv"),
+ label2id_path=os.path.join(args.data_dir, "vox/meta/label2id.txt"))
+ dev_dataset = CSVDataset(
+ csv_path=os.path.join(args.data_dir, "vox/csv/dev.csv"),
+ label2id_path=os.path.join(args.data_dir, "vox/meta/label2id.txt"))
if config.augment:
augment_pipeline = build_augment_pipeline(target_dir=args.data_dir)
@@ -67,7 +71,7 @@ def main(args, config):
# stage4: build the speaker verification train instance with backbone model
model = SpeakerIdetification(
- backbone=ecapa_tdnn, num_class=VoxCeleb.num_speakers)
+ backbone=ecapa_tdnn, num_class=config.num_speakers)
# stage5: build the optimizer, we now only construct the AdamW optimizer
# 140000 is single gpu steps
@@ -193,15 +197,15 @@ def main(args, config):
paddle.optimizer.lr.LRScheduler):
optimizer._learning_rate.step()
optimizer.clear_grad()
- train_run_cost += time.time() - train_start
# stage 9-8: Calculate average loss per batch
- avg_loss += loss.numpy()[0]
+ avg_loss = loss.item()
# stage 9-9: Calculate metrics, which is one-best accuracy
preds = paddle.argmax(logits, axis=1)
num_corrects += (preds == labels).numpy().sum()
num_samples += feats.shape[0]
+ train_run_cost += time.time() - train_start
timer.count() # step plus one in timer
# stage 9-10: print the log information only on 0-rank per log-freq batchs
@@ -220,8 +224,9 @@ def main(args, config):
train_feat_cost / config.log_interval)
print_msg += ' avg_train_cost: {:.5f} sec,'.format(
train_run_cost / config.log_interval)
- print_msg += ' lr={:.4E} step/sec={:.2f} | ETA {}'.format(
- lr, timer.timing, timer.eta)
+
+ print_msg += ' lr={:.4E} step/sec={:.2f} ips:{:.5f}| ETA {}'.format(
+ lr, timer.timing, timer.ips, timer.eta)
logger.info(print_msg)
avg_loss = 0
diff --git a/paddlespeech/vector/io/augment.py b/paddlespeech/vector/io/augment.py
index 3baace139..0aa89c6a3 100644
--- a/paddlespeech/vector/io/augment.py
+++ b/paddlespeech/vector/io/augment.py
@@ -14,6 +14,7 @@
# this is modified from SpeechBrain
# https://github.com/speechbrain/speechbrain/blob/085be635c07f16d42cd1295045bc46c407f1e15b/speechbrain/lobes/augment.py
import math
+import os
from typing import List
import numpy as np
@@ -21,8 +22,8 @@ import paddle
import paddle.nn as nn
import paddle.nn.functional as F
-from paddleaudio.datasets.rirs_noises import OpenRIRNoise
from paddlespeech.s2t.utils.log import Log
+from paddlespeech.vector.io.dataset import CSVDataset
from paddlespeech.vector.io.signal_processing import compute_amplitude
from paddlespeech.vector.io.signal_processing import convolve1d
from paddlespeech.vector.io.signal_processing import dB_to_amplitude
@@ -509,7 +510,7 @@ class AddNoise(nn.Layer):
assert w >= 0, f'Target length {target_length} is less than origin length {x.shape[0]}'
return np.pad(x, [0, w], mode=mode, **kwargs)
- ids = [item['id'] for item in batch]
+ ids = [item['utt_id'] for item in batch]
lengths = np.asarray([item['feat'].shape[0] for item in batch])
waveforms = list(
map(lambda x: pad(x, max(max_length, lengths.max().item())),
@@ -589,7 +590,7 @@ class AddReverb(nn.Layer):
assert w >= 0, f'Target length {target_length} is less than origin length {x.shape[0]}'
return np.pad(x, [0, w], mode=mode, **kwargs)
- ids = [item['id'] for item in batch]
+ ids = [item['utt_id'] for item in batch]
lengths = np.asarray([item['feat'].shape[0] for item in batch])
waveforms = list(
map(lambda x: pad(x, lengths.max().item()),
@@ -839,8 +840,10 @@ def build_augment_pipeline(target_dir=None) -> List[paddle.nn.Layer]:
List[paddle.nn.Layer]: all augment process
"""
logger.info("start to build the augment pipeline")
- noise_dataset = OpenRIRNoise('noise', target_dir=target_dir)
- rir_dataset = OpenRIRNoise('rir', target_dir=target_dir)
+ noise_dataset = CSVDataset(csv_path=os.path.join(target_dir,
+ "rir_noise/csv/noise.csv"))
+ rir_dataset = CSVDataset(csv_path=os.path.join(target_dir,
+ "rir_noise/csv/rir.csv"))
wavedrop = TimeDomainSpecAugment(
sample_rate=16000,
diff --git a/paddlespeech/vector/io/batch.py b/paddlespeech/vector/io/batch.py
index 92ca990cf..5049d1946 100644
--- a/paddlespeech/vector/io/batch.py
+++ b/paddlespeech/vector/io/batch.py
@@ -17,6 +17,17 @@ import paddle
def waveform_collate_fn(batch):
+ """Wrap the waveform into a batch form
+
+ Args:
+ batch (list): the waveform list from the dataloader
+ the item of data include several field
+ feat: the utterance waveform data
+ label: the utterance label encoding data
+
+ Returns:
+ dict: the batch data to dataloader
+ """
waveforms = np.stack([item['feat'] for item in batch])
labels = np.stack([item['label'] for item in batch])
@@ -27,6 +38,18 @@ def feature_normalize(feats: paddle.Tensor,
mean_norm: bool=True,
std_norm: bool=True,
convert_to_numpy: bool=False):
+ """Do one utterance feature normalization
+
+ Args:
+ feats (paddle.Tensor): the original utterance feat, such as fbank, mfcc
+ mean_norm (bool, optional): mean norm flag. Defaults to True.
+ std_norm (bool, optional): std norm flag. Defaults to True.
+ convert_to_numpy (bool, optional): convert the paddle.tensor to numpy
+ and do feature norm with numpy. Defaults to False.
+
+ Returns:
+ paddle.Tensor : the normalized feats
+ """
# Features normalization if needed
# numpy.mean is a little with paddle.mean about 1e-6
if convert_to_numpy:
@@ -60,7 +83,17 @@ def pad_right_2d(x, target_length, axis=-1, mode='constant', **kwargs):
def batch_feature_normalize(batch, mean_norm: bool=True, std_norm: bool=True):
- ids = [item['id'] for item in batch]
+ """Do batch utterance features normalization
+
+ Args:
+ batch (list): the batch feature from dataloader
+ mean_norm (bool, optional): mean normalization flag. Defaults to True.
+ std_norm (bool, optional): std normalization flag. Defaults to True.
+
+ Returns:
+ dict: the normalized batch features
+ """
+ ids = [item['utt_id'] for item in batch]
lengths = np.asarray([item['feat'].shape[1] for item in batch])
feats = list(
map(lambda x: pad_right_2d(x, lengths.max()),
diff --git a/paddlespeech/vector/io/dataset.py b/paddlespeech/vector/io/dataset.py
new file mode 100644
index 000000000..316c8ac34
--- /dev/null
+++ b/paddlespeech/vector/io/dataset.py
@@ -0,0 +1,192 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from dataclasses import dataclass
+from dataclasses import fields
+from paddle.io import Dataset
+
+from paddleaudio import load as load_audio
+from paddleaudio.compliance.librosa import melspectrogram
+from paddlespeech.s2t.utils.log import Log
+logger = Log(__name__).getlog()
+
+# the audio meta info in the vector CSVDataset
+# utt_id: the utterance segment name
+# duration: utterance segment time
+# wav: utterance file path
+# start: start point in the original wav file
+# stop: stop point in the original wav file
+# label: the utterance segment's label id
+
+
+@dataclass
+class meta_info:
+ """the audio meta info in the vector CSVDataset
+
+ Args:
+ utt_id (str): the utterance segment name
+ duration (float): utterance segment time
+ wav (str): utterance file path
+ start (int): start point in the original wav file
+ stop (int): stop point in the original wav file
+ lab_id (str): the utterance segment's label id
+ """
+ utt_id: str
+ duration: float
+ wav: str
+ start: int
+ stop: int
+ label: str
+
+
+# csv dataset support feature type
+# raw: return the pcm data sample point
+# melspectrogram: fbank feature
+feat_funcs = {
+ 'raw': None,
+ 'melspectrogram': melspectrogram,
+}
+
+
+class CSVDataset(Dataset):
+ def __init__(self,
+ csv_path,
+ label2id_path=None,
+ config=None,
+ random_chunk=True,
+ feat_type: str="raw",
+ n_train_snts: int=-1,
+ **kwargs):
+ """Implement the CSV Dataset
+
+ Args:
+ csv_path (str): csv dataset file path
+ label2id_path (str): the utterance label to integer id map file path
+ config (CfgNode): yaml config
+ feat_type (str): dataset feature type. if it is raw, it return pcm data.
+ n_train_snts (int): select the n_train_snts sample from the dataset.
+ if n_train_snts = -1, dataset will load all the sample.
+ Default value is -1.
+ kwargs : feature type args
+ """
+ super().__init__()
+ self.csv_path = csv_path
+ self.label2id_path = label2id_path
+ self.config = config
+ self.random_chunk = random_chunk
+ self.feat_type = feat_type
+ self.n_train_snts = n_train_snts
+ self.feat_config = kwargs
+ self.id2label = {}
+ self.label2id = {}
+ self.data = self.load_data_csv()
+ self.load_speaker_to_label()
+
+ def load_data_csv(self):
+ """Load the csv dataset content and store them in the data property
+ the csv dataset's format has six fields,
+ that is audio_id or utt_id, audio duration, segment start point, segment stop point
+ and utterance label.
+ Note in training period, the utterance label must has a map to integer id in label2id_path
+
+ Returns:
+ list: the csv data with meta_info type
+ """
+ data = []
+
+ with open(self.csv_path, 'r') as rf:
+ for line in rf.readlines()[1:]:
+ audio_id, duration, wav, start, stop, spk_id = line.strip(
+ ).split(',')
+ data.append(
+ meta_info(audio_id,
+ float(duration), wav,
+ int(start), int(stop), spk_id))
+ if self.n_train_snts > 0:
+ sample_num = min(self.n_train_snts, len(data))
+ data = data[0:sample_num]
+
+ return data
+
+ def load_speaker_to_label(self):
+ """Load the utterance label map content.
+ In vector domain, we call the utterance label as speaker label.
+ The speaker label is real speaker label in speaker verification domain,
+ and in language identification is language label.
+ """
+ if not self.label2id_path:
+ logger.warning("No speaker id to label file")
+ return
+
+ with open(self.label2id_path, 'r') as f:
+ for line in f.readlines():
+ label_name, label_id = line.strip().split(' ')
+ self.label2id[label_name] = int(label_id)
+ self.id2label[int(label_id)] = label_name
+
+ def convert_to_record(self, idx: int):
+ """convert the dataset sample to training record the CSV Dataset
+
+ Args:
+ idx (int) : the request index in all the dataset
+ """
+ sample = self.data[idx]
+
+ record = {}
+ # To show all fields in a namedtuple: `type(sample)._fields`
+ for field in fields(sample):
+ record[field.name] = getattr(sample, field.name)
+
+ waveform, sr = load_audio(record['wav'])
+
+ # random select a chunk audio samples from the audio
+ if self.config and self.config.random_chunk:
+ num_wav_samples = waveform.shape[0]
+ num_chunk_samples = int(self.config.chunk_duration * sr)
+ start = random.randint(0, num_wav_samples - num_chunk_samples - 1)
+ stop = start + num_chunk_samples
+ else:
+ start = record['start']
+ stop = record['stop']
+
+ # we only return the waveform as feat
+ waveform = waveform[start:stop]
+
+ # all availabel feature type is in feat_funcs
+ assert self.feat_type in feat_funcs.keys(), \
+ f"Unknown feat_type: {self.feat_type}, it must be one in {list(feat_funcs.keys())}"
+ feat_func = feat_funcs[self.feat_type]
+ feat = feat_func(
+ waveform, sr=sr, **self.feat_config) if feat_func else waveform
+
+ record.update({'feat': feat})
+ if self.label2id:
+ record.update({'label': self.label2id[record['label']]})
+
+ return record
+
+ def __getitem__(self, idx):
+ """Return the specific index sample
+
+ Args:
+ idx (int) : the request index in all the dataset
+ """
+ return self.convert_to_record(idx)
+
+ def __len__(self):
+ """Return the dataset length
+
+ Returns:
+ int: the length num of the dataset
+ """
+ return len(self.data)
diff --git a/paddlespeech/vector/io/dataset_from_json.py b/paddlespeech/vector/io/dataset_from_json.py
new file mode 100644
index 000000000..5ffd2c186
--- /dev/null
+++ b/paddlespeech/vector/io/dataset_from_json.py
@@ -0,0 +1,116 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import json
+
+from dataclasses import dataclass
+from dataclasses import fields
+from paddle.io import Dataset
+
+from paddleaudio import load as load_audio
+from paddleaudio.compliance.librosa import melspectrogram
+from paddleaudio.compliance.librosa import mfcc
+
+
+@dataclass
+class meta_info:
+ """the audio meta info in the vector JSONDataset
+ Args:
+ id (str): the segment name
+ duration (float): segment time
+ wav (str): wav file path
+ start (int): start point in the original wav file
+ stop (int): stop point in the original wav file
+ lab_id (str): the record id
+ """
+ id: str
+ duration: float
+ wav: str
+ start: int
+ stop: int
+ record_id: str
+
+
+# json dataset support feature type
+feat_funcs = {
+ 'raw': None,
+ 'melspectrogram': melspectrogram,
+ 'mfcc': mfcc,
+}
+
+
+class JSONDataset(Dataset):
+ """
+ dataset from json file.
+ """
+
+ def __init__(self, json_file: str, feat_type: str='raw', **kwargs):
+ """
+ Ags:
+ json_file (:obj:`str`): Data prep JSON file.
+ labels (:obj:`List[int]`): Labels of audio files.
+ feat_type (:obj:`str`, `optional`, defaults to `raw`):
+ It identifies the feature type that user wants to extrace of an audio file.
+ """
+ if feat_type not in feat_funcs.keys():
+ raise RuntimeError(
+ f"Unknown feat_type: {feat_type}, it must be one in {list(feat_funcs.keys())}"
+ )
+
+ self.json_file = json_file
+ self.feat_type = feat_type
+ self.feat_config = kwargs
+ self._data = self._get_data()
+ super(JSONDataset, self).__init__()
+
+ def _get_data(self):
+ with open(self.json_file, "r") as f:
+ meta_data = json.load(f)
+ data = []
+ for key in meta_data:
+ sub_seg = meta_data[key]["wav"]
+ wav = sub_seg["file"]
+ duration = sub_seg["duration"]
+ start = sub_seg["start"]
+ stop = sub_seg["stop"]
+ rec_id = str(key).rsplit("_", 2)[0]
+ data.append(
+ meta_info(
+ str(key),
+ float(duration), wav, int(start), int(stop), str(rec_id)))
+ return data
+
+ def _convert_to_record(self, idx: int):
+ sample = self._data[idx]
+
+ record = {}
+ # To show all fields in a namedtuple
+ for field in fields(sample):
+ record[field.name] = getattr(sample, field.name)
+
+ waveform, sr = load_audio(record['wav'])
+ waveform = waveform[record['start']:record['stop']]
+
+ feat_func = feat_funcs[self.feat_type]
+ feat = feat_func(
+ waveform, sr=sr, **self.feat_config) if feat_func else waveform
+
+ record.update({'feat': feat})
+
+ return record
+
+ def __getitem__(self, idx):
+ return self._convert_to_record(idx)
+
+ def __len__(self):
+ return len(self._data)
diff --git a/paddlespeech/vector/io/embedding_norm.py b/paddlespeech/vector/io/embedding_norm.py
new file mode 100644
index 000000000..619f37101
--- /dev/null
+++ b/paddlespeech/vector/io/embedding_norm.py
@@ -0,0 +1,214 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Dict
+
+import paddle
+
+
+class InputNormalization:
+ spk_dict_mean: Dict[int, paddle.Tensor]
+ spk_dict_std: Dict[int, paddle.Tensor]
+ spk_dict_count: Dict[int, int]
+
+ def __init__(
+ self,
+ mean_norm=True,
+ std_norm=True,
+ norm_type="global", ):
+ """Do feature or embedding mean and std norm
+
+ Args:
+ mean_norm (bool, optional): mean norm flag. Defaults to True.
+ std_norm (bool, optional): std norm flag. Defaults to True.
+ norm_type (str, optional): norm type. Defaults to "global".
+ """
+ super().__init__()
+ self.training = True
+ self.mean_norm = mean_norm
+ self.std_norm = std_norm
+ self.norm_type = norm_type
+ self.glob_mean = paddle.to_tensor([0], dtype="float32")
+ self.glob_std = paddle.to_tensor([0], dtype="float32")
+ self.spk_dict_mean = {}
+ self.spk_dict_std = {}
+ self.spk_dict_count = {}
+ self.weight = 1.0
+ self.count = 0
+ self.eps = 1e-10
+
+ def __call__(self,
+ x,
+ lengths,
+ spk_ids=paddle.to_tensor([], dtype="float32")):
+ """Returns the tensor with the surrounding context.
+ Args:
+ x (paddle.Tensor): A batch of tensors.
+ lengths (paddle.Tensor): A batch of tensors containing the relative length of each
+ sentence (e.g, [0.7, 0.9, 1.0]). It is used to avoid
+ computing stats on zero-padded steps.
+ spk_ids (_type_, optional): tensor containing the ids of each speaker (e.g, [0 10 6]).
+ It is used to perform per-speaker normalization when
+ norm_type='speaker'. Defaults to paddle.to_tensor([], dtype="float32").
+ Returns:
+ paddle.Tensor: The normalized feature or embedding
+ """
+ N_batches = x.shape[0]
+ # print(f"x shape: {x.shape[1]}")
+ current_means = []
+ current_stds = []
+
+ for snt_id in range(N_batches):
+
+ # Avoiding padded time steps
+ # actual size is the actual time data length
+ actual_size = paddle.round(lengths[snt_id] *
+ x.shape[1]).astype("int32")
+ # computing actual time data statistics
+ current_mean, current_std = self._compute_current_stats(
+ x[snt_id, 0:actual_size, ...].unsqueeze(0))
+ current_means.append(current_mean)
+ current_stds.append(current_std)
+
+ if self.norm_type == "global":
+ current_mean = paddle.mean(paddle.stack(current_means), axis=0)
+ current_std = paddle.mean(paddle.stack(current_stds), axis=0)
+
+ if self.norm_type == "global":
+
+ if self.training:
+ if self.count == 0:
+ self.glob_mean = current_mean
+ self.glob_std = current_std
+
+ else:
+ self.weight = 1 / (self.count + 1)
+
+ self.glob_mean = (
+ 1 - self.weight
+ ) * self.glob_mean + self.weight * current_mean
+
+ self.glob_std = (
+ 1 - self.weight
+ ) * self.glob_std + self.weight * current_std
+
+ self.glob_mean.detach()
+ self.glob_std.detach()
+
+ self.count = self.count + 1
+ x = (x - self.glob_mean) / (self.glob_std)
+ return x
+
+ def _compute_current_stats(self, x):
+ """Returns the tensor with the surrounding context.
+
+ Args:
+ x (paddle.Tensor): A batch of tensors.
+
+ Returns:
+ the statistics of the data
+ """
+ # Compute current mean
+ if self.mean_norm:
+ current_mean = paddle.mean(x, axis=0).detach()
+ else:
+ current_mean = paddle.to_tensor([0.0], dtype="float32")
+
+ # Compute current std
+ if self.std_norm:
+ current_std = paddle.std(x, axis=0).detach()
+ else:
+ current_std = paddle.to_tensor([1.0], dtype="float32")
+
+ # Improving numerical stability of std
+ current_std = paddle.maximum(current_std,
+ self.eps * paddle.ones_like(current_std))
+
+ return current_mean, current_std
+
+ def _statistics_dict(self):
+ """Fills the dictionary containing the normalization statistics.
+ """
+ state = {}
+ state["count"] = self.count
+ state["glob_mean"] = self.glob_mean
+ state["glob_std"] = self.glob_std
+ state["spk_dict_mean"] = self.spk_dict_mean
+ state["spk_dict_std"] = self.spk_dict_std
+ state["spk_dict_count"] = self.spk_dict_count
+
+ return state
+
+ def _load_statistics_dict(self, state):
+ """Loads the dictionary containing the statistics.
+
+ Arguments
+ ---------
+ state : dict
+ A dictionary containing the normalization statistics.
+ """
+ self.count = state["count"]
+ if isinstance(state["glob_mean"], int):
+ self.glob_mean = state["glob_mean"]
+ self.glob_std = state["glob_std"]
+ else:
+ self.glob_mean = state["glob_mean"] # .to(self.device_inp)
+ self.glob_std = state["glob_std"] # .to(self.device_inp)
+
+ # Loading the spk_dict_mean in the right device
+ self.spk_dict_mean = {}
+ for spk in state["spk_dict_mean"]:
+ self.spk_dict_mean[spk] = state["spk_dict_mean"][spk]
+
+ # Loading the spk_dict_std in the right device
+ self.spk_dict_std = {}
+ for spk in state["spk_dict_std"]:
+ self.spk_dict_std[spk] = state["spk_dict_std"][spk]
+
+ self.spk_dict_count = state["spk_dict_count"]
+
+ return state
+
+ def to(self, device):
+ """Puts the needed tensors in the right device.
+ """
+ self = super(InputNormalization, self).to(device)
+ self.glob_mean = self.glob_mean.to(device)
+ self.glob_std = self.glob_std.to(device)
+ for spk in self.spk_dict_mean:
+ self.spk_dict_mean[spk] = self.spk_dict_mean[spk].to(device)
+ self.spk_dict_std[spk] = self.spk_dict_std[spk].to(device)
+ return self
+
+ def save(self, path):
+ """Save statistic dictionary.
+
+ Args:
+ path (str): A path where to save the dictionary.
+ """
+ stats = self._statistics_dict()
+ paddle.save(stats, path)
+
+ def _load(self, path, end_of_epoch=False, device=None):
+ """Load statistic dictionary.
+
+ Arguments
+ ---------
+ path : str
+ The path of the statistic dictionary
+ device : str, None
+ Passed to paddle.load(..., map_location=device)
+ """
+ del end_of_epoch # Unused here.
+ stats = paddle.load(path, map_location=device)
+ self._load_statistics_dict(stats)
diff --git a/paddlespeech/vector/models/ecapa_tdnn.py b/paddlespeech/vector/models/ecapa_tdnn.py
index 0e7287cd3..895ff13f4 100644
--- a/paddlespeech/vector/models/ecapa_tdnn.py
+++ b/paddlespeech/vector/models/ecapa_tdnn.py
@@ -79,6 +79,20 @@ class Conv1d(nn.Layer):
bias_attr=bias, )
def forward(self, x):
+ """Do conv1d forward
+
+ Args:
+ x (paddle.Tensor): [N, C, L] input data,
+ N is the batch,
+ C is the data dimension,
+ L is the time
+
+ Raises:
+ ValueError: only support the same padding type
+
+ Returns:
+ paddle.Tensor: the value of conv1d
+ """
if self.padding == "same":
x = self._manage_padding(x, self.kernel_size, self.dilation,
self.stride)
@@ -88,6 +102,20 @@ class Conv1d(nn.Layer):
return self.conv(x)
def _manage_padding(self, x, kernel_size: int, dilation: int, stride: int):
+ """Padding the input data
+
+ Args:
+ x (paddle.Tensor): [N, C, L] input data
+ N is the batch,
+ C is the data dimension,
+ L is the time
+ kernel_size (int): 1-d convolution kernel size
+ dilation (int): 1-d convolution dilation
+ stride (int): 1-d convolution stride
+
+ Returns:
+ paddle.Tensor: the padded input data
+ """
L_in = x.shape[-1] # Detecting input shape
padding = self._get_padding_elem(L_in, stride, kernel_size,
dilation) # Time padding
@@ -101,6 +129,17 @@ class Conv1d(nn.Layer):
stride: int,
kernel_size: int,
dilation: int):
+ """Calculate the padding value in same mode
+
+ Args:
+ L_in (int): the times of the input data,
+ stride (int): 1-d convolution stride
+ kernel_size (int): 1-d convolution kernel size
+ dilation (int): 1-d convolution stride
+
+ Returns:
+ int: return the padding value in same mode
+ """
if stride > 1:
n_steps = math.ceil(((L_in - kernel_size * dilation) / stride) + 1)
L_out = stride * (n_steps - 1) + kernel_size * dilation
@@ -245,6 +284,13 @@ class SEBlock(nn.Layer):
class AttentiveStatisticsPooling(nn.Layer):
def __init__(self, channels, attention_channels=128, global_context=True):
+ """Compute the speaker verification statistics
+ The detail info is section 3.1 in https://arxiv.org/pdf/1709.01507.pdf
+ Args:
+ channels (int): input data channel or data dimension
+ attention_channels (int, optional): attention dimension. Defaults to 128.
+ global_context (bool, optional): If use the global context information. Defaults to True.
+ """
super().__init__()
self.eps = 1e-12
diff --git a/paddlespeech/vector/utils/time.py b/paddlespeech/vector/utils/time.py
index 8e85b0e12..9dfbbe1f7 100644
--- a/paddlespeech/vector/utils/time.py
+++ b/paddlespeech/vector/utils/time.py
@@ -23,6 +23,7 @@ class Timer(object):
self.last_start_step = 0
self.current_step = 0
self._is_running = True
+ self.cur_ips = 0
def start(self):
self.last_time = time.time()
@@ -43,12 +44,17 @@ class Timer(object):
self.last_start_step = self.current_step
time_used = time.time() - self.last_time
self.last_time = time.time()
+ self.cur_ips = run_steps / time_used
return time_used / run_steps
@property
def is_running(self) -> bool:
return self._is_running
+ @property
+ def ips(self) -> float:
+ return self.cur_ips
+
@property
def eta(self) -> str:
if not self.is_running:
diff --git a/paddlespeech/vector/utils/vector_utils.py b/paddlespeech/vector/utils/vector_utils.py
new file mode 100644
index 000000000..46de7ffaa
--- /dev/null
+++ b/paddlespeech/vector/utils/vector_utils.py
@@ -0,0 +1,32 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+def get_chunks(seg_dur, audio_id, audio_duration):
+ """Get all chunk segments from a utterance
+
+ Args:
+ seg_dur (float): segment chunk duration, seconds
+ audio_id (str): utterance name,
+ audio_duration (float): utterance duration, seconds
+
+ Returns:
+ List: all the chunk segments
+ """
+ num_chunks = int(audio_duration / seg_dur) # all in seconds
+ chunk_lst = [
+ audio_id + "_" + str(i * seg_dur) + "_" + str(i * seg_dur + seg_dur)
+ for i in range(num_chunks)
+ ]
+ return chunk_lst
diff --git a/speechx/examples/aishell/local/compute-wer.py b/speechx/examples/aishell/local/compute-wer.py
new file mode 100755
index 000000000..a3eefc0dc
--- /dev/null
+++ b/speechx/examples/aishell/local/compute-wer.py
@@ -0,0 +1,500 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+
+import re, sys, unicodedata
+import codecs
+
+remove_tag = True
+spacelist= [' ', '\t', '\r', '\n']
+puncts = ['!', ',', '?',
+ '、', '。', '!', ',', ';', '?',
+ ':', '「', '」', '︰', '『', '』', '《', '》']
+
+def characterize(string) :
+ res = []
+ i = 0
+ while i < len(string):
+ char = string[i]
+ if char in puncts:
+ i += 1
+ continue
+ cat1 = unicodedata.category(char)
+ #https://unicodebook.readthedocs.io/unicode.html#unicode-categories
+ if cat1 == 'Zs' or cat1 == 'Cn' or char in spacelist: # space or not assigned
+ i += 1
+ continue
+ if cat1 == 'Lo': # letter-other
+ res.append(char)
+ i += 1
+ else:
+ # some input looks like: , we want to separate it to two words.
+ sep = ' '
+ if char == '<': sep = '>'
+ j = i+1
+ while j < len(string):
+ c = string[j]
+ if ord(c) >= 128 or (c in spacelist) or (c==sep):
+ break
+ j += 1
+ if j < len(string) and string[j] == '>':
+ j += 1
+ res.append(string[i:j])
+ i = j
+ return res
+
+def stripoff_tags(x):
+ if not x: return ''
+ chars = []
+ i = 0; T=len(x)
+ while i < T:
+ if x[i] == '<':
+ while i < T and x[i] != '>':
+ i += 1
+ i += 1
+ else:
+ chars.append(x[i])
+ i += 1
+ return ''.join(chars)
+
+
+def normalize(sentence, ignore_words, cs, split=None):
+ """ sentence, ignore_words are both in unicode
+ """
+ new_sentence = []
+ for token in sentence:
+ x = token
+ if not cs:
+ x = x.upper()
+ if x in ignore_words:
+ continue
+ if remove_tag:
+ x = stripoff_tags(x)
+ if not x:
+ continue
+ if split and x in split:
+ new_sentence += split[x]
+ else:
+ new_sentence.append(x)
+ return new_sentence
+
+class Calculator :
+ def __init__(self) :
+ self.data = {}
+ self.space = []
+ self.cost = {}
+ self.cost['cor'] = 0
+ self.cost['sub'] = 1
+ self.cost['del'] = 1
+ self.cost['ins'] = 1
+ def calculate(self, lab, rec) :
+ # Initialization
+ lab.insert(0, '')
+ rec.insert(0, '')
+ while len(self.space) < len(lab) :
+ self.space.append([])
+ for row in self.space :
+ for element in row :
+ element['dist'] = 0
+ element['error'] = 'non'
+ while len(row) < len(rec) :
+ row.append({'dist' : 0, 'error' : 'non'})
+ for i in range(len(lab)) :
+ self.space[i][0]['dist'] = i
+ self.space[i][0]['error'] = 'del'
+ for j in range(len(rec)) :
+ self.space[0][j]['dist'] = j
+ self.space[0][j]['error'] = 'ins'
+ self.space[0][0]['error'] = 'non'
+ for token in lab :
+ if token not in self.data and len(token) > 0 :
+ self.data[token] = {'all' : 0, 'cor' : 0, 'sub' : 0, 'ins' : 0, 'del' : 0}
+ for token in rec :
+ if token not in self.data and len(token) > 0 :
+ self.data[token] = {'all' : 0, 'cor' : 0, 'sub' : 0, 'ins' : 0, 'del' : 0}
+ # Computing edit distance
+ for i, lab_token in enumerate(lab) :
+ for j, rec_token in enumerate(rec) :
+ if i == 0 or j == 0 :
+ continue
+ min_dist = sys.maxsize
+ min_error = 'none'
+ dist = self.space[i-1][j]['dist'] + self.cost['del']
+ error = 'del'
+ if dist < min_dist :
+ min_dist = dist
+ min_error = error
+ dist = self.space[i][j-1]['dist'] + self.cost['ins']
+ error = 'ins'
+ if dist < min_dist :
+ min_dist = dist
+ min_error = error
+ if lab_token == rec_token :
+ dist = self.space[i-1][j-1]['dist'] + self.cost['cor']
+ error = 'cor'
+ else :
+ dist = self.space[i-1][j-1]['dist'] + self.cost['sub']
+ error = 'sub'
+ if dist < min_dist :
+ min_dist = dist
+ min_error = error
+ self.space[i][j]['dist'] = min_dist
+ self.space[i][j]['error'] = min_error
+ # Tracing back
+ result = {'lab':[], 'rec':[], 'all':0, 'cor':0, 'sub':0, 'ins':0, 'del':0}
+ i = len(lab) - 1
+ j = len(rec) - 1
+ while True :
+ if self.space[i][j]['error'] == 'cor' : # correct
+ if len(lab[i]) > 0 :
+ self.data[lab[i]]['all'] = self.data[lab[i]]['all'] + 1
+ self.data[lab[i]]['cor'] = self.data[lab[i]]['cor'] + 1
+ result['all'] = result['all'] + 1
+ result['cor'] = result['cor'] + 1
+ result['lab'].insert(0, lab[i])
+ result['rec'].insert(0, rec[j])
+ i = i - 1
+ j = j - 1
+ elif self.space[i][j]['error'] == 'sub' : # substitution
+ if len(lab[i]) > 0 :
+ self.data[lab[i]]['all'] = self.data[lab[i]]['all'] + 1
+ self.data[lab[i]]['sub'] = self.data[lab[i]]['sub'] + 1
+ result['all'] = result['all'] + 1
+ result['sub'] = result['sub'] + 1
+ result['lab'].insert(0, lab[i])
+ result['rec'].insert(0, rec[j])
+ i = i - 1
+ j = j - 1
+ elif self.space[i][j]['error'] == 'del' : # deletion
+ if len(lab[i]) > 0 :
+ self.data[lab[i]]['all'] = self.data[lab[i]]['all'] + 1
+ self.data[lab[i]]['del'] = self.data[lab[i]]['del'] + 1
+ result['all'] = result['all'] + 1
+ result['del'] = result['del'] + 1
+ result['lab'].insert(0, lab[i])
+ result['rec'].insert(0, "")
+ i = i - 1
+ elif self.space[i][j]['error'] == 'ins' : # insertion
+ if len(rec[j]) > 0 :
+ self.data[rec[j]]['ins'] = self.data[rec[j]]['ins'] + 1
+ result['ins'] = result['ins'] + 1
+ result['lab'].insert(0, "")
+ result['rec'].insert(0, rec[j])
+ j = j - 1
+ elif self.space[i][j]['error'] == 'non' : # starting point
+ break
+ else : # shouldn't reach here
+ print('this should not happen , i = {i} , j = {j} , error = {error}'.format(i = i, j = j, error = self.space[i][j]['error']))
+ return result
+ def overall(self) :
+ result = {'all':0, 'cor':0, 'sub':0, 'ins':0, 'del':0}
+ for token in self.data :
+ result['all'] = result['all'] + self.data[token]['all']
+ result['cor'] = result['cor'] + self.data[token]['cor']
+ result['sub'] = result['sub'] + self.data[token]['sub']
+ result['ins'] = result['ins'] + self.data[token]['ins']
+ result['del'] = result['del'] + self.data[token]['del']
+ return result
+ def cluster(self, data) :
+ result = {'all':0, 'cor':0, 'sub':0, 'ins':0, 'del':0}
+ for token in data :
+ if token in self.data :
+ result['all'] = result['all'] + self.data[token]['all']
+ result['cor'] = result['cor'] + self.data[token]['cor']
+ result['sub'] = result['sub'] + self.data[token]['sub']
+ result['ins'] = result['ins'] + self.data[token]['ins']
+ result['del'] = result['del'] + self.data[token]['del']
+ return result
+ def keys(self) :
+ return list(self.data.keys())
+
+def width(string):
+ return sum(1 + (unicodedata.east_asian_width(c) in "AFW") for c in string)
+
+def default_cluster(word) :
+ unicode_names = [ unicodedata.name(char) for char in word ]
+ for i in reversed(range(len(unicode_names))) :
+ if unicode_names[i].startswith('DIGIT') : # 1
+ unicode_names[i] = 'Number' # 'DIGIT'
+ elif (unicode_names[i].startswith('CJK UNIFIED IDEOGRAPH') or
+ unicode_names[i].startswith('CJK COMPATIBILITY IDEOGRAPH')) :
+ # 明 / 郎
+ unicode_names[i] = 'Mandarin' # 'CJK IDEOGRAPH'
+ elif (unicode_names[i].startswith('LATIN CAPITAL LETTER') or
+ unicode_names[i].startswith('LATIN SMALL LETTER')) :
+ # A / a
+ unicode_names[i] = 'English' # 'LATIN LETTER'
+ elif unicode_names[i].startswith('HIRAGANA LETTER') : # は こ め
+ unicode_names[i] = 'Japanese' # 'GANA LETTER'
+ elif (unicode_names[i].startswith('AMPERSAND') or
+ unicode_names[i].startswith('APOSTROPHE') or
+ unicode_names[i].startswith('COMMERCIAL AT') or
+ unicode_names[i].startswith('DEGREE CELSIUS') or
+ unicode_names[i].startswith('EQUALS SIGN') or
+ unicode_names[i].startswith('FULL STOP') or
+ unicode_names[i].startswith('HYPHEN-MINUS') or
+ unicode_names[i].startswith('LOW LINE') or
+ unicode_names[i].startswith('NUMBER SIGN') or
+ unicode_names[i].startswith('PLUS SIGN') or
+ unicode_names[i].startswith('SEMICOLON')) :
+ # & / ' / @ / ℃ / = / . / - / _ / # / + / ;
+ del unicode_names[i]
+ else :
+ return 'Other'
+ if len(unicode_names) == 0 :
+ return 'Other'
+ if len(unicode_names) == 1 :
+ return unicode_names[0]
+ for i in range(len(unicode_names)-1) :
+ if unicode_names[i] != unicode_names[i+1] :
+ return 'Other'
+ return unicode_names[0]
+
+def usage() :
+ print("compute-wer.py : compute word error rate (WER) and align recognition results and references.")
+ print(" usage : python compute-wer.py [--cs={0,1}] [--cluster=foo] [--ig=ignore_file] [--char={0,1}] [--v={0,1}] [--padding-symbol={space,underline}] test.ref test.hyp > test.wer")
+
+if __name__ == '__main__':
+ if len(sys.argv) == 1 :
+ usage()
+ sys.exit(0)
+ calculator = Calculator()
+ cluster_file = ''
+ ignore_words = set()
+ tochar = False
+ verbose= 1
+ padding_symbol= ' '
+ case_sensitive = False
+ max_words_per_line = sys.maxsize
+ split = None
+ while len(sys.argv) > 3:
+ a = '--maxw='
+ if sys.argv[1].startswith(a):
+ b = sys.argv[1][len(a):]
+ del sys.argv[1]
+ max_words_per_line = int(b)
+ continue
+ a = '--rt='
+ if sys.argv[1].startswith(a):
+ b = sys.argv[1][len(a):].lower()
+ del sys.argv[1]
+ remove_tag = (b == 'true') or (b != '0')
+ continue
+ a = '--cs='
+ if sys.argv[1].startswith(a):
+ b = sys.argv[1][len(a):].lower()
+ del sys.argv[1]
+ case_sensitive = (b == 'true') or (b != '0')
+ continue
+ a = '--cluster='
+ if sys.argv[1].startswith(a):
+ cluster_file = sys.argv[1][len(a):]
+ del sys.argv[1]
+ continue
+ a = '--splitfile='
+ if sys.argv[1].startswith(a):
+ split_file = sys.argv[1][len(a):]
+ del sys.argv[1]
+ split = dict()
+ with codecs.open(split_file, 'r', 'utf-8') as fh:
+ for line in fh: # line in unicode
+ words = line.strip().split()
+ if len(words) >= 2:
+ split[words[0]] = words[1:]
+ continue
+ a = '--ig='
+ if sys.argv[1].startswith(a):
+ ignore_file = sys.argv[1][len(a):]
+ del sys.argv[1]
+ with codecs.open(ignore_file, 'r', 'utf-8') as fh:
+ for line in fh: # line in unicode
+ line = line.strip()
+ if len(line) > 0:
+ ignore_words.add(line)
+ continue
+ a = '--char='
+ if sys.argv[1].startswith(a):
+ b = sys.argv[1][len(a):].lower()
+ del sys.argv[1]
+ tochar = (b == 'true') or (b != '0')
+ continue
+ a = '--v='
+ if sys.argv[1].startswith(a):
+ b = sys.argv[1][len(a):].lower()
+ del sys.argv[1]
+ verbose=0
+ try:
+ verbose=int(b)
+ except:
+ if b == 'true' or b != '0':
+ verbose = 1
+ continue
+ a = '--padding-symbol='
+ if sys.argv[1].startswith(a):
+ b = sys.argv[1][len(a):].lower()
+ del sys.argv[1]
+ if b == 'space':
+ padding_symbol= ' '
+ elif b == 'underline':
+ padding_symbol= '_'
+ continue
+ if True or sys.argv[1].startswith('-'):
+ #ignore invalid switch
+ del sys.argv[1]
+ continue
+
+ if not case_sensitive:
+ ig=set([w.upper() for w in ignore_words])
+ ignore_words = ig
+
+ default_clusters = {}
+ default_words = {}
+
+ ref_file = sys.argv[1]
+ hyp_file = sys.argv[2]
+ rec_set = {}
+ if split and not case_sensitive:
+ newsplit = dict()
+ for w in split:
+ words = split[w]
+ for i in range(len(words)):
+ words[i] = words[i].upper()
+ newsplit[w.upper()] = words
+ split = newsplit
+
+ with codecs.open(hyp_file, 'r', 'utf-8') as fh:
+ for line in fh:
+ if tochar:
+ array = characterize(line)
+ else:
+ array = line.strip().split()
+ if len(array)==0: continue
+ fid = array[0]
+ rec_set[fid] = normalize(array[1:], ignore_words, case_sensitive, split)
+
+ # compute error rate on the interaction of reference file and hyp file
+ for line in open(ref_file, 'r', encoding='utf-8') :
+ if tochar:
+ array = characterize(line)
+ else:
+ array = line.rstrip('\n').split()
+ if len(array)==0: continue
+ fid = array[0]
+ if fid not in rec_set:
+ continue
+ lab = normalize(array[1:], ignore_words, case_sensitive, split)
+ rec = rec_set[fid]
+ if verbose:
+ print('\nutt: %s' % fid)
+
+ for word in rec + lab :
+ if word not in default_words :
+ default_cluster_name = default_cluster(word)
+ if default_cluster_name not in default_clusters :
+ default_clusters[default_cluster_name] = {}
+ if word not in default_clusters[default_cluster_name] :
+ default_clusters[default_cluster_name][word] = 1
+ default_words[word] = default_cluster_name
+
+ result = calculator.calculate(lab, rec)
+ if verbose:
+ if result['all'] != 0 :
+ wer = float(result['ins'] + result['sub'] + result['del']) * 100.0 / result['all']
+ else :
+ wer = 0.0
+ print('WER: %4.2f %%' % wer, end = ' ')
+ print('N=%d C=%d S=%d D=%d I=%d' %
+ (result['all'], result['cor'], result['sub'], result['del'], result['ins']))
+ space = {}
+ space['lab'] = []
+ space['rec'] = []
+ for idx in range(len(result['lab'])) :
+ len_lab = width(result['lab'][idx])
+ len_rec = width(result['rec'][idx])
+ length = max(len_lab, len_rec)
+ space['lab'].append(length-len_lab)
+ space['rec'].append(length-len_rec)
+ upper_lab = len(result['lab'])
+ upper_rec = len(result['rec'])
+ lab1, rec1 = 0, 0
+ while lab1 < upper_lab or rec1 < upper_rec:
+ if verbose > 1:
+ print('lab(%s):' % fid.encode('utf-8'), end = ' ')
+ else:
+ print('lab:', end = ' ')
+ lab2 = min(upper_lab, lab1 + max_words_per_line)
+ for idx in range(lab1, lab2):
+ token = result['lab'][idx]
+ print('{token}'.format(token = token), end = '')
+ for n in range(space['lab'][idx]) :
+ print(padding_symbol, end = '')
+ print(' ',end='')
+ print()
+ if verbose > 1:
+ print('rec(%s):' % fid.encode('utf-8'), end = ' ')
+ else:
+ print('rec:', end = ' ')
+ rec2 = min(upper_rec, rec1 + max_words_per_line)
+ for idx in range(rec1, rec2):
+ token = result['rec'][idx]
+ print('{token}'.format(token = token), end = '')
+ for n in range(space['rec'][idx]) :
+ print(padding_symbol, end = '')
+ print(' ',end='')
+ print('\n', end='\n')
+ lab1 = lab2
+ rec1 = rec2
+
+ if verbose:
+ print('===========================================================================')
+ print()
+
+ result = calculator.overall()
+ if result['all'] != 0 :
+ wer = float(result['ins'] + result['sub'] + result['del']) * 100.0 / result['all']
+ else :
+ wer = 0.0
+ print('Overall -> %4.2f %%' % wer, end = ' ')
+ print('N=%d C=%d S=%d D=%d I=%d' %
+ (result['all'], result['cor'], result['sub'], result['del'], result['ins']))
+ if not verbose:
+ print()
+
+ if verbose:
+ for cluster_id in default_clusters :
+ result = calculator.cluster([ k for k in default_clusters[cluster_id] ])
+ if result['all'] != 0 :
+ wer = float(result['ins'] + result['sub'] + result['del']) * 100.0 / result['all']
+ else :
+ wer = 0.0
+ print('%s -> %4.2f %%' % (cluster_id, wer), end = ' ')
+ print('N=%d C=%d S=%d D=%d I=%d' %
+ (result['all'], result['cor'], result['sub'], result['del'], result['ins']))
+ if len(cluster_file) > 0 : # compute separated WERs for word clusters
+ cluster_id = ''
+ cluster = []
+ for line in open(cluster_file, 'r', encoding='utf-8') :
+ for token in line.decode('utf-8').rstrip('\n').split() :
+ # end of cluster reached, like
+ if token[0:2] == '' and token[len(token)-1] == '>' and \
+ token.lstrip('').rstrip('>') == cluster_id :
+ result = calculator.cluster(cluster)
+ if result['all'] != 0 :
+ wer = float(result['ins'] + result['sub'] + result['del']) * 100.0 / result['all']
+ else :
+ wer = 0.0
+ print('%s -> %4.2f %%' % (cluster_id, wer), end = ' ')
+ print('N=%d C=%d S=%d D=%d I=%d' %
+ (result['all'], result['cor'], result['sub'], result['del'], result['ins']))
+ cluster_id = ''
+ cluster = []
+ # begin of cluster reached, like
+ elif token[0] == '<' and token[len(token)-1] == '>' and \
+ cluster_id == '' :
+ cluster_id = token.lstrip('<').rstrip('>')
+ cluster = []
+ # general terms, like WEATHER / CAR / ...
+ else :
+ cluster.append(token)
+ print()
+ print('===========================================================================')
diff --git a/speechx/examples/aishell/local/split_data.sh b/speechx/examples/aishell/local/split_data.sh
new file mode 100755
index 000000000..df454d6cf
--- /dev/null
+++ b/speechx/examples/aishell/local/split_data.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+
+data=$1
+feat_scp=$2
+split_feat_name=$3
+numsplit=$4
+
+
+if ! [ "$numsplit" -gt 0 ]; then
+ echo "Invalid num-split argument";
+ exit 1;
+fi
+
+directories=$(for n in `seq $numsplit`; do echo $data/split${numsplit}/$n; done)
+feat_split_scp=$(for n in `seq $numsplit`; do echo $data/split${numsplit}/$n/${split_feat_name}; done)
+echo $feat_split_scp
+# if this mkdir fails due to argument-list being too long, iterate.
+if ! mkdir -p $directories >&/dev/null; then
+ for n in `seq $numsplit`; do
+ mkdir -p $data/split${numsplit}/$n
+ done
+fi
+
+utils/split_scp.pl $feat_scp $feat_split_scp
diff --git a/speechx/examples/aishell/path.sh b/speechx/examples/aishell/path.sh
new file mode 100644
index 000000000..a0e7c9aed
--- /dev/null
+++ b/speechx/examples/aishell/path.sh
@@ -0,0 +1,14 @@
+# This contains the locations of binarys build required for running the examples.
+
+SPEECHX_ROOT=$PWD/../..
+SPEECHX_EXAMPLES=$SPEECHX_ROOT/build/examples
+
+SPEECHX_TOOLS=$SPEECHX_ROOT/tools
+TOOLS_BIN=$SPEECHX_TOOLS/valgrind/install/bin
+
+[ -d $SPEECHX_EXAMPLES ] || { echo "Error: 'build/examples' directory not found. please ensure that the project build successfully"; }
+
+export LC_AL=C
+
+SPEECHX_BIN=$SPEECHX_EXAMPLES/decoder:$SPEECHX_EXAMPLES/feat
+export PATH=$PATH:$SPEECHX_BIN:$TOOLS_BIN
diff --git a/speechx/examples/aishell/run.sh b/speechx/examples/aishell/run.sh
new file mode 100755
index 000000000..a21ba086a
--- /dev/null
+++ b/speechx/examples/aishell/run.sh
@@ -0,0 +1,81 @@
+#!/bin/bash
+set +x
+set -e
+
+. path.sh
+
+# 1. compile
+if [ ! -d ${SPEECHX_EXAMPLES} ]; then
+ pushd ${SPEECHX_ROOT}
+ bash build.sh
+ popd
+fi
+
+
+# 2. download model
+if [ ! -d ../paddle_asr_model ]; then
+ wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/paddle_asr_model.tar.gz
+ tar xzfv paddle_asr_model.tar.gz
+ mv ./paddle_asr_model ../
+ # produce wav scp
+ echo "utt1 " $PWD/../paddle_asr_model/BAC009S0764W0290.wav > ../paddle_asr_model/wav.scp
+fi
+
+mkdir -p data
+data=$PWD/data
+aishell_wav_scp=aishell_test.scp
+if [ ! -d $data/test ]; then
+ wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/aishell_test.zip
+ unzip -d $data aishell_test.zip
+ realpath $data/test/*/*.wav > $data/wavlist
+ awk -F '/' '{ print $(NF) }' $data/wavlist | awk -F '.' '{ print $1 }' > $data/utt_id
+ paste $data/utt_id $data/wavlist > $data/$aishell_wav_scp
+fi
+
+model_dir=$PWD/aishell_ds2_online_model
+if [ ! -d $model_dir ]; then
+ mkdir -p $model_dir
+ wget -P $model_dir -c https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz
+ tar xzfv $model_dir/asr0_deepspeech2_online_aishell_ckpt_0.2.0.model.tar.gz -C $model_dir
+fi
+
+# 3. make feature
+aishell_online_model=$model_dir/exp/deepspeech2_online/checkpoints
+lm_model_dir=../paddle_asr_model
+label_file=./aishell_result
+wer=./aishell_wer
+
+nj=40
+export GLOG_logtostderr=1
+
+./local/split_data.sh $data $data/$aishell_wav_scp $aishell_wav_scp $nj
+
+data=$PWD/data
+# 3. gen linear feat
+cmvn=$PWD/cmvn.ark
+cmvn_json2binary_main --json_file=$model_dir/data/mean_std.json --cmvn_write_path=$cmvn
+
+utils/run.pl JOB=1:$nj $data/split${nj}/JOB/feat_log \
+linear_spectrogram_without_db_norm_main \
+ --wav_rspecifier=scp:$data/split${nj}/JOB/${aishell_wav_scp} \
+ --feature_wspecifier=ark,scp:$data/split${nj}/JOB/feat.ark,$data/split${nj}/JOB/feat.scp \
+ --cmvn_file=$cmvn \
+ --streaming_chunk=0.36
+
+text=$data/test/text
+
+# 4. recognizer
+utils/run.pl JOB=1:$nj $data/split${nj}/JOB/log \
+ offline_decoder_sliding_chunk_main \
+ --feature_rspecifier=scp:$data/split${nj}/JOB/feat.scp \
+ --model_path=$aishell_online_model/avg_1.jit.pdmodel \
+ --param_path=$aishell_online_model/avg_1.jit.pdiparams \
+ --model_output_names=softmax_0.tmp_0,tmp_5,concat_0.tmp_0,concat_1.tmp_0 \
+ --dict_file=$lm_model_dir/vocab.txt \
+ --lm_path=$lm_model_dir/avg_1.jit.klm \
+ --result_wspecifier=ark,t:$data/split${nj}/JOB/result
+
+cat $data/split${nj}/*/result > $label_file
+
+local/compute-wer.py --char=1 --v=1 $label_file $text > $wer
+tail $wer
diff --git a/speechx/examples/aishell/utils b/speechx/examples/aishell/utils
new file mode 120000
index 000000000..973afe674
--- /dev/null
+++ b/speechx/examples/aishell/utils
@@ -0,0 +1 @@
+../../../utils
\ No newline at end of file
diff --git a/speechx/examples/decoder/offline_decoder_sliding_chunk_main.cc b/speechx/examples/decoder/offline_decoder_sliding_chunk_main.cc
index 7f6c572ca..be56342fe 100644
--- a/speechx/examples/decoder/offline_decoder_sliding_chunk_main.cc
+++ b/speechx/examples/decoder/offline_decoder_sliding_chunk_main.cc
@@ -22,7 +22,8 @@
#include "nnet/decodable.h"
#include "nnet/paddle_nnet.h"
-DEFINE_string(feature_respecifier, "", "test feature rspecifier");
+DEFINE_string(feature_rspecifier, "", "test feature rspecifier");
+DEFINE_string(result_wspecifier, "", "test result wspecifier");
DEFINE_string(model_path, "avg_1.jit.pdmodel", "paddle nnet model");
DEFINE_string(param_path, "avg_1.jit.pdiparams", "paddle nnet model param");
DEFINE_string(dict_file, "vocab.txt", "vocabulary of lm");
@@ -33,6 +34,12 @@ DEFINE_int32(receptive_field_length,
DEFINE_int32(downsampling_rate,
4,
"two CNN(kernel=5) module downsampling rate.");
+DEFINE_string(model_output_names,
+ "save_infer_model/scale_0.tmp_1,save_infer_model/"
+ "scale_1.tmp_1,save_infer_model/scale_2.tmp_1,save_infer_model/"
+ "scale_3.tmp_1",
+ "model output names");
+DEFINE_string(model_cache_names, "5-1-1024,5-1-1024", "model cache names");
using kaldi::BaseFloat;
using kaldi::Matrix;
@@ -45,7 +52,8 @@ int main(int argc, char* argv[]) {
google::InitGoogleLogging(argv[0]);
kaldi::SequentialBaseFloatMatrixReader feature_reader(
- FLAGS_feature_respecifier);
+ FLAGS_feature_rspecifier);
+ kaldi::TokenWriter result_writer(FLAGS_result_wspecifier);
std::string model_graph = FLAGS_model_path;
std::string model_params = FLAGS_param_path;
std::string dict_file = FLAGS_dict_file;
@@ -66,7 +74,8 @@ int main(int argc, char* argv[]) {
ppspeech::ModelOptions model_opts;
model_opts.model_path = model_graph;
model_opts.params_path = model_params;
- model_opts.cache_shape = "5-1-1024,5-1-1024";
+ model_opts.cache_shape = FLAGS_model_cache_names;
+ model_opts.output_names = FLAGS_model_output_names;
std::shared_ptr nnet(
new ppspeech::PaddleNnet(model_opts));
std::shared_ptr raw_data(new ppspeech::DataCache());
@@ -130,6 +139,7 @@ int main(int argc, char* argv[]) {
std::string result;
result = decoder.GetFinalBestPath();
KALDI_LOG << " the result of " << utt << " is " << result;
+ result_writer.Write(utt, result);
decodable->Reset();
decoder.Reset();
++num_done;
diff --git a/speechx/examples/feat/CMakeLists.txt b/speechx/examples/feat/CMakeLists.txt
index b8f516afb..d6fdb9bc6 100644
--- a/speechx/examples/feat/CMakeLists.txt
+++ b/speechx/examples/feat/CMakeLists.txt
@@ -7,4 +7,12 @@ target_link_libraries(mfcc-test kaldi-mfcc)
add_executable(linear_spectrogram_main ${CMAKE_CURRENT_SOURCE_DIR}/linear_spectrogram_main.cc)
target_include_directories(linear_spectrogram_main PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
-target_link_libraries(linear_spectrogram_main frontend kaldi-util kaldi-feat-common gflags glog)
\ No newline at end of file
+target_link_libraries(linear_spectrogram_main frontend kaldi-util kaldi-feat-common gflags glog)
+
+add_executable(linear_spectrogram_without_db_norm_main ${CMAKE_CURRENT_SOURCE_DIR}/linear_spectrogram_without_db_norm_main.cc)
+target_include_directories(linear_spectrogram_without_db_norm_main PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
+target_link_libraries(linear_spectrogram_without_db_norm_main frontend kaldi-util kaldi-feat-common gflags glog)
+
+add_executable(cmvn_json2binary_main ${CMAKE_CURRENT_SOURCE_DIR}/cmvn_json2binary_main.cc)
+target_include_directories(cmvn_json2binary_main PRIVATE ${SPEECHX_ROOT} ${SPEECHX_ROOT}/kaldi)
+target_link_libraries(cmvn_json2binary_main utils kaldi-util kaldi-matrix gflags glog)
diff --git a/speechx/examples/feat/cmvn_json2binary_main.cc b/speechx/examples/feat/cmvn_json2binary_main.cc
new file mode 100644
index 000000000..e77f983aa
--- /dev/null
+++ b/speechx/examples/feat/cmvn_json2binary_main.cc
@@ -0,0 +1,58 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "base/flags.h"
+#include "base/log.h"
+#include "kaldi/matrix/kaldi-matrix.h"
+#include "kaldi/util/kaldi-io.h"
+#include "utils/file_utils.h"
+#include "utils/simdjson.h"
+
+DEFINE_string(json_file, "", "cmvn json file");
+DEFINE_string(cmvn_write_path, "./cmvn.ark", "write cmvn");
+DEFINE_bool(binary, true, "write cmvn in binary (true) or text(false)");
+
+using namespace simdjson;
+
+int main(int argc, char* argv[]) {
+ gflags::ParseCommandLineFlags(&argc, &argv, false);
+ google::InitGoogleLogging(argv[0]);
+
+ ondemand::parser parser;
+ padded_string json = padded_string::load(FLAGS_json_file);
+ ondemand::document val = parser.iterate(json);
+ ondemand::object doc = val;
+ kaldi::int32 frame_num = uint64_t(doc["frame_num"]);
+ auto mean_stat = doc["mean_stat"];
+ std::vector mean_stat_vec;
+ for (double x : mean_stat) {
+ mean_stat_vec.push_back(x);
+ }
+ auto var_stat = doc["var_stat"];
+ std::vector var_stat_vec;
+ for (double x : var_stat) {
+ var_stat_vec.push_back(x);
+ }
+
+ size_t mean_size = mean_stat_vec.size();
+ kaldi::Matrix cmvn_stats(2, mean_size + 1);
+ for (size_t idx = 0; idx < mean_size; ++idx) {
+ cmvn_stats(0, idx) = mean_stat_vec[idx];
+ cmvn_stats(1, idx) = var_stat_vec[idx];
+ }
+ cmvn_stats(0, mean_size) = frame_num;
+ kaldi::WriteKaldiObject(cmvn_stats, FLAGS_cmvn_write_path, FLAGS_binary);
+ LOG(INFO) << "the json file have write into " << FLAGS_cmvn_write_path;
+ return 0;
+}
\ No newline at end of file
diff --git a/speechx/examples/feat/linear_spectrogram_main.cc b/speechx/examples/feat/linear_spectrogram_main.cc
index 2d75bb5df..2e70386d6 100644
--- a/speechx/examples/feat/linear_spectrogram_main.cc
+++ b/speechx/examples/feat/linear_spectrogram_main.cc
@@ -30,6 +30,7 @@
DEFINE_string(wav_rspecifier, "", "test wav scp path");
DEFINE_string(feature_wspecifier, "", "output feats wspecifier");
DEFINE_string(cmvn_write_path, "./cmvn.ark", "write cmvn");
+DEFINE_double(streaming_chunk, 0.36, "streaming feature chunk size");
std::vector mean_{
@@ -181,6 +182,7 @@ int main(int argc, char* argv[]) {
ppspeech::LinearSpectrogramOptions opt;
opt.frame_opts.frame_length_ms = 20;
opt.frame_opts.frame_shift_ms = 10;
+ opt.streaming_chunk = FLAGS_streaming_chunk;
opt.frame_opts.dither = 0.0;
opt.frame_opts.remove_dc_offset = false;
opt.frame_opts.window_type = "hanning";
@@ -198,7 +200,7 @@ int main(int argc, char* argv[]) {
LOG(INFO) << "feat dim: " << feature_cache.Dim();
int sample_rate = 16000;
- float streaming_chunk = 0.36;
+ float streaming_chunk = FLAGS_streaming_chunk;
int chunk_sample_size = streaming_chunk * sample_rate;
LOG(INFO) << "sr: " << sample_rate;
LOG(INFO) << "chunk size (s): " << streaming_chunk;
@@ -256,6 +258,7 @@ int main(int argc, char* argv[]) {
}
}
feat_writer.Write(utt, features);
+ feature_cache.Reset();
if (num_done % 50 == 0 && num_done != 0)
KALDI_VLOG(2) << "Processed " << num_done << " utterances";
diff --git a/speechx/examples/feat/linear_spectrogram_without_db_norm_main.cc b/speechx/examples/feat/linear_spectrogram_without_db_norm_main.cc
new file mode 100644
index 000000000..5b875a3ee
--- /dev/null
+++ b/speechx/examples/feat/linear_spectrogram_without_db_norm_main.cc
@@ -0,0 +1,139 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// todo refactor, repalce with gtest
+
+#include "base/flags.h"
+#include "base/log.h"
+#include "kaldi/feat/wave-reader.h"
+#include "kaldi/util/kaldi-io.h"
+#include "kaldi/util/table-types.h"
+
+#include "frontend/audio/audio_cache.h"
+#include "frontend/audio/data_cache.h"
+#include "frontend/audio/feature_cache.h"
+#include "frontend/audio/frontend_itf.h"
+#include "frontend/audio/linear_spectrogram.h"
+#include "frontend/audio/normalizer.h"
+
+DEFINE_string(wav_rspecifier, "", "test wav scp path");
+DEFINE_string(feature_wspecifier, "", "output feats wspecifier");
+DEFINE_string(cmvn_file, "./cmvn.ark", "read cmvn");
+DEFINE_double(streaming_chunk, 0.36, "streaming feature chunk size");
+
+int main(int argc, char* argv[]) {
+ gflags::ParseCommandLineFlags(&argc, &argv, false);
+ google::InitGoogleLogging(argv[0]);
+
+ kaldi::SequentialTableReader wav_reader(
+ FLAGS_wav_rspecifier);
+ kaldi::BaseFloatMatrixWriter feat_writer(FLAGS_feature_wspecifier);
+
+ int32 num_done = 0, num_err = 0;
+
+ // feature pipeline: wave cache --> hanning
+ // window -->linear_spectrogram --> global cmvn -> feat cache
+
+ std::unique_ptr data_source(
+ new ppspeech::AudioCache(3600 * 1600, true));
+
+ ppspeech::LinearSpectrogramOptions opt;
+ opt.frame_opts.frame_length_ms = 20;
+ opt.frame_opts.frame_shift_ms = 10;
+ opt.streaming_chunk = FLAGS_streaming_chunk;
+ opt.frame_opts.dither = 0.0;
+ opt.frame_opts.remove_dc_offset = false;
+ opt.frame_opts.window_type = "hanning";
+ opt.frame_opts.preemph_coeff = 0.0;
+ LOG(INFO) << "frame length (ms): " << opt.frame_opts.frame_length_ms;
+ LOG(INFO) << "frame shift (ms): " << opt.frame_opts.frame_shift_ms;
+
+ std::unique_ptr linear_spectrogram(
+ new ppspeech::LinearSpectrogram(opt, std::move(data_source)));
+
+ std::unique_ptr cmvn(
+ new ppspeech::CMVN(FLAGS_cmvn_file, std::move(linear_spectrogram)));
+
+ ppspeech::FeatureCache feature_cache(kint16max, std::move(cmvn));
+ LOG(INFO) << "feat dim: " << feature_cache.Dim();
+
+ int sample_rate = 16000;
+ float streaming_chunk = FLAGS_streaming_chunk;
+ int chunk_sample_size = streaming_chunk * sample_rate;
+ LOG(INFO) << "sr: " << sample_rate;
+ LOG(INFO) << "chunk size (s): " << streaming_chunk;
+ LOG(INFO) << "chunk size (sample): " << chunk_sample_size;
+
+
+ for (; !wav_reader.Done(); wav_reader.Next()) {
+ std::string utt = wav_reader.Key();
+ const kaldi::WaveData& wave_data = wav_reader.Value();
+ LOG(INFO) << "process utt: " << utt;
+
+ int32 this_channel = 0;
+ kaldi::SubVector waveform(wave_data.Data(),
+ this_channel);
+ int tot_samples = waveform.Dim();
+ LOG(INFO) << "wav len (sample): " << tot_samples;
+
+ int sample_offset = 0;
+ std::vector> feats;
+ int feature_rows = 0;
+ while (sample_offset < tot_samples) {
+ int cur_chunk_size =
+ std::min(chunk_sample_size, tot_samples - sample_offset);
+
+ kaldi::Vector wav_chunk(cur_chunk_size);
+ for (int i = 0; i < cur_chunk_size; ++i) {
+ wav_chunk(i) = waveform(sample_offset + i);
+ }
+
+ kaldi::Vector features;
+ feature_cache.Accept(wav_chunk);
+ if (cur_chunk_size < chunk_sample_size) {
+ feature_cache.SetFinished();
+ }
+ feature_cache.Read(&features);
+ if (features.Dim() == 0) break;
+
+ feats.push_back(features);
+ sample_offset += cur_chunk_size;
+ feature_rows += features.Dim() / feature_cache.Dim();
+ }
+
+ int cur_idx = 0;
+ kaldi::Matrix features(feature_rows,
+ feature_cache.Dim());
+ for (auto feat : feats) {
+ int num_rows = feat.Dim() / feature_cache.Dim();
+ for (int row_idx = 0; row_idx < num_rows; ++row_idx) {
+ for (size_t col_idx = 0; col_idx < feature_cache.Dim();
+ ++col_idx) {
+ features(cur_idx, col_idx) =
+ feat(row_idx * feature_cache.Dim() + col_idx);
+ }
+ ++cur_idx;
+ }
+ }
+ feat_writer.Write(utt, features);
+ feature_cache.Reset();
+
+ if (num_done % 50 == 0 && num_done != 0)
+ KALDI_VLOG(2) << "Processed " << num_done << " utterances";
+ num_done++;
+ }
+ KALDI_LOG << "Done " << num_done << " utterances, " << num_err
+ << " with errors.";
+ return (num_done != 0 ? 0 : 1);
+}
diff --git a/speechx/speechx/frontend/audio/audio_cache.cc b/speechx/speechx/frontend/audio/audio_cache.cc
index c3233e595..50aca4fb0 100644
--- a/speechx/speechx/frontend/audio/audio_cache.cc
+++ b/speechx/speechx/frontend/audio/audio_cache.cc
@@ -21,15 +21,20 @@ using kaldi::BaseFloat;
using kaldi::VectorBase;
using kaldi::Vector;
-AudioCache::AudioCache(int buffer_size)
+AudioCache::AudioCache(int buffer_size, bool convert2PCM32)
: finished_(false),
capacity_(buffer_size),
size_(0),
offset_(0),
- timeout_(1) {
+ timeout_(1),
+ convert2PCM32_(convert2PCM32) {
ring_buffer_.resize(capacity_);
}
+BaseFloat AudioCache::Convert2PCM32(BaseFloat val) {
+ return val * (1. / std::pow(2.0, 15));
+}
+
void AudioCache::Accept(const VectorBase& waves) {
std::unique_lock lock(mutex_);
while (size_ + waves.Dim() > ring_buffer_.size()) {
@@ -38,6 +43,8 @@ void AudioCache::Accept(const VectorBase& waves) {
for (size_t idx = 0; idx < waves.Dim(); ++idx) {
int32 buffer_idx = (idx + offset_) % ring_buffer_.size();
ring_buffer_[buffer_idx] = waves(idx);
+ if (convert2PCM32_)
+ ring_buffer_[buffer_idx] = Convert2PCM32(waves(idx));
}
size_ += waves.Dim();
}
diff --git a/speechx/speechx/frontend/audio/audio_cache.h b/speechx/speechx/frontend/audio/audio_cache.h
index 17e1a8389..adef12399 100644
--- a/speechx/speechx/frontend/audio/audio_cache.h
+++ b/speechx/speechx/frontend/audio/audio_cache.h
@@ -23,7 +23,8 @@ namespace ppspeech {
// waves cache
class AudioCache : public FrontendInterface {
public:
- explicit AudioCache(int buffer_size = kint16max);
+ explicit AudioCache(int buffer_size = 1000 * kint16max,
+ bool convert2PCM32 = false);
virtual void Accept(const kaldi::VectorBase& waves);
@@ -46,14 +47,17 @@ class AudioCache : public FrontendInterface {
}
private:
+ kaldi::BaseFloat Convert2PCM32(kaldi::BaseFloat val);
+
std::vector ring_buffer_;
size_t offset_; // offset in ring_buffer_
size_t size_; // samples in ring_buffer_ now
size_t capacity_; // capacity of ring_buffer_
bool finished_; // reach audio end
- mutable std::mutex mutex_;
+ std::mutex mutex_;
std::condition_variable ready_feed_condition_;
kaldi::int32 timeout_; // millisecond
+ bool convert2PCM32_;
DISALLOW_COPY_AND_ASSIGN(AudioCache);
};
diff --git a/speechx/speechx/frontend/audio/linear_spectrogram.h b/speechx/speechx/frontend/audio/linear_spectrogram.h
index 896c494dd..6b20b8b94 100644
--- a/speechx/speechx/frontend/audio/linear_spectrogram.h
+++ b/speechx/speechx/frontend/audio/linear_spectrogram.h
@@ -46,7 +46,10 @@ class LinearSpectrogram : public FrontendInterface {
virtual size_t Dim() const { return dim_; }
virtual void SetFinished() { base_extractor_->SetFinished(); }
virtual bool IsFinished() const { return base_extractor_->IsFinished(); }
- virtual void Reset() { base_extractor_->Reset(); }
+ virtual void Reset() {
+ base_extractor_->Reset();
+ reminded_wav_.Resize(0);
+ }
private:
bool Compute(const kaldi::Vector& waves,
diff --git a/speechx/speechx/utils/CMakeLists.txt b/speechx/speechx/utils/CMakeLists.txt
index b5e2495e0..08d115281 100644
--- a/speechx/speechx/utils/CMakeLists.txt
+++ b/speechx/speechx/utils/CMakeLists.txt
@@ -1,4 +1,5 @@
add_library(utils
file_utils.cc
+ simdjson.cpp
)
diff --git a/speechx/speechx/utils/file_utils.cc b/speechx/speechx/utils/file_utils.cc
index b8e51760a..8a2762685 100644
--- a/speechx/speechx/utils/file_utils.cc
+++ b/speechx/speechx/utils/file_utils.cc
@@ -31,4 +31,14 @@ bool ReadFileToVector(const std::string& filename,
return true;
}
-}
\ No newline at end of file
+
+std::string ReadFile2String(const std::string& path) {
+ std::ifstream input_file(path);
+ if (!input_file.is_open()) {
+ std::cerr << "please input a valid file" << std::endl;
+ }
+ return std::string((std::istreambuf_iterator(input_file)),
+ std::istreambuf_iterator());
+}
+
+}
diff --git a/speechx/speechx/utils/file_utils.h b/speechx/speechx/utils/file_utils.h
index f82d41a5b..bf88ed793 100644
--- a/speechx/speechx/utils/file_utils.h
+++ b/speechx/speechx/utils/file_utils.h
@@ -18,4 +18,7 @@ namespace ppspeech {
bool ReadFileToVector(const std::string& filename,
std::vector* data);
+
+std::string ReadFile2String(const std::string& path);
+
}
diff --git a/speechx/speechx/utils/simdjson.cpp b/speechx/speechx/utils/simdjson.cpp
new file mode 100644
index 000000000..4c5dde164
--- /dev/null
+++ b/speechx/speechx/utils/simdjson.cpp
@@ -0,0 +1,12708 @@
+/* auto-generated on 2022-01-31 11:38:54 -0500. Do not edit! */
+/* begin file src/simdjson.cpp */
+#include "simdjson.h"
+
+SIMDJSON_PUSH_DISABLE_WARNINGS
+SIMDJSON_DISABLE_UNDESIRED_WARNINGS
+
+/* begin file src/to_chars.cpp */
+#include
+#include
+#include
+#include
+
+namespace simdjson {
+namespace internal {
+/*!
+implements the Grisu2 algorithm for binary to decimal floating-point
+conversion.
+Adapted from JSON for Modern C++
+
+This implementation is a slightly modified version of the reference
+implementation which may be obtained from
+http://florian.loitsch.com/publications (bench.tar.gz).
+The code is distributed under the MIT license, Copyright (c) 2009 Florian
+Loitsch. For a detailed description of the algorithm see: [1] Loitsch, "Printing
+Floating-Point Numbers Quickly and Accurately with Integers", Proceedings of the
+ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation,
+PLDI 2010 [2] Burger, Dybvig, "Printing Floating-Point Numbers Quickly and
+Accurately", Proceedings of the ACM SIGPLAN 1996 Conference on Programming
+Language Design and Implementation, PLDI 1996
+*/
+namespace dtoa_impl {
+
+template
+Target reinterpret_bits(const Source source) {
+ static_assert(sizeof(Target) == sizeof(Source), "size mismatch");
+
+ Target target;
+ std::memcpy(&target, &source, sizeof(Source));
+ return target;
+}
+
+struct diyfp // f * 2^e
+{
+ static constexpr int kPrecision = 64; // = q
+
+ std::uint64_t f = 0;
+ int e = 0;
+
+ constexpr diyfp(std::uint64_t f_, int e_) noexcept : f(f_), e(e_) {}
+
+ /*!
+ @brief returns x - y
+ @pre x.e == y.e and x.f >= y.f
+ */
+ static diyfp sub(const diyfp &x, const diyfp &y) noexcept {
+
+ return {x.f - y.f, x.e};
+ }
+
+ /*!
+ @brief returns x * y
+ @note The result is rounded. (Only the upper q bits are returned.)
+ */
+ static diyfp mul(const diyfp &x, const diyfp &y) noexcept {
+ static_assert(kPrecision == 64, "internal error");
+
+ // Computes:
+ // f = round((x.f * y.f) / 2^q)
+ // e = x.e + y.e + q
+
+ // Emulate the 64-bit * 64-bit multiplication:
+ //
+ // p = u * v
+ // = (u_lo + 2^32 u_hi) (v_lo + 2^32 v_hi)
+ // = (u_lo v_lo ) + 2^32 ((u_lo v_hi ) + (u_hi v_lo )) +
+ // 2^64 (u_hi v_hi ) = (p0 ) + 2^32 ((p1 ) + (p2 ))
+ // + 2^64 (p3 ) = (p0_lo + 2^32 p0_hi) + 2^32 ((p1_lo +
+ // 2^32 p1_hi) + (p2_lo + 2^32 p2_hi)) + 2^64 (p3 ) =
+ // (p0_lo ) + 2^32 (p0_hi + p1_lo + p2_lo ) + 2^64 (p1_hi +
+ // p2_hi + p3) = (p0_lo ) + 2^32 (Q ) + 2^64 (H ) = (p0_lo ) +
+ // 2^32 (Q_lo + 2^32 Q_hi ) + 2^64 (H )
+ //
+ // (Since Q might be larger than 2^32 - 1)
+ //
+ // = (p0_lo + 2^32 Q_lo) + 2^64 (Q_hi + H)
+ //
+ // (Q_hi + H does not overflow a 64-bit int)
+ //
+ // = p_lo + 2^64 p_hi
+
+ const std::uint64_t u_lo = x.f & 0xFFFFFFFFu;
+ const std::uint64_t u_hi = x.f >> 32u;
+ const std::uint64_t v_lo = y.f & 0xFFFFFFFFu;
+ const std::uint64_t v_hi = y.f >> 32u;
+
+ const std::uint64_t p0 = u_lo * v_lo;
+ const std::uint64_t p1 = u_lo * v_hi;
+ const std::uint64_t p2 = u_hi * v_lo;
+ const std::uint64_t p3 = u_hi * v_hi;
+
+ const std::uint64_t p0_hi = p0 >> 32u;
+ const std::uint64_t p1_lo = p1 & 0xFFFFFFFFu;
+ const std::uint64_t p1_hi = p1 >> 32u;
+ const std::uint64_t p2_lo = p2 & 0xFFFFFFFFu;
+ const std::uint64_t p2_hi = p2 >> 32u;
+
+ std::uint64_t Q = p0_hi + p1_lo + p2_lo;
+
+ // The full product might now be computed as
+ //
+ // p_hi = p3 + p2_hi + p1_hi + (Q >> 32)
+ // p_lo = p0_lo + (Q << 32)
+ //
+ // But in this particular case here, the full p_lo is not required.
+ // Effectively we only need to add the highest bit in p_lo to p_hi (and
+ // Q_hi + 1 does not overflow).
+
+ Q += std::uint64_t{1} << (64u - 32u - 1u); // round, ties up
+
+ const std::uint64_t h = p3 + p2_hi + p1_hi + (Q >> 32u);
+
+ return {h, x.e + y.e + 64};
+ }
+
+ /*!
+ @brief normalize x such that the significand is >= 2^(q-1)
+ @pre x.f != 0
+ */
+ static diyfp normalize(diyfp x) noexcept {
+
+ while ((x.f >> 63u) == 0) {
+ x.f <<= 1u;
+ x.e--;
+ }
+
+ return x;
+ }
+
+ /*!
+ @brief normalize x such that the result has the exponent E
+ @pre e >= x.e and the upper e - x.e bits of x.f must be zero.
+ */
+ static diyfp normalize_to(const diyfp &x,
+ const int target_exponent) noexcept {
+ const int delta = x.e - target_exponent;
+
+ return {x.f << delta, target_exponent};
+ }
+};
+
+struct boundaries {
+ diyfp w;
+ diyfp minus;
+ diyfp plus;
+};
+
+/*!
+Compute the (normalized) diyfp representing the input number 'value' and its
+boundaries.
+@pre value must be finite and positive
+*/
+template boundaries compute_boundaries(FloatType value) {
+
+ // Convert the IEEE representation into a diyfp.
+ //
+ // If v is denormal:
+ // value = 0.F * 2^(1 - bias) = ( F) * 2^(1 - bias - (p-1))
+ // If v is normalized:
+ // value = 1.F * 2^(E - bias) = (2^(p-1) + F) * 2^(E - bias - (p-1))
+
+ static_assert(std::numeric_limits::is_iec559,
+ "internal error: dtoa_short requires an IEEE-754 "
+ "floating-point implementation");
+
+ constexpr int kPrecision =
+ std::numeric_limits::digits; // = p (includes the hidden bit)
+ constexpr int kBias =
+ std::numeric_limits::max_exponent - 1 + (kPrecision - 1);
+ constexpr int kMinExp = 1 - kBias;
+ constexpr std::uint64_t kHiddenBit = std::uint64_t{1}
+ << (kPrecision - 1); // = 2^(p-1)
+
+ using bits_type = typename std::conditional::type;
+
+ const std::uint64_t bits = reinterpret_bits(value);
+ const std::uint64_t E = bits >> (kPrecision - 1);
+ const std::uint64_t F = bits & (kHiddenBit - 1);
+
+ const bool is_denormal = E == 0;
+ const diyfp v = is_denormal
+ ? diyfp(F, kMinExp)
+ : diyfp(F + kHiddenBit, static_cast(E) - kBias);
+
+ // Compute the boundaries m- and m+ of the floating-point value
+ // v = f * 2^e.
+ //
+ // Determine v- and v+, the floating-point predecessor and successor if v,
+ // respectively.
+ //
+ // v- = v - 2^e if f != 2^(p-1) or e == e_min (A)
+ // = v - 2^(e-1) if f == 2^(p-1) and e > e_min (B)
+ //
+ // v+ = v + 2^e
+ //
+ // Let m- = (v- + v) / 2 and m+ = (v + v+) / 2. All real numbers _strictly_
+ // between m- and m+ round to v, regardless of how the input rounding
+ // algorithm breaks ties.
+ //
+ // ---+-------------+-------------+-------------+-------------+--- (A)
+ // v- m- v m+ v+
+ //
+ // -----------------+------+------+-------------+-------------+--- (B)
+ // v- m- v m+ v+
+
+ const bool lower_boundary_is_closer = F == 0 && E > 1;
+ const diyfp m_plus = diyfp(2 * v.f + 1, v.e - 1);
+ const diyfp m_minus = lower_boundary_is_closer
+ ? diyfp(4 * v.f - 1, v.e - 2) // (B)
+ : diyfp(2 * v.f - 1, v.e - 1); // (A)
+
+ // Determine the normalized w+ = m+.
+ const diyfp w_plus = diyfp::normalize(m_plus);
+
+ // Determine w- = m- such that e_(w-) = e_(w+).
+ const diyfp w_minus = diyfp::normalize_to(m_minus, w_plus.e);
+
+ return {diyfp::normalize(v), w_minus, w_plus};
+}
+
+// Given normalized diyfp w, Grisu needs to find a (normalized) cached
+// power-of-ten c, such that the exponent of the product c * w = f * 2^e lies
+// within a certain range [alpha, gamma] (Definition 3.2 from [1])
+//
+// alpha <= e = e_c + e_w + q <= gamma
+//
+// or
+//
+// f_c * f_w * 2^alpha <= f_c 2^(e_c) * f_w 2^(e_w) * 2^q
+// <= f_c * f_w * 2^gamma
+//
+// Since c and w are normalized, i.e. 2^(q-1) <= f < 2^q, this implies
+//
+// 2^(q-1) * 2^(q-1) * 2^alpha <= c * w * 2^q < 2^q * 2^q * 2^gamma
+//
+// or
+//
+// 2^(q - 2 + alpha) <= c * w < 2^(q + gamma)
+//
+// The choice of (alpha,gamma) determines the size of the table and the form of
+// the digit generation procedure. Using (alpha,gamma)=(-60,-32) works out well
+// in practice:
+//
+// The idea is to cut the number c * w = f * 2^e into two parts, which can be
+// processed independently: An integral part p1, and a fractional part p2:
+//
+// f * 2^e = ( (f div 2^-e) * 2^-e + (f mod 2^-e) ) * 2^e
+// = (f div 2^-e) + (f mod 2^-e) * 2^e
+// = p1 + p2 * 2^e
+//
+// The conversion of p1 into decimal form requires a series of divisions and
+// modulos by (a power of) 10. These operations are faster for 32-bit than for
+// 64-bit integers, so p1 should ideally fit into a 32-bit integer. This can be
+// achieved by choosing
+//
+// -e >= 32 or e <= -32 := gamma
+//
+// In order to convert the fractional part
+//
+// p2 * 2^e = p2 / 2^-e = d[-1] / 10^1 + d[-2] / 10^2 + ...
+//
+// into decimal form, the fraction is repeatedly multiplied by 10 and the digits
+// d[-i] are extracted in order:
+//
+// (10 * p2) div 2^-e = d[-1]
+// (10 * p2) mod 2^-e = d[-2] / 10^1 + ...
+//
+// The multiplication by 10 must not overflow. It is sufficient to choose
+//
+// 10 * p2 < 16 * p2 = 2^4 * p2 <= 2^64.
+//
+// Since p2 = f mod 2^-e < 2^-e,
+//
+// -e <= 60 or e >= -60 := alpha
+
+constexpr int kAlpha = -60;
+constexpr int kGamma = -32;
+
+struct cached_power // c = f * 2^e ~= 10^k
+{
+ std::uint64_t f;
+ int e;
+ int k;
+};
+
+/*!
+For a normalized diyfp w = f * 2^e, this function returns a (normalized) cached
+power-of-ten c = f_c * 2^e_c, such that the exponent of the product w * c
+satisfies (Definition 3.2 from [1])
+ alpha <= e_c + e + q <= gamma.
+*/
+inline cached_power get_cached_power_for_binary_exponent(int e) {
+ // Now
+ //
+ // alpha <= e_c + e + q <= gamma (1)
+ // ==> f_c * 2^alpha <= c * 2^e * 2^q
+ //
+ // and since the c's are normalized, 2^(q-1) <= f_c,
+ //
+ // ==> 2^(q - 1 + alpha) <= c * 2^(e + q)
+ // ==> 2^(alpha - e - 1) <= c
+ //
+ // If c were an exact power of ten, i.e. c = 10^k, one may determine k as
+ //
+ // k = ceil( log_10( 2^(alpha - e - 1) ) )
+ // = ceil( (alpha - e - 1) * log_10(2) )
+ //
+ // From the paper:
+ // "In theory the result of the procedure could be wrong since c is rounded,
+ // and the computation itself is approximated [...]. In practice, however,
+ // this simple function is sufficient."
+ //
+ // For IEEE double precision floating-point numbers converted into
+ // normalized diyfp's w = f * 2^e, with q = 64,
+ //
+ // e >= -1022 (min IEEE exponent)
+ // -52 (p - 1)
+ // -52 (p - 1, possibly normalize denormal IEEE numbers)
+ // -11 (normalize the diyfp)
+ // = -1137
+ //
+ // and
+ //
+ // e <= +1023 (max IEEE exponent)
+ // -52 (p - 1)
+ // -11 (normalize the diyfp)
+ // = 960
+ //
+ // This binary exponent range [-1137,960] results in a decimal exponent
+ // range [-307,324]. One does not need to store a cached power for each
+ // k in this range. For each such k it suffices to find a cached power
+ // such that the exponent of the product lies in [alpha,gamma].
+ // This implies that the difference of the decimal exponents of adjacent
+ // table entries must be less than or equal to
+ //
+ // floor( (gamma - alpha) * log_10(2) ) = 8.
+ //
+ // (A smaller distance gamma-alpha would require a larger table.)
+
+ // NB:
+ // Actually this function returns c, such that -60 <= e_c + e + 64 <= -34.
+
+ constexpr int kCachedPowersMinDecExp = -300;
+ constexpr int kCachedPowersDecStep = 8;
+
+ static constexpr std::array kCachedPowers = {{
+ {0xAB70FE17C79AC6CA, -1060, -300}, {0xFF77B1FCBEBCDC4F, -1034, -292},
+ {0xBE5691EF416BD60C, -1007, -284}, {0x8DD01FAD907FFC3C, -980, -276},
+ {0xD3515C2831559A83, -954, -268}, {0x9D71AC8FADA6C9B5, -927, -260},
+ {0xEA9C227723EE8BCB, -901, -252}, {0xAECC49914078536D, -874, -244},
+ {0x823C12795DB6CE57, -847, -236}, {0xC21094364DFB5637, -821, -228},
+ {0x9096EA6F3848984F, -794, -220}, {0xD77485CB25823AC7, -768, -212},
+ {0xA086CFCD97BF97F4, -741, -204}, {0xEF340A98172AACE5, -715, -196},
+ {0xB23867FB2A35B28E, -688, -188}, {0x84C8D4DFD2C63F3B, -661, -180},
+ {0xC5DD44271AD3CDBA, -635, -172}, {0x936B9FCEBB25C996, -608, -164},
+ {0xDBAC6C247D62A584, -582, -156}, {0xA3AB66580D5FDAF6, -555, -148},
+ {0xF3E2F893DEC3F126, -529, -140}, {0xB5B5ADA8AAFF80B8, -502, -132},
+ {0x87625F056C7C4A8B, -475, -124}, {0xC9BCFF6034C13053, -449, -116},
+ {0x964E858C91BA2655, -422, -108}, {0xDFF9772470297EBD, -396, -100},
+ {0xA6DFBD9FB8E5B88F, -369, -92}, {0xF8A95FCF88747D94, -343, -84},
+ {0xB94470938FA89BCF, -316, -76}, {0x8A08F0F8BF0F156B, -289, -68},
+ {0xCDB02555653131B6, -263, -60}, {0x993FE2C6D07B7FAC, -236, -52},
+ {0xE45C10C42A2B3B06, -210, -44}, {0xAA242499697392D3, -183, -36},
+ {0xFD87B5F28300CA0E, -157, -28}, {0xBCE5086492111AEB, -130, -20},
+ {0x8CBCCC096F5088CC, -103, -12}, {0xD1B71758E219652C, -77, -4},
+ {0x9C40000000000000, -50, 4}, {0xE8D4A51000000000, -24, 12},
+ {0xAD78EBC5AC620000, 3, 20}, {0x813F3978F8940984, 30, 28},
+ {0xC097CE7BC90715B3, 56, 36}, {0x8F7E32CE7BEA5C70, 83, 44},
+ {0xD5D238A4ABE98068, 109, 52}, {0x9F4F2726179A2245, 136, 60},
+ {0xED63A231D4C4FB27, 162, 68}, {0xB0DE65388CC8ADA8, 189, 76},
+ {0x83C7088E1AAB65DB, 216, 84}, {0xC45D1DF942711D9A, 242, 92},
+ {0x924D692CA61BE758, 269, 100}, {0xDA01EE641A708DEA, 295, 108},
+ {0xA26DA3999AEF774A, 322, 116}, {0xF209787BB47D6B85, 348, 124},
+ {0xB454E4A179DD1877, 375, 132}, {0x865B86925B9BC5C2, 402, 140},
+ {0xC83553C5C8965D3D, 428, 148}, {0x952AB45CFA97A0B3, 455, 156},
+ {0xDE469FBD99A05FE3, 481, 164}, {0xA59BC234DB398C25, 508, 172},
+ {0xF6C69A72A3989F5C, 534, 180}, {0xB7DCBF5354E9BECE, 561, 188},
+ {0x88FCF317F22241E2, 588, 196}, {0xCC20CE9BD35C78A5, 614, 204},
+ {0x98165AF37B2153DF, 641, 212}, {0xE2A0B5DC971F303A, 667, 220},
+ {0xA8D9D1535CE3B396, 694, 228}, {0xFB9B7CD9A4A7443C, 720, 236},
+ {0xBB764C4CA7A44410, 747, 244}, {0x8BAB8EEFB6409C1A, 774, 252},
+ {0xD01FEF10A657842C, 800, 260}, {0x9B10A4E5E9913129, 827, 268},
+ {0xE7109BFBA19C0C9D, 853, 276}, {0xAC2820D9623BF429, 880, 284},
+ {0x80444B5E7AA7CF85, 907, 292}, {0xBF21E44003ACDD2D, 933, 300},
+ {0x8E679C2F5E44FF8F, 960, 308}, {0xD433179D9C8CB841, 986, 316},
+ {0x9E19DB92B4E31BA9, 1013, 324},
+ }};
+
+ // This computation gives exactly the same results for k as
+ // k = ceil((kAlpha - e - 1) * 0.30102999566398114)
+ // for |e| <= 1500, but doesn't require floating-point operations.
+ // NB: log_10(2) ~= 78913 / 2^18
+ const int f = kAlpha - e - 1;
+ const int k = (f * 78913) / (1 << 18) + static_cast(f > 0);
+
+ const int index = (-kCachedPowersMinDecExp + k + (kCachedPowersDecStep - 1)) /
+ kCachedPowersDecStep;
+
+ const cached_power cached = kCachedPowers[static_cast(index)];
+
+ return cached;
+}
+
+/*!
+For n != 0, returns k, such that pow10 := 10^(k-1) <= n < 10^k.
+For n == 0, returns 1 and sets pow10 := 1.
+*/
+inline int find_largest_pow10(const std::uint32_t n, std::uint32_t &pow10) {
+ // LCOV_EXCL_START
+ if (n >= 1000000000) {
+ pow10 = 1000000000;
+ return 10;
+ }
+ // LCOV_EXCL_STOP
+ else if (n >= 100000000) {
+ pow10 = 100000000;
+ return 9;
+ } else if (n >= 10000000) {
+ pow10 = 10000000;
+ return 8;
+ } else if (n >= 1000000) {
+ pow10 = 1000000;
+ return 7;
+ } else if (n >= 100000) {
+ pow10 = 100000;
+ return 6;
+ } else if (n >= 10000) {
+ pow10 = 10000;
+ return 5;
+ } else if (n >= 1000) {
+ pow10 = 1000;
+ return 4;
+ } else if (n >= 100) {
+ pow10 = 100;
+ return 3;
+ } else if (n >= 10) {
+ pow10 = 10;
+ return 2;
+ } else {
+ pow10 = 1;
+ return 1;
+ }
+}
+
+inline void grisu2_round(char *buf, int len, std::uint64_t dist,
+ std::uint64_t delta, std::uint64_t rest,
+ std::uint64_t ten_k) {
+
+ // <--------------------------- delta ---->
+ // <---- dist --------->
+ // --------------[------------------+-------------------]--------------
+ // M- w M+
+ //
+ // ten_k
+ // <------>
+ // <---- rest ---->
+ // --------------[------------------+----+--------------]--------------
+ // w V
+ // = buf * 10^k
+ //
+ // ten_k represents a unit-in-the-last-place in the decimal representation
+ // stored in buf.
+ // Decrement buf by ten_k while this takes buf closer to w.
+
+ // The tests are written in this order to avoid overflow in unsigned
+ // integer arithmetic.
+
+ while (rest < dist && delta - rest >= ten_k &&
+ (rest + ten_k < dist || dist - rest > rest + ten_k - dist)) {
+ buf[len - 1]--;
+ rest += ten_k;
+ }
+}
+
+/*!
+Generates V = buffer * 10^decimal_exponent, such that M- <= V <= M+.
+M- and M+ must be normalized and share the same exponent -60 <= e <= -32.
+*/
+inline void grisu2_digit_gen(char *buffer, int &length, int &decimal_exponent,
+ diyfp M_minus, diyfp w, diyfp M_plus) {
+ static_assert(kAlpha >= -60, "internal error");
+ static_assert(kGamma <= -32, "internal error");
+
+ // Generates the digits (and the exponent) of a decimal floating-point
+ // number V = buffer * 10^decimal_exponent in the range [M-, M+]. The diyfp's
+ // w, M- and M+ share the same exponent e, which satisfies alpha <= e <=
+ // gamma.
+ //
+ // <--------------------------- delta ---->
+ // <---- dist --------->
+ // --------------[------------------+-------------------]--------------
+ // M- w M+
+ //
+ // Grisu2 generates the digits of M+ from left to right and stops as soon as
+ // V is in [M-,M+].
+
+ std::uint64_t delta =
+ diyfp::sub(M_plus, M_minus)
+ .f; // (significand of (M+ - M-), implicit exponent is e)
+ std::uint64_t dist =
+ diyfp::sub(M_plus, w)
+ .f; // (significand of (M+ - w ), implicit exponent is e)
+
+ // Split M+ = f * 2^e into two parts p1 and p2 (note: e < 0):
+ //
+ // M+ = f * 2^e
+ // = ((f div 2^-e) * 2^-e + (f mod 2^-e)) * 2^e
+ // = ((p1 ) * 2^-e + (p2 )) * 2^e
+ // = p1 + p2 * 2^e
+
+ const diyfp one(std::uint64_t{1} << -M_plus.e, M_plus.e);
+
+ auto p1 = static_cast(
+ M_plus.f >>
+ -one.e); // p1 = f div 2^-e (Since -e >= 32, p1 fits into a 32-bit int.)
+ std::uint64_t p2 = M_plus.f & (one.f - 1); // p2 = f mod 2^-e
+
+ // 1)
+ //
+ // Generate the digits of the integral part p1 = d[n-1]...d[1]d[0]
+
+ std::uint32_t pow10;
+ const int k = find_largest_pow10(p1, pow10);
+
+ // 10^(k-1) <= p1 < 10^k, pow10 = 10^(k-1)
+ //
+ // p1 = (p1 div 10^(k-1)) * 10^(k-1) + (p1 mod 10^(k-1))
+ // = (d[k-1] ) * 10^(k-1) + (p1 mod 10^(k-1))
+ //
+ // M+ = p1 + p2 * 2^e
+ // = d[k-1] * 10^(k-1) + (p1 mod 10^(k-1)) + p2 * 2^e
+ // = d[k-1] * 10^(k-1) + ((p1 mod 10^(k-1)) * 2^-e + p2) * 2^e
+ // = d[k-1] * 10^(k-1) + ( rest) * 2^e
+ //
+ // Now generate the digits d[n] of p1 from left to right (n = k-1,...,0)
+ //
+ // p1 = d[k-1]...d[n] * 10^n + d[n-1]...d[0]
+ //
+ // but stop as soon as
+ //
+ // rest * 2^e = (d[n-1]...d[0] * 2^-e + p2) * 2^e <= delta * 2^e
+
+ int n = k;
+ while (n > 0) {
+ // Invariants:
+ // M+ = buffer * 10^n + (p1 + p2 * 2^e) (buffer = 0 for n = k)
+ // pow10 = 10^(n-1) <= p1 < 10^n
+ //
+ const std::uint32_t d = p1 / pow10; // d = p1 div 10^(n-1)
+ const std::uint32_t r = p1 % pow10; // r = p1 mod 10^(n-1)
+ //
+ // M+ = buffer * 10^n + (d * 10^(n-1) + r) + p2 * 2^e
+ // = (buffer * 10 + d) * 10^(n-1) + (r + p2 * 2^e)
+ //
+ buffer[length++] = static_cast('0' + d); // buffer := buffer * 10 + d
+ //
+ // M+ = buffer * 10^(n-1) + (r + p2 * 2^e)
+ //
+ p1 = r;
+ n--;
+ //
+ // M+ = buffer * 10^n + (p1 + p2 * 2^e)
+ // pow10 = 10^n
+ //
+
+ // Now check if enough digits have been generated.
+ // Compute
+ //
+ // p1 + p2 * 2^e = (p1 * 2^-e + p2) * 2^e = rest * 2^e
+ //
+ // Note:
+ // Since rest and delta share the same exponent e, it suffices to
+ // compare the significands.
+ const std::uint64_t rest = (std::uint64_t{p1} << -one.e) + p2;
+ if (rest <= delta) {
+ // V = buffer * 10^n, with M- <= V <= M+.
+
+ decimal_exponent += n;
+
+ // We may now just stop. But instead look if the buffer could be
+ // decremented to bring V closer to w.
+ //
+ // pow10 = 10^n is now 1 ulp in the decimal representation V.
+ // The rounding procedure works with diyfp's with an implicit
+ // exponent of e.
+ //
+ // 10^n = (10^n * 2^-e) * 2^e = ulp * 2^e
+ //
+ const std::uint64_t ten_n = std::uint64_t{pow10} << -one.e;
+ grisu2_round(buffer, length, dist, delta, rest, ten_n);
+
+ return;
+ }
+
+ pow10 /= 10;
+ //
+ // pow10 = 10^(n-1) <= p1 < 10^n
+ // Invariants restored.
+ }
+
+ // 2)
+ //
+ // The digits of the integral part have been generated:
+ //
+ // M+ = d[k-1]...d[1]d[0] + p2 * 2^e
+ // = buffer + p2 * 2^e
+ //
+ // Now generate the digits of the fractional part p2 * 2^e.
+ //
+ // Note:
+ // No decimal point is generated: the exponent is adjusted instead.
+ //
+ // p2 actually represents the fraction
+ //
+ // p2 * 2^e
+ // = p2 / 2^-e
+ // = d[-1] / 10^1 + d[-2] / 10^2 + ...
+ //
+ // Now generate the digits d[-m] of p1 from left to right (m = 1,2,...)
+ //
+ // p2 * 2^e = d[-1]d[-2]...d[-m] * 10^-m
+ // + 10^-m * (d[-m-1] / 10^1 + d[-m-2] / 10^2 + ...)
+ //
+ // using
+ //
+ // 10^m * p2 = ((10^m * p2) div 2^-e) * 2^-e + ((10^m * p2) mod 2^-e)
+ // = ( d) * 2^-e + ( r)
+ //
+ // or
+ // 10^m * p2 * 2^e = d + r * 2^e
+ //
+ // i.e.
+ //
+ // M+ = buffer + p2 * 2^e
+ // = buffer + 10^-m * (d + r * 2^e)
+ // = (buffer * 10^m + d) * 10^-m + 10^-m * r * 2^e
+ //
+ // and stop as soon as 10^-m * r * 2^e <= delta * 2^e
+
+ int m = 0;
+ for (;;) {
+ // Invariant:
+ // M+ = buffer * 10^-m + 10^-m * (d[-m-1] / 10 + d[-m-2] / 10^2 + ...)
+ // * 2^e
+ // = buffer * 10^-m + 10^-m * (p2 )
+ // * 2^e = buffer * 10^-m + 10^-m * (1/10 * (10 * p2) ) * 2^e =
+ // buffer * 10^-m + 10^-m * (1/10 * ((10*p2 div 2^-e) * 2^-e +
+ // (10*p2 mod 2^-e)) * 2^e
+ //
+ p2 *= 10;
+ const std::uint64_t d = p2 >> -one.e; // d = (10 * p2) div 2^-e
+ const std::uint64_t r = p2 & (one.f - 1); // r = (10 * p2) mod 2^-e
+ //
+ // M+ = buffer * 10^-m + 10^-m * (1/10 * (d * 2^-e + r) * 2^e
+ // = buffer * 10^-m + 10^-m * (1/10 * (d + r * 2^e))
+ // = (buffer * 10 + d) * 10^(-m-1) + 10^(-m-1) * r * 2^e
+ //
+ buffer[length++] = static_cast('0' + d); // buffer := buffer * 10 + d
+ //
+ // M+ = buffer * 10^(-m-1) + 10^(-m-1) * r * 2^e
+ //
+ p2 = r;
+ m++;
+ //
+ // M+ = buffer * 10^-m + 10^-m * p2 * 2^e
+ // Invariant restored.
+
+ // Check if enough digits have been generated.
+ //
+ // 10^-m * p2 * 2^e <= delta * 2^e
+ // p2 * 2^e <= 10^m * delta * 2^e
+ // p2 <= 10^m * delta
+ delta *= 10;
+ dist *= 10;
+ if (p2 <= delta) {
+ break;
+ }
+ }
+
+ // V = buffer * 10^-m, with M- <= V <= M+.
+
+ decimal_exponent -= m;
+
+ // 1 ulp in the decimal representation is now 10^-m.
+ // Since delta and dist are now scaled by 10^m, we need to do the
+ // same with ulp in order to keep the units in sync.
+ //
+ // 10^m * 10^-m = 1 = 2^-e * 2^e = ten_m * 2^e
+ //
+ const std::uint64_t ten_m = one.f;
+ grisu2_round(buffer, length, dist, delta, p2, ten_m);
+
+ // By construction this algorithm generates the shortest possible decimal
+ // number (Loitsch, Theorem 6.2) which rounds back to w.
+ // For an input number of precision p, at least
+ //
+ // N = 1 + ceil(p * log_10(2))
+ //
+ // decimal digits are sufficient to identify all binary floating-point
+ // numbers (Matula, "In-and-Out conversions").
+ // This implies that the algorithm does not produce more than N decimal
+ // digits.
+ //
+ // N = 17 for p = 53 (IEEE double precision)
+ // N = 9 for p = 24 (IEEE single precision)
+}
+
+/*!
+v = buf * 10^decimal_exponent
+len is the length of the buffer (number of decimal digits)
+The buffer must be large enough, i.e. >= max_digits10.
+*/
+inline void grisu2(char *buf, int &len, int &decimal_exponent, diyfp m_minus,
+ diyfp v, diyfp m_plus) {
+
+ // --------(-----------------------+-----------------------)-------- (A)
+ // m- v m+
+ //
+ // --------------------(-----------+-----------------------)-------- (B)
+ // m- v m+
+ //
+ // First scale v (and m- and m+) such that the exponent is in the range
+ // [alpha, gamma].
+
+ const cached_power cached = get_cached_power_for_binary_exponent(m_plus.e);
+
+ const diyfp c_minus_k(cached.f, cached.e); // = c ~= 10^-k
+
+ // The exponent of the products is = v.e + c_minus_k.e + q and is in the range
+ // [alpha,gamma]
+ const diyfp w = diyfp::mul(v, c_minus_k);
+ const diyfp w_minus = diyfp::mul(m_minus, c_minus_k);
+ const diyfp w_plus = diyfp::mul(m_plus, c_minus_k);
+
+ // ----(---+---)---------------(---+---)---------------(---+---)----
+ // w- w w+
+ // = c*m- = c*v = c*m+
+ //
+ // diyfp::mul rounds its result and c_minus_k is approximated too. w, w- and
+ // w+ are now off by a small amount.
+ // In fact:
+ //
+ // w - v * 10^k < 1 ulp
+ //
+ // To account for this inaccuracy, add resp. subtract 1 ulp.
+ //
+ // --------+---[---------------(---+---)---------------]---+--------
+ // w- M- w M+ w+
+ //
+ // Now any number in [M-, M+] (bounds included) will round to w when input,
+ // regardless of how the input rounding algorithm breaks ties.
+ //
+ // And digit_gen generates the shortest possible such number in [M-, M+].
+ // Note that this does not mean that Grisu2 always generates the shortest
+ // possible number in the interval (m-, m+).
+ const diyfp M_minus(w_minus.f + 1, w_minus.e);
+ const diyfp M_plus(w_plus.f - 1, w_plus.e);
+
+ decimal_exponent = -cached.k; // = -(-k) = k
+
+ grisu2_digit_gen(buf, len, decimal_exponent, M_minus, w, M_plus);
+}
+
+/*!
+v = buf * 10^decimal_exponent
+len is the length of the buffer (number of decimal digits)
+The buffer must be large enough, i.e. >= max_digits10.
+*/
+template
+void grisu2(char *buf, int &len, int &decimal_exponent, FloatType value) {
+ static_assert(diyfp::kPrecision >= std::numeric_limits::digits + 3,
+ "internal error: not enough precision");
+
+ // If the neighbors (and boundaries) of 'value' are always computed for
+ // double-precision numbers, all float's can be recovered using strtod (and
+ // strtof). However, the resulting decimal representations are not exactly
+ // "short".
+ //
+ // The documentation for 'std::to_chars'
+ // (https://en.cppreference.com/w/cpp/utility/to_chars) says "value is
+ // converted to a string as if by std::sprintf in the default ("C") locale"
+ // and since sprintf promotes float's to double's, I think this is exactly
+ // what 'std::to_chars' does. On the other hand, the documentation for
+ // 'std::to_chars' requires that "parsing the representation using the
+ // corresponding std::from_chars function recovers value exactly". That
+ // indicates that single precision floating-point numbers should be recovered
+ // using 'std::strtof'.
+ //
+ // NB: If the neighbors are computed for single-precision numbers, there is a
+ // single float
+ // (7.0385307e-26f) which can't be recovered using strtod. The resulting
+ // double precision value is off by 1 ulp.
+#if 0
+ const boundaries w = compute_boundaries(static_cast(value));
+#else
+ const boundaries w = compute_boundaries(value);
+#endif
+
+ grisu2(buf, len, decimal_exponent, w.minus, w.w, w.plus);
+}
+
+/*!
+@brief appends a decimal representation of e to buf
+@return a pointer to the element following the exponent.
+@pre -1000 < e < 1000
+*/
+inline char *append_exponent(char *buf, int e) {
+
+ if (e < 0) {
+ e = -e;
+ *buf++ = '-';
+ } else {
+ *buf++ = '+';
+ }
+
+ auto k = static_cast(e);
+ if (k < 10) {
+ // Always print at least two digits in the exponent.
+ // This is for compatibility with printf("%g").
+ *buf++ = '0';
+ *buf++ = static_cast('0' + k);
+ } else if (k < 100) {
+ *buf++ = static_cast('0' + k / 10);
+ k %= 10;
+ *buf++ = static_cast('0' + k);
+ } else {
+ *buf++ = static_cast('0' + k / 100);
+ k %= 100;
+ *buf++ = static_cast('0' + k / 10);
+ k %= 10;
+ *buf++ = static_cast('0' + k);
+ }
+
+ return buf;
+}
+
+/*!
+@brief prettify v = buf * 10^decimal_exponent
+If v is in the range [10^min_exp, 10^max_exp) it will be printed in fixed-point
+notation. Otherwise it will be printed in exponential notation.
+@pre min_exp < 0
+@pre max_exp > 0
+*/
+inline char *format_buffer(char *buf, int len, int decimal_exponent,
+ int min_exp, int max_exp) {
+
+ const int k = len;
+ const int n = len + decimal_exponent;
+
+ // v = buf * 10^(n-k)
+ // k is the length of the buffer (number of decimal digits)
+ // n is the position of the decimal point relative to the start of the buffer.
+
+ if (k <= n && n <= max_exp) {
+ // digits[000]
+ // len <= max_exp + 2
+
+ std::memset(buf + k, '0', static_cast(n) - static_cast(k));
+ // Make it look like a floating-point number (#362, #378)
+ // buf[n + 0] = '.';
+ // buf[n + 1] = '0';
+ return buf + (static_cast(n));
+ }
+
+ if (0 < n && n <= max_exp) {
+ // dig.its
+ // len <= max_digits10 + 1
+ std::memmove(buf + (static_cast(n) + 1), buf + n,
+ static_cast(k) - static_cast(n));
+ buf[n] = '.';
+ return buf + (static_cast(k) + 1U);
+ }
+
+ if (min_exp < n && n <= 0) {
+ // 0.[000]digits
+ // len <= 2 + (-min_exp - 1) + max_digits10
+
+ std::memmove(buf + (2 + static_cast(-n)), buf,
+ static_cast(k));
+ buf[0] = '0';
+ buf[1] = '.';
+ std::memset(buf + 2, '0', static_cast(-n));
+ return buf + (2U + static_cast(-n) + static_cast(k));
+ }
+
+ if (k == 1) {
+ // dE+123
+ // len <= 1 + 5
+
+ buf += 1;
+ } else {
+ // d.igitsE+123
+ // len <= max_digits10 + 1 + 5
+
+ std::memmove(buf + 2, buf + 1, static_cast(k) - 1);
+ buf[1] = '.';
+ buf += 1 + static_cast(k);
+ }
+
+ *buf++ = 'e';
+ return append_exponent(buf, n - 1);
+}
+
+} // namespace dtoa_impl
+
+/*!
+The format of the resulting decimal representation is similar to printf's %g
+format. Returns an iterator pointing past-the-end of the decimal representation.
+@note The input number must be finite, i.e. NaN's and Inf's are not supported.
+@note The buffer must be large enough.
+@note The result is NOT null-terminated.
+*/
+char *to_chars(char *first, const char *last, double value) {
+ static_cast(last); // maybe unused - fix warning
+ bool negative = std::signbit(value);
+ if (negative) {
+ value = -value;
+ *first++ = '-';
+ }
+
+ if (value == 0) // +-0
+ {
+ *first++ = '0';
+ // Make it look like a floating-point number (#362, #378)
+ if(negative) {
+ *first++ = '.';
+ *first++ = '0';
+ }
+ return first;
+ }
+ // Compute v = buffer * 10^decimal_exponent.
+ // The decimal digits are stored in the buffer, which needs to be interpreted
+ // as an unsigned decimal integer.
+ // len is the length of the buffer, i.e. the number of decimal digits.
+ int len = 0;
+ int decimal_exponent = 0;
+ dtoa_impl::grisu2(first, len, decimal_exponent, value);
+ // Format the buffer like printf("%.*g", prec, value)
+ constexpr int kMinExp = -4;
+ constexpr int kMaxExp = std::numeric_limits::digits10;
+
+ return dtoa_impl::format_buffer(first, len, decimal_exponent, kMinExp,
+ kMaxExp);
+}
+} // namespace internal
+} // namespace simdjson
+/* end file src/to_chars.cpp */
+/* begin file src/from_chars.cpp */
+#include
+namespace simdjson {
+namespace internal {
+
+/**
+ * The code in the internal::from_chars function is meant to handle the floating-point number parsing
+ * when we have more than 19 digits in the decimal mantissa. This should only be seen
+ * in adversarial scenarios: we do not expect production systems to even produce
+ * such floating-point numbers.
+ *
+ * The parser is based on work by Nigel Tao (at https://github.com/google/wuffs/)
+ * who credits Ken Thompson for the design (via a reference to the Go source
+ * code). See
+ * https://github.com/google/wuffs/blob/aa46859ea40c72516deffa1b146121952d6dfd3b/internal/cgen/base/floatconv-submodule-data.c
+ * https://github.com/google/wuffs/blob/46cd8105f47ca07ae2ba8e6a7818ef9c0df6c152/internal/cgen/base/floatconv-submodule-code.c
+ * It is probably not very fast but it is a fallback that should almost never be
+ * called in real life. Google Wuffs is published under APL 2.0.
+ **/
+
+namespace {
+constexpr uint32_t max_digits = 768;
+constexpr int32_t decimal_point_range = 2047;
+} // namespace
+
+struct adjusted_mantissa {
+ uint64_t mantissa;
+ int power2;
+ adjusted_mantissa() : mantissa(0), power2(0) {}
+};
+
+struct decimal {
+ uint32_t num_digits;
+ int32_t decimal_point;
+ bool negative;
+ bool truncated;
+ uint8_t digits[max_digits];
+};
+
+template struct binary_format {
+ static constexpr int mantissa_explicit_bits();
+ static constexpr int minimum_exponent();
+ static constexpr int infinite_power();
+ static constexpr int sign_index();
+};
+
+template <> constexpr int binary_format::mantissa_explicit_bits() {
+ return 52;
+}
+
+template <> constexpr int binary_format::minimum_exponent() {
+ return -1023;
+}
+template <> constexpr int binary_format::infinite_power() {
+ return 0x7FF;
+}
+
+template <> constexpr int binary_format::sign_index() { return 63; }
+
+bool is_integer(char c) noexcept { return (c >= '0' && c <= '9'); }
+
+// This should always succeed since it follows a call to parse_number.
+decimal parse_decimal(const char *&p) noexcept {
+ decimal answer;
+ answer.num_digits = 0;
+ answer.decimal_point = 0;
+ answer.truncated = false;
+ answer.negative = (*p == '-');
+ if ((*p == '-') || (*p == '+')) {
+ ++p;
+ }
+
+ while (*p == '0') {
+ ++p;
+ }
+ while (is_integer(*p)) {
+ if (answer.num_digits < max_digits) {
+ answer.digits[answer.num_digits] = uint8_t(*p - '0');
+ }
+ answer.num_digits++;
+ ++p;
+ }
+ if (*p == '.') {
+ ++p;
+ const char *first_after_period = p;
+ // if we have not yet encountered a zero, we have to skip it as well
+ if (answer.num_digits == 0) {
+ // skip zeros
+ while (*p == '0') {
+ ++p;
+ }
+ }
+ while (is_integer(*p)) {
+ if (answer.num_digits < max_digits) {
+ answer.digits[answer.num_digits] = uint8_t(*p - '0');
+ }
+ answer.num_digits++;
+ ++p;
+ }
+ answer.decimal_point = int32_t(first_after_period - p);
+ }
+ if(answer.num_digits > 0) {
+ const char *preverse = p - 1;
+ int32_t trailing_zeros = 0;
+ while ((*preverse == '0') || (*preverse == '.')) {
+ if(*preverse == '0') { trailing_zeros++; };
+ --preverse;
+ }
+ answer.decimal_point += int32_t(answer.num_digits);
+ answer.num_digits -= uint32_t(trailing_zeros);
+ }
+ if(answer.num_digits > max_digits ) {
+ answer.num_digits = max_digits;
+ answer.truncated = true;
+ }
+ if (('e' == *p) || ('E' == *p)) {
+ ++p;
+ bool neg_exp = false;
+ if ('-' == *p) {
+ neg_exp = true;
+ ++p;
+ } else if ('+' == *p) {
+ ++p;
+ }
+ int32_t exp_number = 0; // exponential part
+ while (is_integer(*p)) {
+ uint8_t digit = uint8_t(*p - '0');
+ if (exp_number < 0x10000) {
+ exp_number = 10 * exp_number + digit;
+ }
+ ++p;
+ }
+ answer.decimal_point += (neg_exp ? -exp_number : exp_number);
+ }
+ return answer;
+}
+
+// This should always succeed since it follows a call to parse_number.
+// Will not read at or beyond the "end" pointer.
+decimal parse_decimal(const char *&p, const char * end) noexcept {
+ decimal answer;
+ answer.num_digits = 0;
+ answer.decimal_point = 0;
+ answer.truncated = false;
+ if(p == end) { return answer; } // should never happen
+ answer.negative = (*p == '-');
+ if ((*p == '-') || (*p == '+')) {
+ ++p;
+ }
+
+ while ((p != end) && (*p == '0')) {
+ ++p;
+ }
+ while ((p != end) && is_integer(*p)) {
+ if (answer.num_digits < max_digits) {
+ answer.digits[answer.num_digits] = uint8_t(*p - '0');
+ }
+ answer.num_digits++;
+ ++p;
+ }
+ if ((p != end) && (*p == '.')) {
+ ++p;
+ if(p == end) { return answer; } // should never happen
+ const char *first_after_period = p;
+ // if we have not yet encountered a zero, we have to skip it as well
+ if (answer.num_digits == 0) {
+ // skip zeros
+ while (*p == '0') {
+ ++p;
+ }
+ }
+ while ((p != end) && is_integer(*p)) {
+ if (answer.num_digits < max_digits) {
+ answer.digits[answer.num_digits] = uint8_t(*p - '0');
+ }
+ answer.num_digits++;
+ ++p;
+ }
+ answer.decimal_point = int32_t(first_after_period - p);
+ }
+ if(answer.num_digits > 0) {
+ const char *preverse = p - 1;
+ int32_t trailing_zeros = 0;
+ while ((*preverse == '0') || (*preverse == '.')) {
+ if(*preverse == '0') { trailing_zeros++; };
+ --preverse;
+ }
+ answer.decimal_point += int32_t(answer.num_digits);
+ answer.num_digits -= uint32_t(trailing_zeros);
+ }
+ if(answer.num_digits > max_digits ) {
+ answer.num_digits = max_digits;
+ answer.truncated = true;
+ }
+ if ((p != end) && (('e' == *p) || ('E' == *p))) {
+ ++p;
+ if(p == end) { return answer; } // should never happen
+ bool neg_exp = false;
+ if ('-' == *p) {
+ neg_exp = true;
+ ++p;
+ } else if ('+' == *p) {
+ ++p;
+ }
+ int32_t exp_number = 0; // exponential part
+ while ((p != end) && is_integer(*p)) {
+ uint8_t digit = uint8_t(*p - '0');
+ if (exp_number < 0x10000) {
+ exp_number = 10 * exp_number + digit;
+ }
+ ++p;
+ }
+ answer.decimal_point += (neg_exp ? -exp_number : exp_number);
+ }
+ return answer;
+}
+
+namespace {
+
+// remove all final zeroes
+inline void trim(decimal &h) {
+ while ((h.num_digits > 0) && (h.digits[h.num_digits - 1] == 0)) {
+ h.num_digits--;
+ }
+}
+
+uint32_t number_of_digits_decimal_left_shift(decimal &h, uint32_t shift) {
+ shift &= 63;
+ const static uint16_t number_of_digits_decimal_left_shift_table[65] = {
+ 0x0000, 0x0800, 0x0801, 0x0803, 0x1006, 0x1009, 0x100D, 0x1812, 0x1817,
+ 0x181D, 0x2024, 0x202B, 0x2033, 0x203C, 0x2846, 0x2850, 0x285B, 0x3067,
+ 0x3073, 0x3080, 0x388E, 0x389C, 0x38AB, 0x38BB, 0x40CC, 0x40DD, 0x40EF,
+ 0x4902, 0x4915, 0x4929, 0x513E, 0x5153, 0x5169, 0x5180, 0x5998, 0x59B0,
+ 0x59C9, 0x61E3, 0x61FD, 0x6218, 0x6A34, 0x6A50, 0x6A6D, 0x6A8B, 0x72AA,
+ 0x72C9, 0x72E9, 0x7B0A, 0x7B2B, 0x7B4D, 0x8370, 0x8393, 0x83B7, 0x83DC,
+ 0x8C02, 0x8C28, 0x8C4F, 0x9477, 0x949F, 0x94C8, 0x9CF2, 0x051C, 0x051C,
+ 0x051C, 0x051C,
+ };
+ uint32_t x_a = number_of_digits_decimal_left_shift_table[shift];
+ uint32_t x_b = number_of_digits_decimal_left_shift_table[shift + 1];
+ uint32_t num_new_digits = x_a >> 11;
+ uint32_t pow5_a = 0x7FF & x_a;
+ uint32_t pow5_b = 0x7FF & x_b;
+ const static uint8_t
+ number_of_digits_decimal_left_shift_table_powers_of_5[0x051C] = {
+ 5, 2, 5, 1, 2, 5, 6, 2, 5, 3, 1, 2, 5, 1, 5, 6, 2, 5, 7, 8, 1, 2, 5,
+ 3, 9, 0, 6, 2, 5, 1, 9, 5, 3, 1, 2, 5, 9, 7, 6, 5, 6, 2, 5, 4, 8, 8,
+ 2, 8, 1, 2, 5, 2, 4, 4, 1, 4, 0, 6, 2, 5, 1, 2, 2, 0, 7, 0, 3, 1, 2,
+ 5, 6, 1, 0, 3, 5, 1, 5, 6, 2, 5, 3, 0, 5, 1, 7, 5, 7, 8, 1, 2, 5, 1,
+ 5, 2, 5, 8, 7, 8, 9, 0, 6, 2, 5, 7, 6, 2, 9, 3, 9, 4, 5, 3, 1, 2, 5,
+ 3, 8, 1, 4, 6, 9, 7, 2, 6, 5, 6, 2, 5, 1, 9, 0, 7, 3, 4, 8, 6, 3, 2,
+ 8, 1, 2, 5, 9, 5, 3, 6, 7, 4, 3, 1, 6, 4, 0, 6, 2, 5, 4, 7, 6, 8, 3,
+ 7, 1, 5, 8, 2, 0, 3, 1, 2, 5, 2, 3, 8, 4, 1, 8, 5, 7, 9, 1, 0, 1, 5,
+ 6, 2, 5, 1, 1, 9, 2, 0, 9, 2, 8, 9, 5, 5, 0, 7, 8, 1, 2, 5, 5, 9, 6,
+ 0, 4, 6, 4, 4, 7, 7, 5, 3, 9, 0, 6, 2, 5, 2, 9, 8, 0, 2, 3, 2, 2, 3,
+ 8, 7, 6, 9, 5, 3, 1, 2, 5, 1, 4, 9, 0, 1, 1, 6, 1, 1, 9, 3, 8, 4, 7,
+ 6, 5, 6, 2, 5, 7, 4, 5, 0, 5, 8, 0, 5, 9, 6, 9, 2, 3, 8, 2, 8, 1, 2,
+ 5, 3, 7, 2, 5, 2, 9, 0, 2, 9, 8, 4, 6, 1, 9, 1, 4, 0, 6, 2, 5, 1, 8,
+ 6, 2, 6, 4, 5, 1, 4, 9, 2, 3, 0, 9, 5, 7, 0, 3, 1, 2, 5, 9, 3, 1, 3,
+ 2, 2, 5, 7, 4, 6, 1, 5, 4, 7, 8, 5, 1, 5, 6, 2, 5, 4, 6, 5, 6, 6, 1,
+ 2, 8, 7, 3, 0, 7, 7, 3, 9, 2, 5, 7, 8, 1, 2, 5, 2, 3, 2, 8, 3, 0, 6,
+ 4, 3, 6, 5, 3, 8, 6, 9, 6, 2, 8, 9, 0, 6, 2, 5, 1, 1, 6, 4, 1, 5, 3,
+ 2, 1, 8, 2, 6, 9, 3, 4, 8, 1, 4, 4, 5, 3, 1, 2, 5, 5, 8, 2, 0, 7, 6,
+ 6, 0, 9, 1, 3, 4, 6, 7, 4, 0, 7, 2, 2, 6, 5, 6, 2, 5, 2, 9, 1, 0, 3,
+ 8, 3, 0, 4, 5, 6, 7, 3, 3, 7, 0, 3, 6, 1, 3, 2, 8, 1, 2, 5, 1, 4, 5,
+ 5, 1, 9, 1, 5, 2, 2, 8, 3, 6, 6, 8, 5, 1, 8, 0, 6, 6, 4, 0, 6, 2, 5,
+ 7, 2, 7, 5, 9, 5, 7, 6, 1, 4, 1, 8, 3, 4, 2, 5, 9, 0, 3, 3, 2, 0, 3,
+ 1, 2, 5, 3, 6, 3, 7, 9, 7, 8, 8, 0, 7, 0, 9, 1, 7, 1, 2, 9, 5, 1, 6,
+ 6, 0, 1, 5, 6, 2, 5, 1, 8, 1, 8, 9, 8, 9, 4, 0, 3, 5, 4, 5, 8, 5, 6,
+ 4, 7, 5, 8, 3, 0, 0, 7, 8, 1, 2, 5, 9, 0, 9, 4, 9, 4, 7, 0, 1, 7, 7,
+ 2, 9, 2, 8, 2, 3, 7, 9, 1, 5, 0, 3, 9, 0, 6, 2, 5, 4, 5, 4, 7, 4, 7,
+ 3, 5, 0, 8, 8, 6, 4, 6, 4, 1, 1, 8, 9, 5, 7, 5, 1, 9, 5, 3, 1, 2, 5,
+ 2, 2, 7, 3, 7, 3, 6, 7, 5, 4, 4, 3, 2, 3, 2, 0, 5, 9, 4, 7, 8, 7, 5,
+ 9, 7, 6, 5, 6, 2, 5, 1, 1, 3, 6, 8, 6, 8, 3, 7, 7, 2, 1, 6, 1, 6, 0,
+ 2, 9, 7, 3, 9, 3, 7, 9, 8, 8, 2, 8, 1, 2, 5, 5, 6, 8, 4, 3, 4, 1, 8,
+ 8, 6, 0, 8, 0, 8, 0, 1, 4, 8, 6, 9, 6, 8, 9, 9, 4, 1, 4, 0, 6, 2, 5,
+ 2, 8, 4, 2, 1, 7, 0, 9, 4, 3, 0, 4, 0, 4, 0, 0, 7, 4, 3, 4, 8, 4, 4,
+ 9, 7, 0, 7, 0, 3, 1, 2, 5, 1, 4, 2, 1, 0, 8, 5, 4, 7, 1, 5, 2, 0, 2,
+ 0, 0, 3, 7, 1, 7, 4, 2, 2, 4, 8, 5, 3, 5, 1, 5, 6, 2, 5, 7, 1, 0, 5,
+ 4, 2, 7, 3, 5, 7, 6, 0, 1, 0, 0, 1, 8, 5, 8, 7, 1, 1, 2, 4, 2, 6, 7,
+ 5, 7, 8, 1, 2, 5, 3, 5, 5, 2, 7, 1, 3, 6, 7, 8, 8, 0, 0, 5, 0, 0, 9,
+ 2, 9, 3, 5, 5, 6, 2, 1, 3, 3, 7, 8, 9, 0, 6, 2, 5, 1, 7, 7, 6, 3, 5,
+ 6, 8, 3, 9, 4, 0, 0, 2, 5, 0, 4, 6, 4, 6, 7, 7, 8, 1, 0, 6, 6, 8, 9,
+ 4, 5, 3, 1, 2, 5, 8, 8, 8, 1, 7, 8, 4, 1, 9, 7, 0, 0, 1, 2, 5, 2, 3,
+ 2, 3, 3, 8, 9, 0, 5, 3, 3, 4, 4, 7, 2, 6, 5, 6, 2, 5, 4, 4, 4, 0, 8,
+ 9, 2, 0, 9, 8, 5, 0, 0, 6, 2, 6, 1, 6, 1, 6, 9, 4, 5, 2, 6, 6, 7, 2,
+ 3, 6, 3, 2, 8, 1, 2, 5, 2, 2, 2, 0, 4, 4, 6, 0, 4, 9, 2, 5, 0, 3, 1,
+ 3, 0, 8, 0, 8, 4, 7, 2, 6, 3, 3, 3, 6, 1, 8, 1, 6, 4, 0, 6, 2, 5, 1,
+ 1, 1, 0, 2, 2, 3, 0, 2, 4, 6, 2, 5, 1, 5, 6, 5, 4, 0, 4, 2, 3, 6, 3,
+ 1, 6, 6, 8, 0, 9, 0, 8, 2, 0, 3, 1, 2, 5, 5, 5, 5, 1, 1, 1, 5, 1, 2,
+ 3, 1, 2, 5, 7, 8, 2, 7, 0, 2, 1, 1, 8, 1, 5, 8, 3, 4, 0, 4, 5, 4, 1,
+ 0, 1, 5, 6, 2, 5, 2, 7, 7, 5, 5, 5, 7, 5, 6, 1, 5, 6, 2, 8, 9, 1, 3,
+ 5, 1, 0, 5, 9, 0, 7, 9, 1, 7, 0, 2, 2, 7, 0, 5, 0, 7, 8, 1, 2, 5, 1,
+ 3, 8, 7, 7, 7, 8, 7, 8, 0, 7, 8, 1, 4, 4, 5, 6, 7, 5, 5, 2, 9, 5, 3,
+ 9, 5, 8, 5, 1, 1, 3, 5, 2, 5, 3, 9, 0, 6, 2, 5, 6, 9, 3, 8, 8, 9, 3,
+ 9, 0, 3, 9, 0, 7, 2, 2, 8, 3, 7, 7, 6, 4, 7, 6, 9, 7, 9, 2, 5, 5, 6,
+ 7, 6, 2, 6, 9, 5, 3, 1, 2, 5, 3, 4, 6, 9, 4, 4, 6, 9, 5, 1, 9, 5, 3,
+ 6, 1, 4, 1, 8, 8, 8, 2, 3, 8, 4, 8, 9, 6, 2, 7, 8, 3, 8, 1, 3, 4, 7,
+ 6, 5, 6, 2, 5, 1, 7, 3, 4, 7, 2, 3, 4, 7, 5, 9, 7, 6, 8, 0, 7, 0, 9,
+ 4, 4, 1, 1, 9, 2, 4, 4, 8, 1, 3, 9, 1, 9, 0, 6, 7, 3, 8, 2, 8, 1, 2,
+ 5, 8, 6, 7, 3, 6, 1, 7, 3, 7, 9, 8, 8, 4, 0, 3, 5, 4, 7, 2, 0, 5, 9,
+ 6, 2, 2, 4, 0, 6, 9, 5, 9, 5, 3, 3, 6, 9, 1, 4, 0, 6, 2, 5,
+ };
+ const uint8_t *pow5 =
+ &number_of_digits_decimal_left_shift_table_powers_of_5[pow5_a];
+ uint32_t i = 0;
+ uint32_t n = pow5_b - pow5_a;
+ for (; i < n; i++) {
+ if (i >= h.num_digits) {
+ return num_new_digits - 1;
+ } else if (h.digits[i] == pow5[i]) {
+ continue;
+ } else if (h.digits[i] < pow5[i]) {
+ return num_new_digits - 1;
+ } else {
+ return num_new_digits;
+ }
+ }
+ return num_new_digits;
+}
+
+} // end of anonymous namespace
+
+uint64_t round(decimal &h) {
+ if ((h.num_digits == 0) || (h.decimal_point < 0)) {
+ return 0;
+ } else if (h.decimal_point > 18) {
+ return UINT64_MAX;
+ }
+ // at this point, we know that h.decimal_point >= 0
+ uint32_t dp = uint32_t(h.decimal_point);
+ uint64_t n = 0;
+ for (uint32_t i = 0; i < dp; i++) {
+ n = (10 * n) + ((i < h.num_digits) ? h.digits[i] : 0);
+ }
+ bool round_up = false;
+ if (dp < h.num_digits) {
+ round_up = h.digits[dp] >= 5; // normally, we round up
+ // but we may need to round to even!
+ if ((h.digits[dp] == 5) && (dp + 1 == h.num_digits)) {
+ round_up = h.truncated || ((dp > 0) && (1 & h.digits[dp - 1]));
+ }
+ }
+ if (round_up) {
+ n++;
+ }
+ return n;
+}
+
+// computes h * 2^-shift
+void decimal_left_shift(decimal &h, uint32_t shift) {
+ if (h.num_digits == 0) {
+ return;
+ }
+ uint32_t num_new_digits = number_of_digits_decimal_left_shift(h, shift);
+ int32_t read_index = int32_t(h.num_digits - 1);
+ uint32_t write_index = h.num_digits - 1 + num_new_digits;
+ uint64_t n = 0;
+
+ while (read_index >= 0) {
+ n += uint64_t(h.digits[read_index]) << shift;
+ uint64_t quotient = n / 10;
+ uint64_t remainder = n - (10 * quotient);
+ if (write_index < max_digits) {
+ h.digits[write_index] = uint8_t(remainder);
+ } else if (remainder > 0) {
+ h.truncated = true;
+ }
+ n = quotient;
+ write_index--;
+ read_index--;
+ }
+ while (n > 0) {
+ uint64_t quotient = n / 10;
+ uint64_t remainder = n - (10 * quotient);
+ if (write_index < max_digits) {
+ h.digits[write_index] = uint8_t(remainder);
+ } else if (remainder > 0) {
+ h.truncated = true;
+ }
+ n = quotient;
+ write_index--;
+ }
+ h.num_digits += num_new_digits;
+ if (h.num_digits > max_digits) {
+ h.num_digits = max_digits;
+ }
+ h.decimal_point += int32_t(num_new_digits);
+ trim(h);
+}
+
+// computes h * 2^shift
+void decimal_right_shift(decimal &h, uint32_t shift) {
+ uint32_t read_index = 0;
+ uint32_t write_index = 0;
+
+ uint64_t n = 0;
+
+ while ((n >> shift) == 0) {
+ if (read_index < h.num_digits) {
+ n = (10 * n) + h.digits[read_index++];
+ } else if (n == 0) {
+ return;
+ } else {
+ while ((n >> shift) == 0) {
+ n = 10 * n;
+ read_index++;
+ }
+ break;
+ }
+ }
+ h.decimal_point -= int32_t(read_index - 1);
+ if (h.decimal_point < -decimal_point_range) { // it is zero
+ h.num_digits = 0;
+ h.decimal_point = 0;
+ h.negative = false;
+ h.truncated = false;
+ return;
+ }
+ uint64_t mask = (uint64_t(1) << shift) - 1;
+ while (read_index < h.num_digits) {
+ uint8_t new_digit = uint8_t(n >> shift);
+ n = (10 * (n & mask)) + h.digits[read_index++];
+ h.digits[write_index++] = new_digit;
+ }
+ while (n > 0) {
+ uint8_t new_digit = uint8_t(n >> shift);
+ n = 10 * (n & mask);
+ if (write_index < max_digits) {
+ h.digits[write_index++] = new_digit;
+ } else if (new_digit > 0) {
+ h.truncated = true;
+ }
+ }
+ h.num_digits = write_index;
+ trim(h);
+}
+
+template adjusted_mantissa compute_float(decimal &d) {
+ adjusted_mantissa answer;
+ if (d.num_digits == 0) {
+ // should be zero
+ answer.power2 = 0;
+ answer.mantissa = 0;
+ return answer;
+ }
+ // At this point, going further, we can assume that d.num_digits > 0.
+ // We want to guard against excessive decimal point values because
+ // they can result in long running times. Indeed, we do
+ // shifts by at most 60 bits. We have that log(10**400)/log(2**60) ~= 22
+ // which is fine, but log(10**299995)/log(2**60) ~= 16609 which is not
+ // fine (runs for a long time).
+ //
+ if(d.decimal_point < -324) {
+ // We have something smaller than 1e-324 which is always zero
+ // in binary64 and binary32.
+ // It should be zero.
+ answer.power2 = 0;
+ answer.mantissa = 0;
+ return answer;
+ } else if(d.decimal_point >= 310) {
+ // We have something at least as large as 0.1e310 which is
+ // always infinite.
+ answer.power2 = binary::infinite_power();
+ answer.mantissa = 0;
+ return answer;
+ }
+
+ static const uint32_t max_shift = 60;
+ static const uint32_t num_powers = 19;
+ static const uint8_t powers[19] = {
+ 0, 3, 6, 9, 13, 16, 19, 23, 26, 29, //
+ 33, 36, 39, 43, 46, 49, 53, 56, 59, //
+ };
+ int32_t exp2 = 0;
+ while (d.decimal_point > 0) {
+ uint32_t n = uint32_t(d.decimal_point);
+ uint32_t shift = (n < num_powers) ? powers[n] : max_shift;
+ decimal_right_shift(d, shift);
+ if (d.decimal_point < -decimal_point_range) {
+ // should be zero
+ answer.power2 = 0;
+ answer.mantissa = 0;
+ return answer;
+ }
+ exp2 += int32_t(shift);
+ }
+ // We shift left toward [1/2 ... 1].
+ while (d.decimal_point <= 0) {
+ uint32_t shift;
+ if (d.decimal_point == 0) {
+ if (d.digits[0] >= 5) {
+ break;
+ }
+ shift = (d.digits[0] < 2) ? 2 : 1;
+ } else {
+ uint32_t n = uint32_t(-d.decimal_point);
+ shift = (n < num_powers) ? powers[n] : max_shift;
+ }
+ decimal_left_shift(d, shift);
+ if (d.decimal_point > decimal_point_range) {
+ // we want to get infinity:
+ answer.power2 = 0xFF;
+ answer.mantissa = 0;
+ return answer;
+ }
+ exp2 -= int32_t(shift);
+ }
+ // We are now in the range [1/2 ... 1] but the binary format uses [1 ... 2].
+ exp2--;
+ constexpr int32_t minimum_exponent = binary::minimum_exponent();
+ while ((minimum_exponent + 1) > exp2) {
+ uint32_t n = uint32_t((minimum_exponent + 1) - exp2);
+ if (n > max_shift) {
+ n = max_shift;
+ }
+ decimal_right_shift(d, n);
+ exp2 += int32_t(n);
+ }
+ if ((exp2 - minimum_exponent) >= binary::infinite_power()) {
+ answer.power2 = binary::infinite_power();
+ answer.mantissa = 0;
+ return answer;
+ }
+
+ const int mantissa_size_in_bits = binary::mantissa_explicit_bits() + 1;
+ decimal_left_shift(d, mantissa_size_in_bits);
+
+ uint64_t mantissa = round(d);
+ // It is possible that we have an overflow, in which case we need
+ // to shift back.
+ if (mantissa >= (uint64_t(1) << mantissa_size_in_bits)) {
+ decimal_right_shift(d, 1);
+ exp2 += 1;
+ mantissa = round(d);
+ if ((exp2 - minimum_exponent) >= binary::infinite_power()) {
+ answer.power2 = binary::infinite_power();
+ answer.mantissa = 0;
+ return answer;
+ }
+ }
+ answer.power2 = exp2 - binary::minimum_exponent();
+ if (mantissa < (uint64_t(1) << binary::mantissa_explicit_bits())) {
+ answer.power2--;
+ }
+ answer.mantissa =
+ mantissa & ((uint64_t(1) << binary::mantissa_explicit_bits()) - 1);
+ return answer;
+}
+
+template
+adjusted_mantissa parse_long_mantissa(const char *first) {
+ decimal d = parse_decimal(first);
+ return compute_float(d);
+}
+
+template
+adjusted_mantissa parse_long_mantissa(const char *first, const char *end) {
+ decimal d = parse_decimal(first, end);
+ return compute_float(d);
+}
+
+double from_chars(const char *first) noexcept {
+ bool negative = first[0] == '-';
+ if (negative) {
+ first++;
+ }
+ adjusted_mantissa am = parse_long_mantissa>(first);
+ uint64_t word = am.mantissa;
+ word |= uint64_t(am.power2)
+ << binary_format::mantissa_explicit_bits();
+ word = negative ? word | (uint64_t(1) << binary_format::sign_index())
+ : word;
+ double value;
+ std::memcpy(&value, &word, sizeof(double));
+ return value;
+}
+
+
+double from_chars(const char *first, const char *end) noexcept {
+ bool negative = first[0] == '-';
+ if (negative) {
+ first++;
+ }
+ adjusted_mantissa am = parse_long_mantissa>(first, end);
+ uint64_t word = am.mantissa;
+ word |= uint64_t(am.power2)
+ << binary_format::mantissa_explicit_bits();
+ word = negative ? word | (uint64_t(1) << binary_format::sign_index())
+ : word;
+ double value;
+ std::memcpy(&value, &word, sizeof(double));
+ return value;
+}
+
+} // internal
+} // simdjson
+/* end file src/from_chars.cpp */
+/* begin file src/internal/error_tables.cpp */
+
+namespace simdjson {
+namespace internal {
+
+ SIMDJSON_DLLIMPORTEXPORT const error_code_info error_codes[] {
+ { SUCCESS, "No error" },
+ { CAPACITY, "This parser can't support a document that big" },
+ { MEMALLOC, "Error allocating memory, we're most likely out of memory" },
+ { TAPE_ERROR, "The JSON document has an improper structure: missing or superfluous commas, braces, missing keys, etc." },
+ { DEPTH_ERROR, "The JSON document was too deep (too many nested objects and arrays)" },
+ { STRING_ERROR, "Problem while parsing a string" },
+ { T_ATOM_ERROR, "Problem while parsing an atom starting with the letter 't'" },
+ { F_ATOM_ERROR, "Problem while parsing an atom starting with the letter 'f'" },
+ { N_ATOM_ERROR, "Problem while parsing an atom starting with the letter 'n'" },
+ { NUMBER_ERROR, "Problem while parsing a number" },
+ { UTF8_ERROR, "The input is not valid UTF-8" },
+ { UNINITIALIZED, "Uninitialized" },
+ { EMPTY, "Empty: no JSON found" },
+ { UNESCAPED_CHARS, "Within strings, some characters must be escaped, we found unescaped characters" },
+ { UNCLOSED_STRING, "A string is opened, but never closed." },
+ { UNSUPPORTED_ARCHITECTURE, "simdjson does not have an implementation supported by this CPU architecture (perhaps it's a non-SIMD CPU?)." },
+ { INCORRECT_TYPE, "The JSON element does not have the requested type." },
+ { NUMBER_OUT_OF_RANGE, "The JSON number is too large or too small to fit within the requested type." },
+ { INDEX_OUT_OF_BOUNDS, "Attempted to access an element of a JSON array that is beyond its length." },
+ { NO_SUCH_FIELD, "The JSON field referenced does not exist in this object." },
+ { IO_ERROR, "Error reading the file." },
+ { INVALID_JSON_POINTER, "Invalid JSON pointer syntax." },
+ { INVALID_URI_FRAGMENT, "Invalid URI fragment syntax." },
+ { UNEXPECTED_ERROR, "Unexpected error, consider reporting this problem as you may have found a bug in simdjson" },
+ { PARSER_IN_USE, "Cannot parse a new document while a document is still in use." },
+ { OUT_OF_ORDER_ITERATION, "Objects and arrays can only be iterated when they are first encountered." },
+ { INSUFFICIENT_PADDING, "simdjson requires the input JSON string to have at least SIMDJSON_PADDING extra bytes allocated, beyond the string's length. Consider using the simdjson::padded_string class if needed." },
+ { INCOMPLETE_ARRAY_OR_OBJECT, "JSON document ended early in the middle of an object or array." },
+ { SCALAR_DOCUMENT_AS_VALUE, "A JSON document made of a scalar (number, Boolean, null or string) is treated as a value. Use get_bool(), get_double(), etc. on the document instead. "},
+ { OUT_OF_BOUNDS, "Attempted to access location outside of document."}
+ }; // error_messages[]
+
+} // namespace internal
+} // namespace simdjson
+/* end file src/internal/error_tables.cpp */
+/* begin file src/internal/jsoncharutils_tables.cpp */
+
+namespace simdjson {
+namespace internal {
+
+// structural chars here are
+// they are { 0x7b } 0x7d : 0x3a [ 0x5b ] 0x5d , 0x2c (and NULL)
+// we are also interested in the four whitespace characters
+// space 0x20, linefeed 0x0a, horizontal tab 0x09 and carriage return 0x0d
+
+SIMDJSON_DLLIMPORTEXPORT const bool structural_or_whitespace_negated[256] = {
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
+
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,
+
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
+
+SIMDJSON_DLLIMPORTEXPORT const bool structural_or_whitespace[256] = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+
+SIMDJSON_DLLIMPORTEXPORT const uint32_t digit_to_val32[886] = {
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0x0, 0x1, 0x2, 0x3, 0x4, 0x5,
+ 0x6, 0x7, 0x8, 0x9, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xa,
+ 0xb, 0xc, 0xd, 0xe, 0xf, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xa, 0xb, 0xc, 0xd, 0xe,
+ 0xf, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0x0, 0x10, 0x20, 0x30, 0x40, 0x50,
+ 0x60, 0x70, 0x80, 0x90, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xa0,
+ 0xb0, 0xc0, 0xd0, 0xe0, 0xf0, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xa0, 0xb0, 0xc0, 0xd0, 0xe0,
+ 0xf0, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0x0, 0x100, 0x200, 0x300, 0x400, 0x500,
+ 0x600, 0x700, 0x800, 0x900, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xa00,
+ 0xb00, 0xc00, 0xd00, 0xe00, 0xf00, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xa00, 0xb00, 0xc00, 0xd00, 0xe00,
+ 0xf00, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0x0, 0x1000, 0x2000, 0x3000, 0x4000, 0x5000,
+ 0x6000, 0x7000, 0x8000, 0x9000, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xa000,
+ 0xb000, 0xc000, 0xd000, 0xe000, 0xf000, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xa000, 0xb000, 0xc000, 0xd000, 0xe000,
+ 0xf000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
+ 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF};
+
+} // namespace internal
+} // namespace simdjson
+/* end file src/internal/jsoncharutils_tables.cpp */
+/* begin file src/internal/numberparsing_tables.cpp */
+
+namespace simdjson {
+namespace internal {
+
+// Precomputed powers of ten from 10^0 to 10^22. These
+// can be represented exactly using the double type.
+SIMDJSON_DLLIMPORTEXPORT const double power_of_ten[] = {
+ 1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10, 1e11,
+ 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19, 1e20, 1e21, 1e22};
+
+/**
+ * When mapping numbers from decimal to binary,
+ * we go from w * 10^q to m * 2^p but we have
+ * 10^q = 5^q * 2^q, so effectively
+ * we are trying to match
+ * w * 2^q * 5^q to m * 2^p. Thus the powers of two
+ * are not a concern since they can be represented
+ * exactly using the binary notation, only the powers of five
+ * affect the binary significand.
+ */
+
+
+// The truncated powers of five from 5^-342 all the way to 5^308
+// The mantissa is truncated to 128 bits, and
+// never rounded up. Uses about 10KB.
+SIMDJSON_DLLIMPORTEXPORT const uint64_t power_of_five_128[]= {
+ 0xeef453d6923bd65a,0x113faa2906a13b3f,
+ 0x9558b4661b6565f8,0x4ac7ca59a424c507,
+ 0xbaaee17fa23ebf76,0x5d79bcf00d2df649,
+ 0xe95a99df8ace6f53,0xf4d82c2c107973dc,
+ 0x91d8a02bb6c10594,0x79071b9b8a4be869,
+ 0xb64ec836a47146f9,0x9748e2826cdee284,
+ 0xe3e27a444d8d98b7,0xfd1b1b2308169b25,
+ 0x8e6d8c6ab0787f72,0xfe30f0f5e50e20f7,
+ 0xb208ef855c969f4f,0xbdbd2d335e51a935,
+ 0xde8b2b66b3bc4723,0xad2c788035e61382,
+ 0x8b16fb203055ac76,0x4c3bcb5021afcc31,
+ 0xaddcb9e83c6b1793,0xdf4abe242a1bbf3d,
+ 0xd953e8624b85dd78,0xd71d6dad34a2af0d,
+ 0x87d4713d6f33aa6b,0x8672648c40e5ad68,
+ 0xa9c98d8ccb009506,0x680efdaf511f18c2,
+ 0xd43bf0effdc0ba48,0x212bd1b2566def2,
+ 0x84a57695fe98746d,0x14bb630f7604b57,
+ 0xa5ced43b7e3e9188,0x419ea3bd35385e2d,
+ 0xcf42894a5dce35ea,0x52064cac828675b9,
+ 0x818995ce7aa0e1b2,0x7343efebd1940993,
+ 0xa1ebfb4219491a1f,0x1014ebe6c5f90bf8,
+ 0xca66fa129f9b60a6,0xd41a26e077774ef6,
+ 0xfd00b897478238d0,0x8920b098955522b4,
+ 0x9e20735e8cb16382,0x55b46e5f5d5535b0,
+ 0xc5a890362fddbc62,0xeb2189f734aa831d,
+ 0xf712b443bbd52b7b,0xa5e9ec7501d523e4,
+ 0x9a6bb0aa55653b2d,0x47b233c92125366e,
+ 0xc1069cd4eabe89f8,0x999ec0bb696e840a,
+ 0xf148440a256e2c76,0xc00670ea43ca250d,
+ 0x96cd2a865764dbca,0x380406926a5e5728,
+ 0xbc807527ed3e12bc,0xc605083704f5ecf2,
+ 0xeba09271e88d976b,0xf7864a44c633682e,
+ 0x93445b8731587ea3,0x7ab3ee6afbe0211d,
+ 0xb8157268fdae9e4c,0x5960ea05bad82964,
+ 0xe61acf033d1a45df,0x6fb92487298e33bd,
+ 0x8fd0c16206306bab,0xa5d3b6d479f8e056,
+ 0xb3c4f1ba87bc8696,0x8f48a4899877186c,
+ 0xe0b62e2929aba83c,0x331acdabfe94de87,
+ 0x8c71dcd9ba0b4925,0x9ff0c08b7f1d0b14,
+ 0xaf8e5410288e1b6f,0x7ecf0ae5ee44dd9,
+ 0xdb71e91432b1a24a,0xc9e82cd9f69d6150,
+ 0x892731ac9faf056e,0xbe311c083a225cd2,
+ 0xab70fe17c79ac6ca,0x6dbd630a48aaf406,
+ 0xd64d3d9db981787d,0x92cbbccdad5b108,
+ 0x85f0468293f0eb4e,0x25bbf56008c58ea5,
+ 0xa76c582338ed2621,0xaf2af2b80af6f24e,
+ 0xd1476e2c07286faa,0x1af5af660db4aee1,
+ 0x82cca4db847945ca,0x50d98d9fc890ed4d,
+ 0xa37fce126597973c,0xe50ff107bab528a0,
+ 0xcc5fc196fefd7d0c,0x1e53ed49a96272c8,
+ 0xff77b1fcbebcdc4f,0x25e8e89c13bb0f7a,
+ 0x9faacf3df73609b1,0x77b191618c54e9ac,
+ 0xc795830d75038c1d,0xd59df5b9ef6a2417,
+ 0xf97ae3d0d2446f25,0x4b0573286b44ad1d,
+ 0x9becce62836ac577,0x4ee367f9430aec32,
+ 0xc2e801fb244576d5,0x229c41f793cda73f,
+ 0xf3a20279ed56d48a,0x6b43527578c1110f,
+ 0x9845418c345644d6,0x830a13896b78aaa9,
+ 0xbe5691ef416bd60c,0x23cc986bc656d553,
+ 0xedec366b11c6cb8f,0x2cbfbe86b7ec8aa8,
+ 0x94b3a202eb1c3f39,0x7bf7d71432f3d6a9,
+ 0xb9e08a83a5e34f07,0xdaf5ccd93fb0cc53,
+ 0xe858ad248f5c22c9,0xd1b3400f8f9cff68,
+ 0x91376c36d99995be,0x23100809b9c21fa1,
+ 0xb58547448ffffb2d,0xabd40a0c2832a78a,
+ 0xe2e69915b3fff9f9,0x16c90c8f323f516c,
+ 0x8dd01fad907ffc3b,0xae3da7d97f6792e3,
+ 0xb1442798f49ffb4a,0x99cd11cfdf41779c,
+ 0xdd95317f31c7fa1d,0x40405643d711d583,
+ 0x8a7d3eef7f1cfc52,0x482835ea666b2572,
+ 0xad1c8eab5ee43b66,0xda3243650005eecf,
+ 0xd863b256369d4a40,0x90bed43e40076a82,
+ 0x873e4f75e2224e68,0x5a7744a6e804a291,
+ 0xa90de3535aaae202,0x711515d0a205cb36,
+ 0xd3515c2831559a83,0xd5a5b44ca873e03,
+ 0x8412d9991ed58091,0xe858790afe9486c2,
+ 0xa5178fff668ae0b6,0x626e974dbe39a872,
+ 0xce5d73ff402d98e3,0xfb0a3d212dc8128f,
+ 0x80fa687f881c7f8e,0x7ce66634bc9d0b99,
+ 0xa139029f6a239f72,0x1c1fffc1ebc44e80,
+ 0xc987434744ac874e,0xa327ffb266b56220,
+ 0xfbe9141915d7a922,0x4bf1ff9f0062baa8,
+ 0x9d71ac8fada6c9b5,0x6f773fc3603db4a9,
+ 0xc4ce17b399107c22,0xcb550fb4384d21d3,
+ 0xf6019da07f549b2b,0x7e2a53a146606a48,
+ 0x99c102844f94e0fb,0x2eda7444cbfc426d,
+ 0xc0314325637a1939,0xfa911155fefb5308,
+ 0xf03d93eebc589f88,0x793555ab7eba27ca,
+ 0x96267c7535b763b5,0x4bc1558b2f3458de,
+ 0xbbb01b9283253ca2,0x9eb1aaedfb016f16,
+ 0xea9c227723ee8bcb,0x465e15a979c1cadc,
+ 0x92a1958a7675175f,0xbfacd89ec191ec9,
+ 0xb749faed14125d36,0xcef980ec671f667b,
+ 0xe51c79a85916f484,0x82b7e12780e7401a,
+ 0x8f31cc0937ae58d2,0xd1b2ecb8b0908810,
+ 0xb2fe3f0b8599ef07,0x861fa7e6dcb4aa15,
+ 0xdfbdcece67006ac9,0x67a791e093e1d49a,
+ 0x8bd6a141006042bd,0xe0c8bb2c5c6d24e0,
+ 0xaecc49914078536d,0x58fae9f773886e18,
+ 0xda7f5bf590966848,0xaf39a475506a899e,
+ 0x888f99797a5e012d,0x6d8406c952429603,
+ 0xaab37fd7d8f58178,0xc8e5087ba6d33b83,
+ 0xd5605fcdcf32e1d6,0xfb1e4a9a90880a64,
+ 0x855c3be0a17fcd26,0x5cf2eea09a55067f,
+ 0xa6b34ad8c9dfc06f,0xf42faa48c0ea481e,
+ 0xd0601d8efc57b08b,0xf13b94daf124da26,
+ 0x823c12795db6ce57,0x76c53d08d6b70858,
+ 0xa2cb1717b52481ed,0x54768c4b0c64ca6e,
+ 0xcb7ddcdda26da268,0xa9942f5dcf7dfd09,
+ 0xfe5d54150b090b02,0xd3f93b35435d7c4c,
+ 0x9efa548d26e5a6e1,0xc47bc5014a1a6daf,
+ 0xc6b8e9b0709f109a,0x359ab6419ca1091b,
+ 0xf867241c8cc6d4c0,0xc30163d203c94b62,
+ 0x9b407691d7fc44f8,0x79e0de63425dcf1d,
+ 0xc21094364dfb5636,0x985915fc12f542e4,
+ 0xf294b943e17a2bc4,0x3e6f5b7b17b2939d,
+ 0x979cf3ca6cec5b5a,0xa705992ceecf9c42,
+ 0xbd8430bd08277231,0x50c6ff782a838353,
+ 0xece53cec4a314ebd,0xa4f8bf5635246428,
+ 0x940f4613ae5ed136,0x871b7795e136be99,
+ 0xb913179899f68584,0x28e2557b59846e3f,
+ 0xe757dd7ec07426e5,0x331aeada2fe589cf,
+ 0x9096ea6f3848984f,0x3ff0d2c85def7621,
+ 0xb4bca50b065abe63,0xfed077a756b53a9,
+ 0xe1ebce4dc7f16dfb,0xd3e8495912c62894,
+ 0x8d3360f09cf6e4bd,0x64712dd7abbbd95c,
+ 0xb080392cc4349dec,0xbd8d794d96aacfb3,
+ 0xdca04777f541c567,0xecf0d7a0fc5583a0,
+ 0x89e42caaf9491b60,0xf41686c49db57244,
+ 0xac5d37d5b79b6239,0x311c2875c522ced5,
+ 0xd77485cb25823ac7,0x7d633293366b828b,
+ 0x86a8d39ef77164bc,0xae5dff9c02033197,
+ 0xa8530886b54dbdeb,0xd9f57f830283fdfc,
+ 0xd267caa862a12d66,0xd072df63c324fd7b,
+ 0x8380dea93da4bc60,0x4247cb9e59f71e6d,
+ 0xa46116538d0deb78,0x52d9be85f074e608,
+ 0xcd795be870516656,0x67902e276c921f8b,
+ 0x806bd9714632dff6,0xba1cd8a3db53b6,
+ 0xa086cfcd97bf97f3,0x80e8a40eccd228a4,
+ 0xc8a883c0fdaf7df0,0x6122cd128006b2cd,
+ 0xfad2a4b13d1b5d6c,0x796b805720085f81,
+ 0x9cc3a6eec6311a63,0xcbe3303674053bb0,
+ 0xc3f490aa77bd60fc,0xbedbfc4411068a9c,
+ 0xf4f1b4d515acb93b,0xee92fb5515482d44,
+ 0x991711052d8bf3c5,0x751bdd152d4d1c4a,
+ 0xbf5cd54678eef0b6,0xd262d45a78a0635d,
+ 0xef340a98172aace4,0x86fb897116c87c34,
+ 0x9580869f0e7aac0e,0xd45d35e6ae3d4da0,
+ 0xbae0a846d2195712,0x8974836059cca109,
+ 0xe998d258869facd7,0x2bd1a438703fc94b,
+ 0x91ff83775423cc06,0x7b6306a34627ddcf,
+ 0xb67f6455292cbf08,0x1a3bc84c17b1d542,
+ 0xe41f3d6a7377eeca,0x20caba5f1d9e4a93,
+ 0x8e938662882af53e,0x547eb47b7282ee9c,
+ 0xb23867fb2a35b28d,0xe99e619a4f23aa43,
+ 0xdec681f9f4c31f31,0x6405fa00e2ec94d4,
+ 0x8b3c113c38f9f37e,0xde83bc408dd3dd04,
+ 0xae0b158b4738705e,0x9624ab50b148d445,
+ 0xd98ddaee19068c76,0x3badd624dd9b0957,
+ 0x87f8a8d4cfa417c9,0xe54ca5d70a80e5d6,
+ 0xa9f6d30a038d1dbc,0x5e9fcf4ccd211f4c,
+ 0xd47487cc8470652b,0x7647c3200069671f,
+ 0x84c8d4dfd2c63f3b,0x29ecd9f40041e073,
+ 0xa5fb0a17c777cf09,0xf468107100525890,
+ 0xcf79cc9db955c2cc,0x7182148d4066eeb4,
+ 0x81ac1fe293d599bf,0xc6f14cd848405530,
+ 0xa21727db38cb002f,0xb8ada00e5a506a7c,
+ 0xca9cf1d206fdc03b,0xa6d90811f0e4851c,
+ 0xfd442e4688bd304a,0x908f4a166d1da663,
+ 0x9e4a9cec15763e2e,0x9a598e4e043287fe,
+ 0xc5dd44271ad3cdba,0x40eff1e1853f29fd,
+ 0xf7549530e188c128,0xd12bee59e68ef47c,
+ 0x9a94dd3e8cf578b9,0x82bb74f8301958ce,
+ 0xc13a148e3032d6e7,0xe36a52363c1faf01,
+ 0xf18899b1bc3f8ca1,0xdc44e6c3cb279ac1,
+ 0x96f5600f15a7b7e5,0x29ab103a5ef8c0b9,
+ 0xbcb2b812db11a5de,0x7415d448f6b6f0e7,
+ 0xebdf661791d60f56,0x111b495b3464ad21,
+ 0x936b9fcebb25c995,0xcab10dd900beec34,
+ 0xb84687c269ef3bfb,0x3d5d514f40eea742,
+ 0xe65829b3046b0afa,0xcb4a5a3112a5112,
+ 0x8ff71a0fe2c2e6dc,0x47f0e785eaba72ab,
+ 0xb3f4e093db73a093,0x59ed216765690f56,
+ 0xe0f218b8d25088b8,0x306869c13ec3532c,
+ 0x8c974f7383725573,0x1e414218c73a13fb,
+ 0xafbd2350644eeacf,0xe5d1929ef90898fa,
+ 0xdbac6c247d62a583,0xdf45f746b74abf39,
+ 0x894bc396ce5da772,0x6b8bba8c328eb783,
+ 0xab9eb47c81f5114f,0x66ea92f3f326564,
+ 0xd686619ba27255a2,0xc80a537b0efefebd,
+ 0x8613fd0145877585,0xbd06742ce95f5f36,
+ 0xa798fc4196e952e7,0x2c48113823b73704,
+ 0xd17f3b51fca3a7a0,0xf75a15862ca504c5,
+ 0x82ef85133de648c4,0x9a984d73dbe722fb,
+ 0xa3ab66580d5fdaf5,0xc13e60d0d2e0ebba,
+ 0xcc963fee10b7d1b3,0x318df905079926a8,
+ 0xffbbcfe994e5c61f,0xfdf17746497f7052,
+ 0x9fd561f1fd0f9bd3,0xfeb6ea8bedefa633,
+ 0xc7caba6e7c5382c8,0xfe64a52ee96b8fc0,
+ 0xf9bd690a1b68637b,0x3dfdce7aa3c673b0,
+ 0x9c1661a651213e2d,0x6bea10ca65c084e,
+ 0xc31bfa0fe5698db8,0x486e494fcff30a62,
+ 0xf3e2f893dec3f126,0x5a89dba3c3efccfa,
+ 0x986ddb5c6b3a76b7,0xf89629465a75e01c,
+ 0xbe89523386091465,0xf6bbb397f1135823,
+ 0xee2ba6c0678b597f,0x746aa07ded582e2c,
+ 0x94db483840b717ef,0xa8c2a44eb4571cdc,
+ 0xba121a4650e4ddeb,0x92f34d62616ce413,
+ 0xe896a0d7e51e1566,0x77b020baf9c81d17,
+ 0x915e2486ef32cd60,0xace1474dc1d122e,
+ 0xb5b5ada8aaff80b8,0xd819992132456ba,
+ 0xe3231912d5bf60e6,0x10e1fff697ed6c69,
+ 0x8df5efabc5979c8f,0xca8d3ffa1ef463c1,
+ 0xb1736b96b6fd83b3,0xbd308ff8a6b17cb2,
+ 0xddd0467c64bce4a0,0xac7cb3f6d05ddbde,
+ 0x8aa22c0dbef60ee4,0x6bcdf07a423aa96b,
+ 0xad4ab7112eb3929d,0x86c16c98d2c953c6,
+ 0xd89d64d57a607744,0xe871c7bf077ba8b7,
+ 0x87625f056c7c4a8b,0x11471cd764ad4972,
+ 0xa93af6c6c79b5d2d,0xd598e40d3dd89bcf,
+ 0xd389b47879823479,0x4aff1d108d4ec2c3,
+ 0x843610cb4bf160cb,0xcedf722a585139ba,
+ 0xa54394fe1eedb8fe,0xc2974eb4ee658828,
+ 0xce947a3da6a9273e,0x733d226229feea32,
+ 0x811ccc668829b887,0x806357d5a3f525f,
+ 0xa163ff802a3426a8,0xca07c2dcb0cf26f7,
+ 0xc9bcff6034c13052,0xfc89b393dd02f0b5,
+ 0xfc2c3f3841f17c67,0xbbac2078d443ace2,
+ 0x9d9ba7832936edc0,0xd54b944b84aa4c0d,
+ 0xc5029163f384a931,0xa9e795e65d4df11,
+ 0xf64335bcf065d37d,0x4d4617b5ff4a16d5,
+ 0x99ea0196163fa42e,0x504bced1bf8e4e45,
+ 0xc06481fb9bcf8d39,0xe45ec2862f71e1d6,
+ 0xf07da27a82c37088,0x5d767327bb4e5a4c,
+ 0x964e858c91ba2655,0x3a6a07f8d510f86f,
+ 0xbbe226efb628afea,0x890489f70a55368b,
+ 0xeadab0aba3b2dbe5,0x2b45ac74ccea842e,
+ 0x92c8ae6b464fc96f,0x3b0b8bc90012929d,
+ 0xb77ada0617e3bbcb,0x9ce6ebb40173744,
+ 0xe55990879ddcaabd,0xcc420a6a101d0515,
+ 0x8f57fa54c2a9eab6,0x9fa946824a12232d,
+ 0xb32df8e9f3546564,0x47939822dc96abf9,
+ 0xdff9772470297ebd,0x59787e2b93bc56f7,
+ 0x8bfbea76c619ef36,0x57eb4edb3c55b65a,
+ 0xaefae51477a06b03,0xede622920b6b23f1,
+ 0xdab99e59958885c4,0xe95fab368e45eced,
+ 0x88b402f7fd75539b,0x11dbcb0218ebb414,
+ 0xaae103b5fcd2a881,0xd652bdc29f26a119,
+ 0xd59944a37c0752a2,0x4be76d3346f0495f,
+ 0x857fcae62d8493a5,0x6f70a4400c562ddb,
+ 0xa6dfbd9fb8e5b88e,0xcb4ccd500f6bb952,
+ 0xd097ad07a71f26b2,0x7e2000a41346a7a7,
+ 0x825ecc24c873782f,0x8ed400668c0c28c8,
+ 0xa2f67f2dfa90563b,0x728900802f0f32fa,
+ 0xcbb41ef979346bca,0x4f2b40a03ad2ffb9,
+ 0xfea126b7d78186bc,0xe2f610c84987bfa8,
+ 0x9f24b832e6b0f436,0xdd9ca7d2df4d7c9,
+ 0xc6ede63fa05d3143,0x91503d1c79720dbb,
+ 0xf8a95fcf88747d94,0x75a44c6397ce912a,
+ 0x9b69dbe1b548ce7c,0xc986afbe3ee11aba,
+ 0xc24452da229b021b,0xfbe85badce996168,
+ 0xf2d56790ab41c2a2,0xfae27299423fb9c3,
+ 0x97c560ba6b0919a5,0xdccd879fc967d41a,
+ 0xbdb6b8e905cb600f,0x5400e987bbc1c920,
+ 0xed246723473e3813,0x290123e9aab23b68,
+ 0x9436c0760c86e30b,0xf9a0b6720aaf6521,
+ 0xb94470938fa89bce,0xf808e40e8d5b3e69,
+ 0xe7958cb87392c2c2,0xb60b1d1230b20e04,
+ 0x90bd77f3483bb9b9,0xb1c6f22b5e6f48c2,
+ 0xb4ecd5f01a4aa828,0x1e38aeb6360b1af3,
+ 0xe2280b6c20dd5232,0x25c6da63c38de1b0,
+ 0x8d590723948a535f,0x579c487e5a38ad0e,
+ 0xb0af48ec79ace837,0x2d835a9df0c6d851,
+ 0xdcdb1b2798182244,0xf8e431456cf88e65,
+ 0x8a08f0f8bf0f156b,0x1b8e9ecb641b58ff,
+ 0xac8b2d36eed2dac5,0xe272467e3d222f3f,
+ 0xd7adf884aa879177,0x5b0ed81dcc6abb0f,
+ 0x86ccbb52ea94baea,0x98e947129fc2b4e9,
+ 0xa87fea27a539e9a5,0x3f2398d747b36224,
+ 0xd29fe4b18e88640e,0x8eec7f0d19a03aad,
+ 0x83a3eeeef9153e89,0x1953cf68300424ac,
+ 0xa48ceaaab75a8e2b,0x5fa8c3423c052dd7,
+ 0xcdb02555653131b6,0x3792f412cb06794d,
+ 0x808e17555f3ebf11,0xe2bbd88bbee40bd0,
+ 0xa0b19d2ab70e6ed6,0x5b6aceaeae9d0ec4,
+ 0xc8de047564d20a8b,0xf245825a5a445275,
+ 0xfb158592be068d2e,0xeed6e2f0f0d56712,
+ 0x9ced737bb6c4183d,0x55464dd69685606b,
+ 0xc428d05aa4751e4c,0xaa97e14c3c26b886,
+ 0xf53304714d9265df,0xd53dd99f4b3066a8,
+ 0x993fe2c6d07b7fab,0xe546a8038efe4029,
+ 0xbf8fdb78849a5f96,0xde98520472bdd033,
+ 0xef73d256a5c0f77c,0x963e66858f6d4440,
+ 0x95a8637627989aad,0xdde7001379a44aa8,
+ 0xbb127c53b17ec159,0x5560c018580d5d52,
+ 0xe9d71b689dde71af,0xaab8f01e6e10b4a6,
+ 0x9226712162ab070d,0xcab3961304ca70e8,
+ 0xb6b00d69bb55c8d1,0x3d607b97c5fd0d22,
+ 0xe45c10c42a2b3b05,0x8cb89a7db77c506a,
+ 0x8eb98a7a9a5b04e3,0x77f3608e92adb242,
+ 0xb267ed1940f1c61c,0x55f038b237591ed3,
+ 0xdf01e85f912e37a3,0x6b6c46dec52f6688,
+ 0x8b61313bbabce2c6,0x2323ac4b3b3da015,
+ 0xae397d8aa96c1b77,0xabec975e0a0d081a,
+ 0xd9c7dced53c72255,0x96e7bd358c904a21,
+ 0x881cea14545c7575,0x7e50d64177da2e54,
+ 0xaa242499697392d2,0xdde50bd1d5d0b9e9,
+ 0xd4ad2dbfc3d07787,0x955e4ec64b44e864,
+ 0x84ec3c97da624ab4,0xbd5af13bef0b113e,
+ 0xa6274bbdd0fadd61,0xecb1ad8aeacdd58e,
+ 0xcfb11ead453994ba,0x67de18eda5814af2,
+ 0x81ceb32c4b43fcf4,0x80eacf948770ced7,
+ 0xa2425ff75e14fc31,0xa1258379a94d028d,
+ 0xcad2f7f5359a3b3e,0x96ee45813a04330,
+ 0xfd87b5f28300ca0d,0x8bca9d6e188853fc,
+ 0x9e74d1b791e07e48,0x775ea264cf55347e,
+ 0xc612062576589dda,0x95364afe032a81a0,
+ 0xf79687aed3eec551,0x3a83ddbd83f52210,
+ 0x9abe14cd44753b52,0xc4926a9672793580,
+ 0xc16d9a0095928a27,0x75b7053c0f178400,
+ 0xf1c90080baf72cb1,0x5324c68b12dd6800,
+ 0x971da05074da7bee,0xd3f6fc16ebca8000,
+ 0xbce5086492111aea,0x88f4bb1ca6bd0000,
+ 0xec1e4a7db69561a5,0x2b31e9e3d0700000,
+ 0x9392ee8e921d5d07,0x3aff322e62600000,
+ 0xb877aa3236a4b449,0x9befeb9fad487c3,
+ 0xe69594bec44de15b,0x4c2ebe687989a9b4,
+ 0x901d7cf73ab0acd9,0xf9d37014bf60a11,
+ 0xb424dc35095cd80f,0x538484c19ef38c95,
+ 0xe12e13424bb40e13,0x2865a5f206b06fba,
+ 0x8cbccc096f5088cb,0xf93f87b7442e45d4,
+ 0xafebff0bcb24aafe,0xf78f69a51539d749,
+ 0xdbe6fecebdedd5be,0xb573440e5a884d1c,
+ 0x89705f4136b4a597,0x31680a88f8953031,
+ 0xabcc77118461cefc,0xfdc20d2b36ba7c3e,
+ 0xd6bf94d5e57a42bc,0x3d32907604691b4d,
+ 0x8637bd05af6c69b5,0xa63f9a49c2c1b110,
+ 0xa7c5ac471b478423,0xfcf80dc33721d54,
+ 0xd1b71758e219652b,0xd3c36113404ea4a9,
+ 0x83126e978d4fdf3b,0x645a1cac083126ea,
+ 0xa3d70a3d70a3d70a,0x3d70a3d70a3d70a4,
+ 0xcccccccccccccccc,0xcccccccccccccccd,
+ 0x8000000000000000,0x0,
+ 0xa000000000000000,0x0,
+ 0xc800000000000000,0x0,
+ 0xfa00000000000000,0x0,
+ 0x9c40000000000000,0x0,
+ 0xc350000000000000,0x0,
+ 0xf424000000000000,0x0,
+ 0x9896800000000000,0x0,
+ 0xbebc200000000000,0x0,
+ 0xee6b280000000000,0x0,
+ 0x9502f90000000000,0x0,
+ 0xba43b74000000000,0x0,
+ 0xe8d4a51000000000,0x0,
+ 0x9184e72a00000000,0x0,
+ 0xb5e620f480000000,0x0,
+ 0xe35fa931a0000000,0x0,
+ 0x8e1bc9bf04000000,0x0,
+ 0xb1a2bc2ec5000000,0x0,
+ 0xde0b6b3a76400000,0x0,
+ 0x8ac7230489e80000,0x0,
+ 0xad78ebc5ac620000,0x0,
+ 0xd8d726b7177a8000,0x0,
+ 0x878678326eac9000,0x0,
+ 0xa968163f0a57b400,0x0,
+ 0xd3c21bcecceda100,0x0,
+ 0x84595161401484a0,0x0,
+ 0xa56fa5b99019a5c8,0x0,
+ 0xcecb8f27f4200f3a,0x0,
+ 0x813f3978f8940984,0x4000000000000000,
+ 0xa18f07d736b90be5,0x5000000000000000,
+ 0xc9f2c9cd04674ede,0xa400000000000000,
+ 0xfc6f7c4045812296,0x4d00000000000000,
+ 0x9dc5ada82b70b59d,0xf020000000000000,
+ 0xc5371912364ce305,0x6c28000000000000,
+ 0xf684df56c3e01bc6,0xc732000000000000,
+ 0x9a130b963a6c115c,0x3c7f400000000000,
+ 0xc097ce7bc90715b3,0x4b9f100000000000,
+ 0xf0bdc21abb48db20,0x1e86d40000000000,
+ 0x96769950b50d88f4,0x1314448000000000,
+ 0xbc143fa4e250eb31,0x17d955a000000000,
+ 0xeb194f8e1ae525fd,0x5dcfab0800000000,
+ 0x92efd1b8d0cf37be,0x5aa1cae500000000,
+ 0xb7abc627050305ad,0xf14a3d9e40000000,
+ 0xe596b7b0c643c719,0x6d9ccd05d0000000,
+ 0x8f7e32ce7bea5c6f,0xe4820023a2000000,
+ 0xb35dbf821ae4f38b,0xdda2802c8a800000,
+ 0xe0352f62a19e306e,0xd50b2037ad200000,
+ 0x8c213d9da502de45,0x4526f422cc340000,
+ 0xaf298d050e4395d6,0x9670b12b7f410000,
+ 0xdaf3f04651d47b4c,0x3c0cdd765f114000,
+ 0x88d8762bf324cd0f,0xa5880a69fb6ac800,
+ 0xab0e93b6efee0053,0x8eea0d047a457a00,
+ 0xd5d238a4abe98068,0x72a4904598d6d880,
+ 0x85a36366eb71f041,0x47a6da2b7f864750,
+ 0xa70c3c40a64e6c51,0x999090b65f67d924,
+ 0xd0cf4b50cfe20765,0xfff4b4e3f741cf6d,
+ 0x82818f1281ed449f,0xbff8f10e7a8921a4,
+ 0xa321f2d7226895c7,0xaff72d52192b6a0d,
+ 0xcbea6f8ceb02bb39,0x9bf4f8a69f764490,
+ 0xfee50b7025c36a08,0x2f236d04753d5b4,
+ 0x9f4f2726179a2245,0x1d762422c946590,
+ 0xc722f0ef9d80aad6,0x424d3ad2b7b97ef5,
+ 0xf8ebad2b84e0d58b,0xd2e0898765a7deb2,
+ 0x9b934c3b330c8577,0x63cc55f49f88eb2f,
+ 0xc2781f49ffcfa6d5,0x3cbf6b71c76b25fb,
+ 0xf316271c7fc3908a,0x8bef464e3945ef7a,
+ 0x97edd871cfda3a56,0x97758bf0e3cbb5ac,
+ 0xbde94e8e43d0c8ec,0x3d52eeed1cbea317,
+ 0xed63a231d4c4fb27,0x4ca7aaa863ee4bdd,
+ 0x945e455f24fb1cf8,0x8fe8caa93e74ef6a,
+ 0xb975d6b6ee39e436,0xb3e2fd538e122b44,
+ 0xe7d34c64a9c85d44,0x60dbbca87196b616,
+ 0x90e40fbeea1d3a4a,0xbc8955e946fe31cd,
+ 0xb51d13aea4a488dd,0x6babab6398bdbe41,
+ 0xe264589a4dcdab14,0xc696963c7eed2dd1,
+ 0x8d7eb76070a08aec,0xfc1e1de5cf543ca2,
+ 0xb0de65388cc8ada8,0x3b25a55f43294bcb,
+ 0xdd15fe86affad912,0x49ef0eb713f39ebe,
+ 0x8a2dbf142dfcc7ab,0x6e3569326c784337,
+ 0xacb92ed9397bf996,0x49c2c37f07965404,
+ 0xd7e77a8f87daf7fb,0xdc33745ec97be906,
+ 0x86f0ac99b4e8dafd,0x69a028bb3ded71a3,
+ 0xa8acd7c0222311bc,0xc40832ea0d68ce0c,
+ 0xd2d80db02aabd62b,0xf50a3fa490c30190,
+ 0x83c7088e1aab65db,0x792667c6da79e0fa,
+ 0xa4b8cab1a1563f52,0x577001b891185938,
+ 0xcde6fd5e09abcf26,0xed4c0226b55e6f86,
+ 0x80b05e5ac60b6178,0x544f8158315b05b4,
+ 0xa0dc75f1778e39d6,0x696361ae3db1c721,
+ 0xc913936dd571c84c,0x3bc3a19cd1e38e9,
+ 0xfb5878494ace3a5f,0x4ab48a04065c723,
+ 0x9d174b2dcec0e47b,0x62eb0d64283f9c76,
+ 0xc45d1df942711d9a,0x3ba5d0bd324f8394,
+ 0xf5746577930d6500,0xca8f44ec7ee36479,
+ 0x9968bf6abbe85f20,0x7e998b13cf4e1ecb,
+ 0xbfc2ef456ae276e8,0x9e3fedd8c321a67e,
+ 0xefb3ab16c59b14a2,0xc5cfe94ef3ea101e,
+ 0x95d04aee3b80ece5,0xbba1f1d158724a12,
+ 0xbb445da9ca61281f,0x2a8a6e45ae8edc97,
+ 0xea1575143cf97226,0xf52d09d71a3293bd,
+ 0x924d692ca61be758,0x593c2626705f9c56,
+ 0xb6e0c377cfa2e12e,0x6f8b2fb00c77836c,
+ 0xe498f455c38b997a,0xb6dfb9c0f956447,
+ 0x8edf98b59a373fec,0x4724bd4189bd5eac,
+ 0xb2977ee300c50fe7,0x58edec91ec2cb657,
+ 0xdf3d5e9bc0f653e1,0x2f2967b66737e3ed,
+ 0x8b865b215899f46c,0xbd79e0d20082ee74,
+ 0xae67f1e9aec07187,0xecd8590680a3aa11,
+ 0xda01ee641a708de9,0xe80e6f4820cc9495,
+ 0x884134fe908658b2,0x3109058d147fdcdd,
+ 0xaa51823e34a7eede,0xbd4b46f0599fd415,
+ 0xd4e5e2cdc1d1ea96,0x6c9e18ac7007c91a,
+ 0x850fadc09923329e,0x3e2cf6bc604ddb0,
+ 0xa6539930bf6bff45,0x84db8346b786151c,
+ 0xcfe87f7cef46ff16,0xe612641865679a63,
+ 0x81f14fae158c5f6e,0x4fcb7e8f3f60c07e,
+ 0xa26da3999aef7749,0xe3be5e330f38f09d,
+ 0xcb090c8001ab551c,0x5cadf5bfd3072cc5,
+ 0xfdcb4fa002162a63,0x73d9732fc7c8f7f6,
+ 0x9e9f11c4014dda7e,0x2867e7fddcdd9afa,
+ 0xc646d63501a1511d,0xb281e1fd541501b8,
+ 0xf7d88bc24209a565,0x1f225a7ca91a4226,
+ 0x9ae757596946075f,0x3375788de9b06958,
+ 0xc1a12d2fc3978937,0x52d6b1641c83ae,
+ 0xf209787bb47d6b84,0xc0678c5dbd23a49a,
+ 0x9745eb4d50ce6332,0xf840b7ba963646e0,
+ 0xbd176620a501fbff,0xb650e5a93bc3d898,
+ 0xec5d3fa8ce427aff,0xa3e51f138ab4cebe,
+ 0x93ba47c980e98cdf,0xc66f336c36b10137,
+ 0xb8a8d9bbe123f017,0xb80b0047445d4184,
+ 0xe6d3102ad96cec1d,0xa60dc059157491e5,
+ 0x9043ea1ac7e41392,0x87c89837ad68db2f,
+ 0xb454e4a179dd1877,0x29babe4598c311fb,
+ 0xe16a1dc9d8545e94,0xf4296dd6fef3d67a,
+ 0x8ce2529e2734bb1d,0x1899e4a65f58660c,
+ 0xb01ae745b101e9e4,0x5ec05dcff72e7f8f,
+ 0xdc21a1171d42645d,0x76707543f4fa1f73,
+ 0x899504ae72497eba,0x6a06494a791c53a8,
+ 0xabfa45da0edbde69,0x487db9d17636892,
+ 0xd6f8d7509292d603,0x45a9d2845d3c42b6,
+ 0x865b86925b9bc5c2,0xb8a2392ba45a9b2,
+ 0xa7f26836f282b732,0x8e6cac7768d7141e,
+ 0xd1ef0244af2364ff,0x3207d795430cd926,
+ 0x8335616aed761f1f,0x7f44e6bd49e807b8,
+ 0xa402b9c5a8d3a6e7,0x5f16206c9c6209a6,
+ 0xcd036837130890a1,0x36dba887c37a8c0f,
+ 0x802221226be55a64,0xc2494954da2c9789,
+ 0xa02aa96b06deb0fd,0xf2db9baa10b7bd6c,
+ 0xc83553c5c8965d3d,0x6f92829494e5acc7,
+ 0xfa42a8b73abbf48c,0xcb772339ba1f17f9,
+ 0x9c69a97284b578d7,0xff2a760414536efb,
+ 0xc38413cf25e2d70d,0xfef5138519684aba,
+ 0xf46518c2ef5b8cd1,0x7eb258665fc25d69,
+ 0x98bf2f79d5993802,0xef2f773ffbd97a61,
+ 0xbeeefb584aff8603,0xaafb550ffacfd8fa,
+ 0xeeaaba2e5dbf6784,0x95ba2a53f983cf38,
+ 0x952ab45cfa97a0b2,0xdd945a747bf26183,
+ 0xba756174393d88df,0x94f971119aeef9e4,
+ 0xe912b9d1478ceb17,0x7a37cd5601aab85d,
+ 0x91abb422ccb812ee,0xac62e055c10ab33a,
+ 0xb616a12b7fe617aa,0x577b986b314d6009,
+ 0xe39c49765fdf9d94,0xed5a7e85fda0b80b,
+ 0x8e41ade9fbebc27d,0x14588f13be847307,
+ 0xb1d219647ae6b31c,0x596eb2d8ae258fc8,
+ 0xde469fbd99a05fe3,0x6fca5f8ed9aef3bb,
+ 0x8aec23d680043bee,0x25de7bb9480d5854,
+ 0xada72ccc20054ae9,0xaf561aa79a10ae6a,
+ 0xd910f7ff28069da4,0x1b2ba1518094da04,
+ 0x87aa9aff79042286,0x90fb44d2f05d0842,
+ 0xa99541bf57452b28,0x353a1607ac744a53,
+ 0xd3fa922f2d1675f2,0x42889b8997915ce8,
+ 0x847c9b5d7c2e09b7,0x69956135febada11,
+ 0xa59bc234db398c25,0x43fab9837e699095,
+ 0xcf02b2c21207ef2e,0x94f967e45e03f4bb,
+ 0x8161afb94b44f57d,0x1d1be0eebac278f5,
+ 0xa1ba1ba79e1632dc,0x6462d92a69731732,
+ 0xca28a291859bbf93,0x7d7b8f7503cfdcfe,
+ 0xfcb2cb35e702af78,0x5cda735244c3d43e,
+ 0x9defbf01b061adab,0x3a0888136afa64a7,
+ 0xc56baec21c7a1916,0x88aaa1845b8fdd0,
+ 0xf6c69a72a3989f5b,0x8aad549e57273d45,
+ 0x9a3c2087a63f6399,0x36ac54e2f678864b,
+ 0xc0cb28a98fcf3c7f,0x84576a1bb416a7dd,
+ 0xf0fdf2d3f3c30b9f,0x656d44a2a11c51d5,
+ 0x969eb7c47859e743,0x9f644ae5a4b1b325,
+ 0xbc4665b596706114,0x873d5d9f0dde1fee,
+ 0xeb57ff22fc0c7959,0xa90cb506d155a7ea,
+ 0x9316ff75dd87cbd8,0x9a7f12442d588f2,
+ 0xb7dcbf5354e9bece,0xc11ed6d538aeb2f,
+ 0xe5d3ef282a242e81,0x8f1668c8a86da5fa,
+ 0x8fa475791a569d10,0xf96e017d694487bc,
+ 0xb38d92d760ec4455,0x37c981dcc395a9ac,
+ 0xe070f78d3927556a,0x85bbe253f47b1417,
+ 0x8c469ab843b89562,0x93956d7478ccec8e,
+ 0xaf58416654a6babb,0x387ac8d1970027b2,
+ 0xdb2e51bfe9d0696a,0x6997b05fcc0319e,
+ 0x88fcf317f22241e2,0x441fece3bdf81f03,
+ 0xab3c2fddeeaad25a,0xd527e81cad7626c3,
+ 0xd60b3bd56a5586f1,0x8a71e223d8d3b074,
+ 0x85c7056562757456,0xf6872d5667844e49,
+ 0xa738c6bebb12d16c,0xb428f8ac016561db,
+ 0xd106f86e69d785c7,0xe13336d701beba52,
+ 0x82a45b450226b39c,0xecc0024661173473,
+ 0xa34d721642b06084,0x27f002d7f95d0190,
+ 0xcc20ce9bd35c78a5,0x31ec038df7b441f4,
+ 0xff290242c83396ce,0x7e67047175a15271,
+ 0x9f79a169bd203e41,0xf0062c6e984d386,
+ 0xc75809c42c684dd1,0x52c07b78a3e60868,
+ 0xf92e0c3537826145,0xa7709a56ccdf8a82,
+ 0x9bbcc7a142b17ccb,0x88a66076400bb691,
+ 0xc2abf989935ddbfe,0x6acff893d00ea435,
+ 0xf356f7ebf83552fe,0x583f6b8c4124d43,
+ 0x98165af37b2153de,0xc3727a337a8b704a,
+ 0xbe1bf1b059e9a8d6,0x744f18c0592e4c5c,
+ 0xeda2ee1c7064130c,0x1162def06f79df73,
+ 0x9485d4d1c63e8be7,0x8addcb5645ac2ba8,
+ 0xb9a74a0637ce2ee1,0x6d953e2bd7173692,
+ 0xe8111c87c5c1ba99,0xc8fa8db6ccdd0437,
+ 0x910ab1d4db9914a0,0x1d9c9892400a22a2,
+ 0xb54d5e4a127f59c8,0x2503beb6d00cab4b,
+ 0xe2a0b5dc971f303a,0x2e44ae64840fd61d,
+ 0x8da471a9de737e24,0x5ceaecfed289e5d2,
+ 0xb10d8e1456105dad,0x7425a83e872c5f47,
+ 0xdd50f1996b947518,0xd12f124e28f77719,
+ 0x8a5296ffe33cc92f,0x82bd6b70d99aaa6f,
+ 0xace73cbfdc0bfb7b,0x636cc64d1001550b,
+ 0xd8210befd30efa5a,0x3c47f7e05401aa4e,
+ 0x8714a775e3e95c78,0x65acfaec34810a71,
+ 0xa8d9d1535ce3b396,0x7f1839a741a14d0d,
+ 0xd31045a8341ca07c,0x1ede48111209a050,
+ 0x83ea2b892091e44d,0x934aed0aab460432,
+ 0xa4e4b66b68b65d60,0xf81da84d5617853f,
+ 0xce1de40642e3f4b9,0x36251260ab9d668e,
+ 0x80d2ae83e9ce78f3,0xc1d72b7c6b426019,
+ 0xa1075a24e4421730,0xb24cf65b8612f81f,
+ 0xc94930ae1d529cfc,0xdee033f26797b627,
+ 0xfb9b7cd9a4a7443c,0x169840ef017da3b1,
+ 0x9d412e0806e88aa5,0x8e1f289560ee864e,
+ 0xc491798a08a2ad4e,0xf1a6f2bab92a27e2,
+ 0xf5b5d7ec8acb58a2,0xae10af696774b1db,
+ 0x9991a6f3d6bf1765,0xacca6da1e0a8ef29,
+ 0xbff610b0cc6edd3f,0x17fd090a58d32af3,
+ 0xeff394dcff8a948e,0xddfc4b4cef07f5b0,
+ 0x95f83d0a1fb69cd9,0x4abdaf101564f98e,
+ 0xbb764c4ca7a4440f,0x9d6d1ad41abe37f1,
+ 0xea53df5fd18d5513,0x84c86189216dc5ed,
+ 0x92746b9be2f8552c,0x32fd3cf5b4e49bb4,
+ 0xb7118682dbb66a77,0x3fbc8c33221dc2a1,
+ 0xe4d5e82392a40515,0xfabaf3feaa5334a,
+ 0x8f05b1163ba6832d,0x29cb4d87f2a7400e,
+ 0xb2c71d5bca9023f8,0x743e20e9ef511012,
+ 0xdf78e4b2bd342cf6,0x914da9246b255416,
+ 0x8bab8eefb6409c1a,0x1ad089b6c2f7548e,
+ 0xae9672aba3d0c320,0xa184ac2473b529b1,
+ 0xda3c0f568cc4f3e8,0xc9e5d72d90a2741e,
+ 0x8865899617fb1871,0x7e2fa67c7a658892,
+ 0xaa7eebfb9df9de8d,0xddbb901b98feeab7,
+ 0xd51ea6fa85785631,0x552a74227f3ea565,
+ 0x8533285c936b35de,0xd53a88958f87275f,
+ 0xa67ff273b8460356,0x8a892abaf368f137,
+ 0xd01fef10a657842c,0x2d2b7569b0432d85,
+ 0x8213f56a67f6b29b,0x9c3b29620e29fc73,
+ 0xa298f2c501f45f42,0x8349f3ba91b47b8f,
+ 0xcb3f2f7642717713,0x241c70a936219a73,
+ 0xfe0efb53d30dd4d7,0xed238cd383aa0110,
+ 0x9ec95d1463e8a506,0xf4363804324a40aa,
+ 0xc67bb4597ce2ce48,0xb143c6053edcd0d5,
+ 0xf81aa16fdc1b81da,0xdd94b7868e94050a,
+ 0x9b10a4e5e9913128,0xca7cf2b4191c8326,
+ 0xc1d4ce1f63f57d72,0xfd1c2f611f63a3f0,
+ 0xf24a01a73cf2dccf,0xbc633b39673c8cec,
+ 0x976e41088617ca01,0xd5be0503e085d813,
+ 0xbd49d14aa79dbc82,0x4b2d8644d8a74e18,
+ 0xec9c459d51852ba2,0xddf8e7d60ed1219e,
+ 0x93e1ab8252f33b45,0xcabb90e5c942b503,
+ 0xb8da1662e7b00a17,0x3d6a751f3b936243,
+ 0xe7109bfba19c0c9d,0xcc512670a783ad4,
+ 0x906a617d450187e2,0x27fb2b80668b24c5,
+ 0xb484f9dc9641e9da,0xb1f9f660802dedf6,
+ 0xe1a63853bbd26451,0x5e7873f8a0396973,
+ 0x8d07e33455637eb2,0xdb0b487b6423e1e8,
+ 0xb049dc016abc5e5f,0x91ce1a9a3d2cda62,
+ 0xdc5c5301c56b75f7,0x7641a140cc7810fb,
+ 0x89b9b3e11b6329ba,0xa9e904c87fcb0a9d,
+ 0xac2820d9623bf429,0x546345fa9fbdcd44,
+ 0xd732290fbacaf133,0xa97c177947ad4095,
+ 0x867f59a9d4bed6c0,0x49ed8eabcccc485d,
+ 0xa81f301449ee8c70,0x5c68f256bfff5a74,
+ 0xd226fc195c6a2f8c,0x73832eec6fff3111,
+ 0x83585d8fd9c25db7,0xc831fd53c5ff7eab,
+ 0xa42e74f3d032f525,0xba3e7ca8b77f5e55,
+ 0xcd3a1230c43fb26f,0x28ce1bd2e55f35eb,
+ 0x80444b5e7aa7cf85,0x7980d163cf5b81b3,
+ 0xa0555e361951c366,0xd7e105bcc332621f,
+ 0xc86ab5c39fa63440,0x8dd9472bf3fefaa7,
+ 0xfa856334878fc150,0xb14f98f6f0feb951,
+ 0x9c935e00d4b9d8d2,0x6ed1bf9a569f33d3,
+ 0xc3b8358109e84f07,0xa862f80ec4700c8,
+ 0xf4a642e14c6262c8,0xcd27bb612758c0fa,
+ 0x98e7e9cccfbd7dbd,0x8038d51cb897789c,
+ 0xbf21e44003acdd2c,0xe0470a63e6bd56c3,
+ 0xeeea5d5004981478,0x1858ccfce06cac74,
+ 0x95527a5202df0ccb,0xf37801e0c43ebc8,
+ 0xbaa718e68396cffd,0xd30560258f54e6ba,
+ 0xe950df20247c83fd,0x47c6b82ef32a2069,
+ 0x91d28b7416cdd27e,0x4cdc331d57fa5441,
+ 0xb6472e511c81471d,0xe0133fe4adf8e952,
+ 0xe3d8f9e563a198e5,0x58180fddd97723a6,
+ 0x8e679c2f5e44ff8f,0x570f09eaa7ea7648,};
+
+} // namespace internal
+} // namespace simdjson
+/* end file src/internal/numberparsing_tables.cpp */
+/* begin file src/internal/simdprune_tables.cpp */
+#if SIMDJSON_IMPLEMENTATION_ARM64 || SIMDJSON_IMPLEMENTATION_HASWELL || SIMDJSON_IMPLEMENTATION_WESTMERE || SIMDJSON_IMPLEMENTATION_PPC64
+
+#include
+
+namespace simdjson { // table modified and copied from
+namespace internal { // http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
+SIMDJSON_DLLIMPORTEXPORT const unsigned char BitsSetTable256mul2[256] = {
+ 0, 2, 2, 4, 2, 4, 4, 6, 2, 4, 4, 6, 4, 6, 6, 8, 2, 4, 4,
+ 6, 4, 6, 6, 8, 4, 6, 6, 8, 6, 8, 8, 10, 2, 4, 4, 6, 4, 6,
+ 6, 8, 4, 6, 6, 8, 6, 8, 8, 10, 4, 6, 6, 8, 6, 8, 8, 10, 6,
+ 8, 8, 10, 8, 10, 10, 12, 2, 4, 4, 6, 4, 6, 6, 8, 4, 6, 6, 8,
+ 6, 8, 8, 10, 4, 6, 6, 8, 6, 8, 8, 10, 6, 8, 8, 10, 8, 10, 10,
+ 12, 4, 6, 6, 8, 6, 8, 8, 10, 6, 8, 8, 10, 8, 10, 10, 12, 6, 8,
+ 8, 10, 8, 10, 10, 12, 8, 10, 10, 12, 10, 12, 12, 14, 2, 4, 4, 6, 4,
+ 6, 6, 8, 4, 6, 6, 8, 6, 8, 8, 10, 4, 6, 6, 8, 6, 8, 8, 10,
+ 6, 8, 8, 10, 8, 10, 10, 12, 4, 6, 6, 8, 6, 8, 8, 10, 6, 8, 8,
+ 10, 8, 10, 10, 12, 6, 8, 8, 10, 8, 10, 10, 12, 8, 10, 10, 12, 10, 12,
+ 12, 14, 4, 6, 6, 8, 6, 8, 8, 10, 6, 8, 8, 10, 8, 10, 10, 12, 6,
+ 8, 8, 10, 8, 10, 10, 12, 8, 10, 10, 12, 10, 12, 12, 14, 6, 8, 8, 10,
+ 8, 10, 10, 12, 8, 10, 10, 12, 10, 12, 12, 14, 8, 10, 10, 12, 10, 12, 12,
+ 14, 10, 12, 12, 14, 12, 14, 14, 16};
+
+SIMDJSON_DLLIMPORTEXPORT const uint8_t pshufb_combine_table[272] = {
+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b,
+ 0x0c, 0x0d, 0x0e, 0x0f, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x08,
+ 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0xff, 0x00, 0x01, 0x02, 0x03,
+ 0x04, 0x05, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0xff, 0xff,
+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e,
+ 0x0f, 0xff, 0xff, 0xff, 0x00, 0x01, 0x02, 0x03, 0x08, 0x09, 0x0a, 0x0b,
+ 0x0c, 0x0d, 0x0e, 0x0f, 0xff, 0xff, 0xff, 0xff, 0x00, 0x01, 0x02, 0x08,
+ 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x01, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e,
+ 0x0f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x08, 0x09, 0x0a, 0x0b,
+ 0x0c, 0x0d, 0x0e, 0x0f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+};
+
+// 256 * 8 bytes = 2kB, easily fits in cache.
+SIMDJSON_DLLIMPORTEXPORT const uint64_t thintable_epi8[256] = {
+ 0x0706050403020100, 0x0007060504030201, 0x0007060504030200,
+ 0x0000070605040302, 0x0007060504030100, 0x0000070605040301,
+ 0x0000070605040300, 0x0000000706050403, 0x0007060504020100,
+ 0x0000070605040201, 0x0000070605040200, 0x0000000706050402,
+ 0x0000070605040100, 0x0000000706050401, 0x0000000706050400,
+ 0x0000000007060504, 0x0007060503020100, 0x0000070605030201,
+ 0x0000070605030200, 0x0000000706050302, 0x0000070605030100,
+ 0x0000000706050301, 0x0000000706050300, 0x0000000007060503,
+ 0x0000070605020100, 0x0000000706050201, 0x0000000706050200,
+ 0x0000000007060502, 0x0000000706050100, 0x0000000007060501,
+ 0x0000000007060500, 0x0000000000070605, 0x0007060403020100,
+ 0x0000070604030201, 0x0000070604030200, 0x0000000706040302,
+ 0x0000070604030100, 0x0000000706040301, 0x0000000706040300,
+ 0x0000000007060403, 0x0000070604020100, 0x0000000706040201,
+ 0x0000000706040200, 0x0000000007060402, 0x0000000706040100,
+ 0x0000000007060401, 0x0000000007060400, 0x0000000000070604,
+ 0x0000070603020100, 0x0000000706030201, 0x0000000706030200,
+ 0x0000000007060302, 0x0000000706030100, 0x0000000007060301,
+ 0x0000000007060300, 0x0000000000070603, 0x0000000706020100,
+ 0x0000000007060201, 0x0000000007060200, 0x0000000000070602,
+ 0x0000000007060100, 0x0000000000070601, 0x0000000000070600,
+ 0x0000000000000706, 0x0007050403020100, 0x0000070504030201,
+ 0x0000070504030200, 0x0000000705040302, 0x0000070504030100,
+ 0x0000000705040301, 0x0000000705040300, 0x0000000007050403,
+ 0x0000070504020100, 0x0000000705040201, 0x0000000705040200,
+ 0x0000000007050402, 0x0000000705040100, 0x0000000007050401,
+ 0x0000000007050400, 0x0000000000070504, 0x0000070503020100,
+ 0x0000000705030201, 0x0000000705030200, 0x0000000007050302,
+ 0x0000000705030100, 0x0000000007050301, 0x0000000007050300,
+ 0x0000000000070503, 0x0000000705020100, 0x0000000007050201,
+ 0x0000000007050200, 0x0000000000070502, 0x0000000007050100,
+ 0x0000000000070501, 0x0000000000070500, 0x0000000000000705,
+ 0x0000070403020100, 0x0000000704030201, 0x0000000704030200,
+ 0x0000000007040302, 0x0000000704030100, 0x0000000007040301,
+ 0x0000000007040300, 0x0000000000070403, 0x0000000704020100,
+ 0x0000000007040201, 0x0000000007040200, 0x0000000000070402,
+ 0x0000000007040100, 0x0000000000070401, 0x0000000000070400,
+ 0x0000000000000704, 0x0000000703020100, 0x0000000007030201,
+ 0x0000000007030200, 0x0000000000070302, 0x0000000007030100,
+ 0x0000000000070301, 0x0000000000070300, 0x0000000000000703,
+ 0x0000000007020100, 0x0000000000070201, 0x0000000000070200,
+ 0x0000000000000702, 0x0000000000070100, 0x0000000000000701,
+ 0x0000000000000700, 0x0000000000000007, 0x0006050403020100,
+ 0x0000060504030201, 0x0000060504030200, 0x0000000605040302,
+ 0x0000060504030100, 0x0000000605040301, 0x0000000605040300,
+ 0x0000000006050403, 0x0000060504020100, 0x0000000605040201,
+ 0x0000000605040200, 0x0000000006050402, 0x0000000605040100,
+ 0x0000000006050401, 0x0000000006050400, 0x0000000000060504,
+ 0x0000060503020100, 0x0000000605030201, 0x0000000605030200,
+ 0x0000000006050302, 0x0000000605030100, 0x0000000006050301,
+ 0x0000000006050300, 0x0000000000060503, 0x0000000605020100,
+ 0x0000000006050201, 0x0000000006050200, 0x0000000000060502,
+ 0x0000000006050100, 0x0000000000060501, 0x0000000000060500,
+ 0x0000000000000605, 0x0000060403020100, 0x0000000604030201,
+ 0x0000000604030200, 0x0000000006040302, 0x0000000604030100,
+ 0x0000000006040301, 0x0000000006040300, 0x0000000000060403,
+ 0x0000000604020100, 0x0000000006040201, 0x0000000006040200,
+ 0x0000000000060402, 0x0000000006040100, 0x0000000000060401,
+ 0x0000000000060400, 0x0000000000000604, 0x0000000603020100,
+ 0x0000000006030201, 0x0000000006030200, 0x0000000000060302,
+ 0x0000000006030100, 0x0000000000060301, 0x0000000000060300,
+ 0x0000000000000603, 0x0000000006020100, 0x0000000000060201,
+ 0x0000000000060200, 0x0000000000000602, 0x0000000000060100,
+ 0x0000000000000601, 0x0000000000000600, 0x0000000000000006,
+ 0x0000050403020100, 0x0000000504030201, 0x0000000504030200,
+ 0x0000000005040302, 0x0000000504030100, 0x0000000005040301,
+ 0x0000000005040300, 0x0000000000050403, 0x0000000504020100,
+ 0x0000000005040201, 0x0000000005040200, 0x0000000000050402,
+ 0x0000000005040100, 0x0000000000050401, 0x0000000000050400,
+ 0x0000000000000504, 0x0000000503020100, 0x0000000005030201,
+ 0x0000000005030200, 0x0000000000050302, 0x0000000005030100,
+ 0x0000000000050301, 0x0000000000050300, 0x0000000000000503,
+ 0x0000000005020100, 0x0000000000050201, 0x0000000000050200,
+ 0x0000000000000502, 0x0000000000050100, 0x0000000000000501,
+ 0x0000000000000500, 0x0000000000000005, 0x0000000403020100,
+ 0x0000000004030201, 0x0000000004030200, 0x0000000000040302,
+ 0x0000000004030100, 0x0000000000040301, 0x0000000000040300,
+ 0x0000000000000403, 0x0000000004020100, 0x0000000000040201,
+ 0x0000000000040200, 0x0000000000000402, 0x0000000000040100,
+ 0x0000000000000401, 0x0000000000000400, 0x0000000000000004,
+ 0x0000000003020100, 0x0000000000030201, 0x0000000000030200,
+ 0x0000000000000302, 0x0000000000030100, 0x0000000000000301,
+ 0x0000000000000300, 0x0000000000000003, 0x0000000000020100,
+ 0x0000000000000201, 0x0000000000000200, 0x0000000000000002,
+ 0x0000000000000100, 0x0000000000000001, 0x0000000000000000,
+ 0x0000000000000000,
+}; //static uint64_t thintable_epi8[256]
+
+} // namespace internal
+} // namespace simdjson
+
+#endif // SIMDJSON_IMPLEMENTATION_ARM64 || SIMDJSON_IMPLEMENTATION_HASWELL || SIMDJSON_IMPLEMENTATION_WESTMERE || SIMDJSON_IMPLEMENTATION_PPC64
+/* end file src/internal/simdprune_tables.cpp */
+/* begin file src/implementation.cpp */
+#include
+
+namespace simdjson {
+
+bool implementation::supported_by_runtime_system() const {
+ uint32_t required_instruction_sets = this->required_instruction_sets();
+ uint32_t supported_instruction_sets = internal::detect_supported_architectures();
+ return ((supported_instruction_sets & required_instruction_sets) == required_instruction_sets);
+}
+
+namespace internal {
+
+// Static array of known implementations. We're hoping these get baked into the executable
+// without requiring a static initializer.
+
+#if SIMDJSON_IMPLEMENTATION_HASWELL
+static const haswell::implementation* get_haswell_singleton() {
+ static const haswell::implementation haswell_singleton{};
+ return &haswell_singleton;
+}
+#endif
+#if SIMDJSON_IMPLEMENTATION_WESTMERE
+static const westmere::implementation* get_westmere_singleton() {
+ static const westmere::implementation westmere_singleton{};
+ return &westmere_singleton;
+}
+#endif // SIMDJSON_IMPLEMENTATION_WESTMERE
+#if SIMDJSON_IMPLEMENTATION_ARM64
+static const arm64::implementation* get_arm64_singleton() {
+ static const arm64::implementation arm64_singleton{};
+ return &arm64_singleton;
+}
+#endif // SIMDJSON_IMPLEMENTATION_ARM64
+#if SIMDJSON_IMPLEMENTATION_PPC64
+static const ppc64::implementation* get_ppc64_singleton() {
+ static const ppc64::implementation ppc64_singleton{};
+ return &ppc64_singleton;
+}
+#endif // SIMDJSON_IMPLEMENTATION_PPC64
+#if SIMDJSON_IMPLEMENTATION_FALLBACK
+static const fallback::implementation* get_fallback_singleton() {
+ static const fallback::implementation fallback_singleton{};
+ return &fallback_singleton;
+}
+#endif // SIMDJSON_IMPLEMENTATION_FALLBACK
+
+/**
+ * @private Detects best supported implementation on first use, and sets it
+ */
+class detect_best_supported_implementation_on_first_use final : public implementation {
+public:
+ const std::string &name() const noexcept final { return set_best()->name(); }
+ const std::string &description() const noexcept final { return set_best()->description(); }
+ uint32_t required_instruction_sets() const noexcept final { return set_best()->required_instruction_sets(); }
+ simdjson_warn_unused error_code create_dom_parser_implementation(
+ size_t capacity,
+ size_t max_length,
+ std::unique_ptr& dst
+ ) const noexcept final {
+ return set_best()->create_dom_parser_implementation(capacity, max_length, dst);
+ }
+ simdjson_warn_unused error_code minify(const uint8_t *buf, size_t len, uint8_t *dst, size_t &dst_len) const noexcept final {
+ return set_best()->minify(buf, len, dst, dst_len);
+ }
+ simdjson_warn_unused bool validate_utf8(const char * buf, size_t len) const noexcept final override {
+ return set_best()->validate_utf8(buf, len);
+ }
+ simdjson_really_inline detect_best_supported_implementation_on_first_use() noexcept : implementation("best_supported_detector", "Detects the best supported implementation and sets it", 0) {}
+private:
+ const implementation *set_best() const noexcept;
+};
+
+static const std::initializer_list& get_available_implementation_pointers() {
+ static const std::initializer_list available_implementation_pointers {
+#if SIMDJSON_IMPLEMENTATION_HASWELL
+ get_haswell_singleton(),
+#endif
+#if SIMDJSON_IMPLEMENTATION_WESTMERE
+ get_westmere_singleton(),
+#endif
+#if SIMDJSON_IMPLEMENTATION_ARM64
+ get_arm64_singleton(),
+#endif
+#if SIMDJSON_IMPLEMENTATION_PPC64
+ get_ppc64_singleton(),
+#endif
+#if SIMDJSON_IMPLEMENTATION_FALLBACK
+ get_fallback_singleton(),
+#endif
+ }; // available_implementation_pointers
+ return available_implementation_pointers;
+}
+
+// So we can return UNSUPPORTED_ARCHITECTURE from the parser when there is no support
+class unsupported_implementation final : public implementation {
+public:
+ simdjson_warn_unused error_code create_dom_parser_implementation(
+ size_t,
+ size_t,
+ std::unique_ptr&
+ ) const noexcept final {
+ return UNSUPPORTED_ARCHITECTURE;
+ }
+ simdjson_warn_unused error_code minify(const uint8_t *, size_t, uint8_t *, size_t &) const noexcept final override {
+ return UNSUPPORTED_ARCHITECTURE;
+ }
+ simdjson_warn_unused bool validate_utf8(const char *, size_t) const noexcept final override {
+ return false; // Just refuse to validate. Given that we have a fallback implementation
+ // it seems unlikely that unsupported_implementation will ever be used. If it is used,
+ // then it will flag all strings as invalid. The alternative is to return an error_code
+ // from which the user has to figure out whether the string is valid UTF-8... which seems
+ // like a lot of work just to handle the very unlikely case that we have an unsupported
+ // implementation. And, when it does happen (that we have an unsupported implementation),
+ // what are the chances that the programmer has a fallback? Given that *we* provide the
+ // fallback, it implies that the programmer would need a fallback for our fallback.
+ }
+ unsupported_implementation() : implementation("unsupported", "Unsupported CPU (no detected SIMD instructions)", 0) {}
+};
+
+const unsupported_implementation* get_unsupported_singleton() {
+ static const unsupported_implementation unsupported_singleton{};
+ return &unsupported_singleton;
+}
+
+size_t available_implementation_list::size() const noexcept {
+ return internal::get_available_implementation_pointers().size();
+}
+const implementation * const *available_implementation_list::begin() const noexcept {
+ return internal::get_available_implementation_pointers().begin();
+}
+const implementation * const *available_implementation_list::end() const noexcept {
+ return internal::get_available_implementation_pointers().end();
+}
+const implementation *available_implementation_list::detect_best_supported() const noexcept {
+ // They are prelisted in priority order, so we just go down the list
+ uint32_t supported_instruction_sets = internal::detect_supported_architectures();
+ for (const implementation *impl : internal::get_available_implementation_pointers()) {
+ uint32_t required_instruction_sets = impl->required_instruction_sets();
+ if ((supported_instruction_sets & required_instruction_sets) == required_instruction_sets) { return impl; }
+ }
+ return get_unsupported_singleton(); // this should never happen?
+}
+
+const implementation *detect_best_supported_implementation_on_first_use::set_best() const noexcept {
+ SIMDJSON_PUSH_DISABLE_WARNINGS
+ SIMDJSON_DISABLE_DEPRECATED_WARNING // Disable CRT_SECURE warning on MSVC: manually verified this is safe
+ char *force_implementation_name = getenv("SIMDJSON_FORCE_IMPLEMENTATION");
+ SIMDJSON_POP_DISABLE_WARNINGS
+
+ if (force_implementation_name) {
+ auto force_implementation = get_available_implementations()[force_implementation_name];
+ if (force_implementation) {
+ return get_active_implementation() = force_implementation;
+ } else {
+ // Note: abort() and stderr usage within the library is forbidden.
+ return get_active_implementation() = get_unsupported_singleton();
+ }
+ }
+ return get_active_implementation() = get_available_implementations().detect_best_supported();
+}
+
+} // namespace internal
+
+SIMDJSON_DLLIMPORTEXPORT const internal::available_implementation_list& get_available_implementations() {
+ static const internal::available_implementation_list available_implementations{};
+ return available_implementations;
+}
+
+SIMDJSON_DLLIMPORTEXPORT internal::atomic_ptr& get_active_implementation() {
+ static const internal::detect_best_supported_implementation_on_first_use detect_best_supported_implementation_on_first_use_singleton;
+ static internal::atomic_ptr active_implementation{&detect_best_supported_implementation_on_first_use_singleton};
+ return active_implementation;
+}
+
+simdjson_warn_unused error_code minify(const char *buf, size_t len, char *dst, size_t &dst_len) noexcept {
+ return get_active_implementation()->minify(reinterpret_cast(buf), len, reinterpret_cast(dst), dst_len);
+}
+simdjson_warn_unused bool validate_utf8(const char *buf, size_t len) noexcept {
+ return get_active_implementation()->validate_utf8(buf, len);
+}
+
+const implementation * builtin_implementation() {
+ static const implementation * builtin_impl = get_available_implementations()[SIMDJSON_STRINGIFY(SIMDJSON_BUILTIN_IMPLEMENTATION)];
+ assert(builtin_impl);
+ return builtin_impl;
+}
+
+
+} // namespace simdjson
+/* end file src/implementation.cpp */
+
+#if SIMDJSON_IMPLEMENTATION_ARM64
+/* begin file src/arm64/implementation.cpp */
+/* begin file include/simdjson/arm64/begin.h */
+// redefining SIMDJSON_IMPLEMENTATION to "arm64"
+// #define SIMDJSON_IMPLEMENTATION arm64
+/* end file include/simdjson/arm64/begin.h */
+
+namespace simdjson {
+namespace arm64 {
+
+simdjson_warn_unused error_code implementation::create_dom_parser_implementation(
+ size_t capacity,
+ size_t max_depth,
+ std::unique_ptr& dst
+) const noexcept {
+ dst.reset( new (std::nothrow) dom_parser_implementation() );
+ if (!dst) { return MEMALLOC; }
+ if (auto err = dst->set_capacity(capacity))
+ return err;
+ if (auto err = dst->set_max_depth(max_depth))
+ return err;
+ return SUCCESS;
+}
+
+} // namespace arm64
+} // namespace simdjson
+
+/* begin file include/simdjson/arm64/end.h */
+/* end file include/simdjson/arm64/end.h */
+/* end file src/arm64/implementation.cpp */
+/* begin file src/arm64/dom_parser_implementation.cpp */
+/* begin file include/simdjson/arm64/begin.h */
+// redefining SIMDJSON_IMPLEMENTATION to "arm64"
+// #define SIMDJSON_IMPLEMENTATION arm64
+/* end file include/simdjson/arm64/begin.h */
+
+//
+// Stage 1
+//
+namespace simdjson {
+namespace arm64 {
+namespace {
+
+using namespace simd;
+
+struct json_character_block {
+ static simdjson_really_inline json_character_block classify(const simd::simd8x64& in);
+
+ simdjson_really_inline uint64_t whitespace() const noexcept { return _whitespace; }
+ simdjson_really_inline uint64_t op() const noexcept { return _op; }
+ simdjson_really_inline uint64_t scalar() const noexcept { return ~(op() | whitespace()); }
+
+ uint64_t _whitespace;
+ uint64_t _op;
+};
+
+simdjson_really_inline json_character_block json_character_block::classify(const simd::simd8x64& in) {
+ // Functional programming causes trouble with Visual Studio.
+ // Keeping this version in comments since it is much nicer:
+ // auto v = in.map([&](simd8 chunk) {
+ // auto nib_lo = chunk & 0xf;
+ // auto nib_hi = chunk.shr<4>();
+ // auto shuf_lo = nib_lo.lookup_16(16, 0, 0, 0, 0, 0, 0, 0, 0, 8, 12, 1, 2, 9, 0, 0);
+ // auto shuf_hi = nib_hi.lookup_16(8, 0, 18, 4, 0, 1, 0, 1, 0, 0, 0, 3, 2, 1, 0, 0);
+ // return shuf_lo & shuf_hi;
+ // });
+ const simd8 table1(16, 0, 0, 0, 0, 0, 0, 0, 0, 8, 12, 1, 2, 9, 0, 0);
+ const simd8 table2(8, 0, 18, 4, 0, 1, 0, 1, 0, 0, 0, 3, 2, 1, 0, 0);
+
+ simd8x64 v(
+ (in.chunks[0] & 0xf).lookup_16(table1) & (in.chunks[0].shr<4>()).lookup_16(table2),
+ (in.chunks[1] & 0xf).lookup_16(table1) & (in.chunks[1].shr<4>()).lookup_16(table2),
+ (in.chunks[2] & 0xf).lookup_16(table1) & (in.chunks[2].shr<4>()).lookup_16(table2),
+ (in.chunks[3] & 0xf).lookup_16(table1) & (in.chunks[3].shr<4>()).lookup_16(table2)
+ );
+
+
+ // We compute whitespace and op separately. If the code later only use one or the
+ // other, given the fact that all functions are aggressively inlined, we can
+ // hope that useless computations will be omitted. This is namely case when
+ // minifying (we only need whitespace). *However* if we only need spaces,
+ // it is likely that we will still compute 'v' above with two lookup_16: one
+ // could do it a bit cheaper. This is in contrast with the x64 implementations
+ // where we can, efficiently, do the white space and structural matching
+ // separately. One reason for this difference is that on ARM NEON, the table
+ // lookups either zero or leave unchanged the characters exceeding 0xF whereas
+ // on x64, the equivalent instruction (pshufb) automatically applies a mask,
+ // ignoring the 4 most significant bits. Thus the x64 implementation is
+ // optimized differently. This being said, if you use this code strictly
+ // just for minification (or just to identify the structural characters),
+ // there is a small untaken optimization opportunity here. We deliberately
+ // do not pick it up.
+
+ uint64_t op = simd8x64(
+ v.chunks[0].any_bits_set(0x7),
+ v.chunks[1].any_bits_set(0x7),
+ v.chunks[2].any_bits_set(0x7),
+ v.chunks[3].any_bits_set(0x7)
+ ).to_bitmask();
+
+ uint64_t whitespace = simd8x64(
+ v.chunks[0].any_bits_set(0x18),
+ v.chunks[1].any_bits_set(0x18),
+ v.chunks[2].any_bits_set(0x18),
+ v.chunks[3].any_bits_set(0x18)
+ ).to_bitmask();
+
+ return { whitespace, op };
+}
+
+simdjson_really_inline bool is_ascii(const simd8x64& input) {
+ simd8 bits = input.reduce_or();
+ return bits.max_val() < 0b10000000u;
+}
+
+simdjson_unused simdjson_really_inline simd8 must_be_continuation(const simd8 prev1, const simd8 prev2, const simd8 prev3) {
+ simd8 is_second_byte = prev1 >= uint8_t(0b11000000u);
+ simd8 is_third_byte = prev2 >= uint8_t(0b11100000u);
+ simd8 is_fourth_byte = prev3 >= uint8_t(0b11110000u);
+ // Use ^ instead of | for is_*_byte, because ^ is commutative, and the caller is using ^ as well.
+ // This will work fine because we only have to report errors for cases with 0-1 lead bytes.
+ // Multiple lead bytes implies 2 overlapping multibyte characters, and if that happens, there is
+ // guaranteed to be at least *one* lead byte that is part of only 1 other multibyte character.
+ // The error will be detected there.
+ return is_second_byte ^ is_third_byte ^ is_fourth_byte;
+}
+
+simdjson_really_inline simd8 must_be_2_3_continuation(const simd8 prev2, const simd8 prev3) {
+ simd8 is_third_byte = prev2 >= uint8_t(0b11100000u);
+ simd8 is_fourth_byte = prev3 >= uint8_t(0b11110000u);
+ return is_third_byte ^ is_fourth_byte;
+}
+
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdjson
+
+/* begin file src/generic/stage1/utf8_lookup4_algorithm.h */
+namespace simdjson {
+namespace arm64 {
+namespace {
+namespace utf8_validation {
+
+using namespace simd;
+
+ simdjson_really_inline simd8 check_special_cases(const simd8 input, const simd8 prev1) {
+// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+// Bit 1 = Too Long (ASCII followed by continuation)
+// Bit 2 = Overlong 3-byte
+// Bit 4 = Surrogate
+// Bit 5 = Overlong 2-byte
+// Bit 7 = Two Continuations
+ constexpr const uint8_t TOO_SHORT = 1<<0; // 11______ 0_______
+ // 11______ 11______
+ constexpr const uint8_t TOO_LONG = 1<<1; // 0_______ 10______
+ constexpr const uint8_t OVERLONG_3 = 1<<2; // 11100000 100_____
+ constexpr const uint8_t SURROGATE = 1<<4; // 11101101 101_____
+ constexpr const uint8_t OVERLONG_2 = 1<<5; // 1100000_ 10______
+ constexpr const uint8_t TWO_CONTS = 1<<7; // 10______ 10______
+ constexpr const uint8_t TOO_LARGE = 1<<3; // 11110100 1001____
+ // 11110100 101_____
+ // 11110101 1001____
+ // 11110101 101_____
+ // 1111011_ 1001____
+ // 1111011_ 101_____
+ // 11111___ 1001____
+ // 11111___ 101_____
+ constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
+ // 11110101 1000____
+ // 1111011_ 1000____
+ // 11111___ 1000____
+ constexpr const uint8_t OVERLONG_4 = 1<<6; // 11110000 1000____
+
+ const simd8 byte_1_high = prev1.shr<4>().lookup_16(
+ // 0_______ ________
+ TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+ TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+ // 10______ ________
+ TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+ // 1100____ ________
+ TOO_SHORT | OVERLONG_2,
+ // 1101____ ________
+ TOO_SHORT,
+ // 1110____ ________
+ TOO_SHORT | OVERLONG_3 | SURROGATE,
+ // 1111____ ________
+ TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
+ );
+ constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+ const simd8 byte_1_low = (prev1 & 0x0F).lookup_16(
+ // ____0000 ________
+ CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+ // ____0001 ________
+ CARRY | OVERLONG_2,
+ // ____001_ ________
+ CARRY,
+ CARRY,
+
+ // ____0100 ________
+ CARRY | TOO_LARGE,
+ // ____0101 ________
+ CARRY | TOO_LARGE | TOO_LARGE_1000,
+ // ____011_ ________
+ CARRY | TOO_LARGE | TOO_LARGE_1000,
+ CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+ // ____1___ ________
+ CARRY | TOO_LARGE | TOO_LARGE_1000,
+ CARRY | TOO_LARGE | TOO_LARGE_1000,
+ CARRY | TOO_LARGE | TOO_LARGE_1000,
+ CARRY | TOO_LARGE | TOO_LARGE_1000,
+ CARRY | TOO_LARGE | TOO_LARGE_1000,
+ // ____1101 ________
+ CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+ CARRY | TOO_LARGE | TOO_LARGE_1000,
+ CARRY | TOO_LARGE | TOO_LARGE_1000
+ );
+ const simd8 byte_2_high = input.shr<4>().lookup_16(
+ // ________ 0_______
+ TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+ TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+
+ // ________ 1000____
+ TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+ // ________ 1001____
+ TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+ // ________ 101_____
+ TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+ TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+
+ // ________ 11______
+ TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
+ );
+ return (byte_1_high & byte_1_low & byte_2_high);
+ }
+ simdjson_really_inline simd8 check_multibyte_lengths(const simd8 input,
+ const simd8 prev_input, const simd8 sc) {
+ simd8 prev2 = input.prev<2>(prev_input);
+ simd8 prev3 = input.prev<3>(prev_input);
+ simd8 must23 = simd8(must_be_2_3_continuation(prev2, prev3));
+ simd8 must23_80 = must23 & uint8_t(0x80);
+ return must23_80 ^ sc;
+ }
+
+ //
+ // Return nonzero if there are incomplete multibyte characters at the end of the block:
+ // e.g. if there is a 4-byte character, but it's 3 bytes from the end.
+ //
+ simdjson_really_inline simd8 is_incomplete(const simd8 input) {
+ // If the previous input's last 3 bytes match this, they're too short (they ended at EOF):
+ // ... 1111____ 111_____ 11______
+ static const uint8_t max_array[32] = {
+ 255, 255, 255, 255, 255, 255, 255, 255,
+ 255, 255, 255, 255, 255, 255, 255, 255,
+ 255, 255, 255, 255, 255, 255, 255, 255,
+ 255, 255, 255, 255, 255, 0b11110000u-1, 0b11100000u-1, 0b11000000u-1
+ };
+ const simd8 max_value(&max_array[sizeof(max_array)-sizeof(simd8)]);
+ return input.gt_bits(max_value);
+ }
+
+ struct utf8_checker {
+ // If this is nonzero, there has been a UTF-8 error.
+ simd8 error;
+ // The last input we received
+ simd8 prev_input_block;
+ // Whether the last input we received was incomplete (used for ASCII fast path)
+ simd8 prev_incomplete;
+
+ //
+ // Check whether the current bytes are valid UTF-8.
+ //
+ simdjson_really_inline void check_utf8_bytes(const simd8 input, const simd8 prev_input) {
+ // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
+ // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
+ simd8 prev1 = input.prev<1>(prev_input);
+ simd8 sc = check_special_cases(input, prev1);
+ this->error |= check_multibyte_lengths(input, prev_input, sc);
+ }
+
+ // The only problem that can happen at EOF is that a multibyte character is too short
+ // or a byte value too large in the last bytes: check_special_cases only checks for bytes
+ // too large in the first of two bytes.
+ simdjson_really_inline void check_eof() {
+ // If the previous block had incomplete UTF-8 characters at the end, an ASCII block can't
+ // possibly finish them.
+ this->error |= this->prev_incomplete;
+ }
+
+ simdjson_really_inline void check_next_input(const simd8x64& input) {
+ if(simdjson_likely(is_ascii(input))) {
+ this->error |= this->prev_incomplete;
+ } else {
+ // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
+ static_assert((simd8x64::NUM_CHUNKS == 2) || (simd8x64::NUM_CHUNKS == 4),
+ "We support either two or four chunks per 64-byte block.");
+ if(simd8x64::NUM_CHUNKS == 2) {
+ this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+ this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+ } else if(simd8x64::NUM_CHUNKS == 4) {
+ this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+ this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+ this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+ this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+ }
+ this->prev_incomplete = is_incomplete(input.chunks[simd8x64::NUM_CHUNKS-1]);
+ this->prev_input_block = input.chunks[simd8x64::NUM_CHUNKS-1];
+ }
+ }
+ // do not forget to call check_eof!
+ simdjson_really_inline error_code errors() {
+ return this->error.any_bits_set_anywhere() ? error_code::UTF8_ERROR : error_code::SUCCESS;
+ }
+
+ }; // struct utf8_checker
+} // namespace utf8_validation
+
+using utf8_validation::utf8_checker;
+
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdjson
+/* end file src/generic/stage1/utf8_lookup4_algorithm.h */
+/* begin file src/generic/stage1/json_structural_indexer.h */
+// This file contains the common code every implementation uses in stage1
+// It is intended to be included multiple times and compiled multiple times
+// We assume the file in which it is included already includes
+// "simdjson/stage1.h" (this simplifies amalgation)
+
+/* begin file src/generic/stage1/buf_block_reader.h */
+namespace simdjson {
+namespace arm64 {
+namespace {
+
+// Walks through a buffer in block-sized increments, loading the last part with spaces
+template
+struct buf_block_reader {
+public:
+ simdjson_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
+ simdjson_really_inline size_t block_index();
+ simdjson_really_inline bool has_full_block() const;
+ simdjson_really_inline const uint8_t *full_block() const;
+ /**
+ * Get the last block, padded with spaces.
+ *
+ * There will always be a last block, with at least 1 byte, unless len == 0 (in which case this
+ * function fills the buffer with spaces and returns 0. In particular, if len == STEP_SIZE there
+ * will be 0 full_blocks and 1 remainder block with STEP_SIZE bytes and no spaces for padding.
+ *
+ * @return the number of effective characters in the last block.
+ */
+ simdjson_really_inline size_t get_remainder(uint8_t *dst) const;
+ simdjson_really_inline void advance();
+private:
+ const uint8_t *buf;
+ const size_t len;
+ const size_t lenminusstep;
+ size_t idx;
+};
+
+// Routines to print masks and text for debugging bitmask operations
+simdjson_unused static char * format_input_text_64(const uint8_t *text) {
+ static char buf[sizeof(simd8x64) + 1];
+ for (size_t i=0; i); i++) {
+ buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
+ }
+ buf[sizeof(simd8x64)] = '\0';
+ return buf;
+}
+
+// Routines to print masks and text for debugging bitmask operations
+simdjson_unused static char * format_input_text(const simd8x64& in) {
+ static char buf[sizeof(simd8x64) + 1];
+ in.store(reinterpret_cast(buf));
+ for (size_t i=0; i); i++) {
+ if (buf[i] < ' ') { buf[i] = '_'; }
+ }
+ buf[sizeof(simd8x64)] = '\0';
+ return buf;
+}
+
+simdjson_unused static char * format_mask(uint64_t mask) {
+ static char buf[sizeof(simd8x64) + 1];
+ for (size_t i=0; i<64; i++) {
+ buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
+ }
+ buf[64] = '\0';
+ return buf;
+}
+
+template
+simdjson_really_inline buf_block_reader::buf_block_reader(const uint8_t *_buf, size_t _len) : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE}, idx{0} {}
+
+template
+simdjson_really_inline size_t buf_block_reader::block_index() { return idx; }
+
+template
+simdjson_really_inline bool buf_block_reader::has_full_block() const {
+ return idx < lenminusstep;
+}
+
+template
+simdjson_really_inline const uint8_t *buf_block_reader::full_block() const {
+ return &buf[idx];
+}
+
+template
+simdjson_really_inline size_t buf_block_reader::get_remainder(uint8_t *dst) const {
+ if(len == idx) { return 0; } // memcpy(dst, null, 0) will trigger an error with some sanitizers
+ std::memset(dst, 0x20, STEP_SIZE); // std::memset STEP_SIZE because it's more efficient to write out 8 or 16 bytes at once.
+ std::memcpy(dst, buf + idx, len - idx);
+ return len - idx;
+}
+
+template
+simdjson_really_inline void buf_block_reader::advance() {
+ idx += STEP_SIZE;
+}
+
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdjson
+/* end file src/generic/stage1/buf_block_reader.h */
+/* begin file src/generic/stage1/json_string_scanner.h */
+namespace simdjson {
+namespace arm64 {
+namespace {
+namespace stage1 {
+
+struct json_string_block {
+ // We spell out the constructors in the hope of resolving inlining issues with Visual Studio 2017
+ simdjson_really_inline json_string_block(uint64_t backslash, uint64_t escaped, uint64_t quote, uint64_t in_string) :
+ _backslash(backslash), _escaped(escaped), _quote(quote), _in_string(in_string) {}
+
+ // Escaped characters (characters following an escape() character)
+ simdjson_really_inline uint64_t escaped() const { return _escaped; }
+ // Escape characters (backslashes that are not escaped--i.e. in \\, includes only the first \)
+ simdjson_really_inline uint64_t escape() const { return _backslash & ~_escaped; }
+ // Real (non-backslashed) quotes
+ simdjson_really_inline uint64_t quote() const { return _quote; }
+ // Start quotes of strings
+ simdjson_really_inline uint64_t string_start() const { return _quote & _in_string; }
+ // End quotes of strings
+ simdjson_really_inline uint64_t string_end() const { return _quote & ~_in_string; }
+ // Only characters inside the string (not including the quotes)
+ simdjson_really_inline uint64_t string_content() const { return _in_string & ~_quote; }
+ // Return a mask of whether the given characters are inside a string (only works on non-quotes)
+ simdjson_really_inline uint64_t non_quote_inside_string(uint64_t mask) const { return mask & _in_string; }
+ // Return a mask of whether the given characters are inside a string (only works on non-quotes)
+ simdjson_really_inline uint64_t non_quote_outside_string(uint64_t mask) const { return mask & ~_in_string; }
+ // Tail of string (everything except the start quote)
+ simdjson_really_inline uint64_t string_tail() const { return _in_string ^ _quote; }
+
+ // backslash characters
+ uint64_t _backslash;
+ // escaped characters (backslashed--does not include the hex characters after \u)
+ uint64_t _escaped;
+ // real quotes (non-backslashed ones)
+ uint64_t _quote;
+ // string characters (includes start quote but not end quote)
+ uint64_t _in_string;
+};
+
+// Scans blocks for string characters, storing the state necessary to do so
+class json_string_scanner {
+public:
+ simdjson_really_inline json_string_block next(const simd::simd8x64& in);
+ // Returns either UNCLOSED_STRING or SUCCESS
+ simdjson_really_inline error_code finish();
+
+private:
+ // Intended to be defined by the implementation
+ simdjson_really_inline uint64_t find_escaped(uint64_t escape);
+ simdjson_really_inline uint64_t find_escaped_branchless(uint64_t escape);
+
+ // Whether the last iteration was still inside a string (all 1's = true, all 0's = false).
+ uint64_t prev_in_string = 0ULL;
+ // Whether the first character of the next iteration is escaped.
+ uint64_t prev_escaped = 0ULL;
+};
+
+//
+// Finds escaped characters (characters following \).
+//
+// Handles runs of backslashes like \\\" and \\\\" correctly (yielding 0101 and 01010, respectively).
+//
+// Does this by:
+// - Shift the escape mask to get potentially escaped characters (characters after backslashes).
+// - Mask escaped sequences that start on *even* bits with 1010101010 (odd bits are escaped, even bits are not)
+// - Mask escaped sequences that start on *odd* bits with 0101010101 (even bits are escaped, odd bits are not)
+//
+// To distinguish between escaped sequences starting on even/odd bits, it finds the start of all
+// escape sequences, filters out the ones that start on even bits, and adds that to the mask of
+// escape sequences. This causes the addition to clear out the sequences starting on odd bits (since
+// the start bit causes a carry), and leaves even-bit sequences alone.
+//
+// Example:
+//
+// text | \\\ | \\\"\\\" \\\" \\"\\" |
+// escape | xxx | xx xxx xxx xx xx | Removed overflow backslash; will | it into follows_escape
+// odd_starts | x | x x x | escape & ~even_bits & ~follows_escape
+// even_seq | c| cxxx c xx c | c = carry bit -- will be masked out later
+// invert_mask | | cxxx c xx c| even_seq << 1
+// follows_escape | xx | x xx xxx xxx xx xx | Includes overflow bit
+// escaped | x | x x x x x x x x |
+// desired | x | x x x x x x x x |
+// text | \\\ | \\\"\\\" \\\" \\"\\" |
+//
+simdjson_really_inline uint64_t json_string_scanner::find_escaped_branchless(uint64_t backslash) {
+ // If there was overflow, pretend the first character isn't a backslash
+ backslash &= ~prev_escaped;
+ uint64_t follows_escape = backslash << 1 | prev_escaped;
+
+ // Get sequences starting on even bits by clearing out the odd series using +
+ const uint64_t even_bits = 0x5555555555555555ULL;
+ uint64_t odd_sequence_starts = backslash & ~even_bits & ~follows_escape;
+ uint64_t sequences_starting_on_even_bits;
+ prev_escaped = add_overflow(odd_sequence_starts, backslash, &sequences_starting_on_even_bits);
+ uint64_t invert_mask = sequences_starting_on_even_bits << 1; // The mask we want to return is the *escaped* bits, not escapes.
+
+ // Mask every other backslashed character as an escaped character
+ // Flip the mask for sequences that start on even bits, to correct them
+ return (even_bits ^ invert_mask) & follows_escape;
+}
+
+//
+// Return a mask of all string characters plus end quotes.
+//
+// prev_escaped is overflow saying whether the next character is escaped.
+// prev_in_string is overflow saying whether we're still in a string.
+//
+// Backslash sequences outside of quotes will be detected in stage 2.
+//
+simdjson_really_inline json_string_block json_string_scanner::next(const simd::simd8x64& in) {
+ const uint64_t backslash = in.eq('\\');
+ const uint64_t escaped = find_escaped(backslash);
+ const uint64_t quote = in.eq('"') & ~escaped;
+
+ //
+ // prefix_xor flips on bits inside the string (and flips off the end quote).
+ //
+ // Then we xor with prev_in_string: if we were in a string already, its effect is flipped
+ // (characters inside strings are outside, and characters outside strings are inside).
+ //
+ const uint64_t in_string = prefix_xor(quote) ^ prev_in_string;
+
+ //
+ // Check if we're still in a string at the end of the box so the next block will know
+ //
+ // right shift of a signed value expected to be well-defined and standard
+ // compliant as of C++20, John Regher from Utah U. says this is fine code
+ //
+ prev_in_string = uint64_t(static_cast(in_string) >> 63);
+
+ // Use ^ to turn the beginning quote off, and the end quote on.
+
+ // We are returning a function-local object so either we get a move constructor
+ // or we get copy elision.
+ return json_string_block(
+ backslash,
+ escaped,
+ quote,
+ in_string
+ );
+}
+
+simdjson_really_inline error_code json_string_scanner::finish() {
+ if (prev_in_string) {
+ return UNCLOSED_STRING;
+ }
+ return SUCCESS;
+}
+
+} // namespace stage1
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdjson
+/* end file src/generic/stage1/json_string_scanner.h */
+/* begin file src/generic/stage1/json_scanner.h */
+namespace simdjson {
+namespace arm64 {
+namespace {
+namespace stage1 {
+
+/**
+ * A block of scanned json, with information on operators and scalars.
+ *
+ * We seek to identify pseudo-structural characters. Anything that is inside
+ * a string must be omitted (hence & ~_string.string_tail()).
+ * Otherwise, pseudo-structural characters come in two forms.
+ * 1. We have the structural characters ([,],{,},:, comma). The
+ * term 'structural character' is from the JSON RFC.
+ * 2. We have the 'scalar pseudo-structural characters'.
+ * Scalars are quotes, and any character except structural characters and white space.
+ *
+ * To identify the scalar pseudo-structural characters, we must look at what comes
+ * before them: it must be a space, a quote or a structural characters.
+ * Starting with simdjson v0.3, we identify them by
+ * negation: we identify everything that is followed by a non-quote scalar,
+ * and we negate that. Whatever remains must be a 'scalar pseudo-structural character'.
+ */
+struct json_block {
+public:
+ // We spell out the constructors in the hope of resolving inlining issues with Visual Studio 2017
+ simdjson_really_inline json_block(json_string_block&& string, json_character_block characters, uint64_t follows_potential_nonquote_scalar) :
+ _string(std::move(string)), _characters(characters), _follows_potential_nonquote_scalar(follows_potential_nonquote_scalar) {}
+ simdjson_really_inline json_block(json_string_block string, json_character_block characters, uint64_t follows_potential_nonquote_scalar) :
+ _string(string), _characters(characters), _follows_potential_nonquote_scalar(follows_potential_nonquote_scalar) {}
+
+ /**
+ * The start of structurals.
+ * In simdjson prior to v0.3, these were called the pseudo-structural characters.
+ **/
+ simdjson_really_inline uint64_t structural_start() const noexcept { return potential_structural_start() & ~_string.string_tail(); }
+ /** All JSON whitespace (i.e. not in a string) */
+ simdjson_really_inline uint64_t whitespace() const noexcept { return non_quote_outside_string(_characters.whitespace()); }
+
+ // Helpers
+
+ /** Whether the given characters are inside a string (only works on non-quotes) */
+ simdjson_really_inline uint64_t non_quote_inside_string(uint64_t mask) const noexcept { return _string.non_quote_inside_string(mask); }
+ /** Whether the given characters are outside a string (only works on non-quotes) */
+ simdjson_really_inline uint64_t non_quote_outside_string(uint64_t mask) const noexcept { return _string.non_quote_outside_string(mask); }
+
+ // string and escape characters
+ json_string_block _string;
+ // whitespace, structural characters ('operators'), scalars
+ json_character_block _characters;
+ // whether the previous character was a scalar
+ uint64_t _follows_potential_nonquote_scalar;
+private:
+ // Potential structurals (i.e. disregarding strings)
+
+ /**
+ * structural elements ([,],{,},:, comma) plus scalar starts like 123, true and "abc".
+ * They may reside inside a string.
+ **/
+ simdjson_really_inline uint64_t potential_structural_start() const noexcept { return _characters.op() | potential_scalar_start(); }
+ /**
+ * The start of non-operator runs, like 123, true and "abc".
+ * It main reside inside a string.
+ **/
+ simdjson_really_inline uint64_t potential_scalar_start() const noexcept {
+ // The term "scalar" refers to anything except structural characters and white space
+ // (so letters, numbers, quotes).
+ // Whenever it is preceded by something that is not a structural element ({,},[,],:, ") nor a white-space
+ // then we know that it is irrelevant structurally.
+ return _characters.scalar() & ~follows_potential_scalar();
+ }
+ /**
+ * Whether the given character is immediately after a non-operator like 123, true.
+ * The characters following a quote are not included.
+ */
+ simdjson_really_inline uint64_t follows_potential_scalar() const noexcept {
+ // _follows_potential_nonquote_scalar: is defined as marking any character that follows a character
+ // that is not a structural element ({,},[,],:, comma) nor a quote (") and that is not a
+ // white space.
+ // It is understood that within quoted region, anything at all could be marked (irrelevant).
+ return _follows_potential_nonquote_scalar;
+ }
+};
+
+/**
+ * Scans JSON for important bits: structural characters or 'operators', strings, and scalars.
+ *
+ * The scanner starts by calculating two distinct things:
+ * - string characters (taking \" into account)
+ * - structural characters or 'operators' ([]{},:, comma)
+ * and scalars (runs of non-operators like 123, true and "abc")
+ *
+ * To minimize data dependency (a key component of the scanner's speed), it finds these in parallel:
+ * in particular, the operator/scalar bit will find plenty of things that are actually part of
+ * strings. When we're done, json_block will fuse the two together by masking out tokens that are
+ * part of a string.
+ */
+class json_scanner {
+public:
+ json_scanner() {}
+ simdjson_really_inline json_block next(const simd::simd8x64& in);
+ // Returns either UNCLOSED_STRING or SUCCESS
+ simdjson_really_inline error_code finish();
+
+private:
+ // Whether the last character of the previous iteration is part of a scalar token
+ // (anything except whitespace or a structural character/'operator').
+ uint64_t prev_scalar = 0ULL;
+ json_string_scanner string_scanner{};
+};
+
+
+//
+// Check if the current character immediately follows a matching character.
+//
+// For example, this checks for quotes with backslashes in front of them:
+//
+// const uint64_t backslashed_quote = in.eq('"') & immediately_follows(in.eq('\'), prev_backslash);
+//
+simdjson_really_inline uint64_t follows(const uint64_t match, uint64_t &overflow) {
+ const uint64_t result = match << 1 | overflow;
+ overflow = match >> 63;
+ return result;
+}
+
+simdjson_really_inline json_block json_scanner::next(const simd::simd8x64& in) {
+ json_string_block strings = string_scanner.next(in);
+ // identifies the white-space and the structural characters
+ json_character_block characters = json_character_block::classify(in);
+ // The term "scalar" refers to anything except structural characters and white space
+ // (so letters, numbers, quotes).
+ // We want follows_scalar to mark anything that follows a non-quote scalar (so letters and numbers).
+ //
+ // A terminal quote should either be followed by a structural character (comma, brace, bracket, colon)
+ // or nothing. However, we still want ' "a string"true ' to mark the 't' of 'true' as a potential
+ // pseudo-structural character just like we would if we had ' "a string" true '; otherwise we
+ // may need to add an extra check when parsing strings.
+ //
+ // Performance: there are many ways to skin this cat.
+ const uint64_t nonquote_scalar = characters.scalar() & ~strings.quote();
+ uint64_t follows_nonquote_scalar = follows(nonquote_scalar, prev_scalar);
+ // We are returning a function-local object so either we get a move constructor
+ // or we get copy elision.
+ return json_block(
+ strings,// strings is a function-local object so either it moves or the copy is elided.
+ characters,
+ follows_nonquote_scalar
+ );
+}
+
+simdjson_really_inline error_code json_scanner::finish() {
+ return string_scanner.finish();
+}
+
+} // namespace stage1
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdjson
+/* end file src/generic/stage1/json_scanner.h */
+/* begin file src/generic/stage1/json_minifier.h */
+// This file contains the common code every implementation uses in stage1
+// It is intended to be included multiple times and compiled multiple times
+// We assume the file in which it is included already includes
+// "simdjson/stage1.h" (this simplifies amalgation)
+
+namespace simdjson {
+namespace arm64 {
+namespace {
+namespace stage1 {
+
+class json_minifier {
+public:
+ template
+ static error_code minify(const uint8_t *buf, size_t len, uint8_t *dst, size_t &dst_len) noexcept;
+
+private:
+ simdjson_really_inline json_minifier(uint8_t *_dst)
+ : dst{_dst}
+ {}
+ template
+ simdjson_really_inline void step(const uint8_t *block_buf, buf_block_reader &reader) noexcept;
+ simdjson_really_inline void next(const simd::simd8x64& in, const json_block& block);
+ simdjson_really_inline error_code finish(uint8_t *dst_start, size_t &dst_len);
+ json_scanner scanner{};
+ uint8_t *dst;
+};
+
+simdjson_really_inline void json_minifier::next(const simd::simd8x64& in, const json_block& block) {
+ uint64_t mask = block.whitespace();
+ dst += in.compress(mask, dst);
+}
+
+simdjson_really_inline error_code json_minifier::finish(uint8_t *dst_start, size_t &dst_len) {
+ error_code error = scanner.finish();
+ if (error) { dst_len = 0; return error; }
+ dst_len = dst - dst_start;
+ return SUCCESS;
+}
+
+template<>
+simdjson_really_inline void json_minifier::step<128>(const uint8_t *block_buf, buf_block_reader<128> &reader) noexcept {
+ simd::simd8x64 in_1(block_buf);
+ simd::simd8x64 in_2(block_buf+64);
+ json_block block_1 = scanner.next(in_1);
+ json_block block_2 = scanner.next(in_2);
+ this->next(in_1, block_1);
+ this->next(in_2, block_2);
+ reader.advance();
+}
+
+template<>
+simdjson_really_inline void json_minifier::step<64>(const uint8_t *block_buf, buf_block_reader<64> &reader) noexcept {
+ simd::simd8x64 in_1(block_buf);
+ json_block block_1 = scanner.next(in_1);
+ this->next(block_buf, block_1);
+ reader.advance();
+}
+
+template
+error_code json_minifier::minify(const uint8_t *buf, size_t len, uint8_t *dst, size_t &dst_len) noexcept {
+ buf_block_reader reader(buf, len);
+ json_minifier minifier(dst);
+
+ // Index the first n-1 blocks
+ while (reader.has_full_block()) {
+ minifier.step(reader.full_block(), reader);
+ }
+
+ // Index the last (remainder) block, padded with spaces
+ uint8_t block[STEP_SIZE];
+ size_t remaining_bytes = reader.get_remainder(block);
+ if (remaining_bytes > 0) {
+ // We do not want to write directly to the output stream. Rather, we write
+ // to a local buffer (for safety).
+ uint8_t out_block[STEP_SIZE];
+ uint8_t * const guarded_dst{minifier.dst};
+ minifier.dst = out_block;
+ minifier.step(block, reader);
+ size_t to_write = minifier.dst - out_block;
+ // In some cases, we could be enticed to consider the padded spaces
+ // as part of the string. This is fine as long as we do not write more
+ // than we consumed.
+ if(to_write > remaining_bytes) { to_write = remaining_bytes; }
+ memcpy(guarded_dst, out_block, to_write);
+ minifier.dst = guarded_dst + to_write;
+ }
+ return minifier.finish(dst, dst_len);
+}
+
+} // namespace stage1
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdjson
+/* end file src/generic/stage1/json_minifier.h */
+/* begin file src/generic/stage1/find_next_document_index.h */
+namespace simdjson {
+namespace arm64 {
+namespace {
+
+/**
+ * This algorithm is used to quickly identify the last structural position that
+ * makes up a complete document.
+ *
+ * It does this by going backwards and finding the last *document boundary* (a
+ * place where one value follows another without a comma between them). If the
+ * last document (the characters after the boundary) has an equal number of
+ * start and end brackets, it is considered complete.
+ *
+ * Simply put, we iterate over the structural characters, starting from
+ * the end. We consider that we found the end of a JSON document when the
+ * first element of the pair is NOT one of these characters: '{' '[' ':' ','
+ * and when the second element is NOT one of these characters: '}' ']' ':' ','.
+ *
+ * This simple comparison works most of the time, but it does not cover cases
+ * where the batch's structural indexes contain a perfect amount of documents.
+ * In such a case, we do not have access to the structural index which follows
+ * the last document, therefore, we do not have access to the second element in
+ * the pair, and that means we cannot identify the last document. To fix this
+ * issue, we keep a count of the open and closed curly/square braces we found
+ * while searching for the pair. When we find a pair AND the count of open and
+ * closed curly/square braces is the same, we know that we just passed a
+ * complete document, therefore the last json buffer location is the end of the
+ * batch.
+ */
+simdjson_really_inline uint32_t find_next_document_index(dom_parser_implementation &parser) {
+ // Variant: do not count separately, just figure out depth
+ if(parser.n_structural_indexes == 0) { return 0; }
+ auto arr_cnt = 0;
+ auto obj_cnt = 0;
+ for (auto i = parser.n_structural_indexes - 1; i > 0; i--) {
+ auto idxb = parser.structural_indexes[i];
+ switch (parser.buf[idxb]) {
+ case ':':
+ case ',':
+ continue;
+ case '}':
+ obj_cnt--;
+ continue;
+ case ']':
+ arr_cnt--;
+ continue;
+ case '{':
+ obj_cnt++;
+ break;
+ case '[':
+ arr_cnt++;
+ break;
+ }
+ auto idxa = parser.structural_indexes[i - 1];
+ switch (parser.buf[idxa]) {
+ case '{':
+ case '[':
+ case ':':
+ case ',':
+ continue;
+ }
+ // Last document is complete, so the next document will appear after!
+ if (!arr_cnt && !obj_cnt) {
+ return parser.n_structural_indexes;
+ }
+ // Last document is incomplete; mark the document at i + 1 as the next one
+ return i;
+ }
+ // If we made it to the end, we want to finish counting to see if we have a full document.
+ switch (parser.buf[parser.structural_indexes[0]]) {
+ case '}':
+ obj_cnt--;
+ break;
+ case ']':
+ arr_cnt--;
+ break;
+ case '{':
+ obj_cnt++;
+ break;
+ case '[':
+ arr_cnt++;
+ break;
+ }
+ if (!arr_cnt && !obj_cnt) {
+ // We have a complete document.
+ return parser.n_structural_indexes;
+ }
+ return 0;
+}
+
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdjson
+/* end file src/generic/stage1/find_next_document_index.h */
+
+namespace simdjson {
+namespace arm64 {
+namespace {
+namespace stage1 {
+
+class bit_indexer {
+public:
+ uint32_t *tail;
+
+ simdjson_really_inline bit_indexer(uint32_t *index_buf) : tail(index_buf) {}
+
+ // flatten out values in 'bits' assuming that they are are to have values of idx
+ // plus their position in the bitvector, and store these indexes at
+ // base_ptr[base] incrementing base as we go
+ // will potentially store extra values beyond end of valid bits, so base_ptr
+ // needs to be large enough to handle this
+ simdjson_really_inline void write(uint32_t idx, uint64_t bits) {
+ // In some instances, the next branch is expensive because it is mispredicted.
+ // Unfortunately, in other cases,
+ // it helps tremendously.
+ if (bits == 0)
+ return;
+#if defined(SIMDJSON_PREFER_REVERSE_BITS)
+ /**
+ * ARM lacks a fast trailing zero instruction, but it has a fast
+ * bit reversal instruction and a fast leading zero instruction.
+ * Thus it may be profitable to reverse the bits (once) and then
+ * to rely on a sequence of instructions that call the leading
+ * zero instruction.
+ *
+ * Performance notes:
+ * The chosen routine is not optimal in terms of data dependency
+ * since zero_leading_bit might require two instructions. However,
+ * it tends to minimize the total number of instructions which is
+ * beneficial.
+ */
+
+ uint64_t rev_bits = reverse_bits(bits);
+ int cnt = static_cast(count_ones(bits));
+ int i = 0;
+ // Do the first 8 all together
+ for (; i<8; i++) {
+ int lz = leading_zeroes(rev_bits);
+ this->tail[i] = static_cast(idx) + lz;
+ rev_bits = zero_leading_bit(rev_bits, lz);
+ }
+ // Do the next 8 all together (we hope in most cases it won't happen at all
+ // and the branch is easily predicted).
+ if (simdjson_unlikely(cnt > 8)) {
+ i = 8;
+ for (; i<16; i++) {
+ int lz = leading_zeroes(rev_bits);
+ this->tail[i] = static_cast