Merge branch 'develop' into dac

pull/3973/head
Ryan 4 months ago committed by GitHub
commit b248cf9b69
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -33,7 +33,7 @@ If applicable, add screenshots to help explain your problem.
- Python Version [e.g. 3.7]
- PaddlePaddle Version [e.g. 2.0.0]
- Model Version [e.g. 2.0.0]
- GPU/DRIVER Informationo [e.g. Tesla V100-SXM2-32GB/440.64.00]
- GPU/DRIVER Information [e.g. Tesla V100-SXM2-32GB/440.64.00]
- CUDA/CUDNN Version [e.g. cuda-10.2]
- MKL Version
- TensorRT Version

@ -32,7 +32,7 @@ If applicable, add screenshots to help explain your problem.
- Python Version [e.g. 3.7]
- PaddlePaddle Version [e.g. 2.0.0]
- Model Version [e.g. 2.0.0]
- GPU/DRIVER Informationo [e.g. Tesla V100-SXM2-32GB/440.64.00]
- GPU/DRIVER Information [e.g. Tesla V100-SXM2-32GB/440.64.00]
- CUDA/CUDNN Version [e.g. cuda-10.2]
- MKL Version
- TensorRT Version

@ -46,14 +46,14 @@
<tbody>
<tr>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br>
</td>
<td >I knocked at the door on the ancient side of the building.</td>
</tr>
<tr>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
<td>我认为跑步最重要的就是给我带来了身体健康。</td>
@ -76,7 +76,7 @@
<tbody>
<tr>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br>
</td>
<td >我 在 这栋 建筑 的 古老 门上 敲门。</td>
@ -99,42 +99,42 @@
<tr>
<td>Life was like a box of chocolates, you never know what you're gonna get.</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/tacotron2_ljspeech_waveflow_samples_0.2/sentence_1.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/tacotron2_ljspeech_waveflow_samples_0.2/sentence_1.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
<tr>
<td>早上好今天是2020/10/29最低温度是-3°C。</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
<tr>
<td>季姬寂,集鸡,鸡即棘鸡。棘鸡饥叽,季姬及箕稷济鸡。鸡既济,跻姬笈,季姬忌,急咭鸡,鸡急,继圾几,季姬急,即籍箕击鸡,箕疾击几伎,伎即齑,鸡叽集几基,季姬急极屐击鸡,鸡既殛,季姬激,即记《季姬击鸡记》。</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/jijiji.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/jijiji.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
<tr>
<td>大家好,我是 parrot 虚拟老师我们来读一首诗我与春风皆过客I and the spring breeze are passing by你携秋水揽星河you take the autumn water to take the galaxy。</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/labixiaoxin.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/labixiaoxin.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
<tr>
<td>宜家唔系事必要你讲,但系你所讲嘅说话将会变成呈堂证供。</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/chengtangzhenggong.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/chengtangzhenggong.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
<tr>
<td>各个国家有各个国家嘅国歌</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/gegege.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/gegege.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
@ -173,7 +173,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🏆 **Streaming ASR and TTS System**: we provide production ready streaming asr and streaming tts system.
- 💯 **Rule-based Chinese frontend**: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
- 📦 **Varieties of Functions that Vitalize both Industrial and Academia**:
- 🛎️ *Implementation of critical audio tasks*: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verfication, KeyWord Spotting, Audio Classification, and Speech Translation, etc.
- 🛎️ *Implementation of critical audio tasks*: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verification, KeyWord Spotting, Audio Classification, and Speech Translation, etc.
- 🔬 *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
@ -228,12 +228,12 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
## Installation
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.8* and *paddlepaddle<=2.5.1*. Some new versions of Paddle do not have support for adaptation in PaddleSpeech, so currently only versions 2.5.1 and earlier can be supported.
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.8*.
### **Dependency Introduction**
+ gcc >= 4.8.5
+ paddlepaddle <= 2.5.1
+ paddlepaddle
+ python >= 3.8
+ OS support: Linux(recommend), Windows, Mac OSX
@ -265,6 +265,8 @@ git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech
pip install pytest-runner
pip install .
# If you need to install in editable mode, you need to use --use-pep517. The command is as follows:
# pip install -e . --use-pep517
```
For more installation problems, such as conda environment, librosa-dependent, gcc problems, kaldi installation, etc., you can refer to this [installation document](./docs/source/install.md). If you encounter problems during installation, you can leave a message on [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) and find related problems
@ -281,8 +283,8 @@ Developers can have a try of our models with [PaddleSpeech Command Line](./paddl
Test audio sample download
```shell
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
```
### Automatic Speech Recognition
@ -1023,7 +1025,7 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
- Many thanks to [vpegasus](https://github.com/vpegasus)/[xuesebot](https://github.com/vpegasus/xuesebot) for developing a rasa chatbot,which is able to speak and listen thanks to PaddleSpeech.
- Many thanks to [chenkui164](https://github.com/chenkui164)/[FastASR](https://github.com/chenkui164/FastASR) for the C++ inference implementation of PaddleSpeech ASR.
- Many thanks to [heyudage](https://github.com/heyudage)/[VoiceTyping](https://github.com/heyudage/VoiceTyping) for the real-time voice typing tool implementation of PaddleSpeech ASR streaming services.
- Many thanks to [EscaticZheng](https://github.com/EscaticZheng)/[ps3.9wheel-install](https://github.com/EscaticZheng/ps3.9wheel-install) for the python3.9 prebuilt wheel for PaddleSpeech installation in Windows without Viusal Studio.
- Many thanks to [EscaticZheng](https://github.com/EscaticZheng)/[ps3.9wheel-install](https://github.com/EscaticZheng/ps3.9wheel-install) for the python3.9 prebuilt wheel for PaddleSpeech installation in Windows without Visual Studio.
Besides, PaddleSpeech depends on a lot of open source repositories. See [references](./docs/source/reference.md) for more information.
- Many thanks to [chinobing](https://github.com/chinobing)/[FastAPI-PaddleSpeech-Audio-To-Text](https://github.com/chinobing/FastAPI-PaddleSpeech-Audio-To-Text) for converting audio to text based on FastAPI and PaddleSpeech.
- Many thanks to [MistEO](https://github.com/MistEO)/[Pallas-Bot](https://github.com/MistEO/Pallas-Bot) for QQ bot based on PaddleSpeech TTS.

@ -51,14 +51,14 @@
<tbody>
<tr>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br>
</td>
<td >I knocked at the door on the ancient side of the building.</td>
</tr>
<tr>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
<td>我认为跑步最重要的就是给我带来了身体健康。</td>
@ -81,7 +81,7 @@
<tbody>
<tr>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br>
</td>
<td >我 在 这栋 建筑 的 古老 门上 敲门。</td>
@ -104,42 +104,42 @@
<tr>
<td >Life was like a box of chocolates, you never know what you're gonna get.</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/tacotron2_ljspeech_waveflow_samples_0.2/sentence_1.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/tacotron2_ljspeech_waveflow_samples_0.2/sentence_1.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
<tr>
<td >早上好今天是2020/10/29最低温度是-3°C。</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
<tr>
<td >季姬寂,集鸡,鸡即棘鸡。棘鸡饥叽,季姬及箕稷济鸡。鸡既济,跻姬笈,季姬忌,急咭鸡,鸡急,继圾几,季姬急,即籍箕击鸡,箕疾击几伎,伎即齑,鸡叽集几基,季姬急极屐击鸡,鸡既殛,季姬激,即记《季姬击鸡记》。</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/jijiji.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/jijiji.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
<tr>
<td>大家好,我是 parrot 虚拟老师我们来读一首诗我与春风皆过客I and the spring breeze are passing by你携秋水揽星河you take the autumn water to take the galaxy。</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/labixiaoxin.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/labixiaoxin.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
<tr>
<td>宜家唔系事必要你讲,但系你所讲嘅说话将会变成呈堂证供。</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/chengtangzhenggong.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/chengtangzhenggong.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
<tr>
<td>各个国家有各个国家嘅国歌</td>
<td align = "center">
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/gegege.wav" rel="nofollow">
<a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/gegege.wav" rel="nofollow">
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
</td>
</tr>
@ -238,11 +238,11 @@
<a name="安装"></a>
## 安装
我们强烈建议用户在 **Linux** 环境下,*3.8* 以上版本的 *python* 上安装 PaddleSpeech。同时有一些Paddle新版本的内容没有在做适配的支持因此目前只能使用2.5.1及之前的版本。
我们强烈建议用户在 **Linux** 环境下,*3.8* 以上版本的 *python* 上安装 PaddleSpeech。
### 相关依赖
+ gcc >= 4.8.5
+ paddlepaddle <= 2.5.1
+ paddlepaddle
+ python >= 3.8
+ linux(推荐), mac, windows
@ -272,6 +272,8 @@ git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech
pip install pytest-runner
pip install .
# 如果需要在可编辑模式下安装,需要使用 --use-pep517命令如下
# pip install -e . --use-pep517
```
更多关于安装问题,如 conda 环境librosa 依赖的系统库gcc 环境问题kaldi 安装等,可以参考这篇[安装文档](docs/source/install_cn.md),如安装上遇到问题可以在 [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) 上留言以及查找相关问题
@ -284,8 +286,8 @@ pip install .
测试音频示例下载
```shell
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
```
### 语音识别

@ -1,4 +1,5 @@
# Token form https://github.com/pytorch/audio/blob/main/torchaudio/backend/common.py with modification.
# Token from https://github.com/pytorch/audio/blob/main/torchaudio/backend/common.py with modification.
class AudioInfo:
"""return of info function.
@ -30,13 +31,12 @@ class AudioInfo:
"""
def __init__(
self,
sample_rate: int,
num_frames: int,
num_channels: int,
bits_per_sample: int,
encoding: str,
):
self,
sample_rate: int,
num_frames: int,
num_channels: int,
bits_per_sample: int,
encoding: str, ):
self.sample_rate = sample_rate
self.num_frames = num_frames
self.num_channels = num_channels
@ -44,12 +44,10 @@ class AudioInfo:
self.encoding = encoding
def __str__(self):
return (
f"AudioMetaData("
f"sample_rate={self.sample_rate}, "
f"num_frames={self.num_frames}, "
f"num_channels={self.num_channels}, "
f"bits_per_sample={self.bits_per_sample}, "
f"encoding={self.encoding}"
f")"
)
return (f"AudioMetaData("
f"sample_rate={self.sample_rate}, "
f"num_frames={self.num_frames}, "
f"num_channels={self.num_channels}, "
f"bits_per_sample={self.bits_per_sample}, "
f"encoding={self.encoding}"
f")")

@ -61,7 +61,7 @@ def resample(y: np.ndarray,
if mode == 'kaiser_best':
warnings.warn(
f'Using resampy in kaiser_best to {src_sr}=>{target_sr}. This function is pretty slow, \
we recommend the mode kaiser_fast in large scale audio trainning')
we recommend the mode kaiser_fast in large scale audio training')
if not isinstance(y, np.ndarray):
raise ParameterError(
@ -183,7 +183,7 @@ def soundfile_save(y: np.ndarray, sr: int, file: os.PathLike) -> None:
Args:
y (np.ndarray): Input waveform array in 1D or 2D.
sr (int): Sample rate.
file (os.PathLike): Path of auido file to save.
file (os.PathLike): Path of audio file to save.
"""
if not file.endswith('.wav'):
raise ParameterError(
@ -216,10 +216,10 @@ def soundfile_load(
duration: Optional[int]=None,
dtype: str='float32',
resample_mode: str='kaiser_fast') -> Tuple[np.ndarray, int]:
"""Load audio file from disk. This function loads audio from disk using using audio beackend.
"""Load audio file from disk. This function loads audio from disk using using audio backend.
Args:
file (os.PathLike): Path of auido file to load.
file (os.PathLike): Path of audio file to load.
sr (Optional[int], optional): Sample rate of loaded waveform. Defaults to None.
mono (bool, optional): Return waveform with mono channel. Defaults to True.
merge_type (str, optional): Merge type of multi-channels waveform. Defaults to 'average'.
@ -250,14 +250,14 @@ def soundfile_load(
if normal:
y = normalize(y, norm_type, norm_mul_factor)
elif dtype in ['int8', 'int16']:
# still need to do normalization, before depth convertion
# still need to do normalization, before depth conversion
y = normalize(y, 'linear', 1.0)
y = depth_convert(y, dtype)
return y, r
#the code below token form: https://github.com/pytorch/audio/blob/main/torchaudio/backend/soundfile_backend.py with modificaion.
#The code below is taken from: https://github.com/pytorch/audio/blob/main/torchaudio/backend/soundfile_backend.py, with some modifications.
def _get_subtype_for_wav(dtype: paddle.dtype,
@ -382,7 +382,7 @@ def save(
channels_first (bool, optional): If ``True``, the given tensor is interpreted as `[channel, time]`,
otherwise `[time, channel]`.
compression (float of None, optional): Not used.
It is here only for interface compatibility reson with "sox_io" backend.
It is here only for interface compatibility reason with "sox_io" backend.
format (str or None, optional): Override the audio format.
When ``filepath`` argument is path-like object, audio format is
inferred from file extension. If the file extension is missing or
@ -394,8 +394,8 @@ def save(
Valid values are ``"wav"``, ``"ogg"``, ``"vorbis"``,
``"flac"`` and ``"sph"``.
encoding (str or None, optional): Changes the encoding for supported formats.
This argument is effective only for supported formats, sush as
``"wav"``, ``""flac"`` and ``"sph"``. Valid values are;
This argument is effective only for supported formats, such as
``"wav"``, ``""flac"`` and ``"sph"``. Valid values are:
- ``"PCM_S"`` (signed integer Linear PCM)
- ``"PCM_U"`` (unsigned integer Linear PCM)

@ -233,7 +233,7 @@ def spectrogram(waveform: Tensor,
round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input
to FFT. Defaults to True.
sr (int, optional): Sample rate of input waveform. Defaults to 16000.
snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it
snip_edges (bool, optional): Drop samples in the end of waveform that can't fit a signal frame when it
is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.
subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False.
window_type (str, optional): Choose type of window for FFT computation. Defaults to "povey".
@ -443,7 +443,7 @@ def fbank(waveform: Tensor,
round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input
to FFT. Defaults to True.
sr (int, optional): Sample rate of input waveform. Defaults to 16000.
snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it
snip_edges (bool, optional): Drop samples in the end of waveform that can't fit a signal frame when it
is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.
subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False.
use_energy (bool, optional): Add an dimension with energy of spectrogram to the output. Defaults to False.
@ -566,7 +566,7 @@ def mfcc(waveform: Tensor,
round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input
to FFT. Defaults to True.
sr (int, optional): Sample rate of input waveform. Defaults to 16000.
snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it
snip_edges (bool, optional): Drop samples in the end of waveform that can't fit a signal frame when it
is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.
subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False.
use_energy (bool, optional): Add an dimension with energy of spectrogram to the output. Defaults to False.

@ -527,7 +527,7 @@ def melspectrogram(x: np.ndarray,
if fmax is None:
fmax = sr // 2
if fmin < 0 or fmin >= fmax:
raise ParameterError('fmin and fmax must statisfy 0<fmin<fmax')
raise ParameterError('fmin and fmax must satisfy 0<fmin<fmax')
s = stft(
x,
@ -626,7 +626,7 @@ def mu_decode(y: np.ndarray, mu: int=255, quantized: bool=True) -> np.ndarray:
def _randint(high: int) -> int:
"""Generate one random integer in range [0 high)
This is a helper function for random data augmentaiton
This is a helper function for random data augmentation
"""
return int(np.random.randint(0, high=high))
@ -659,7 +659,7 @@ def depth_augment(y: np.ndarray,
def adaptive_spect_augment(spect: np.ndarray,
tempo_axis: int=0,
level: float=0.1) -> np.ndarray:
"""Do adpative spectrogram augmentation. The level of the augmentation is gowern by the paramter level, ranging from 0 to 1, with 0 represents no augmentation.
"""Do adaptive spectrogram augmentation. The level of the augmentation is govern by the parameter level, ranging from 0 to 1, with 0 represents no augmentation.
Args:
spect (np.ndarray): Input spectrogram.
@ -711,9 +711,9 @@ def spect_augment(spect: np.ndarray,
spect (np.ndarray): Input spectrogram.
tempo_axis (int, optional): Indicate the tempo axis. Defaults to 0.
max_time_mask (int, optional): Maximum number of time masking. Defaults to 3.
max_freq_mask (int, optional): Maximum number of frenquence masking. Defaults to 3.
max_freq_mask (int, optional): Maximum number of frequency masking. Defaults to 3.
max_time_mask_width (int, optional): Maximum width of time masking. Defaults to 30.
max_freq_mask_width (int, optional): Maximum width of frenquence masking. Defaults to 20.
max_freq_mask_width (int, optional): Maximum width of frequency masking. Defaults to 20.
Returns:
np.ndarray: The augmented spectrogram.

@ -43,11 +43,11 @@ class AudioClassificationDataset(paddle.io.Dataset):
sample_rate: int=None,
**kwargs):
"""
Ags:
Args:
files (:obj:`List[str]`): A list of absolute path of audio files.
labels (:obj:`List[int]`): Labels of audio files.
feat_type (:obj:`str`, `optional`, defaults to `raw`):
It identifies the feature type that user wants to extrace of an audio file.
It identifies the feature type that user wants to extract of an audio file.
"""
super(AudioClassificationDataset, self).__init__()

@ -35,7 +35,7 @@ class ESC50(AudioClassificationDataset):
http://dx.doi.org/10.1145/2733373.2806390
"""
archieves = [
archives = [
{
'url':
'https://paddleaudio.bj.bcebos.com/datasets/ESC-50-master.zip',
@ -111,13 +111,13 @@ class ESC50(AudioClassificationDataset):
feat_type: str='raw',
**kwargs):
"""
Ags:
Args:
mode (:obj:`str`, `optional`, defaults to `train`):
It identifies the dataset mode (train or dev).
split (:obj:`int`, `optional`, defaults to 1):
It specify the fold of dev dataset.
feat_type (:obj:`str`, `optional`, defaults to `raw`):
It identifies the feature type that user wants to extrace of an audio file.
It identifies the feature type that user wants to extract of an audio file.
"""
files, labels = self._get_data(mode, split)
super(ESC50, self).__init__(
@ -133,7 +133,7 @@ class ESC50(AudioClassificationDataset):
def _get_data(self, mode: str, split: int) -> Tuple[List[str], List[int]]:
if not os.path.isdir(os.path.join(DATA_HOME, self.audio_path)) or \
not os.path.isfile(os.path.join(DATA_HOME, self.meta)):
download_and_decompress(self.archieves, DATA_HOME)
download_and_decompress(self.archives, DATA_HOME)
meta_info = self._get_meta_info()

@ -35,7 +35,7 @@ class GTZAN(AudioClassificationDataset):
https://ieeexplore.ieee.org/document/1021072/
"""
archieves = [
archives = [
{
'url': 'http://opihi.cs.uvic.ca/sound/genres.tar.gz',
'md5': '5b3d6dddb579ab49814ab86dba69e7c7',
@ -57,7 +57,7 @@ class GTZAN(AudioClassificationDataset):
feat_type='raw',
**kwargs):
"""
Ags:
Args:
mode (:obj:`str`, `optional`, defaults to `train`):
It identifies the dataset mode (train or dev).
seed (:obj:`int`, `optional`, defaults to 0):
@ -67,7 +67,7 @@ class GTZAN(AudioClassificationDataset):
split (:obj:`int`, `optional`, defaults to 1):
It specify the fold of dev dataset.
feat_type (:obj:`str`, `optional`, defaults to `raw`):
It identifies the feature type that user wants to extrace of an audio file.
It identifies the feature type that user wants to extract of an audio file.
"""
assert split <= n_folds, f'The selected split should not be larger than n_fold, but got {split} > {n_folds}'
files, labels = self._get_data(mode, seed, n_folds, split)
@ -85,7 +85,7 @@ class GTZAN(AudioClassificationDataset):
split) -> Tuple[List[str], List[int]]:
if not os.path.isdir(os.path.join(DATA_HOME, self.audio_path)) or \
not os.path.isfile(os.path.join(DATA_HOME, self.meta)):
download_and_decompress(self.archieves, DATA_HOME)
download_and_decompress(self.archives, DATA_HOME)
meta_info = self._get_meta_info()
random.seed(seed) # shuffle samples to split data

@ -30,7 +30,7 @@ __all__ = ['OpenRIRNoise']
class OpenRIRNoise(Dataset):
archieves = [
archives = [
{
'url': 'http://www.openslr.org/resources/28/rirs_noises.zip',
'md5': 'e6f48e257286e05de56413b4779d8ffb',
@ -76,7 +76,7 @@ class OpenRIRNoise(Dataset):
print(f"rirs noises base path: {self.base_path}")
if not os.path.isdir(self.base_path):
download_and_decompress(
self.archieves, self.base_path, decompress=True)
self.archives, self.base_path, decompress=True)
else:
print(
f"{self.base_path} already exists, we will not download and decompress again"

@ -37,7 +37,7 @@ class TESS(AudioClassificationDataset):
https://doi.org/10.5683/SP2/E8H2MF
"""
archieves = [
archives = [
{
'url':
'https://bj.bcebos.com/paddleaudio/datasets/TESS_Toronto_emotional_speech_set.zip',
@ -66,7 +66,7 @@ class TESS(AudioClassificationDataset):
feat_type='raw',
**kwargs):
"""
Ags:
Args:
mode (:obj:`str`, `optional`, defaults to `train`):
It identifies the dataset mode (train or dev).
seed (:obj:`int`, `optional`, defaults to 0):
@ -76,7 +76,7 @@ class TESS(AudioClassificationDataset):
split (:obj:`int`, `optional`, defaults to 1):
It specify the fold of dev dataset.
feat_type (:obj:`str`, `optional`, defaults to `raw`):
It identifies the feature type that user wants to extrace of an audio file.
It identifies the feature type that user wants to extract of an audio file.
"""
assert split <= n_folds, f'The selected split should not be larger than n_fold, but got {split} > {n_folds}'
files, labels = self._get_data(mode, seed, n_folds, split)
@ -93,7 +93,7 @@ class TESS(AudioClassificationDataset):
def _get_data(self, mode, seed, n_folds,
split) -> Tuple[List[str], List[int]]:
if not os.path.isdir(os.path.join(DATA_HOME, self.audio_path)):
download_and_decompress(self.archieves, DATA_HOME)
download_and_decompress(self.archives, DATA_HOME)
wav_files = []
for root, _, files in os.walk(os.path.join(DATA_HOME, self.audio_path)):

@ -35,7 +35,7 @@ class UrbanSound8K(AudioClassificationDataset):
https://dl.acm.org/doi/10.1145/2647868.2655045
"""
archieves = [
archives = [
{
'url':
'https://zenodo.org/record/1203745/files/UrbanSound8K.tar.gz',
@ -62,13 +62,13 @@ class UrbanSound8K(AudioClassificationDataset):
super(UrbanSound8K, self).__init__(
files=files, labels=labels, feat_type=feat_type, **kwargs)
"""
Ags:
Args:
mode (:obj:`str`, `optional`, defaults to `train`):
It identifies the dataset mode (train or dev).
split (:obj:`int`, `optional`, defaults to 1):
It specify the fold of dev dataset.
feat_type (:obj:`str`, `optional`, defaults to `raw`):
It identifies the feature type that user wants to extrace of an audio file.
It identifies the feature type that user wants to extract of an audio file.
"""
def _get_meta_info(self):
@ -81,7 +81,7 @@ class UrbanSound8K(AudioClassificationDataset):
def _get_data(self, mode: str, split: int) -> Tuple[List[str], List[int]]:
if not os.path.isdir(os.path.join(DATA_HOME, self.audio_path)) or \
not os.path.isfile(os.path.join(DATA_HOME, self.meta)):
download_and_decompress(self.archieves, DATA_HOME)
download_and_decompress(self.archives, DATA_HOME)
meta_info = self._get_meta_info()

@ -34,7 +34,7 @@ __all__ = ['VoxCeleb']
class VoxCeleb(Dataset):
source_url = 'https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/'
archieves_audio_dev = [
archives_audio_dev = [
{
'url': source_url + 'vox1_dev_wav_partaa',
'md5': 'e395d020928bc15670b570a21695ed96',
@ -52,13 +52,13 @@ class VoxCeleb(Dataset):
'md5': '7bb1e9f70fddc7a678fa998ea8b3ba19',
},
]
archieves_audio_test = [
archives_audio_test = [
{
'url': source_url + 'vox1_test_wav.zip',
'md5': '185fdc63c3c739954633d50379a3d102',
},
]
archieves_meta = [
archives_meta = [
{
'url':
'https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt',
@ -135,11 +135,11 @@ class VoxCeleb(Dataset):
if not os.path.isdir(self.wav_path):
print("start to download the voxceleb1 dataset")
download_and_decompress( # multi-zip parts concatenate to vox1_dev_wav.zip
self.archieves_audio_dev,
self.archives_audio_dev,
self.base_path,
decompress=False)
download_and_decompress( # download the vox1_test_wav.zip and unzip
self.archieves_audio_test,
self.archives_audio_test,
self.base_path,
decompress=True)
@ -157,7 +157,7 @@ class VoxCeleb(Dataset):
if not os.path.isdir(self.meta_path):
print("prepare the meta data")
download_and_decompress(
self.archieves_meta, self.meta_path, decompress=False)
self.archives_meta, self.meta_path, decompress=False)
# Data preparation.
if not os.path.isdir(self.csv_path):
@ -262,8 +262,8 @@ class VoxCeleb(Dataset):
split_chunks: bool=True):
print(f'Generating csv: {output_file}')
header = ["id", "duration", "wav", "start", "stop", "spk_id"]
# Note: this may occurs c++ execption, but the program will execute fine
# so we can ignore the execption
# Note: this may occurs c++ exception, but the program will execute fine
# so we can ignore the exception
with Pool(cpu_count()) as p:
infos = list(
tqdm(

@ -34,7 +34,7 @@ __all__ = [
class Spectrogram(nn.Layer):
"""Compute spectrogram of given signals, typically audio waveforms.
The spectorgram is defined as the complex norm of the short-time Fourier transformation.
The spectrogram is defined as the complex norm of the short-time Fourier transformation.
Args:
n_fft (int, optional): The number of frequency components of the discrete Fourier transform. Defaults to 512.

@ -247,7 +247,7 @@ def create_dct(n_mfcc: int,
Args:
n_mfcc (int): Number of mel frequency cepstral coefficients.
n_mels (int): Number of mel filterbanks.
norm (Optional[str], optional): Normalizaiton type. Defaults to 'ortho'.
norm (Optional[str], optional): Normalization type. Defaults to 'ortho'.
dtype (str, optional): The data type of the return matrix. Defaults to 'float32'.
Returns:

@ -22,8 +22,8 @@ def compute_eer(labels: np.ndarray, scores: np.ndarray) -> List[float]:
"""Compute EER and return score threshold.
Args:
labels (np.ndarray): the trial label, shape: [N], one-dimention, N refer to the samples num
scores (np.ndarray): the trial scores, shape: [N], one-dimention, N refer to the samples num
labels (np.ndarray): the trial label, shape: [N], one-dimension, N refer to the samples num
scores (np.ndarray): the trial scores, shape: [N], one-dimension, N refer to the samples num
Returns:
List[float]: eer and the specific threshold

@ -121,8 +121,8 @@ def apply_effects_tensor(
"""
tensor_np = tensor.numpy()
ret = paddleaudio._paddleaudio.sox_effects_apply_effects_tensor(tensor_np, sample_rate,
effects, channels_first)
ret = paddleaudio._paddleaudio.sox_effects_apply_effects_tensor(
tensor_np, sample_rate, effects, channels_first)
if ret is not None:
return (paddle.to_tensor(ret[0]), ret[1])
raise RuntimeError("Failed to apply sox effect")
@ -139,7 +139,7 @@ def apply_effects_file(
Note:
This function works in the way very similar to ``sox`` command, however there are slight
differences. For example, ``sox`` commnad adds certain effects automatically (such as
differences. For example, ``sox`` command adds certain effects automatically (such as
``rate`` effect after ``speed``, ``pitch`` etc), but this function only applies the given
effects. Therefore, to actually apply ``speed`` effect, you also need to give ``rate``
effect with desired sampling rate, because internally, ``speed`` effects only alter sampling
@ -228,14 +228,14 @@ def apply_effects_file(
>>> pass
"""
if hasattr(path, "read"):
ret = paddleaudio._paddleaudio.apply_effects_fileobj(path, effects, normalize,
channels_first, format)
ret = paddleaudio._paddleaudio.apply_effects_fileobj(
path, effects, normalize, channels_first, format)
if ret is None:
raise RuntimeError("Failed to load audio from {}".format(path))
return (paddle.to_tensor(ret[0]), ret[1])
path = os.fspath(path)
ret = paddleaudio._paddleaudio.sox_effects_apply_effects_file(path, effects, normalize,
channels_first, format)
ret = paddleaudio._paddleaudio.sox_effects_apply_effects_file(
path, effects, normalize, channels_first, format)
if ret is not None:
return (paddle.to_tensor(ret[0]), ret[1])
raise RuntimeError("Failed to load audio from {}".format(path))

@ -26,7 +26,7 @@ template <class F>
bool StreamingFeatureTpl<F>::ComputeFeature(
const std::vector<float>& wav,
std::vector<float>* feats) {
// append remaned waves
// append remained waves
int wav_len = wav.size();
if (wav_len == 0) return false;
int left_len = remained_wav_.size();
@ -38,7 +38,7 @@ bool StreamingFeatureTpl<F>::ComputeFeature(
wav.data(),
wav_len * sizeof(float));
// cache remaned waves
// cache remained waves
knf::FrameExtractionOptions frame_opts = computer_.GetFrameOptions();
int num_frames = knf::NumFrames(waves.size(), frame_opts);
int frame_shift = frame_opts.WindowShift();

@ -44,5 +44,5 @@ py::array_t<float> KaldiFeatureWrapper::ComputeFbank(
return result.reshape(shape);
}
} // namesapce kaldi
} // namespace kaldi
} // namespace paddleaudio

@ -12,9 +12,9 @@ using namespace paddleaudio::sox_utils;
namespace paddleaudio::sox_effects {
// Streaming decoding over file-like object is tricky because libsox operates on
// FILE pointer. The folloing is what `sox` and `play` commands do
// FILE pointer. The following is what `sox` and `play` commands do
// - file input -> FILE pointer
// - URL input -> call wget in suprocess and pipe the data -> FILE pointer
// - URL input -> call wget in subprocess and pipe the data -> FILE pointer
// - stdin -> FILE pointer
//
// We want to, instead, fetch byte strings chunk by chunk, consume them, and
@ -127,12 +127,12 @@ namespace {
enum SoxEffectsResourceState { NotInitialized, Initialized, ShutDown };
SoxEffectsResourceState SOX_RESOURCE_STATE = NotInitialized;
std::mutex SOX_RESOUCE_STATE_MUTEX;
std::mutex SOX_RESOURCE_STATE_MUTEX;
} // namespace
void initialize_sox_effects() {
const std::lock_guard<std::mutex> lock(SOX_RESOUCE_STATE_MUTEX);
const std::lock_guard<std::mutex> lock(SOX_RESOURCE_STATE_MUTEX);
switch (SOX_RESOURCE_STATE) {
case NotInitialized:
@ -150,7 +150,7 @@ void initialize_sox_effects() {
};
void shutdown_sox_effects() {
const std::lock_guard<std::mutex> lock(SOX_RESOUCE_STATE_MUTEX);
const std::lock_guard<std::mutex> lock(SOX_RESOURCE_STATE_MUTEX);
switch (SOX_RESOURCE_STATE) {
case NotInitialized:

@ -14,7 +14,7 @@ namespace {
/// helper classes for passing the location of input tensor and output buffer
///
/// drain/flow callback functions require plaing C style function signature and
/// drain/flow callback functions require plain C style function signature and
/// the way to pass extra data is to attach data to sox_effect_t::priv pointer.
/// The following structs will be assigned to sox_effect_t::priv pointer which
/// gives sox_effect_t an access to input Tensor and output buffer object.
@ -50,7 +50,7 @@ int tensor_input_drain(sox_effect_t* effp, sox_sample_t* obuf, size_t* osamp) {
*osamp -= *osamp % num_channels;
// Slice the input Tensor
// refacor this module, chunk
// refactor this module, chunk
auto i_frame = index / num_channels;
auto num_frames = *osamp / num_channels;

@ -162,7 +162,7 @@ py::dtype get_dtype(
}
default:
// default to float32 for the other formats, including
// 32-bit flaoting-point WAV,
// 32-bit floating-point WAV,
// MP3,
// FLAC,
// VORBIS etc...
@ -177,7 +177,7 @@ py::array convert_to_tensor(
const py::dtype dtype,
const bool normalize,
const bool channels_first) {
// todo refector later(SGoat)
// todo refactor later(SGoat)
py::array t;
uint64_t dummy = 0;
SOX_SAMPLE_LOCALS;
@ -449,7 +449,7 @@ unsigned get_precision(const std::string filetype, py::dtype dtype) {
return SOX_UNSPEC;
if (filetype == "wav" || filetype == "amb") {
switch (dtype.num()) {
case 1: // byte in numpy dype num
case 1: // byte in numpy dtype num
return 8;
case 3: // short, in numpy dtype num
return 16;

@ -76,7 +76,7 @@ py::dtype get_dtype(
/// Tensor.
/// @param dtype Target dtype. Determines the output dtype and value range in
/// conjunction with normalization.
/// @param noramlize Perform normalization. Only effective when dtype is not
/// @param normalize Perform normalization. Only effective when dtype is not
/// kFloat32. When effective, the output tensor is kFloat32 type and value range
/// is [-1.0, 1.0]
/// @param channels_first When True, output Tensor has shape of [num_channels,

@ -8,9 +8,9 @@ set(patch_dir ${CMAKE_CURRENT_SOURCE_DIR}/../patches)
set(COMMON_ARGS --quiet --disable-shared --enable-static --prefix=${INSTALL_DIR} --with-pic --disable-dependency-tracking --disable-debug --disable-examples --disable-doc)
# To pass custom environment variables to ExternalProject_Add command,
# we need to do `${CMAKE_COMMAND} -E env ${envs} <COMMANAD>`.
# we need to do `${CMAKE_COMMAND} -E env ${envs} <COMMAND>`.
# https://stackoverflow.com/a/62437353
# We constrcut the custom environment variables here
# We construct the custom environment variables here
set(envs
"PKG_CONFIG_PATH=${INSTALL_DIR}/lib/pkgconfig"
"LDFLAGS=-L${INSTALL_DIR}/lib $ENV{LDFLAGS}"

@ -41,14 +41,14 @@ def download_and_decompress(archives: List[Dict[str, str]],
path: str,
decompress: bool=True):
"""
Download archieves and decompress to specific path.
Download archives and decompress to specific path.
"""
if not os.path.isdir(path):
os.makedirs(path)
for archive in archives:
assert 'url' in archive and 'md5' in archive, \
'Dictionary keys of "url" and "md5" are required in the archive, but got: {list(archieve.keys())}'
'Dictionary keys of "url" and "md5" are required in the archive, but got: {list(archive.keys())}'
download.get_path_from_url(
archive['url'], path, archive['md5'], decompress=decompress)

@ -58,7 +58,7 @@ log_config = {
class Logger(object):
'''
Deafult logger in PaddleAudio
Default logger in PaddleAudio
Args:
name(str) : Logger name, default is 'PaddleAudio'
'''

@ -55,7 +55,7 @@ def set_use_threads(use_threads: bool):
Args:
use_threads (bool): When ``True``, enables ``libsox``'s parallel effects channels processing.
To use mutlithread, the underlying ``libsox`` has to be compiled with OpenMP support.
To use multithread, the underlying ``libsox`` has to be compiled with OpenMP support.
See Also:
http://sox.sourceforge.net/sox.html

@ -11,7 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Unility functions for Transformer."""
"""Utility functions for Transformer."""
from typing import List
from typing import Tuple
@ -80,7 +80,7 @@ def pad_sequence(sequences: List[paddle.Tensor],
# assuming trailing dimensions and type of all the Tensors
# in sequences are same and fetching those from sequences[0]
max_size = paddle.shape(sequences[0])
# (TODO Hui Zhang): slice not supprot `end==start`
# (TODO Hui Zhang): slice not support `end==start`
# trailing_dims = max_size[1:]
trailing_dims = tuple(
max_size[1:].numpy().tolist()) if sequences[0].ndim >= 2 else ()
@ -94,7 +94,7 @@ def pad_sequence(sequences: List[paddle.Tensor],
length = tensor.shape[0]
# use index notation to prevent duplicate references to the tensor
if batch_first:
# TODO (Hui Zhang): set_value op not supprot `end==start`
# TODO (Hui Zhang): set_value op not support `end==start`
# TODO (Hui Zhang): set_value op not support int16
# TODO (Hui Zhang): set_varbase 2 rank not support [0,0,...]
# out_tensor[i, :length, ...] = tensor
@ -103,7 +103,7 @@ def pad_sequence(sequences: List[paddle.Tensor],
else:
out_tensor[i, length] = tensor
else:
# TODO (Hui Zhang): set_value op not supprot `end==start`
# TODO (Hui Zhang): set_value op not support `end==start`
# out_tensor[:length, i, ...] = tensor
if length != 0:
out_tensor[:length, i] = tensor

@ -21,7 +21,7 @@ __all__ = [
class Timer(object):
'''Calculate runing speed and estimated time of arrival(ETA)'''
'''Calculate running speed and estimated time of arrival(ETA)'''
def __init__(self, total_step: int):
self.total_step = total_step

@ -15,8 +15,8 @@ import os
import unittest
import urllib.request
mono_channel_wav = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
multi_channels_wav = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav'
mono_channel_wav = 'https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav'
multi_channels_wav = 'https://paddlespeech.cdn.bcebos.com/PaddleAudio/cat.wav'
class BackendTest(unittest.TestCase):
@ -30,5 +30,5 @@ class BackendTest(unittest.TestCase):
urllib.request.urlretrieve(url, os.path.basename(url))
self.files.append(os.path.basename(url))
def initParmas(self):
def initParams(self):
raise NotImplementedError

@ -15,8 +15,8 @@ import os
import unittest
import urllib.request
mono_channel_wav = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
multi_channels_wav = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav'
mono_channel_wav = 'https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav'
multi_channels_wav = 'https://paddlespeech.cdn.bcebos.com/PaddleAudio/cat.wav'
class BackendTest(unittest.TestCase):
@ -30,5 +30,5 @@ class BackendTest(unittest.TestCase):
urllib.request.urlretrieve(url, os.path.basename(url))
self.files.append(os.path.basename(url))
def initParmas(self):
def initParams(self):
raise NotImplementedError

@ -58,7 +58,7 @@ class MockedSaveTest(unittest.TestCase):
encoding=encoding,
bits_per_sample=bits_per_sample, )
# on +Py3.8 call_args.kwargs is more descreptive
# on +Py3.8 call_args.kwargs is more descriptive
args = mocked_write.call_args[1]
assert args["file"] == filepath
assert args["samplerate"] == sample_rate
@ -103,7 +103,7 @@ class MockedSaveTest(unittest.TestCase):
encoding=encoding,
bits_per_sample=bits_per_sample, )
# on +Py3.8 call_args.kwargs is more descreptive
# on +Py3.8 call_args.kwargs is more descriptive
args = mocked_write.call_args[1]
assert args["file"] == filepath
assert args["samplerate"] == sample_rate
@ -191,7 +191,7 @@ class SaveTestBase(TempDirMixin, unittest.TestCase):
def _assert_non_wav(self, fmt, dtype, sample_rate, num_channels):
"""`soundfile_backend.save` can save non-wav format.
Due to precision missmatch, and the lack of alternative way to decode the
Due to precision mismatch, and the lack of alternative way to decode the
resulting files without using soundfile, only meta data are validated.
"""
num_frames = sample_rate * 3

@ -41,7 +41,7 @@ class TestSaveBase(TempDirMixin):
test_mode: str="path", ):
"""`save` function produces file that is comparable with `sox` command
To compare that the file produced by `save` function agains the file produced by
To compare that the file produced by `save` function against the file produced by
the equivalent `sox` command, we need to load both files.
But there are many formats that cannot be opened with common Python modules (like
SciPy).

@ -21,11 +21,12 @@ import paddleaudio
import torch
import torchaudio
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
wav_url = 'https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav'
if not os.path.isfile(os.path.basename(wav_url)):
urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))
waveform, sr = paddleaudio.backends.soundfile_load(os.path.abspath(os.path.basename(wav_url)))
waveform, sr = paddleaudio.backends.soundfile_load(
os.path.abspath(os.path.basename(wav_url)))
waveform_tensor = paddle.to_tensor(waveform).unsqueeze(0)
waveform_tensor_torch = torch.from_numpy(waveform).unsqueeze(0)

@ -21,11 +21,12 @@ import paddleaudio
import torch
import torchaudio
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
wav_url = 'https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav'
if not os.path.isfile(os.path.basename(wav_url)):
urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))
waveform, sr = paddleaudio.backends.soundfile_load(os.path.abspath(os.path.basename(wav_url)))
waveform, sr = paddleaudio.backends.soundfile_load(
os.path.abspath(os.path.basename(wav_url)))
waveform_tensor = paddle.to_tensor(waveform).unsqueeze(0)
waveform_tensor_torch = torch.from_numpy(waveform).unsqueeze(0)

@ -21,11 +21,12 @@ import paddleaudio
import torch
import torchaudio
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
wav_url = 'https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav'
if not os.path.isfile(os.path.basename(wav_url)):
urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))
waveform, sr = paddleaudio.backends.soundfile_load(os.path.abspath(os.path.basename(wav_url)))
waveform, sr = paddleaudio.backends.soundfile_load(
os.path.abspath(os.path.basename(wav_url)))
waveform_tensor = paddle.to_tensor(waveform).unsqueeze(0)
waveform_tensor_torch = torch.from_numpy(waveform).unsqueeze(0)

@ -81,7 +81,7 @@ def convert_tensor_encoding(
#dtype = getattr(paddle, dtype)
#if dtype not in [paddle.float64, paddle.float32, paddle.int32, paddle.int16, paddle.uint8]:
#raise NotImplementedError(f"dtype {dtype} is not supported.")
## According to the doc, folking rng on all CUDA devices is slow when there are many CUDA devices,
## According to the doc, forking rng on all CUDA devices is slow when there are many CUDA devices,
## so we only fork on CPU, generate values and move the data to the given device
#with paddle.random.fork_rng([]):
#paddle.random.manual_seed(seed)

@ -24,20 +24,21 @@ def get_bit_depth(dtype):
def gen_audio_file(
path,
sample_rate,
num_channels,
*,
encoding=None,
bit_depth=None,
compression=None,
attenuation=None,
duration=1,
comment_file=None,
):
path,
sample_rate,
num_channels,
*,
encoding=None,
bit_depth=None,
compression=None,
attenuation=None,
duration=1,
comment_file=None, ):
"""Generate synthetic audio file with `sox` command."""
if path.endswith(".wav"):
warnings.warn("Use get_wav_data and save_wav to generate wav file for accurate result.")
warnings.warn(
"Use get_wav_data and save_wav to generate wav file for accurate result."
)
command = [
"sox",
"-V3", # verbose
@ -81,7 +82,12 @@ def gen_audio_file(
subprocess.run(command, check=True)
def convert_audio_file(src_path, dst_path, *, encoding=None, bit_depth=None, compression=None):
def convert_audio_file(src_path,
dst_path,
*,
encoding=None,
bit_depth=None,
compression=None):
"""Convert audio file with `sox` command."""
command = ["sox", "-V3", "--no-dither", "-R", str(src_path)]
if encoding is not None:
@ -95,7 +101,7 @@ def convert_audio_file(src_path, dst_path, *, encoding=None, bit_depth=None, com
subprocess.run(command, check=True)
def _flattern(effects):
def _flatten(effects):
if not effects:
return effects
if isinstance(effects[0], str):
@ -103,9 +109,14 @@ def _flattern(effects):
return [item for sublist in effects for item in sublist]
def run_sox_effect(input_file, output_file, effect, *, output_sample_rate=None, output_bitdepth=None):
def run_sox_effect(input_file,
output_file,
effect,
*,
output_sample_rate=None,
output_bitdepth=None):
"""Run sox effects"""
effect = _flattern(effect)
effect = _flatten(effect)
command = ["sox", "-V", "--no-dither", input_file]
if output_bitdepth:
command += ["--bits", str(output_bitdepth)]

@ -19,12 +19,12 @@ import numpy as np
import paddle
from paddleaudio.backends import soundfile_load as load
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
wav_url = 'https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav'
class FeatTest(unittest.TestCase):
def setUp(self):
self.initParmas()
self.initParams()
self.initWavInput()
self.setUpDevice()
@ -44,5 +44,5 @@ class FeatTest(unittest.TestCase):
if dim == 1:
self.waveform = np.expand_dims(self.waveform, 0)
def initParmas(self):
def initParams(self):
raise NotImplementedError

@ -23,7 +23,7 @@ from paddlespeech.audio.transform.spectrogram import Stft
class TestIstft(FeatTest):
def initParmas(self):
def initParams(self):
self.n_fft = 512
self.hop_length = 128
self.window_str = 'hann'

@ -18,12 +18,11 @@ import paddle
import paddleaudio
import torch
import torchaudio
from base import FeatTest
class TestKaldi(FeatTest):
def initParmas(self):
def initParams(self):
self.window_size = 1024
self.dtype = 'float32'

@ -17,13 +17,12 @@ import librosa
import numpy as np
import paddle
import paddleaudio
from paddleaudio.functional.window import get_window
from base import FeatTest
from paddleaudio.functional.window import get_window
class TestLibrosa(FeatTest):
def initParmas(self):
def initParams(self):
self.n_fft = 512
self.hop_length = 128
self.n_mels = 40

@ -22,7 +22,7 @@ from paddlespeech.audio.transform.spectrogram import LogMelSpectrogram
class TestLogMelSpectrogram(FeatTest):
def initParmas(self):
def initParams(self):
self.n_fft = 512
self.hop_length = 128
self.n_mels = 40

@ -22,7 +22,7 @@ from paddlespeech.audio.transform.spectrogram import Spectrogram
class TestSpectrogram(FeatTest):
def initParmas(self):
def initParams(self):
self.n_fft = 512
self.hop_length = 128

@ -22,7 +22,7 @@ from paddlespeech.audio.transform.spectrogram import Stft
class TestStft(FeatTest):
def initParmas(self):
def initParams(self):
self.n_fft = 512
self.hop_length = 128
self.window_str = 'hann'
@ -30,7 +30,7 @@ class TestStft(FeatTest):
def test_stft(self):
ps_stft = Stft(self.n_fft, self.hop_length)
ps_res = ps_stft(
self.waveform.T).squeeze(1).T # (n_fft//2 + 1, n_frmaes)
self.waveform.T).squeeze(1).T # (n_fft//2 + 1, n_frames)
x = paddle.to_tensor(self.waveform)
window = get_window(self.window_str, self.n_fft, dtype=x.dtype)

@ -58,7 +58,7 @@ def download(url, md5sum, target_dir, filename=None):
if not (os.path.exists(filepath) and md5file(filepath) == md5sum):
print("Downloading %s ..." % url)
wget.download(url, target_dir)
print("\nMD5 Chesksum %s ..." % filepath)
print("\nMD5 Checksum %s ..." % filepath)
if not md5file(filepath) == md5sum:
raise RuntimeError("MD5 checksum failed.")
else:
@ -109,7 +109,7 @@ def create_manifest(data_dir, manifest_path):
def prepare_chime3(url, md5sum, target_dir, manifest_path):
"""Download, unpack and create summmary manifest file."""
"""Download, unpack and create summary manifest file."""
if not os.path.exists(os.path.join(target_dir, "CHiME3")):
# download
filepath = download(url, md5sum, target_dir,

@ -132,7 +132,7 @@ def create_manifest(data_dir, manifest_path):
def prepare_dataset(url, md5sum, target_dir, manifest_path):
"""Download, unpack and create summmary manifest file.
"""Download, unpack and create summary manifest file.
"""
if not os.path.exists(os.path.join(target_dir, "LibriSpeech")):
# download

@ -108,7 +108,7 @@ def create_manifest(data_dir, manifest_path):
def prepare_dataset(url, md5sum, target_dir, manifest_path):
"""Download, unpack and create summmary manifest file.
"""Download, unpack and create summary manifest file.
"""
if not os.path.exists(os.path.join(target_dir, "LibriSpeech")):
# download

@ -13,7 +13,7 @@
# limitations under the License.
"""Prepare Ted-En-Zh speech translation dataset
Create manifest files from splited datased.
Create manifest files from splited dataset.
dev set: tst2010, test set: tst2015
Manifest file is a json-format file with each line containing the
meta data (i.e. audio filepath, transcript and audio duration)

@ -71,7 +71,7 @@ def read_trn(filepath):
with open(filepath, 'r') as f:
lines = f.read().strip().split('\n')
assert len(lines) == 3, lines
# charactor text, remove withespace
# character text, remove whitespace
texts.append(''.join(lines[0].split()))
texts.extend(lines[1:])
return texts
@ -127,7 +127,7 @@ def create_manifest(data_dir, manifest_path_prefix):
'utt2spk': spk,
'feat': audio_path,
'feat_shape': (duration, ), # second
'text': word_text, # charactor
'text': word_text, # character
'syllable': syllable_text,
'phone': phone_text,
},

@ -123,7 +123,7 @@ def read_algin(filepath: str) -> str:
filepath (str): [description]
Returns:
str: token sepearte by <space>
str: token separate by <space>
"""
aligns = [] # (start, end, token)
with open(filepath, 'r') as f:
@ -210,7 +210,7 @@ def create_manifest(data_dir, manifest_path_prefix):
def prepare_dataset(url, md5sum, target_dir, manifest_path):
"""Download, unpack and create summmary manifest file.
"""Download, unpack and create summary manifest file.
"""
filepath = os.path.join(target_dir, "TIMIT.zip")
if not os.path.exists(filepath):

@ -13,7 +13,7 @@
# limitations under the License.
"""Prepare TIMIT dataset (Standard split from Kaldi)
Create manifest files from splited datased.
Create manifest files from splited dataset.
Manifest file is a json-format file with each line containing the
meta data (i.e. audio filepath, transcript and audio duration)
of each audio file in the data set.

@ -167,7 +167,7 @@ def prepare_dataset(base_url, data_list, target_dir, manifest_path,
# check the target zip file md5sum
if not check_md5sum(target_name, target_md5sum):
raise RuntimeError("{} MD5 checkssum failed".format(target_name))
raise RuntimeError("{} MD5 checksum failed".format(target_name))
else:
print("Check {} md5sum successfully".format(target_name))

@ -179,7 +179,7 @@ def download_dataset(base_url, data_list, target_data, target_dir, dataset):
# check the target zip file md5sum
if not check_md5sum(target_name, target_md5sum):
raise RuntimeError("{} MD5 checkssum failed".format(target_name))
raise RuntimeError("{} MD5 checksum failed".format(target_name))
else:
print("Check {} md5sum successfully".format(target_name))
@ -187,7 +187,7 @@ def download_dataset(base_url, data_list, target_data, target_dir, dataset):
# we need make the test directory
unzip(target_name, os.path.join(target_dir, "test"))
else:
# upzip dev zip pacakge and will create the dev directory
# unzip dev zip package and will create the dev directory
unzip(target_name, target_dir)

@ -8,7 +8,7 @@
### 环境准备
1. 在本地环境安装好 Android Studio 工具,详细安装方法请见 [Android Stuido 官网](https://developer.android.com/studio)。
1. 在本地环境安装好 Android Studio 工具,详细安装方法请见 [Android Studio 官网](https://developer.android.com/studio)。
2. 准备一部 Android 手机,并开启 USB 调试模式。开启方法: `手机设置 -> 查找开发者选项 -> 打开开发者选项和 USB 调试模式`
**注意**
@ -20,10 +20,10 @@
2. 手机连接电脑,打开 USB 调试和文件传输模式,并在 Android Studio 上连接自己的手机设备(手机需要开启允许从 USB 安装软件权限)。
**注意:**
>1. 如果您在导入项目、编译或者运行过程中遇到 NDK 配置错误的提示,请打开 `File > Project Structure > SDK Location`,修改 `Andriod NDK location` 为您本机配置的 NDK 所在路径。
>2. 如果您是通过 Andriod Studio 的 SDK Tools 下载的 NDK (见本章节"环境准备"),可以直接点击下拉框选择默认路径。
>1. 如果您在导入项目、编译或者运行过程中遇到 NDK 配置错误的提示,请打开 `File > Project Structure > SDK Location`,修改 `Android NDK location` 为您本机配置的 NDK 所在路径。
>2. 如果您是通过 Android Studio 的 SDK Tools 下载的 NDK (见本章节"环境准备"),可以直接点击下拉框选择默认路径。
>3. 还有一种 NDK 配置方法,你可以在 `TTSAndroid/local.properties` 文件中手动添加 NDK 路径配置 `nkd.dir=/root/android-ndk-r20b`
>4. 如果以上步骤仍旧无法解决 NDK 配置错误,请尝试根据 Andriod Studio 官方文档中的[更新 Android Gradle 插件](https://developer.android.com/studio/releases/gradle-plugin?hl=zh-cn#updating-plugin)章节,尝试更新 Android Gradle plugin 版本。
>4. 如果以上步骤仍旧无法解决 NDK 配置错误,请尝试根据 Android Studio 官方文档中的[更新 Android Gradle 插件](https://developer.android.com/studio/releases/gradle-plugin?hl=zh-cn#updating-plugin)章节,尝试更新 Android Gradle plugin 版本。
3. 点击 Run 按钮,自动编译 APP 并安装到手机。(该过程会自动下载 Paddle Lite 预测库和模型,需要联网)
成功后效果如下:
@ -70,8 +70,8 @@ TTSAndroid/app/src/main/java/com/baidu/paddle/lite/demo/tts/Predictor.java
```
2. `fastspeech2_csmsc_arm.nb``mb_melgan_csmsc_arm.nb`: 模型文件 (opt 工具转化后 Paddle Lite 模型)
,分别来自 [fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip)
和 [mb_melgan_csmsc_pdlite_1.3.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_pdlite_1.3.0.zip)。
,分别来自 [fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip](https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip)
和 [mb_melgan_csmsc_pdlite_1.3.0.zip](https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_csmsc_pdlite_1.3.0.zip)。
```bash
# 位置:
@ -161,7 +161,7 @@ Android 示例基于 Java API 开发,调用 Paddle Lite `Java API` 包括以
- C++ 中文前端 [lym0302/paddlespeech_tts_cpp](https://github.com/lym0302/paddlespeech_tts_cpp)
- C++ 英文 g2p [yazone/g2pE_mobile](https://github.com/yazone/g2pE_mobile)
`phone_id_map.txt` 请参考 [fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip)。
`phone_id_map.txt` 请参考 [fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip](https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_pdlite_1.3.0.zip)。
## 通过 setting 界面更新语音合成的相关参数
@ -186,7 +186,7 @@ Android 示例基于 Java API 开发,调用 Paddle Lite `Java API` 包括以
## Release
[2022-11-29-app-release.apk](https://paddlespeech.bj.bcebos.com/demos/TTSAndroid/2022-11-29-app-release.apk)
[2022-11-29-app-release.apk](https://paddlespeech.cdn.bcebos.com/demos/TTSAndroid/2022-11-29-app-release.apk)
## More
本 Demo 合并自 [yt605155624/TTSAndroid](https://github.com/yt605155624/TTSAndroid)。

@ -31,7 +31,7 @@ dependencies {
implementation files('libs/PaddlePredictor.jar')
}
def paddleLiteLibs = 'https://paddlespeech.bj.bcebos.com/demos/TTSAndroid/paddle_lite_libs_68b66fd3.tar.gz'
def paddleLiteLibs = 'https://paddlespeech.cdn.bcebos.com/demos/TTSAndroid/paddle_lite_libs_68b66fd3.tar.gz'
task downloadAndExtractPaddleLiteLibs(type: DefaultTask) {
doFirst {
println "Downloading and extracting Paddle Lite libs"
@ -73,7 +73,7 @@ task downloadAndExtractPaddleLiteLibs(type: DefaultTask) {
}
preBuild.dependsOn downloadAndExtractPaddleLiteLibs
def paddleLiteModels = [['src' : 'https://paddlespeech.bj.bcebos.com/demos/TTSAndroid/fs2cnn_mbmelgan_cpu_v1.3.0.tar.gz',
def paddleLiteModels = [['src' : 'https://paddlespeech.cdn.bcebos.com/demos/TTSAndroid/fs2cnn_mbmelgan_cpu_v1.3.0.tar.gz',
'dest': 'src/main/assets/models'],]
task downloadAndExtractPaddleLiteModels(type: DefaultTask) {
doFirst {

@ -21,7 +21,7 @@ sudo yum install cmake wget tar unzip
### 下载 Paddle Lite 库文件和模型文件
预编译的二进制使用与安卓 Demo 版本相同的 Paddle Lite 推理库([Paddle-Lite:68b66fd35](https://github.com/PaddlePaddle/Paddle-Lite/tree/68b66fd356c875c92167d311ad458e6093078449))和模型([fs2cnn_mbmelgan_cpu_v1.3.0](https://paddlespeech.bj.bcebos.com/demos/TTSAndroid/fs2cnn_mbmelgan_cpu_v1.3.0.tar.gz))。
预编译的二进制使用与安卓 Demo 版本相同的 Paddle Lite 推理库([Paddle-Lite:68b66fd35](https://github.com/PaddlePaddle/Paddle-Lite/tree/68b66fd356c875c92167d311ad458e6093078449))和模型([fs2cnn_mbmelgan_cpu_v1.3.0](https://paddlespeech.cdn.bcebos.com/demos/TTSAndroid/fs2cnn_mbmelgan_cpu_v1.3.0.tar.gz))。
可用以下命令下载:

@ -45,17 +45,17 @@ download() {
echo "Download models..."
download 'inference_lite_lib.armlinux.armv8.gcc.with_extra.with_cv.tar.gz' \
'https://paddlespeech.bj.bcebos.com/demos/TTSArmLinux/inference_lite_lib.armlinux.armv8.gcc.with_extra.with_cv.tar.gz' \
'https://paddlespeech.cdn.bcebos.com/demos/TTSArmLinux/inference_lite_lib.armlinux.armv8.gcc.with_extra.with_cv.tar.gz' \
'39e0c6604f97c70f5d13c573d7e709b9' \
"$LIBS_DIR"
download 'inference_lite_lib.armlinux.armv7hf.gcc.with_extra.with_cv.tar.gz' \
'https://paddlespeech.bj.bcebos.com/demos/TTSArmLinux/inference_lite_lib.armlinux.armv7hf.gcc.with_extra.with_cv.tar.gz' \
'https://paddlespeech.cdn.bcebos.com/demos/TTSArmLinux/inference_lite_lib.armlinux.armv7hf.gcc.with_extra.with_cv.tar.gz' \
'f5ceb509f0b610dafb8379889c5f36f8' \
"$LIBS_DIR"
download 'fs2cnn_mbmelgan_cpu_v1.3.0.tar.gz' \
'https://paddlespeech.bj.bcebos.com/demos/TTSAndroid/fs2cnn_mbmelgan_cpu_v1.3.0.tar.gz' \
'https://paddlespeech.cdn.bcebos.com/demos/TTSAndroid/fs2cnn_mbmelgan_cpu_v1.3.0.tar.gz' \
'93ef17d44b498aff3bea93e2c5c09a1e' \
"$MODELS_DIR"

@ -40,22 +40,22 @@ DIST_DIR="$PWD/front_demo/dict"
mkdir -p "$DIST_DIR"
download 'fastspeech2_nosil_baker_ckpt_0.4.tar.gz' \
'https://paddlespeech.bj.bcebos.com/t2s/text_frontend/fastspeech2_nosil_baker_ckpt_0.4.tar.gz' \
'https://paddlespeech.cdn.bcebos.com/t2s/text_frontend/fastspeech2_nosil_baker_ckpt_0.4.tar.gz' \
'7bf1bab1737375fa123c413eb429c573' \
"$DIST_DIR"
download 'speedyspeech_nosil_baker_ckpt_0.5.tar.gz' \
'https://paddlespeech.bj.bcebos.com/t2s/text_frontend/speedyspeech_nosil_baker_ckpt_0.5.tar.gz' \
'https://paddlespeech.cdn.bcebos.com/t2s/text_frontend/speedyspeech_nosil_baker_ckpt_0.5.tar.gz' \
'0b7754b21f324789aef469c61f4d5b8f' \
"$DIST_DIR"
download 'jieba.tar.gz' \
'https://paddlespeech.bj.bcebos.com/t2s/text_frontend/jieba.tar.gz' \
'https://paddlespeech.cdn.bcebos.com/t2s/text_frontend/jieba.tar.gz' \
'6d30f426bd8c0025110a483f051315ca' \
"$DIST_DIR"
download 'tranditional_to_simplified.tar.gz' \
'https://paddlespeech.bj.bcebos.com/t2s/text_frontend/tranditional_to_simplified.tar.gz' \
'https://paddlespeech.cdn.bcebos.com/t2s/text_frontend/tranditional_to_simplified.tar.gz' \
'258f5b59d5ebfe96d02007ca1d274a7f' \
"$DIST_DIR"

@ -115,27 +115,27 @@ int FrontEngineInterface::init() {
// 生成词典(词到音素的映射)
if (0 != GenDict(_word2phone_path, &word_phone_map)) {
LOG(ERROR) << "Genarate word2phone dict failed";
LOG(ERROR) << "Generate word2phone dict failed";
return -1;
}
// 生成音素字典音素到音素id的映射
if (0 != GenDict(_phone2id_path, &phone_id_map)) {
LOG(ERROR) << "Genarate phone2id dict failed";
LOG(ERROR) << "Generate phone2id dict failed";
return -1;
}
// 生成音调字典音调到音调id的映射
if (_separate_tone == "true") {
if (0 != GenDict(_tone2id_path, &tone_id_map)) {
LOG(ERROR) << "Genarate tone2id dict failed";
LOG(ERROR) << "Generate tone2id dict failed";
return -1;
}
}
// 生成繁简字典繁体到简体id的映射
if (0 != GenDict(_trand2simp_path, &trand_simp_map)) {
LOG(ERROR) << "Genarate trand2simp dict failed";
LOG(ERROR) << "Generate trand2simp dict failed";
return -1;
}
@ -263,7 +263,7 @@ int FrontEngineInterface::GetWordsIds(
if (0 !=
GetInitialsFinals(word, &word_initials, &word_finals)) {
LOG(ERROR)
<< "Genarate the word_initials and word_finals of "
<< "Generate the word_initials and word_finals of "
<< word << " failed";
return -1;
}
@ -304,7 +304,7 @@ int FrontEngineInterface::GetWordsIds(
// 音素到音素id
if (0 != Phone2Phoneid(phone, phoneids, toneids)) {
LOG(ERROR) << "Genarate the phone id of " << word << " failed";
LOG(ERROR) << "Generate the phone id of " << word << " failed";
return -1;
}
}
@ -916,11 +916,11 @@ int FrontEngineInterface::NeuralSandhi(const std::string &word,
if (find(must_neural_tone_words.begin(),
must_neural_tone_words.end(),
word) != must_neural_tone_words.end() ||
(word_num >= 2 &&
find(must_neural_tone_words.begin(),
must_neural_tone_words.end(),
ppspeech::wstring2utf8string(word_wstr.substr(
word_num - 2))) != must_neural_tone_words.end())) {
(word_num >= 2 && find(must_neural_tone_words.begin(),
must_neural_tone_words.end(),
ppspeech::wstring2utf8string(
word_wstr.substr(word_num - 2))) !=
must_neural_tone_words.end())) {
(*finals).back() =
(*finals).back().replace((*finals).back().length() - 1, 1, "5");
}

@ -14,7 +14,7 @@ Now, the search word in demo is:
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from meduim and hard to install paddlespeech.
You can choose one way from medium and hard to install paddlespeech.
The dependency refers to the requirements.txt, and install the dependency as follows:
@ -27,7 +27,7 @@ The input of this demo should be a WAV file(`.wav`), and the sample rate must be
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav
```
### 3. run paddlespeech_server

@ -27,7 +27,7 @@ pip install -r requirements.txt
可以下载此 demo 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav
```
### 3. 启动 server

@ -19,7 +19,7 @@ Notethis demo uses the [CN-Celeb](http://openslr.org/82/) dataset of at least
### 1. Prepare PaddleSpeech
Audio vector extraction requires PaddleSpeech training model, so please make sure that PaddleSpeech has been installed before running. Specific installation steps: See [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech.
You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare MySQL and Milvus services by docker-compose
The audio similarity search system requires Milvus, MySQL services. We can start these containers with one click through [docker-compose.yaml](./docker-compose.yaml), so please make sure you have [installed Docker Engine](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) before running. then
@ -128,7 +128,7 @@ Then to start the system server, and it provides HTTP backend services.
Output
```bash
Downloading https://paddlespeech.bj.bcebos.com/vector/audio/example_audio.tar.gz ...
Downloading https://paddlespeech.cdn.bcebos.com/vector/audio/example_audio.tar.gz ...
...
Unpacking ./example_audio.tar.gz ...
[2022-03-26 22:50:54,987] [ INFO] - checking the aduio file format......
@ -136,7 +136,7 @@ Then to start the system server, and it provides HTTP backend services.
[2022-03-26 22:50:54,987] [ INFO] - The audio file format is right
[2022-03-26 22:50:54,988] [ INFO] - device type: cpu
[2022-03-26 22:50:54,988] [ INFO] - load the pretrained model: ecapatdnn_voxceleb12-16k
[2022-03-26 22:50:54,990] [ INFO] - Downloading sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_0.tar.gz from https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_0.tar.gz
[2022-03-26 22:50:54,990] [ INFO] - Downloading sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_0.tar.gz from https://paddlespeech.cdn.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_0.tar.gz
...
[2022-03-26 22:51:17,285] [ INFO] - start to dynamic import the model class
[2022-03-26 22:51:17,285] [ INFO] - model name ecapatdnn
@ -217,7 +217,7 @@ Then to start the system server, and it provides HTTP backend services.
- memory132G
dataset
- CN-Celeb, train size 650,000, test size 10,000, dimention 192, distance L2
- CN-Celeb, train size 650,000, test size 10,000, dimension 192, distance L2
recall and elapsed time statistics are shown in the following figure
@ -226,7 +226,7 @@ recall and elapsed time statistics are shown in the following figure
The retrieval framework based on Milvus takes about 2.9 milliseconds to retrieve on the premise of 90% recall rate, and it takes about 500 milliseconds for feature extraction (testing audio takes about 5 seconds), that is, a single audio test takes about 503 milliseconds in total, which can meet most application scenarios.
* compute embeding takes 500 ms
* compute embedding takes 500 ms
* retrieval with cosine takes 2.9 ms
* total takes 503 ms

@ -130,7 +130,7 @@ ffce340b3790 minio/minio:RELEASE.2020-12-03T00-03-10Z "/usr/bin/docker-ent…"
输出:
```bash
Downloading https://paddlespeech.bj.bcebos.com/vector/audio/example_audio.tar.gz ...
Downloading https://paddlespeech.cdn.bcebos.com/vector/audio/example_audio.tar.gz ...
...
Unpacking ./example_audio.tar.gz ...
[2022-03-26 22:50:54,987] [ INFO] - checking the aduio file format......
@ -138,7 +138,7 @@ ffce340b3790 minio/minio:RELEASE.2020-12-03T00-03-10Z "/usr/bin/docker-ent…"
[2022-03-26 22:50:54,987] [ INFO] - The audio file format is right
[2022-03-26 22:50:54,988] [ INFO] - device type: cpu
[2022-03-26 22:50:54,988] [ INFO] - load the pretrained model: ecapatdnn_voxceleb12-16k
[2022-03-26 22:50:54,990] [ INFO] - Downloading sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_0.tar.gz from https://paddlespeech.bj.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_0.tar.gz
[2022-03-26 22:50:54,990] [ INFO] - Downloading sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_0.tar.gz from https://paddlespeech.cdn.bcebos.com/vector/voxceleb/sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_0.tar.gz
...
[2022-03-26 22:51:17,285] [ INFO] - start to dynamic import the model class
[2022-03-26 22:51:17,285] [ INFO] - model name ecapatdnn

@ -1,5 +1,4 @@
diskcache
dtaidistane
fastapi
librosa==0.8.0
numpy==1.22.0

@ -77,13 +77,13 @@ class MilvusHelper:
field1 = FieldSchema(
name="id",
dtype=DataType.INT64,
descrition="int64",
description="int64",
is_primary=True,
auto_id=True)
field2 = FieldSchema(
name="embedding",
dtype=DataType.FLOAT_VECTOR,
descrition="speaker embeddings",
description="speaker embeddings",
dim=VECTOR_DIMENSION,
is_primary=False)
schema = CollectionSchema(

@ -24,7 +24,7 @@ def download_audio_data():
"""
Download audio data
"""
url = "https://paddlespeech.bj.bcebos.com/vector/audio/example_audio.tar.gz"
url = "https://paddlespeech.cdn.bcebos.com/vector/audio/example_audio.tar.gz"
md5sum = "52ac69316c1aa1fdef84da7dd2c67b39"
target_dir = "./"
filepath = download(url, md5sum, target_dir)

@ -24,7 +24,7 @@ def download_audio_data():
"""
Download audio data
"""
url = "https://paddlespeech.bj.bcebos.com/vector/audio/example_audio.tar.gz"
url = "https://paddlespeech.cdn.bcebos.com/vector/audio/example_audio.tar.gz"
md5sum = "52ac69316c1aa1fdef84da7dd2c67b39"
target_dir = "./"
filepath = download(url, md5sum, target_dir)

@ -11,14 +11,14 @@ This demo is an implementation to tag an audio file with 527 [AudioSet](https://
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech.
You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`).
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/dog.wav
```
### 3. Usage

@ -18,7 +18,7 @@
可以下载此 demo 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/dog.wav
```
### 3. 使用方法

@ -1,4 +1,4 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/dog.wav
paddlespeech cls --input ./cat.wav --topk 10

@ -10,12 +10,12 @@ This demo is an implementation to automatic video subtitles from a video file. I
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech.
You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input
Get a video file with the speech of the specific language:
```bash
wget -c https://paddlespeech.bj.bcebos.com/demos/asr_demos/subtitle_demo1.mp4
wget -c https://paddlespeech.cdn.bcebos.com/demos/asr_demos/subtitle_demo1.mp4
```
Extract `.wav` with one channel and 16000 sample rate from the video:

@ -13,7 +13,7 @@
### 2. 准备输入
获取包含特定语言语音的视频文件:
```bash
wget -c https://paddlespeech.bj.bcebos.com/demos/asr_demos/subtitle_demo1.mp4
wget -c https://paddlespeech.cdn.bcebos.com/demos/asr_demos/subtitle_demo1.mp4
```
从视频文件中提取单通道的 16kHz 采样率的 `.wav` 文件:
```bash

@ -1,6 +1,6 @@
#!/bin/bash
video_url=https://paddlespeech.bj.bcebos.com/demos/asr_demos/subtitle_demo1.mp4
video_url=https://paddlespeech.cdn.bcebos.com/demos/asr_demos/subtitle_demo1.mp4
video_file=$(basename ${video_url})
audio_file=$(echo ${video_file} | awk -F'.' '{print $1}').wav
num_channels=1

@ -14,7 +14,7 @@ cmvn=./data/cmvn.ark
#paddle_asr_online/resource.tar.gz
if [ ! -f $cmvn ]; then
wget -c https://paddlespeech.bj.bcebos.com/s2t/paddle_asr_online/resource.tar.gz
wget -c https://paddlespeech.cdn.bcebos.com/s2t/paddle_asr_online/resource.tar.gz
tar xzfv resource.tar.gz
ln -s ./resource/data .
fi

@ -10,14 +10,14 @@ This demo is an implementation to recognize keyword from a specific audio file.
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech.
You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/kws/hey_snips.wav https://paddlespeech.bj.bcebos.com/kws/non-keyword.wav
wget -c https://paddlespeech.cdn.bcebos.com/kws/hey_snips.wav https://paddlespeech.cdn.bcebos.com/kws/non-keyword.wav
```
### 3. Usage

@ -16,7 +16,7 @@
可以下载此 demo 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/kws/hey_snips.wav https://paddlespeech.bj.bcebos.com/kws/non-keyword.wav
wget -c https://paddlespeech.cdn.bcebos.com/kws/hey_snips.wav https://paddlespeech.cdn.bcebos.com/kws/non-keyword.wav
```
### 3. 使用方法
- 命令行 (推荐使用)

@ -1,6 +1,6 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/kws/hey_snips.wav https://paddlespeech.bj.bcebos.com/kws/non-keyword.wav
wget -c https://paddlespeech.cdn.bcebos.com/kws/hey_snips.wav https://paddlespeech.cdn.bcebos.com/kws/non-keyword.wav
# kws
paddlespeech kws --input ./hey_snips.wav

@ -25,12 +25,12 @@ fi
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
# download pretrained tts models and unzip
wget -P download https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip
wget -P download https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip
unzip -d download download/pwg_baker_ckpt_0.4.zip
wget -P download https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip
wget -P download https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip
unzip -d download download/fastspeech2_nosil_baker_ckpt_0.4.zip
# donload sources
wget -P download https://paddlespeech.bj.bcebos.com/demos/metaverse/Lamarr.png
wget -P download https://paddlespeech.cdn.bcebos.com/demos/metaverse/Lamarr.png
fi

@ -9,7 +9,7 @@ This demo is an implementation to restore punctuation from raw text. It can be d
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech.
You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input
The input of this demo should be a text of the specific language that can be passed via argument.

@ -11,15 +11,15 @@ This demo is an implementation to extract speaker embedding from a specific audi
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech.
You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File
The input of this cli demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/123456789.wav
```
### 3. Usage

@ -18,8 +18,8 @@
可以下载此 demo 的示例音频:
```bash
# 该音频的内容是数字串 85236145389
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/123456789.wav
```
### 3. 使用方法
- 命令行 (推荐使用)

@ -1,7 +1,7 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/123456789.wav
# vector
paddlespeech vector --task spk --input ./85236145389.wav

@ -10,14 +10,14 @@ This demo is an implementation to recognize text from a specific audio file. It
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech.
You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/ch_zh_mix.wav
```
### 3. Usage

@ -17,7 +17,7 @@
可以下载此 demo 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/ch_zh_mix.wav
```
### 3. 使用方法
- 命令行 (推荐使用)

@ -1,8 +1,8 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/ch_zh_mix.wav
# asr
paddlespeech asr --input ./zh.wav

@ -15,7 +15,7 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc
It is recommended to use **paddlepaddle 2.4rc** or above.
You can choose one way from easy, meduim and hard to install paddlespeech.
You can choose one way from easy, medium and hard to install paddlespeech.
**If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to the yaml file in the conf directory.**
@ -42,7 +42,7 @@ Currently the engine type supports two forms: python and inference (Paddle Infer
paddlespeech_server start --help
```
Arguments:
- `config_file`: yaml file of the app, defalut: ./conf/application.yaml
- `config_file`: yaml file of the app, default: ./conf/application.yaml
- `log_file`: log file. Default: ./log/paddlespeech.log
Output:
@ -85,9 +85,9 @@ The input of ASR client demo should be a WAV file(`.wav`), and the sample rate
Here are sample files for this ASR client demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/ch_zh_mix.wav
```
**Note:** The response time will be slightly longer when using the client for the first time
@ -204,7 +204,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
Here are sample files for this CLS Client demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav
```
**Note:** The response time will be slightly longer when using the client for the first time
@ -257,8 +257,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
Here are sample files for this Speaker Verification Client demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/123456789.wav
```
#### 7.1 Extract speaker embedding

@ -89,9 +89,9 @@ ASR 客户端的输入是一个 WAV 文件(`.wav`),并且采样率必须
可以下载 ASR 客户端的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/ch_zh_mix.wav
```
**注意:** 初次使用客户端时响应时间会略长
@ -211,7 +211,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
可以下载 CLS 客户端的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav
```
**注意:** 初次使用客户端时响应时间会略长
@ -264,8 +264,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
可以下载声纹客户端的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/123456789.wav
```
#### 7.1 提取声纹特征

@ -1,6 +1,6 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav

@ -1,6 +1,6 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
# If `127.0.0.1` is not accessible, you need to use the actual service IP address.
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --topk 1

@ -1,7 +1,7 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.cdn.bcebos.com/vector/audio/123456789.wav
# sid extract
paddlespeech_client vector --server_ip 127.0.0.1 --port 8090 --task spk --input ./85236145389.wav

@ -10,14 +10,14 @@ This demo is an implementation to recognize text or produce the acoustic represe
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech.
You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
```
### 3. Usage

@ -17,7 +17,7 @@
可以下载此 demo 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
```
### 3. 使用方法
- 命令行 (推荐使用)

@ -1,7 +1,7 @@
#!/bin/bash
# audio download
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
# to recognize text
paddlespeech ssl --task asr --lang en --input ./en.wav

@ -9,7 +9,7 @@ This demo is an implementation to recognize text from a specific audio file and
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech.
You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File
@ -17,7 +17,7 @@ The input of this demo should be a WAV file(`.wav`).
Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
```
### 3. Usage (not support for Windows now)

@ -17,7 +17,7 @@
这里给出一些样例文件供 Demo 使用:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
```
### 3. 使用方法 (暂不支持Windows)

@ -1,4 +1,4 @@
#!/bin/bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav
paddlespeech st --input ./en.wav

@ -100,43 +100,43 @@ cd speech_server
mkdir -p source/model
cd source
# 下载 & 解压 wav 包含VC测试音频
wget https://paddlespeech.bj.bcebos.com/demos/speech_web/wav_vc.zip
wget https://paddlespeech.cdn.bcebos.com/demos/speech_web/wav_vc.zip
unzip wav_vc.zip
cd model
# 下载 GE2E 相关模型
wget https://bj.bcebos.com/paddlespeech/Parakeet/released_models/ge2e/ge2e_ckpt_0.3.zip
unzip ge2e_ckpt_0.3.zip
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip
wget https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip
unzip pwg_aishell3_ckpt_0.5.zip
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
wget https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
unzip fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip
# 下载 ECAPA-TDNN 相关模型
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
wget https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
unzip fastspeech2_aishell3_ckpt_vc2_1.2.0.zip
# 下载 ERNIE-SAT 相关模型
# aishell3 ERNIE-SAT
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_ckpt_1.2.0.zip
wget https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_ckpt_1.2.0.zip
unzip erniesat_aishell3_ckpt_1.2.0.zip
# vctk ERNIE-SAT
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_vctk_ckpt_1.2.0.zip
wget https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_vctk_ckpt_1.2.0.zip
unzip erniesat_vctk_ckpt_1.2.0.zip
# aishell3_vctk ERNIE-SAT
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_vctk_ckpt_1.2.0.zip
wget https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/ernie_sat/erniesat_aishell3_vctk_ckpt_1.2.0.zip
unzip erniesat_aishell3_vctk_ckpt_1.2.0.zip
# 下载 finetune 相关模型
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip
wget https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip
unzip fastspeech2_aishell3_ckpt_1.1.0.zip
# 下载声码器
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
wget https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
unzip hifigan_aishell3_ckpt_0.2.0.zip
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip
wget https://paddlespeech.cdn.bcebos.com/Parakeet/released_models/hifigan/hifigan_vctk_ckpt_0.2.0.zip
unzip hifigan_vctk_ckpt_0.2.0.zip
cd ../../../

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save