Merge branch 'PaddlePaddle:develop' into doc

pull/3982/head
zxcd 8 months ago committed by GitHub
commit 142318d1b1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -33,7 +33,7 @@ If applicable, add screenshots to help explain your problem.
- Python Version [e.g. 3.7] - Python Version [e.g. 3.7]
- PaddlePaddle Version [e.g. 2.0.0] - PaddlePaddle Version [e.g. 2.0.0]
- Model Version [e.g. 2.0.0] - Model Version [e.g. 2.0.0]
- GPU/DRIVER Informationo [e.g. Tesla V100-SXM2-32GB/440.64.00] - GPU/DRIVER Information [e.g. Tesla V100-SXM2-32GB/440.64.00]
- CUDA/CUDNN Version [e.g. cuda-10.2] - CUDA/CUDNN Version [e.g. cuda-10.2]
- MKL Version - MKL Version
- TensorRT Version - TensorRT Version

@ -32,7 +32,7 @@ If applicable, add screenshots to help explain your problem.
- Python Version [e.g. 3.7] - Python Version [e.g. 3.7]
- PaddlePaddle Version [e.g. 2.0.0] - PaddlePaddle Version [e.g. 2.0.0]
- Model Version [e.g. 2.0.0] - Model Version [e.g. 2.0.0]
- GPU/DRIVER Informationo [e.g. Tesla V100-SXM2-32GB/440.64.00] - GPU/DRIVER Information [e.g. Tesla V100-SXM2-32GB/440.64.00]
- CUDA/CUDNN Version [e.g. cuda-10.2] - CUDA/CUDNN Version [e.g. cuda-10.2]
- MKL Version - MKL Version
- TensorRT Version - TensorRT Version

@ -265,6 +265,8 @@ git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech cd PaddleSpeech
pip install pytest-runner pip install pytest-runner
pip install . pip install .
# If you need to install in editable mode, you need to use --use-pep517. The command is as follows:
# pip install -e . --use-pep517
``` ```
For more installation problems, such as conda environment, librosa-dependent, gcc problems, kaldi installation, etc., you can refer to this [installation document](./docs/source/install.md). If you encounter problems during installation, you can leave a message on [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) and find related problems For more installation problems, such as conda environment, librosa-dependent, gcc problems, kaldi installation, etc., you can refer to this [installation document](./docs/source/install.md). If you encounter problems during installation, you can leave a message on [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) and find related problems

@ -272,6 +272,8 @@ git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech cd PaddleSpeech
pip install pytest-runner pip install pytest-runner
pip install . pip install .
# 如果需要在可编辑模式下安装,需要使用 --use-pep517命令如下
# pip install -e . --use-pep517
``` ```
更多关于安装问题,如 conda 环境librosa 依赖的系统库gcc 环境问题kaldi 安装等,可以参考这篇[安装文档](docs/source/install_cn.md),如安装上遇到问题可以在 [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) 上留言以及查找相关问题 更多关于安装问题,如 conda 环境librosa 依赖的系统库gcc 环境问题kaldi 安装等,可以参考这篇[安装文档](docs/source/install_cn.md),如安装上遇到问题可以在 [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) 上留言以及查找相关问题

@ -61,7 +61,7 @@ def resample(y: np.ndarray,
if mode == 'kaiser_best': if mode == 'kaiser_best':
warnings.warn( warnings.warn(
f'Using resampy in kaiser_best to {src_sr}=>{target_sr}. This function is pretty slow, \ f'Using resampy in kaiser_best to {src_sr}=>{target_sr}. This function is pretty slow, \
we recommend the mode kaiser_fast in large scale audio trainning') we recommend the mode kaiser_fast in large scale audio training')
if not isinstance(y, np.ndarray): if not isinstance(y, np.ndarray):
raise ParameterError( raise ParameterError(

@ -233,7 +233,7 @@ def spectrogram(waveform: Tensor,
round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input
to FFT. Defaults to True. to FFT. Defaults to True.
sr (int, optional): Sample rate of input waveform. Defaults to 16000. sr (int, optional): Sample rate of input waveform. Defaults to 16000.
snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a signal frame when it
is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True. is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.
subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False. subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False.
window_type (str, optional): Choose type of window for FFT computation. Defaults to "povey". window_type (str, optional): Choose type of window for FFT computation. Defaults to "povey".
@ -443,7 +443,7 @@ def fbank(waveform: Tensor,
round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input
to FFT. Defaults to True. to FFT. Defaults to True.
sr (int, optional): Sample rate of input waveform. Defaults to 16000. sr (int, optional): Sample rate of input waveform. Defaults to 16000.
snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a signal frame when it
is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True. is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.
subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False. subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False.
use_energy (bool, optional): Add an dimension with energy of spectrogram to the output. Defaults to False. use_energy (bool, optional): Add an dimension with energy of spectrogram to the output. Defaults to False.
@ -566,7 +566,7 @@ def mfcc(waveform: Tensor,
round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input
to FFT. Defaults to True. to FFT. Defaults to True.
sr (int, optional): Sample rate of input waveform. Defaults to 16000. sr (int, optional): Sample rate of input waveform. Defaults to 16000.
snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a signal frame when it
is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True. is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True.
subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False. subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False.
use_energy (bool, optional): Add an dimension with energy of spectrogram to the output. Defaults to False. use_energy (bool, optional): Add an dimension with energy of spectrogram to the output. Defaults to False.

@ -47,7 +47,7 @@ class AudioClassificationDataset(paddle.io.Dataset):
files (:obj:`List[str]`): A list of absolute path of audio files. files (:obj:`List[str]`): A list of absolute path of audio files.
labels (:obj:`List[int]`): Labels of audio files. labels (:obj:`List[int]`): Labels of audio files.
feat_type (:obj:`str`, `optional`, defaults to `raw`): feat_type (:obj:`str`, `optional`, defaults to `raw`):
It identifies the feature type that user wants to extrace of an audio file. It identifies the feature type that user wants to extract of an audio file.
""" """
super(AudioClassificationDataset, self).__init__() super(AudioClassificationDataset, self).__init__()

@ -117,7 +117,7 @@ class ESC50(AudioClassificationDataset):
split (:obj:`int`, `optional`, defaults to 1): split (:obj:`int`, `optional`, defaults to 1):
It specify the fold of dev dataset. It specify the fold of dev dataset.
feat_type (:obj:`str`, `optional`, defaults to `raw`): feat_type (:obj:`str`, `optional`, defaults to `raw`):
It identifies the feature type that user wants to extrace of an audio file. It identifies the feature type that user wants to extract of an audio file.
""" """
files, labels = self._get_data(mode, split) files, labels = self._get_data(mode, split)
super(ESC50, self).__init__( super(ESC50, self).__init__(

@ -67,7 +67,7 @@ class GTZAN(AudioClassificationDataset):
split (:obj:`int`, `optional`, defaults to 1): split (:obj:`int`, `optional`, defaults to 1):
It specify the fold of dev dataset. It specify the fold of dev dataset.
feat_type (:obj:`str`, `optional`, defaults to `raw`): feat_type (:obj:`str`, `optional`, defaults to `raw`):
It identifies the feature type that user wants to extrace of an audio file. It identifies the feature type that user wants to extract of an audio file.
""" """
assert split <= n_folds, f'The selected split should not be larger than n_fold, but got {split} > {n_folds}' assert split <= n_folds, f'The selected split should not be larger than n_fold, but got {split} > {n_folds}'
files, labels = self._get_data(mode, seed, n_folds, split) files, labels = self._get_data(mode, seed, n_folds, split)

@ -76,7 +76,7 @@ class TESS(AudioClassificationDataset):
split (:obj:`int`, `optional`, defaults to 1): split (:obj:`int`, `optional`, defaults to 1):
It specify the fold of dev dataset. It specify the fold of dev dataset.
feat_type (:obj:`str`, `optional`, defaults to `raw`): feat_type (:obj:`str`, `optional`, defaults to `raw`):
It identifies the feature type that user wants to extrace of an audio file. It identifies the feature type that user wants to extract of an audio file.
""" """
assert split <= n_folds, f'The selected split should not be larger than n_fold, but got {split} > {n_folds}' assert split <= n_folds, f'The selected split should not be larger than n_fold, but got {split} > {n_folds}'
files, labels = self._get_data(mode, seed, n_folds, split) files, labels = self._get_data(mode, seed, n_folds, split)

@ -68,7 +68,7 @@ class UrbanSound8K(AudioClassificationDataset):
split (:obj:`int`, `optional`, defaults to 1): split (:obj:`int`, `optional`, defaults to 1):
It specify the fold of dev dataset. It specify the fold of dev dataset.
feat_type (:obj:`str`, `optional`, defaults to `raw`): feat_type (:obj:`str`, `optional`, defaults to `raw`):
It identifies the feature type that user wants to extrace of an audio file. It identifies the feature type that user wants to extract of an audio file.
""" """
def _get_meta_info(self): def _get_meta_info(self):

@ -262,8 +262,8 @@ class VoxCeleb(Dataset):
split_chunks: bool=True): split_chunks: bool=True):
print(f'Generating csv: {output_file}') print(f'Generating csv: {output_file}')
header = ["id", "duration", "wav", "start", "stop", "spk_id"] header = ["id", "duration", "wav", "start", "stop", "spk_id"]
# Note: this may occurs c++ execption, but the program will execute fine # Note: this may occurs c++ exception, but the program will execute fine
# so we can ignore the execption # so we can ignore the exception
with Pool(cpu_count()) as p: with Pool(cpu_count()) as p:
infos = list( infos = list(
tqdm( tqdm(

@ -34,7 +34,7 @@ __all__ = [
class Spectrogram(nn.Layer): class Spectrogram(nn.Layer):
"""Compute spectrogram of given signals, typically audio waveforms. """Compute spectrogram of given signals, typically audio waveforms.
The spectorgram is defined as the complex norm of the short-time Fourier transformation. The spectrogram is defined as the complex norm of the short-time Fourier transformation.
Args: Args:
n_fft (int, optional): The number of frequency components of the discrete Fourier transform. Defaults to 512. n_fft (int, optional): The number of frequency components of the discrete Fourier transform. Defaults to 512.

@ -247,7 +247,7 @@ def create_dct(n_mfcc: int,
Args: Args:
n_mfcc (int): Number of mel frequency cepstral coefficients. n_mfcc (int): Number of mel frequency cepstral coefficients.
n_mels (int): Number of mel filterbanks. n_mels (int): Number of mel filterbanks.
norm (Optional[str], optional): Normalizaiton type. Defaults to 'ortho'. norm (Optional[str], optional): Normalization type. Defaults to 'ortho'.
dtype (str, optional): The data type of the return matrix. Defaults to 'float32'. dtype (str, optional): The data type of the return matrix. Defaults to 'float32'.
Returns: Returns:

@ -22,8 +22,8 @@ def compute_eer(labels: np.ndarray, scores: np.ndarray) -> List[float]:
"""Compute EER and return score threshold. """Compute EER and return score threshold.
Args: Args:
labels (np.ndarray): the trial label, shape: [N], one-dimention, N refer to the samples num labels (np.ndarray): the trial label, shape: [N], one-dimension, N refer to the samples num
scores (np.ndarray): the trial scores, shape: [N], one-dimention, N refer to the samples num scores (np.ndarray): the trial scores, shape: [N], one-dimension, N refer to the samples num
Returns: Returns:
List[float]: eer and the specific threshold List[float]: eer and the specific threshold

@ -121,8 +121,8 @@ def apply_effects_tensor(
""" """
tensor_np = tensor.numpy() tensor_np = tensor.numpy()
ret = paddleaudio._paddleaudio.sox_effects_apply_effects_tensor(tensor_np, sample_rate, ret = paddleaudio._paddleaudio.sox_effects_apply_effects_tensor(
effects, channels_first) tensor_np, sample_rate, effects, channels_first)
if ret is not None: if ret is not None:
return (paddle.to_tensor(ret[0]), ret[1]) return (paddle.to_tensor(ret[0]), ret[1])
raise RuntimeError("Failed to apply sox effect") raise RuntimeError("Failed to apply sox effect")
@ -139,7 +139,7 @@ def apply_effects_file(
Note: Note:
This function works in the way very similar to ``sox`` command, however there are slight This function works in the way very similar to ``sox`` command, however there are slight
differences. For example, ``sox`` commnad adds certain effects automatically (such as differences. For example, ``sox`` command adds certain effects automatically (such as
``rate`` effect after ``speed``, ``pitch`` etc), but this function only applies the given ``rate`` effect after ``speed``, ``pitch`` etc), but this function only applies the given
effects. Therefore, to actually apply ``speed`` effect, you also need to give ``rate`` effects. Therefore, to actually apply ``speed`` effect, you also need to give ``rate``
effect with desired sampling rate, because internally, ``speed`` effects only alter sampling effect with desired sampling rate, because internally, ``speed`` effects only alter sampling
@ -228,14 +228,14 @@ def apply_effects_file(
>>> pass >>> pass
""" """
if hasattr(path, "read"): if hasattr(path, "read"):
ret = paddleaudio._paddleaudio.apply_effects_fileobj(path, effects, normalize, ret = paddleaudio._paddleaudio.apply_effects_fileobj(
channels_first, format) path, effects, normalize, channels_first, format)
if ret is None: if ret is None:
raise RuntimeError("Failed to load audio from {}".format(path)) raise RuntimeError("Failed to load audio from {}".format(path))
return (paddle.to_tensor(ret[0]), ret[1]) return (paddle.to_tensor(ret[0]), ret[1])
path = os.fspath(path) path = os.fspath(path)
ret = paddleaudio._paddleaudio.sox_effects_apply_effects_file(path, effects, normalize, ret = paddleaudio._paddleaudio.sox_effects_apply_effects_file(
channels_first, format) path, effects, normalize, channels_first, format)
if ret is not None: if ret is not None:
return (paddle.to_tensor(ret[0]), ret[1]) return (paddle.to_tensor(ret[0]), ret[1])
raise RuntimeError("Failed to load audio from {}".format(path)) raise RuntimeError("Failed to load audio from {}".format(path))

@ -26,7 +26,7 @@ template <class F>
bool StreamingFeatureTpl<F>::ComputeFeature( bool StreamingFeatureTpl<F>::ComputeFeature(
const std::vector<float>& wav, const std::vector<float>& wav,
std::vector<float>* feats) { std::vector<float>* feats) {
// append remaned waves // append remained waves
int wav_len = wav.size(); int wav_len = wav.size();
if (wav_len == 0) return false; if (wav_len == 0) return false;
int left_len = remained_wav_.size(); int left_len = remained_wav_.size();
@ -38,7 +38,7 @@ bool StreamingFeatureTpl<F>::ComputeFeature(
wav.data(), wav.data(),
wav_len * sizeof(float)); wav_len * sizeof(float));
// cache remaned waves // cache remained waves
knf::FrameExtractionOptions frame_opts = computer_.GetFrameOptions(); knf::FrameExtractionOptions frame_opts = computer_.GetFrameOptions();
int num_frames = knf::NumFrames(waves.size(), frame_opts); int num_frames = knf::NumFrames(waves.size(), frame_opts);
int frame_shift = frame_opts.WindowShift(); int frame_shift = frame_opts.WindowShift();

@ -44,5 +44,5 @@ py::array_t<float> KaldiFeatureWrapper::ComputeFbank(
return result.reshape(shape); return result.reshape(shape);
} }
} // namesapce kaldi } // namespace kaldi
} // namespace paddleaudio } // namespace paddleaudio

@ -12,9 +12,9 @@ using namespace paddleaudio::sox_utils;
namespace paddleaudio::sox_effects { namespace paddleaudio::sox_effects {
// Streaming decoding over file-like object is tricky because libsox operates on // Streaming decoding over file-like object is tricky because libsox operates on
// FILE pointer. The folloing is what `sox` and `play` commands do // FILE pointer. The following is what `sox` and `play` commands do
// - file input -> FILE pointer // - file input -> FILE pointer
// - URL input -> call wget in suprocess and pipe the data -> FILE pointer // - URL input -> call wget in subprocess and pipe the data -> FILE pointer
// - stdin -> FILE pointer // - stdin -> FILE pointer
// //
// We want to, instead, fetch byte strings chunk by chunk, consume them, and // We want to, instead, fetch byte strings chunk by chunk, consume them, and
@ -127,12 +127,12 @@ namespace {
enum SoxEffectsResourceState { NotInitialized, Initialized, ShutDown }; enum SoxEffectsResourceState { NotInitialized, Initialized, ShutDown };
SoxEffectsResourceState SOX_RESOURCE_STATE = NotInitialized; SoxEffectsResourceState SOX_RESOURCE_STATE = NotInitialized;
std::mutex SOX_RESOUCE_STATE_MUTEX; std::mutex SOX_RESOURCE_STATE_MUTEX;
} // namespace } // namespace
void initialize_sox_effects() { void initialize_sox_effects() {
const std::lock_guard<std::mutex> lock(SOX_RESOUCE_STATE_MUTEX); const std::lock_guard<std::mutex> lock(SOX_RESOURCE_STATE_MUTEX);
switch (SOX_RESOURCE_STATE) { switch (SOX_RESOURCE_STATE) {
case NotInitialized: case NotInitialized:
@ -150,7 +150,7 @@ void initialize_sox_effects() {
}; };
void shutdown_sox_effects() { void shutdown_sox_effects() {
const std::lock_guard<std::mutex> lock(SOX_RESOUCE_STATE_MUTEX); const std::lock_guard<std::mutex> lock(SOX_RESOURCE_STATE_MUTEX);
switch (SOX_RESOURCE_STATE) { switch (SOX_RESOURCE_STATE) {
case NotInitialized: case NotInitialized:

@ -14,7 +14,7 @@ namespace {
/// helper classes for passing the location of input tensor and output buffer /// helper classes for passing the location of input tensor and output buffer
/// ///
/// drain/flow callback functions require plaing C style function signature and /// drain/flow callback functions require plain C style function signature and
/// the way to pass extra data is to attach data to sox_effect_t::priv pointer. /// the way to pass extra data is to attach data to sox_effect_t::priv pointer.
/// The following structs will be assigned to sox_effect_t::priv pointer which /// The following structs will be assigned to sox_effect_t::priv pointer which
/// gives sox_effect_t an access to input Tensor and output buffer object. /// gives sox_effect_t an access to input Tensor and output buffer object.
@ -50,7 +50,7 @@ int tensor_input_drain(sox_effect_t* effp, sox_sample_t* obuf, size_t* osamp) {
*osamp -= *osamp % num_channels; *osamp -= *osamp % num_channels;
// Slice the input Tensor // Slice the input Tensor
// refacor this module, chunk // refactor this module, chunk
auto i_frame = index / num_channels; auto i_frame = index / num_channels;
auto num_frames = *osamp / num_channels; auto num_frames = *osamp / num_channels;

@ -162,7 +162,7 @@ py::dtype get_dtype(
} }
default: default:
// default to float32 for the other formats, including // default to float32 for the other formats, including
// 32-bit flaoting-point WAV, // 32-bit floating-point WAV,
// MP3, // MP3,
// FLAC, // FLAC,
// VORBIS etc... // VORBIS etc...
@ -177,7 +177,7 @@ py::array convert_to_tensor(
const py::dtype dtype, const py::dtype dtype,
const bool normalize, const bool normalize,
const bool channels_first) { const bool channels_first) {
// todo refector later(SGoat) // todo refactor later(SGoat)
py::array t; py::array t;
uint64_t dummy = 0; uint64_t dummy = 0;
SOX_SAMPLE_LOCALS; SOX_SAMPLE_LOCALS;

@ -76,7 +76,7 @@ py::dtype get_dtype(
/// Tensor. /// Tensor.
/// @param dtype Target dtype. Determines the output dtype and value range in /// @param dtype Target dtype. Determines the output dtype and value range in
/// conjunction with normalization. /// conjunction with normalization.
/// @param noramlize Perform normalization. Only effective when dtype is not /// @param normalize Perform normalization. Only effective when dtype is not
/// kFloat32. When effective, the output tensor is kFloat32 type and value range /// kFloat32. When effective, the output tensor is kFloat32 type and value range
/// is [-1.0, 1.0] /// is [-1.0, 1.0]
/// @param channels_first When True, output Tensor has shape of [num_channels, /// @param channels_first When True, output Tensor has shape of [num_channels,

@ -8,9 +8,9 @@ set(patch_dir ${CMAKE_CURRENT_SOURCE_DIR}/../patches)
set(COMMON_ARGS --quiet --disable-shared --enable-static --prefix=${INSTALL_DIR} --with-pic --disable-dependency-tracking --disable-debug --disable-examples --disable-doc) set(COMMON_ARGS --quiet --disable-shared --enable-static --prefix=${INSTALL_DIR} --with-pic --disable-dependency-tracking --disable-debug --disable-examples --disable-doc)
# To pass custom environment variables to ExternalProject_Add command, # To pass custom environment variables to ExternalProject_Add command,
# we need to do `${CMAKE_COMMAND} -E env ${envs} <COMMANAD>`. # we need to do `${CMAKE_COMMAND} -E env ${envs} <COMMAND>`.
# https://stackoverflow.com/a/62437353 # https://stackoverflow.com/a/62437353
# We constrcut the custom environment variables here # We construct the custom environment variables here
set(envs set(envs
"PKG_CONFIG_PATH=${INSTALL_DIR}/lib/pkgconfig" "PKG_CONFIG_PATH=${INSTALL_DIR}/lib/pkgconfig"
"LDFLAGS=-L${INSTALL_DIR}/lib $ENV{LDFLAGS}" "LDFLAGS=-L${INSTALL_DIR}/lib $ENV{LDFLAGS}"

@ -41,14 +41,14 @@ def download_and_decompress(archives: List[Dict[str, str]],
path: str, path: str,
decompress: bool=True): decompress: bool=True):
""" """
Download archieves and decompress to specific path. Download archives and decompress to specific path.
""" """
if not os.path.isdir(path): if not os.path.isdir(path):
os.makedirs(path) os.makedirs(path)
for archive in archives: for archive in archives:
assert 'url' in archive and 'md5' in archive, \ assert 'url' in archive and 'md5' in archive, \
'Dictionary keys of "url" and "md5" are required in the archive, but got: {list(archieve.keys())}' 'Dictionary keys of "url" and "md5" are required in the archive, but got: {list(archive.keys())}'
download.get_path_from_url( download.get_path_from_url(
archive['url'], path, archive['md5'], decompress=decompress) archive['url'], path, archive['md5'], decompress=decompress)

@ -58,7 +58,7 @@ log_config = {
class Logger(object): class Logger(object):
''' '''
Deafult logger in PaddleAudio Default logger in PaddleAudio
Args: Args:
name(str) : Logger name, default is 'PaddleAudio' name(str) : Logger name, default is 'PaddleAudio'
''' '''

@ -55,7 +55,7 @@ def set_use_threads(use_threads: bool):
Args: Args:
use_threads (bool): When ``True``, enables ``libsox``'s parallel effects channels processing. use_threads (bool): When ``True``, enables ``libsox``'s parallel effects channels processing.
To use mutlithread, the underlying ``libsox`` has to be compiled with OpenMP support. To use multithread, the underlying ``libsox`` has to be compiled with OpenMP support.
See Also: See Also:
http://sox.sourceforge.net/sox.html http://sox.sourceforge.net/sox.html

@ -11,7 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Unility functions for Transformer.""" """Utility functions for Transformer."""
from typing import List from typing import List
from typing import Tuple from typing import Tuple
@ -80,7 +80,7 @@ def pad_sequence(sequences: List[paddle.Tensor],
# assuming trailing dimensions and type of all the Tensors # assuming trailing dimensions and type of all the Tensors
# in sequences are same and fetching those from sequences[0] # in sequences are same and fetching those from sequences[0]
max_size = paddle.shape(sequences[0]) max_size = paddle.shape(sequences[0])
# (TODO Hui Zhang): slice not supprot `end==start` # (TODO Hui Zhang): slice not support `end==start`
# trailing_dims = max_size[1:] # trailing_dims = max_size[1:]
trailing_dims = tuple( trailing_dims = tuple(
max_size[1:].numpy().tolist()) if sequences[0].ndim >= 2 else () max_size[1:].numpy().tolist()) if sequences[0].ndim >= 2 else ()
@ -94,7 +94,7 @@ def pad_sequence(sequences: List[paddle.Tensor],
length = tensor.shape[0] length = tensor.shape[0]
# use index notation to prevent duplicate references to the tensor # use index notation to prevent duplicate references to the tensor
if batch_first: if batch_first:
# TODO (Hui Zhang): set_value op not supprot `end==start` # TODO (Hui Zhang): set_value op not support `end==start`
# TODO (Hui Zhang): set_value op not support int16 # TODO (Hui Zhang): set_value op not support int16
# TODO (Hui Zhang): set_varbase 2 rank not support [0,0,...] # TODO (Hui Zhang): set_varbase 2 rank not support [0,0,...]
# out_tensor[i, :length, ...] = tensor # out_tensor[i, :length, ...] = tensor
@ -103,7 +103,7 @@ def pad_sequence(sequences: List[paddle.Tensor],
else: else:
out_tensor[i, length] = tensor out_tensor[i, length] = tensor
else: else:
# TODO (Hui Zhang): set_value op not supprot `end==start` # TODO (Hui Zhang): set_value op not support `end==start`
# out_tensor[:length, i, ...] = tensor # out_tensor[:length, i, ...] = tensor
if length != 0: if length != 0:
out_tensor[:length, i] = tensor out_tensor[:length, i] = tensor

@ -21,7 +21,7 @@ __all__ = [
class Timer(object): class Timer(object):
'''Calculate runing speed and estimated time of arrival(ETA)''' '''Calculate running speed and estimated time of arrival(ETA)'''
def __init__(self, total_step: int): def __init__(self, total_step: int):
self.total_step = total_step self.total_step = total_step

@ -30,5 +30,5 @@ class BackendTest(unittest.TestCase):
urllib.request.urlretrieve(url, os.path.basename(url)) urllib.request.urlretrieve(url, os.path.basename(url))
self.files.append(os.path.basename(url)) self.files.append(os.path.basename(url))
def initParmas(self): def initParams(self):
raise NotImplementedError raise NotImplementedError

@ -30,5 +30,5 @@ class BackendTest(unittest.TestCase):
urllib.request.urlretrieve(url, os.path.basename(url)) urllib.request.urlretrieve(url, os.path.basename(url))
self.files.append(os.path.basename(url)) self.files.append(os.path.basename(url))
def initParmas(self): def initParams(self):
raise NotImplementedError raise NotImplementedError

@ -103,7 +103,7 @@ class MockedSaveTest(unittest.TestCase):
encoding=encoding, encoding=encoding,
bits_per_sample=bits_per_sample, ) bits_per_sample=bits_per_sample, )
# on +Py3.8 call_args.kwargs is more descreptive # on +Py3.8 call_args.kwargs is more descriptive
args = mocked_write.call_args[1] args = mocked_write.call_args[1]
assert args["file"] == filepath assert args["file"] == filepath
assert args["samplerate"] == sample_rate assert args["samplerate"] == sample_rate
@ -191,7 +191,7 @@ class SaveTestBase(TempDirMixin, unittest.TestCase):
def _assert_non_wav(self, fmt, dtype, sample_rate, num_channels): def _assert_non_wav(self, fmt, dtype, sample_rate, num_channels):
"""`soundfile_backend.save` can save non-wav format. """`soundfile_backend.save` can save non-wav format.
Due to precision missmatch, and the lack of alternative way to decode the Due to precision mismatch, and the lack of alternative way to decode the
resulting files without using soundfile, only meta data are validated. resulting files without using soundfile, only meta data are validated.
""" """
num_frames = sample_rate * 3 num_frames = sample_rate * 3

@ -81,7 +81,7 @@ def convert_tensor_encoding(
#dtype = getattr(paddle, dtype) #dtype = getattr(paddle, dtype)
#if dtype not in [paddle.float64, paddle.float32, paddle.int32, paddle.int16, paddle.uint8]: #if dtype not in [paddle.float64, paddle.float32, paddle.int32, paddle.int16, paddle.uint8]:
#raise NotImplementedError(f"dtype {dtype} is not supported.") #raise NotImplementedError(f"dtype {dtype} is not supported.")
## According to the doc, folking rng on all CUDA devices is slow when there are many CUDA devices, ## According to the doc, forking rng on all CUDA devices is slow when there are many CUDA devices,
## so we only fork on CPU, generate values and move the data to the given device ## so we only fork on CPU, generate values and move the data to the given device
#with paddle.random.fork_rng([]): #with paddle.random.fork_rng([]):
#paddle.random.manual_seed(seed) #paddle.random.manual_seed(seed)

@ -33,11 +33,12 @@ def gen_audio_file(
compression=None, compression=None,
attenuation=None, attenuation=None,
duration=1, duration=1,
comment_file=None, comment_file=None, ):
):
"""Generate synthetic audio file with `sox` command.""" """Generate synthetic audio file with `sox` command."""
if path.endswith(".wav"): if path.endswith(".wav"):
warnings.warn("Use get_wav_data and save_wav to generate wav file for accurate result.") warnings.warn(
"Use get_wav_data and save_wav to generate wav file for accurate result."
)
command = [ command = [
"sox", "sox",
"-V3", # verbose "-V3", # verbose
@ -81,7 +82,12 @@ def gen_audio_file(
subprocess.run(command, check=True) subprocess.run(command, check=True)
def convert_audio_file(src_path, dst_path, *, encoding=None, bit_depth=None, compression=None): def convert_audio_file(src_path,
dst_path,
*,
encoding=None,
bit_depth=None,
compression=None):
"""Convert audio file with `sox` command.""" """Convert audio file with `sox` command."""
command = ["sox", "-V3", "--no-dither", "-R", str(src_path)] command = ["sox", "-V3", "--no-dither", "-R", str(src_path)]
if encoding is not None: if encoding is not None:
@ -95,7 +101,7 @@ def convert_audio_file(src_path, dst_path, *, encoding=None, bit_depth=None, com
subprocess.run(command, check=True) subprocess.run(command, check=True)
def _flattern(effects): def _flatten(effects):
if not effects: if not effects:
return effects return effects
if isinstance(effects[0], str): if isinstance(effects[0], str):
@ -103,9 +109,14 @@ def _flattern(effects):
return [item for sublist in effects for item in sublist] return [item for sublist in effects for item in sublist]
def run_sox_effect(input_file, output_file, effect, *, output_sample_rate=None, output_bitdepth=None): def run_sox_effect(input_file,
output_file,
effect,
*,
output_sample_rate=None,
output_bitdepth=None):
"""Run sox effects""" """Run sox effects"""
effect = _flattern(effect) effect = _flatten(effect)
command = ["sox", "-V", "--no-dither", input_file] command = ["sox", "-V", "--no-dither", input_file]
if output_bitdepth: if output_bitdepth:
command += ["--bits", str(output_bitdepth)] command += ["--bits", str(output_bitdepth)]

@ -24,7 +24,7 @@ wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
class FeatTest(unittest.TestCase): class FeatTest(unittest.TestCase):
def setUp(self): def setUp(self):
self.initParmas() self.initParams()
self.initWavInput() self.initWavInput()
self.setUpDevice() self.setUpDevice()
@ -44,5 +44,5 @@ class FeatTest(unittest.TestCase):
if dim == 1: if dim == 1:
self.waveform = np.expand_dims(self.waveform, 0) self.waveform = np.expand_dims(self.waveform, 0)
def initParmas(self): def initParams(self):
raise NotImplementedError raise NotImplementedError

@ -23,7 +23,7 @@ from paddlespeech.audio.transform.spectrogram import Stft
class TestIstft(FeatTest): class TestIstft(FeatTest):
def initParmas(self): def initParams(self):
self.n_fft = 512 self.n_fft = 512
self.hop_length = 128 self.hop_length = 128
self.window_str = 'hann' self.window_str = 'hann'

@ -18,12 +18,11 @@ import paddle
import paddleaudio import paddleaudio
import torch import torch
import torchaudio import torchaudio
from base import FeatTest from base import FeatTest
class TestKaldi(FeatTest): class TestKaldi(FeatTest):
def initParmas(self): def initParams(self):
self.window_size = 1024 self.window_size = 1024
self.dtype = 'float32' self.dtype = 'float32'

@ -17,13 +17,12 @@ import librosa
import numpy as np import numpy as np
import paddle import paddle
import paddleaudio import paddleaudio
from paddleaudio.functional.window import get_window
from base import FeatTest from base import FeatTest
from paddleaudio.functional.window import get_window
class TestLibrosa(FeatTest): class TestLibrosa(FeatTest):
def initParmas(self): def initParams(self):
self.n_fft = 512 self.n_fft = 512
self.hop_length = 128 self.hop_length = 128
self.n_mels = 40 self.n_mels = 40

@ -22,7 +22,7 @@ from paddlespeech.audio.transform.spectrogram import LogMelSpectrogram
class TestLogMelSpectrogram(FeatTest): class TestLogMelSpectrogram(FeatTest):
def initParmas(self): def initParams(self):
self.n_fft = 512 self.n_fft = 512
self.hop_length = 128 self.hop_length = 128
self.n_mels = 40 self.n_mels = 40

@ -22,7 +22,7 @@ from paddlespeech.audio.transform.spectrogram import Spectrogram
class TestSpectrogram(FeatTest): class TestSpectrogram(FeatTest):
def initParmas(self): def initParams(self):
self.n_fft = 512 self.n_fft = 512
self.hop_length = 128 self.hop_length = 128

@ -22,7 +22,7 @@ from paddlespeech.audio.transform.spectrogram import Stft
class TestStft(FeatTest): class TestStft(FeatTest):
def initParmas(self): def initParams(self):
self.n_fft = 512 self.n_fft = 512
self.hop_length = 128 self.hop_length = 128
self.window_str = 'hann' self.window_str = 'hann'
@ -30,7 +30,7 @@ class TestStft(FeatTest):
def test_stft(self): def test_stft(self):
ps_stft = Stft(self.n_fft, self.hop_length) ps_stft = Stft(self.n_fft, self.hop_length)
ps_res = ps_stft( ps_res = ps_stft(
self.waveform.T).squeeze(1).T # (n_fft//2 + 1, n_frmaes) self.waveform.T).squeeze(1).T # (n_fft//2 + 1, n_frames)
x = paddle.to_tensor(self.waveform) x = paddle.to_tensor(self.waveform)
window = get_window(self.window_str, self.n_fft, dtype=x.dtype) window = get_window(self.window_str, self.n_fft, dtype=x.dtype)

@ -132,7 +132,7 @@ def create_manifest(data_dir, manifest_path):
def prepare_dataset(url, md5sum, target_dir, manifest_path): def prepare_dataset(url, md5sum, target_dir, manifest_path):
"""Download, unpack and create summmary manifest file. """Download, unpack and create summary manifest file.
""" """
if not os.path.exists(os.path.join(target_dir, "LibriSpeech")): if not os.path.exists(os.path.join(target_dir, "LibriSpeech")):
# download # download

@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
"""Prepare Ted-En-Zh speech translation dataset """Prepare Ted-En-Zh speech translation dataset
Create manifest files from splited datased. Create manifest files from splited dataset.
dev set: tst2010, test set: tst2015 dev set: tst2010, test set: tst2015
Manifest file is a json-format file with each line containing the Manifest file is a json-format file with each line containing the
meta data (i.e. audio filepath, transcript and audio duration) meta data (i.e. audio filepath, transcript and audio duration)

@ -71,7 +71,7 @@ def read_trn(filepath):
with open(filepath, 'r') as f: with open(filepath, 'r') as f:
lines = f.read().strip().split('\n') lines = f.read().strip().split('\n')
assert len(lines) == 3, lines assert len(lines) == 3, lines
# charactor text, remove withespace # character text, remove whitespace
texts.append(''.join(lines[0].split())) texts.append(''.join(lines[0].split()))
texts.extend(lines[1:]) texts.extend(lines[1:])
return texts return texts
@ -127,7 +127,7 @@ def create_manifest(data_dir, manifest_path_prefix):
'utt2spk': spk, 'utt2spk': spk,
'feat': audio_path, 'feat': audio_path,
'feat_shape': (duration, ), # second 'feat_shape': (duration, ), # second
'text': word_text, # charactor 'text': word_text, # character
'syllable': syllable_text, 'syllable': syllable_text,
'phone': phone_text, 'phone': phone_text,
}, },

@ -123,7 +123,7 @@ def read_algin(filepath: str) -> str:
filepath (str): [description] filepath (str): [description]
Returns: Returns:
str: token sepearte by <space> str: token separate by <space>
""" """
aligns = [] # (start, end, token) aligns = [] # (start, end, token)
with open(filepath, 'r') as f: with open(filepath, 'r') as f:

@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
"""Prepare TIMIT dataset (Standard split from Kaldi) """Prepare TIMIT dataset (Standard split from Kaldi)
Create manifest files from splited datased. Create manifest files from splited dataset.
Manifest file is a json-format file with each line containing the Manifest file is a json-format file with each line containing the
meta data (i.e. audio filepath, transcript and audio duration) meta data (i.e. audio filepath, transcript and audio duration)
of each audio file in the data set. of each audio file in the data set.

@ -167,7 +167,7 @@ def prepare_dataset(base_url, data_list, target_dir, manifest_path,
# check the target zip file md5sum # check the target zip file md5sum
if not check_md5sum(target_name, target_md5sum): if not check_md5sum(target_name, target_md5sum):
raise RuntimeError("{} MD5 checkssum failed".format(target_name)) raise RuntimeError("{} MD5 checksum failed".format(target_name))
else: else:
print("Check {} md5sum successfully".format(target_name)) print("Check {} md5sum successfully".format(target_name))

@ -179,7 +179,7 @@ def download_dataset(base_url, data_list, target_data, target_dir, dataset):
# check the target zip file md5sum # check the target zip file md5sum
if not check_md5sum(target_name, target_md5sum): if not check_md5sum(target_name, target_md5sum):
raise RuntimeError("{} MD5 checkssum failed".format(target_name)) raise RuntimeError("{} MD5 checksum failed".format(target_name))
else: else:
print("Check {} md5sum successfully".format(target_name)) print("Check {} md5sum successfully".format(target_name))
@ -187,7 +187,7 @@ def download_dataset(base_url, data_list, target_data, target_dir, dataset):
# we need make the test directory # we need make the test directory
unzip(target_name, os.path.join(target_dir, "test")) unzip(target_name, os.path.join(target_dir, "test"))
else: else:
# upzip dev zip pacakge and will create the dev directory # unzip dev zip package and will create the dev directory
unzip(target_name, target_dir) unzip(target_name, target_dir)

@ -14,7 +14,7 @@ Now, the search word in demo is:
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from meduim and hard to install paddlespeech. You can choose one way from medium and hard to install paddlespeech.
The dependency refers to the requirements.txt, and install the dependency as follows: The dependency refers to the requirements.txt, and install the dependency as follows:

@ -19,7 +19,7 @@ Notethis demo uses the [CN-Celeb](http://openslr.org/82/) dataset of at least
### 1. Prepare PaddleSpeech ### 1. Prepare PaddleSpeech
Audio vector extraction requires PaddleSpeech training model, so please make sure that PaddleSpeech has been installed before running. Specific installation steps: See [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). Audio vector extraction requires PaddleSpeech training model, so please make sure that PaddleSpeech has been installed before running. Specific installation steps: See [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare MySQL and Milvus services by docker-compose ### 2. Prepare MySQL and Milvus services by docker-compose
The audio similarity search system requires Milvus, MySQL services. We can start these containers with one click through [docker-compose.yaml](./docker-compose.yaml), so please make sure you have [installed Docker Engine](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) before running. then The audio similarity search system requires Milvus, MySQL services. We can start these containers with one click through [docker-compose.yaml](./docker-compose.yaml), so please make sure you have [installed Docker Engine](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) before running. then

@ -11,7 +11,7 @@ This demo is an implementation to tag an audio file with 527 [AudioSet](https://
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File ### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`). The input of this demo should be a WAV file(`.wav`).

@ -10,7 +10,7 @@ This demo is an implementation to automatic video subtitles from a video file. I
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input ### 2. Prepare Input
Get a video file with the speech of the specific language: Get a video file with the speech of the specific language:

@ -10,7 +10,7 @@ This demo is an implementation to recognize keyword from a specific audio file.
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File ### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.

@ -9,7 +9,7 @@ This demo is an implementation to restore punctuation from raw text. It can be d
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input ### 2. Prepare Input
The input of this demo should be a text of the specific language that can be passed via argument. The input of this demo should be a text of the specific language that can be passed via argument.

@ -11,7 +11,7 @@ This demo is an implementation to extract speaker embedding from a specific audi
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File ### 2. Prepare Input File
The input of this cli demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. The input of this cli demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.

@ -10,7 +10,7 @@ This demo is an implementation to recognize text from a specific audio file. It
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File ### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.

@ -15,7 +15,7 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc
It is recommended to use **paddlepaddle 2.4rc** or above. It is recommended to use **paddlepaddle 2.4rc** or above.
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
**If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to the yaml file in the conf directory.** **If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to the yaml file in the conf directory.**

@ -10,7 +10,7 @@ This demo is an implementation to recognize text or produce the acoustic represe
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File ### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.

@ -9,7 +9,7 @@ This demo is an implementation to recognize text from a specific audio file and
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File ### 2. Prepare Input File

@ -18,7 +18,7 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc
It is recommended to use **paddlepaddle 2.4rc** or above. It is recommended to use **paddlepaddle 2.4rc** or above.
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
**If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to **If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to

@ -15,7 +15,7 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc
It is recommended to use **paddlepaddle 2.4rc** or above. It is recommended to use **paddlepaddle 2.4rc** or above.
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
**If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to the yaml file in the conf directory.** **If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to the yaml file in the conf directory.**

@ -10,7 +10,7 @@ This demo is an implementation to generate audio from the given text. It can be
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input ### 2. Prepare Input
The input of this demo should be a text of the specific language that can be passed via argument. The input of this demo should be a text of the specific language that can be passed via argument.

@ -9,7 +9,7 @@ Whisper model trained by OpenAI whisper https://github.com/openai/whisper
### 1. Installation ### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
You can choose one way from easy, meduim and hard to install paddlespeech. You can choose one way from easy, medium and hard to install paddlespeech.
### 2. Prepare Input File ### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.

Loading…
Cancel
Save