diff --git a/.github/ISSUE_TEMPLATE/bug-report-s2t.md b/.github/ISSUE_TEMPLATE/bug-report-s2t.md index 512cdbb01..e9732ad8c 100644 --- a/.github/ISSUE_TEMPLATE/bug-report-s2t.md +++ b/.github/ISSUE_TEMPLATE/bug-report-s2t.md @@ -33,7 +33,7 @@ If applicable, add screenshots to help explain your problem. - Python Version [e.g. 3.7] - PaddlePaddle Version [e.g. 2.0.0] - Model Version [e.g. 2.0.0] - - GPU/DRIVER Informationo [e.g. Tesla V100-SXM2-32GB/440.64.00] + - GPU/DRIVER Information [e.g. Tesla V100-SXM2-32GB/440.64.00] - CUDA/CUDNN Version [e.g. cuda-10.2] - MKL Version - TensorRT Version diff --git a/.github/ISSUE_TEMPLATE/bug-report-tts.md b/.github/ISSUE_TEMPLATE/bug-report-tts.md index e2322c239..b4c5dabdd 100644 --- a/.github/ISSUE_TEMPLATE/bug-report-tts.md +++ b/.github/ISSUE_TEMPLATE/bug-report-tts.md @@ -32,7 +32,7 @@ If applicable, add screenshots to help explain your problem. - Python Version [e.g. 3.7] - PaddlePaddle Version [e.g. 2.0.0] - Model Version [e.g. 2.0.0] - - GPU/DRIVER Informationo [e.g. Tesla V100-SXM2-32GB/440.64.00] + - GPU/DRIVER Information [e.g. Tesla V100-SXM2-32GB/440.64.00] - CUDA/CUDNN Version [e.g. cuda-10.2] - MKL Version - TensorRT Version diff --git a/README.md b/README.md index 39cb1bc9d..6594a4b8f 100644 --- a/README.md +++ b/README.md @@ -265,6 +265,8 @@ git clone https://github.com/PaddlePaddle/PaddleSpeech.git cd PaddleSpeech pip install pytest-runner pip install . +# If you need to install in editable mode, you need to use --use-pep517. The command is as follows: +# pip install -e . --use-pep517 ``` For more installation problems, such as conda environment, librosa-dependent, gcc problems, kaldi installation, etc., you can refer to this [installation document](./docs/source/install.md). If you encounter problems during installation, you can leave a message on [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) and find related problems diff --git a/README_cn.md b/README_cn.md index a644e4c9f..5b95a2879 100644 --- a/README_cn.md +++ b/README_cn.md @@ -272,6 +272,8 @@ git clone https://github.com/PaddlePaddle/PaddleSpeech.git cd PaddleSpeech pip install pytest-runner pip install . +# 如果需要在可编辑模式下安装,需要使用 --use-pep517,命令如下 +# pip install -e . --use-pep517 ``` 更多关于安装问题,如 conda 环境,librosa 依赖的系统库,gcc 环境问题,kaldi 安装等,可以参考这篇[安装文档](docs/source/install_cn.md),如安装上遇到问题可以在 [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) 上留言以及查找相关问题 diff --git a/audio/paddleaudio/backends/soundfile_backend.py b/audio/paddleaudio/backends/soundfile_backend.py index 9195ea097..dcd2b4b1e 100644 --- a/audio/paddleaudio/backends/soundfile_backend.py +++ b/audio/paddleaudio/backends/soundfile_backend.py @@ -61,7 +61,7 @@ def resample(y: np.ndarray, if mode == 'kaiser_best': warnings.warn( f'Using resampy in kaiser_best to {src_sr}=>{target_sr}. This function is pretty slow, \ - we recommend the mode kaiser_fast in large scale audio trainning') + we recommend the mode kaiser_fast in large scale audio training') if not isinstance(y, np.ndarray): raise ParameterError( diff --git a/audio/paddleaudio/compliance/kaldi.py b/audio/paddleaudio/compliance/kaldi.py index eb92ec1f2..a94ec4053 100644 --- a/audio/paddleaudio/compliance/kaldi.py +++ b/audio/paddleaudio/compliance/kaldi.py @@ -233,7 +233,7 @@ def spectrogram(waveform: Tensor, round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input to FFT. Defaults to True. sr (int, optional): Sample rate of input waveform. Defaults to 16000. - snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it + snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a signal frame when it is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True. subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False. window_type (str, optional): Choose type of window for FFT computation. Defaults to "povey". @@ -443,7 +443,7 @@ def fbank(waveform: Tensor, round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input to FFT. Defaults to True. sr (int, optional): Sample rate of input waveform. Defaults to 16000. - snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it + snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a signal frame when it is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True. subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False. use_energy (bool, optional): Add an dimension with energy of spectrogram to the output. Defaults to False. @@ -566,7 +566,7 @@ def mfcc(waveform: Tensor, round_to_power_of_two (bool, optional): If True, round window size to power of two by zero-padding input to FFT. Defaults to True. sr (int, optional): Sample rate of input waveform. Defaults to 16000. - snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a singal frame when it + snip_edges (bool, optional): Drop samples in the end of waveform that cann't fit a signal frame when it is set True. Otherwise performs reflect padding to the end of waveform. Defaults to True. subtract_mean (bool, optional): Whether to subtract mean of feature files. Defaults to False. use_energy (bool, optional): Add an dimension with energy of spectrogram to the output. Defaults to False. diff --git a/audio/paddleaudio/datasets/dataset.py b/audio/paddleaudio/datasets/dataset.py index f1dfc1ea3..170e91669 100644 --- a/audio/paddleaudio/datasets/dataset.py +++ b/audio/paddleaudio/datasets/dataset.py @@ -47,7 +47,7 @@ class AudioClassificationDataset(paddle.io.Dataset): files (:obj:`List[str]`): A list of absolute path of audio files. labels (:obj:`List[int]`): Labels of audio files. feat_type (:obj:`str`, `optional`, defaults to `raw`): - It identifies the feature type that user wants to extrace of an audio file. + It identifies the feature type that user wants to extract of an audio file. """ super(AudioClassificationDataset, self).__init__() diff --git a/audio/paddleaudio/datasets/esc50.py b/audio/paddleaudio/datasets/esc50.py index e7477d40e..fd8c8503e 100644 --- a/audio/paddleaudio/datasets/esc50.py +++ b/audio/paddleaudio/datasets/esc50.py @@ -117,7 +117,7 @@ class ESC50(AudioClassificationDataset): split (:obj:`int`, `optional`, defaults to 1): It specify the fold of dev dataset. feat_type (:obj:`str`, `optional`, defaults to `raw`): - It identifies the feature type that user wants to extrace of an audio file. + It identifies the feature type that user wants to extract of an audio file. """ files, labels = self._get_data(mode, split) super(ESC50, self).__init__( diff --git a/audio/paddleaudio/datasets/gtzan.py b/audio/paddleaudio/datasets/gtzan.py index cfea6f37e..a76e9208e 100644 --- a/audio/paddleaudio/datasets/gtzan.py +++ b/audio/paddleaudio/datasets/gtzan.py @@ -67,7 +67,7 @@ class GTZAN(AudioClassificationDataset): split (:obj:`int`, `optional`, defaults to 1): It specify the fold of dev dataset. feat_type (:obj:`str`, `optional`, defaults to `raw`): - It identifies the feature type that user wants to extrace of an audio file. + It identifies the feature type that user wants to extract of an audio file. """ assert split <= n_folds, f'The selected split should not be larger than n_fold, but got {split} > {n_folds}' files, labels = self._get_data(mode, seed, n_folds, split) diff --git a/audio/paddleaudio/datasets/tess.py b/audio/paddleaudio/datasets/tess.py index 8faab9c39..e34eaea37 100644 --- a/audio/paddleaudio/datasets/tess.py +++ b/audio/paddleaudio/datasets/tess.py @@ -76,7 +76,7 @@ class TESS(AudioClassificationDataset): split (:obj:`int`, `optional`, defaults to 1): It specify the fold of dev dataset. feat_type (:obj:`str`, `optional`, defaults to `raw`): - It identifies the feature type that user wants to extrace of an audio file. + It identifies the feature type that user wants to extract of an audio file. """ assert split <= n_folds, f'The selected split should not be larger than n_fold, but got {split} > {n_folds}' files, labels = self._get_data(mode, seed, n_folds, split) diff --git a/audio/paddleaudio/datasets/urban_sound.py b/audio/paddleaudio/datasets/urban_sound.py index d97c4d1dc..43d1b36c4 100644 --- a/audio/paddleaudio/datasets/urban_sound.py +++ b/audio/paddleaudio/datasets/urban_sound.py @@ -68,7 +68,7 @@ class UrbanSound8K(AudioClassificationDataset): split (:obj:`int`, `optional`, defaults to 1): It specify the fold of dev dataset. feat_type (:obj:`str`, `optional`, defaults to `raw`): - It identifies the feature type that user wants to extrace of an audio file. + It identifies the feature type that user wants to extract of an audio file. """ def _get_meta_info(self): diff --git a/audio/paddleaudio/datasets/voxceleb.py b/audio/paddleaudio/datasets/voxceleb.py index b7160b24c..1fafb5176 100644 --- a/audio/paddleaudio/datasets/voxceleb.py +++ b/audio/paddleaudio/datasets/voxceleb.py @@ -262,8 +262,8 @@ class VoxCeleb(Dataset): split_chunks: bool=True): print(f'Generating csv: {output_file}') header = ["id", "duration", "wav", "start", "stop", "spk_id"] - # Note: this may occurs c++ execption, but the program will execute fine - # so we can ignore the execption + # Note: this may occurs c++ exception, but the program will execute fine + # so we can ignore the exception with Pool(cpu_count()) as p: infos = list( tqdm( diff --git a/audio/paddleaudio/features/layers.py b/audio/paddleaudio/features/layers.py index 292363e64..801ae34ce 100644 --- a/audio/paddleaudio/features/layers.py +++ b/audio/paddleaudio/features/layers.py @@ -34,7 +34,7 @@ __all__ = [ class Spectrogram(nn.Layer): """Compute spectrogram of given signals, typically audio waveforms. - The spectorgram is defined as the complex norm of the short-time Fourier transformation. + The spectrogram is defined as the complex norm of the short-time Fourier transformation. Args: n_fft (int, optional): The number of frequency components of the discrete Fourier transform. Defaults to 512. diff --git a/audio/paddleaudio/functional/functional.py b/audio/paddleaudio/functional/functional.py index 19c63a9ae..7c20f9013 100644 --- a/audio/paddleaudio/functional/functional.py +++ b/audio/paddleaudio/functional/functional.py @@ -247,7 +247,7 @@ def create_dct(n_mfcc: int, Args: n_mfcc (int): Number of mel frequency cepstral coefficients. n_mels (int): Number of mel filterbanks. - norm (Optional[str], optional): Normalizaiton type. Defaults to 'ortho'. + norm (Optional[str], optional): Normalization type. Defaults to 'ortho'. dtype (str, optional): The data type of the return matrix. Defaults to 'float32'. Returns: diff --git a/audio/paddleaudio/metric/eer.py b/audio/paddleaudio/metric/eer.py index a1166d3f9..a55695ac1 100644 --- a/audio/paddleaudio/metric/eer.py +++ b/audio/paddleaudio/metric/eer.py @@ -22,8 +22,8 @@ def compute_eer(labels: np.ndarray, scores: np.ndarray) -> List[float]: """Compute EER and return score threshold. Args: - labels (np.ndarray): the trial label, shape: [N], one-dimention, N refer to the samples num - scores (np.ndarray): the trial scores, shape: [N], one-dimention, N refer to the samples num + labels (np.ndarray): the trial label, shape: [N], one-dimension, N refer to the samples num + scores (np.ndarray): the trial scores, shape: [N], one-dimension, N refer to the samples num Returns: List[float]: eer and the specific threshold diff --git a/audio/paddleaudio/sox_effects/sox_effects.py b/audio/paddleaudio/sox_effects/sox_effects.py index cb7e1b0b9..aa282b572 100644 --- a/audio/paddleaudio/sox_effects/sox_effects.py +++ b/audio/paddleaudio/sox_effects/sox_effects.py @@ -121,8 +121,8 @@ def apply_effects_tensor( """ tensor_np = tensor.numpy() - ret = paddleaudio._paddleaudio.sox_effects_apply_effects_tensor(tensor_np, sample_rate, - effects, channels_first) + ret = paddleaudio._paddleaudio.sox_effects_apply_effects_tensor( + tensor_np, sample_rate, effects, channels_first) if ret is not None: return (paddle.to_tensor(ret[0]), ret[1]) raise RuntimeError("Failed to apply sox effect") @@ -139,7 +139,7 @@ def apply_effects_file( Note: This function works in the way very similar to ``sox`` command, however there are slight - differences. For example, ``sox`` commnad adds certain effects automatically (such as + differences. For example, ``sox`` command adds certain effects automatically (such as ``rate`` effect after ``speed``, ``pitch`` etc), but this function only applies the given effects. Therefore, to actually apply ``speed`` effect, you also need to give ``rate`` effect with desired sampling rate, because internally, ``speed`` effects only alter sampling @@ -228,14 +228,14 @@ def apply_effects_file( >>> pass """ if hasattr(path, "read"): - ret = paddleaudio._paddleaudio.apply_effects_fileobj(path, effects, normalize, - channels_first, format) + ret = paddleaudio._paddleaudio.apply_effects_fileobj( + path, effects, normalize, channels_first, format) if ret is None: raise RuntimeError("Failed to load audio from {}".format(path)) return (paddle.to_tensor(ret[0]), ret[1]) path = os.fspath(path) - ret = paddleaudio._paddleaudio.sox_effects_apply_effects_file(path, effects, normalize, - channels_first, format) + ret = paddleaudio._paddleaudio.sox_effects_apply_effects_file( + path, effects, normalize, channels_first, format) if ret is not None: return (paddle.to_tensor(ret[0]), ret[1]) raise RuntimeError("Failed to load audio from {}".format(path)) diff --git a/audio/paddleaudio/src/pybind/kaldi/feature_common_inl.h b/audio/paddleaudio/src/pybind/kaldi/feature_common_inl.h index 985d586fe..3c62bb0d4 100644 --- a/audio/paddleaudio/src/pybind/kaldi/feature_common_inl.h +++ b/audio/paddleaudio/src/pybind/kaldi/feature_common_inl.h @@ -26,7 +26,7 @@ template bool StreamingFeatureTpl::ComputeFeature( const std::vector& wav, std::vector* feats) { - // append remaned waves + // append remained waves int wav_len = wav.size(); if (wav_len == 0) return false; int left_len = remained_wav_.size(); @@ -38,7 +38,7 @@ bool StreamingFeatureTpl::ComputeFeature( wav.data(), wav_len * sizeof(float)); - // cache remaned waves + // cache remained waves knf::FrameExtractionOptions frame_opts = computer_.GetFrameOptions(); int num_frames = knf::NumFrames(waves.size(), frame_opts); int frame_shift = frame_opts.WindowShift(); diff --git a/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.cc b/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.cc index 8b8ff18be..6fdf68af2 100644 --- a/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.cc +++ b/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.cc @@ -44,5 +44,5 @@ py::array_t KaldiFeatureWrapper::ComputeFbank( return result.reshape(shape); } -} // namesapce kaldi +} // namespace kaldi } // namespace paddleaudio diff --git a/audio/paddleaudio/src/pybind/sox/effects.cpp b/audio/paddleaudio/src/pybind/sox/effects.cpp index ea77527bb..5b8959f6c 100644 --- a/audio/paddleaudio/src/pybind/sox/effects.cpp +++ b/audio/paddleaudio/src/pybind/sox/effects.cpp @@ -12,9 +12,9 @@ using namespace paddleaudio::sox_utils; namespace paddleaudio::sox_effects { // Streaming decoding over file-like object is tricky because libsox operates on -// FILE pointer. The folloing is what `sox` and `play` commands do +// FILE pointer. The following is what `sox` and `play` commands do // - file input -> FILE pointer -// - URL input -> call wget in suprocess and pipe the data -> FILE pointer +// - URL input -> call wget in subprocess and pipe the data -> FILE pointer // - stdin -> FILE pointer // // We want to, instead, fetch byte strings chunk by chunk, consume them, and @@ -127,12 +127,12 @@ namespace { enum SoxEffectsResourceState { NotInitialized, Initialized, ShutDown }; SoxEffectsResourceState SOX_RESOURCE_STATE = NotInitialized; -std::mutex SOX_RESOUCE_STATE_MUTEX; +std::mutex SOX_RESOURCE_STATE_MUTEX; } // namespace void initialize_sox_effects() { - const std::lock_guard lock(SOX_RESOUCE_STATE_MUTEX); + const std::lock_guard lock(SOX_RESOURCE_STATE_MUTEX); switch (SOX_RESOURCE_STATE) { case NotInitialized: @@ -150,7 +150,7 @@ void initialize_sox_effects() { }; void shutdown_sox_effects() { - const std::lock_guard lock(SOX_RESOUCE_STATE_MUTEX); + const std::lock_guard lock(SOX_RESOURCE_STATE_MUTEX); switch (SOX_RESOURCE_STATE) { case NotInitialized: diff --git a/audio/paddleaudio/src/pybind/sox/effects_chain.cpp b/audio/paddleaudio/src/pybind/sox/effects_chain.cpp index 0204fb309..54f54840f 100644 --- a/audio/paddleaudio/src/pybind/sox/effects_chain.cpp +++ b/audio/paddleaudio/src/pybind/sox/effects_chain.cpp @@ -14,7 +14,7 @@ namespace { /// helper classes for passing the location of input tensor and output buffer /// -/// drain/flow callback functions require plaing C style function signature and +/// drain/flow callback functions require plain C style function signature and /// the way to pass extra data is to attach data to sox_effect_t::priv pointer. /// The following structs will be assigned to sox_effect_t::priv pointer which /// gives sox_effect_t an access to input Tensor and output buffer object. @@ -50,7 +50,7 @@ int tensor_input_drain(sox_effect_t* effp, sox_sample_t* obuf, size_t* osamp) { *osamp -= *osamp % num_channels; // Slice the input Tensor - // refacor this module, chunk + // refactor this module, chunk auto i_frame = index / num_channels; auto num_frames = *osamp / num_channels; diff --git a/audio/paddleaudio/src/pybind/sox/utils.cpp b/audio/paddleaudio/src/pybind/sox/utils.cpp index bc32b7407..acdef8040 100644 --- a/audio/paddleaudio/src/pybind/sox/utils.cpp +++ b/audio/paddleaudio/src/pybind/sox/utils.cpp @@ -162,7 +162,7 @@ py::dtype get_dtype( } default: // default to float32 for the other formats, including - // 32-bit flaoting-point WAV, + // 32-bit floating-point WAV, // MP3, // FLAC, // VORBIS etc... @@ -177,7 +177,7 @@ py::array convert_to_tensor( const py::dtype dtype, const bool normalize, const bool channels_first) { - // todo refector later(SGoat) + // todo refactor later(SGoat) py::array t; uint64_t dummy = 0; SOX_SAMPLE_LOCALS; diff --git a/audio/paddleaudio/src/pybind/sox/utils.h b/audio/paddleaudio/src/pybind/sox/utils.h index 6fce66714..c98e8f9ed 100644 --- a/audio/paddleaudio/src/pybind/sox/utils.h +++ b/audio/paddleaudio/src/pybind/sox/utils.h @@ -76,7 +76,7 @@ py::dtype get_dtype( /// Tensor. /// @param dtype Target dtype. Determines the output dtype and value range in /// conjunction with normalization. -/// @param noramlize Perform normalization. Only effective when dtype is not +/// @param normalize Perform normalization. Only effective when dtype is not /// kFloat32. When effective, the output tensor is kFloat32 type and value range /// is [-1.0, 1.0] /// @param channels_first When True, output Tensor has shape of [num_channels, diff --git a/audio/paddleaudio/third_party/sox/CMakeLists.txt b/audio/paddleaudio/third_party/sox/CMakeLists.txt index 8a5bc55c7..91be289bd 100644 --- a/audio/paddleaudio/third_party/sox/CMakeLists.txt +++ b/audio/paddleaudio/third_party/sox/CMakeLists.txt @@ -8,9 +8,9 @@ set(patch_dir ${CMAKE_CURRENT_SOURCE_DIR}/../patches) set(COMMON_ARGS --quiet --disable-shared --enable-static --prefix=${INSTALL_DIR} --with-pic --disable-dependency-tracking --disable-debug --disable-examples --disable-doc) # To pass custom environment variables to ExternalProject_Add command, -# we need to do `${CMAKE_COMMAND} -E env ${envs} `. +# we need to do `${CMAKE_COMMAND} -E env ${envs} `. # https://stackoverflow.com/a/62437353 -# We constrcut the custom environment variables here +# We construct the custom environment variables here set(envs "PKG_CONFIG_PATH=${INSTALL_DIR}/lib/pkgconfig" "LDFLAGS=-L${INSTALL_DIR}/lib $ENV{LDFLAGS}" diff --git a/audio/paddleaudio/utils/download.py b/audio/paddleaudio/utils/download.py index 07d5eea84..f47345dfc 100644 --- a/audio/paddleaudio/utils/download.py +++ b/audio/paddleaudio/utils/download.py @@ -41,14 +41,14 @@ def download_and_decompress(archives: List[Dict[str, str]], path: str, decompress: bool=True): """ - Download archieves and decompress to specific path. + Download archives and decompress to specific path. """ if not os.path.isdir(path): os.makedirs(path) for archive in archives: assert 'url' in archive and 'md5' in archive, \ - 'Dictionary keys of "url" and "md5" are required in the archive, but got: {list(archieve.keys())}' + 'Dictionary keys of "url" and "md5" are required in the archive, but got: {list(archive.keys())}' download.get_path_from_url( archive['url'], path, archive['md5'], decompress=decompress) diff --git a/audio/paddleaudio/utils/log.py b/audio/paddleaudio/utils/log.py index 5656b286a..ddc8fd669 100644 --- a/audio/paddleaudio/utils/log.py +++ b/audio/paddleaudio/utils/log.py @@ -58,7 +58,7 @@ log_config = { class Logger(object): ''' - Deafult logger in PaddleAudio + Default logger in PaddleAudio Args: name(str) : Logger name, default is 'PaddleAudio' ''' diff --git a/audio/paddleaudio/utils/sox_utils.py b/audio/paddleaudio/utils/sox_utils.py index 305bb68b0..7665238ef 100644 --- a/audio/paddleaudio/utils/sox_utils.py +++ b/audio/paddleaudio/utils/sox_utils.py @@ -55,7 +55,7 @@ def set_use_threads(use_threads: bool): Args: use_threads (bool): When ``True``, enables ``libsox``'s parallel effects channels processing. - To use mutlithread, the underlying ``libsox`` has to be compiled with OpenMP support. + To use multithread, the underlying ``libsox`` has to be compiled with OpenMP support. See Also: http://sox.sourceforge.net/sox.html diff --git a/audio/paddleaudio/utils/tensor_utils.py b/audio/paddleaudio/utils/tensor_utils.py index cfd490b9a..1448d48a3 100644 --- a/audio/paddleaudio/utils/tensor_utils.py +++ b/audio/paddleaudio/utils/tensor_utils.py @@ -11,7 +11,7 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -"""Unility functions for Transformer.""" +"""Utility functions for Transformer.""" from typing import List from typing import Tuple @@ -80,7 +80,7 @@ def pad_sequence(sequences: List[paddle.Tensor], # assuming trailing dimensions and type of all the Tensors # in sequences are same and fetching those from sequences[0] max_size = paddle.shape(sequences[0]) - # (TODO Hui Zhang): slice not supprot `end==start` + # (TODO Hui Zhang): slice not support `end==start` # trailing_dims = max_size[1:] trailing_dims = tuple( max_size[1:].numpy().tolist()) if sequences[0].ndim >= 2 else () @@ -94,7 +94,7 @@ def pad_sequence(sequences: List[paddle.Tensor], length = tensor.shape[0] # use index notation to prevent duplicate references to the tensor if batch_first: - # TODO (Hui Zhang): set_value op not supprot `end==start` + # TODO (Hui Zhang): set_value op not support `end==start` # TODO (Hui Zhang): set_value op not support int16 # TODO (Hui Zhang): set_varbase 2 rank not support [0,0,...] # out_tensor[i, :length, ...] = tensor @@ -103,7 +103,7 @@ def pad_sequence(sequences: List[paddle.Tensor], else: out_tensor[i, length] = tensor else: - # TODO (Hui Zhang): set_value op not supprot `end==start` + # TODO (Hui Zhang): set_value op not support `end==start` # out_tensor[:length, i, ...] = tensor if length != 0: out_tensor[:length, i] = tensor diff --git a/audio/paddleaudio/utils/time.py b/audio/paddleaudio/utils/time.py index 105208f91..4ea413282 100644 --- a/audio/paddleaudio/utils/time.py +++ b/audio/paddleaudio/utils/time.py @@ -21,7 +21,7 @@ __all__ = [ class Timer(object): - '''Calculate runing speed and estimated time of arrival(ETA)''' + '''Calculate running speed and estimated time of arrival(ETA)''' def __init__(self, total_step: int): self.total_step = total_step diff --git a/audio/tests/backends/base.py b/audio/tests/backends/base.py index a67191887..c2d53d209 100644 --- a/audio/tests/backends/base.py +++ b/audio/tests/backends/base.py @@ -30,5 +30,5 @@ class BackendTest(unittest.TestCase): urllib.request.urlretrieve(url, os.path.basename(url)) self.files.append(os.path.basename(url)) - def initParmas(self): + def initParams(self): raise NotImplementedError diff --git a/audio/tests/backends/soundfile/base.py b/audio/tests/backends/soundfile/base.py index a67191887..c2d53d209 100644 --- a/audio/tests/backends/soundfile/base.py +++ b/audio/tests/backends/soundfile/base.py @@ -30,5 +30,5 @@ class BackendTest(unittest.TestCase): urllib.request.urlretrieve(url, os.path.basename(url)) self.files.append(os.path.basename(url)) - def initParmas(self): + def initParams(self): raise NotImplementedError diff --git a/audio/tests/backends/soundfile/save_test.py b/audio/tests/backends/soundfile/save_test.py index 4f3df6e48..4b5facd08 100644 --- a/audio/tests/backends/soundfile/save_test.py +++ b/audio/tests/backends/soundfile/save_test.py @@ -103,7 +103,7 @@ class MockedSaveTest(unittest.TestCase): encoding=encoding, bits_per_sample=bits_per_sample, ) - # on +Py3.8 call_args.kwargs is more descreptive + # on +Py3.8 call_args.kwargs is more descriptive args = mocked_write.call_args[1] assert args["file"] == filepath assert args["samplerate"] == sample_rate @@ -191,7 +191,7 @@ class SaveTestBase(TempDirMixin, unittest.TestCase): def _assert_non_wav(self, fmt, dtype, sample_rate, num_channels): """`soundfile_backend.save` can save non-wav format. - Due to precision missmatch, and the lack of alternative way to decode the + Due to precision mismatch, and the lack of alternative way to decode the resulting files without using soundfile, only meta data are validated. """ num_frames = sample_rate * 3 diff --git a/audio/tests/common_utils/data_utils.py b/audio/tests/common_utils/data_utils.py index b5618618c..16f575701 100644 --- a/audio/tests/common_utils/data_utils.py +++ b/audio/tests/common_utils/data_utils.py @@ -81,7 +81,7 @@ def convert_tensor_encoding( #dtype = getattr(paddle, dtype) #if dtype not in [paddle.float64, paddle.float32, paddle.int32, paddle.int16, paddle.uint8]: #raise NotImplementedError(f"dtype {dtype} is not supported.") -## According to the doc, folking rng on all CUDA devices is slow when there are many CUDA devices, +## According to the doc, forking rng on all CUDA devices is slow when there are many CUDA devices, ## so we only fork on CPU, generate values and move the data to the given device #with paddle.random.fork_rng([]): #paddle.random.manual_seed(seed) diff --git a/audio/tests/common_utils/sox_utils.py b/audio/tests/common_utils/sox_utils.py index 6ceae081e..4c0866ed9 100644 --- a/audio/tests/common_utils/sox_utils.py +++ b/audio/tests/common_utils/sox_utils.py @@ -24,20 +24,21 @@ def get_bit_depth(dtype): def gen_audio_file( - path, - sample_rate, - num_channels, - *, - encoding=None, - bit_depth=None, - compression=None, - attenuation=None, - duration=1, - comment_file=None, -): + path, + sample_rate, + num_channels, + *, + encoding=None, + bit_depth=None, + compression=None, + attenuation=None, + duration=1, + comment_file=None, ): """Generate synthetic audio file with `sox` command.""" if path.endswith(".wav"): - warnings.warn("Use get_wav_data and save_wav to generate wav file for accurate result.") + warnings.warn( + "Use get_wav_data and save_wav to generate wav file for accurate result." + ) command = [ "sox", "-V3", # verbose @@ -81,7 +82,12 @@ def gen_audio_file( subprocess.run(command, check=True) -def convert_audio_file(src_path, dst_path, *, encoding=None, bit_depth=None, compression=None): +def convert_audio_file(src_path, + dst_path, + *, + encoding=None, + bit_depth=None, + compression=None): """Convert audio file with `sox` command.""" command = ["sox", "-V3", "--no-dither", "-R", str(src_path)] if encoding is not None: @@ -95,7 +101,7 @@ def convert_audio_file(src_path, dst_path, *, encoding=None, bit_depth=None, com subprocess.run(command, check=True) -def _flattern(effects): +def _flatten(effects): if not effects: return effects if isinstance(effects[0], str): @@ -103,9 +109,14 @@ def _flattern(effects): return [item for sublist in effects for item in sublist] -def run_sox_effect(input_file, output_file, effect, *, output_sample_rate=None, output_bitdepth=None): +def run_sox_effect(input_file, + output_file, + effect, + *, + output_sample_rate=None, + output_bitdepth=None): """Run sox effects""" - effect = _flattern(effect) + effect = _flatten(effect) command = ["sox", "-V", "--no-dither", input_file] if output_bitdepth: command += ["--bits", str(output_bitdepth)] diff --git a/audio/tests/features/base.py b/audio/tests/features/base.py index 3bb1d1dde..4a44e04bb 100644 --- a/audio/tests/features/base.py +++ b/audio/tests/features/base.py @@ -24,7 +24,7 @@ wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav' class FeatTest(unittest.TestCase): def setUp(self): - self.initParmas() + self.initParams() self.initWavInput() self.setUpDevice() @@ -44,5 +44,5 @@ class FeatTest(unittest.TestCase): if dim == 1: self.waveform = np.expand_dims(self.waveform, 0) - def initParmas(self): + def initParams(self): raise NotImplementedError diff --git a/audio/tests/features/test_istft.py b/audio/tests/features/test_istft.py index ea1ee5cb6..862a1d753 100644 --- a/audio/tests/features/test_istft.py +++ b/audio/tests/features/test_istft.py @@ -23,7 +23,7 @@ from paddlespeech.audio.transform.spectrogram import Stft class TestIstft(FeatTest): - def initParmas(self): + def initParams(self): self.n_fft = 512 self.hop_length = 128 self.window_str = 'hann' diff --git a/audio/tests/features/test_kaldi.py b/audio/tests/features/test_kaldi.py index 2bd5dc734..50e2571ca 100644 --- a/audio/tests/features/test_kaldi.py +++ b/audio/tests/features/test_kaldi.py @@ -18,12 +18,11 @@ import paddle import paddleaudio import torch import torchaudio - from base import FeatTest class TestKaldi(FeatTest): - def initParmas(self): + def initParams(self): self.window_size = 1024 self.dtype = 'float32' diff --git a/audio/tests/features/test_librosa.py b/audio/tests/features/test_librosa.py index 8cda25b19..07b117cb0 100644 --- a/audio/tests/features/test_librosa.py +++ b/audio/tests/features/test_librosa.py @@ -17,13 +17,12 @@ import librosa import numpy as np import paddle import paddleaudio -from paddleaudio.functional.window import get_window - from base import FeatTest +from paddleaudio.functional.window import get_window class TestLibrosa(FeatTest): - def initParmas(self): + def initParams(self): self.n_fft = 512 self.hop_length = 128 self.n_mels = 40 diff --git a/audio/tests/features/test_log_melspectrogram.py b/audio/tests/features/test_log_melspectrogram.py index b2765d3be..6152d6ff2 100644 --- a/audio/tests/features/test_log_melspectrogram.py +++ b/audio/tests/features/test_log_melspectrogram.py @@ -22,7 +22,7 @@ from paddlespeech.audio.transform.spectrogram import LogMelSpectrogram class TestLogMelSpectrogram(FeatTest): - def initParmas(self): + def initParams(self): self.n_fft = 512 self.hop_length = 128 self.n_mels = 40 diff --git a/audio/tests/features/test_spectrogram.py b/audio/tests/features/test_spectrogram.py index 6f4609632..c2dced2e7 100644 --- a/audio/tests/features/test_spectrogram.py +++ b/audio/tests/features/test_spectrogram.py @@ -22,7 +22,7 @@ from paddlespeech.audio.transform.spectrogram import Spectrogram class TestSpectrogram(FeatTest): - def initParmas(self): + def initParams(self): self.n_fft = 512 self.hop_length = 128 diff --git a/audio/tests/features/test_stft.py b/audio/tests/features/test_stft.py index 9511a2926..5bab170be 100644 --- a/audio/tests/features/test_stft.py +++ b/audio/tests/features/test_stft.py @@ -22,7 +22,7 @@ from paddlespeech.audio.transform.spectrogram import Stft class TestStft(FeatTest): - def initParmas(self): + def initParams(self): self.n_fft = 512 self.hop_length = 128 self.window_str = 'hann' @@ -30,7 +30,7 @@ class TestStft(FeatTest): def test_stft(self): ps_stft = Stft(self.n_fft, self.hop_length) ps_res = ps_stft( - self.waveform.T).squeeze(1).T # (n_fft//2 + 1, n_frmaes) + self.waveform.T).squeeze(1).T # (n_fft//2 + 1, n_frames) x = paddle.to_tensor(self.waveform) window = get_window(self.window_str, self.n_fft, dtype=x.dtype) diff --git a/dataset/librispeech/librispeech.py b/dataset/librispeech/librispeech.py index 2f5f9016c..ccf8d4b49 100644 --- a/dataset/librispeech/librispeech.py +++ b/dataset/librispeech/librispeech.py @@ -132,7 +132,7 @@ def create_manifest(data_dir, manifest_path): def prepare_dataset(url, md5sum, target_dir, manifest_path): - """Download, unpack and create summmary manifest file. + """Download, unpack and create summary manifest file. """ if not os.path.exists(os.path.join(target_dir, "LibriSpeech")): # download diff --git a/dataset/ted_en_zh/ted_en_zh.py b/dataset/ted_en_zh/ted_en_zh.py index 2d1fc6710..66810c85e 100644 --- a/dataset/ted_en_zh/ted_en_zh.py +++ b/dataset/ted_en_zh/ted_en_zh.py @@ -13,7 +13,7 @@ # limitations under the License. """Prepare Ted-En-Zh speech translation dataset -Create manifest files from splited datased. +Create manifest files from splited dataset. dev set: tst2010, test set: tst2015 Manifest file is a json-format file with each line containing the meta data (i.e. audio filepath, transcript and audio duration) diff --git a/dataset/thchs30/thchs30.py b/dataset/thchs30/thchs30.py index c5c3eb7a8..fc8338984 100644 --- a/dataset/thchs30/thchs30.py +++ b/dataset/thchs30/thchs30.py @@ -71,7 +71,7 @@ def read_trn(filepath): with open(filepath, 'r') as f: lines = f.read().strip().split('\n') assert len(lines) == 3, lines - # charactor text, remove withespace + # character text, remove whitespace texts.append(''.join(lines[0].split())) texts.extend(lines[1:]) return texts @@ -127,7 +127,7 @@ def create_manifest(data_dir, manifest_path_prefix): 'utt2spk': spk, 'feat': audio_path, 'feat_shape': (duration, ), # second - 'text': word_text, # charactor + 'text': word_text, # character 'syllable': syllable_text, 'phone': phone_text, }, diff --git a/dataset/timit/timit.py b/dataset/timit/timit.py index f3889d176..2943ff548 100644 --- a/dataset/timit/timit.py +++ b/dataset/timit/timit.py @@ -123,7 +123,7 @@ def read_algin(filepath: str) -> str: filepath (str): [description] Returns: - str: token sepearte by + str: token separate by """ aligns = [] # (start, end, token) with open(filepath, 'r') as f: diff --git a/dataset/timit/timit_kaldi_standard_split.py b/dataset/timit/timit_kaldi_standard_split.py index 473fc856f..59ce2e64a 100644 --- a/dataset/timit/timit_kaldi_standard_split.py +++ b/dataset/timit/timit_kaldi_standard_split.py @@ -13,7 +13,7 @@ # limitations under the License. """Prepare TIMIT dataset (Standard split from Kaldi) -Create manifest files from splited datased. +Create manifest files from splited dataset. Manifest file is a json-format file with each line containing the meta data (i.e. audio filepath, transcript and audio duration) of each audio file in the data set. diff --git a/dataset/voxceleb/voxceleb1.py b/dataset/voxceleb/voxceleb1.py index 8d4100678..49a2a6baa 100644 --- a/dataset/voxceleb/voxceleb1.py +++ b/dataset/voxceleb/voxceleb1.py @@ -167,7 +167,7 @@ def prepare_dataset(base_url, data_list, target_dir, manifest_path, # check the target zip file md5sum if not check_md5sum(target_name, target_md5sum): - raise RuntimeError("{} MD5 checkssum failed".format(target_name)) + raise RuntimeError("{} MD5 checksum failed".format(target_name)) else: print("Check {} md5sum successfully".format(target_name)) diff --git a/dataset/voxceleb/voxceleb2.py b/dataset/voxceleb/voxceleb2.py index 6df6d1f38..faa3b99bc 100644 --- a/dataset/voxceleb/voxceleb2.py +++ b/dataset/voxceleb/voxceleb2.py @@ -179,7 +179,7 @@ def download_dataset(base_url, data_list, target_data, target_dir, dataset): # check the target zip file md5sum if not check_md5sum(target_name, target_md5sum): - raise RuntimeError("{} MD5 checkssum failed".format(target_name)) + raise RuntimeError("{} MD5 checksum failed".format(target_name)) else: print("Check {} md5sum successfully".format(target_name)) @@ -187,7 +187,7 @@ def download_dataset(base_url, data_list, target_data, target_dir, dataset): # we need make the test directory unzip(target_name, os.path.join(target_dir, "test")) else: - # upzip dev zip pacakge and will create the dev directory + # unzip dev zip package and will create the dev directory unzip(target_name, target_dir) diff --git a/demos/audio_content_search/README.md b/demos/audio_content_search/README.md index f04ac447e..89b1c0d89 100644 --- a/demos/audio_content_search/README.md +++ b/demos/audio_content_search/README.md @@ -14,7 +14,7 @@ Now, the search word in demo is: ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from meduim and hard to install paddlespeech. +You can choose one way from medium and hard to install paddlespeech. The dependency refers to the requirements.txt, and install the dependency as follows: diff --git a/demos/audio_searching/README.md b/demos/audio_searching/README.md index 0fc901432..528fce9e8 100644 --- a/demos/audio_searching/README.md +++ b/demos/audio_searching/README.md @@ -19,7 +19,7 @@ Note:this demo uses the [CN-Celeb](http://openslr.org/82/) dataset of at least ### 1. Prepare PaddleSpeech Audio vector extraction requires PaddleSpeech training model, so please make sure that PaddleSpeech has been installed before running. Specific installation steps: See [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare MySQL and Milvus services by docker-compose The audio similarity search system requires Milvus, MySQL services. We can start these containers with one click through [docker-compose.yaml](./docker-compose.yaml), so please make sure you have [installed Docker Engine](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) before running. then diff --git a/demos/audio_tagging/README.md b/demos/audio_tagging/README.md index fc4a334ea..b602c6022 100644 --- a/demos/audio_tagging/README.md +++ b/demos/audio_tagging/README.md @@ -11,7 +11,7 @@ This demo is an implementation to tag an audio file with 527 [AudioSet](https:// ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare Input File The input of this demo should be a WAV file(`.wav`). diff --git a/demos/automatic_video_subtitiles/README.md b/demos/automatic_video_subtitiles/README.md index b815425ec..89d8c73c9 100644 --- a/demos/automatic_video_subtitiles/README.md +++ b/demos/automatic_video_subtitiles/README.md @@ -10,7 +10,7 @@ This demo is an implementation to automatic video subtitles from a video file. I ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare Input Get a video file with the speech of the specific language: diff --git a/demos/keyword_spotting/README.md b/demos/keyword_spotting/README.md index 6544cf71e..b55c71124 100644 --- a/demos/keyword_spotting/README.md +++ b/demos/keyword_spotting/README.md @@ -10,7 +10,7 @@ This demo is an implementation to recognize keyword from a specific audio file. ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare Input File The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. diff --git a/demos/punctuation_restoration/README.md b/demos/punctuation_restoration/README.md index 458ab92f9..3544a2060 100644 --- a/demos/punctuation_restoration/README.md +++ b/demos/punctuation_restoration/README.md @@ -9,7 +9,7 @@ This demo is an implementation to restore punctuation from raw text. It can be d ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare Input The input of this demo should be a text of the specific language that can be passed via argument. diff --git a/demos/speaker_verification/README.md b/demos/speaker_verification/README.md index 55f9a7360..37c6bf3b9 100644 --- a/demos/speaker_verification/README.md +++ b/demos/speaker_verification/README.md @@ -11,7 +11,7 @@ This demo is an implementation to extract speaker embedding from a specific audi ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare Input File The input of this cli demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. diff --git a/demos/speech_recognition/README.md b/demos/speech_recognition/README.md index ee2acd6fd..e406590d2 100644 --- a/demos/speech_recognition/README.md +++ b/demos/speech_recognition/README.md @@ -10,7 +10,7 @@ This demo is an implementation to recognize text from a specific audio file. It ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare Input File The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. diff --git a/demos/speech_server/README.md b/demos/speech_server/README.md index 116f1fd7b..08788a89e 100644 --- a/demos/speech_server/README.md +++ b/demos/speech_server/README.md @@ -15,7 +15,7 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc It is recommended to use **paddlepaddle 2.4rc** or above. -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. **If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to the yaml file in the conf directory.** diff --git a/demos/speech_ssl/README.md b/demos/speech_ssl/README.md index ef9b2237d..8677ebc57 100644 --- a/demos/speech_ssl/README.md +++ b/demos/speech_ssl/README.md @@ -10,7 +10,7 @@ This demo is an implementation to recognize text or produce the acoustic represe ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare Input File The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model. diff --git a/demos/speech_translation/README.md b/demos/speech_translation/README.md index 00a9c7932..4866336c0 100644 --- a/demos/speech_translation/README.md +++ b/demos/speech_translation/README.md @@ -9,7 +9,7 @@ This demo is an implementation to recognize text from a specific audio file and ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare Input File diff --git a/demos/streaming_asr_server/README.md b/demos/streaming_asr_server/README.md index 136863b96..423485466 100644 --- a/demos/streaming_asr_server/README.md +++ b/demos/streaming_asr_server/README.md @@ -18,7 +18,7 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc It is recommended to use **paddlepaddle 2.4rc** or above. -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. **If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to diff --git a/demos/streaming_tts_server/README.md b/demos/streaming_tts_server/README.md index ca5d6f1f8..ad87bebdc 100644 --- a/demos/streaming_tts_server/README.md +++ b/demos/streaming_tts_server/README.md @@ -15,7 +15,7 @@ see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/doc It is recommended to use **paddlepaddle 2.4rc** or above. -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. **If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to the yaml file in the conf directory.** diff --git a/demos/text_to_speech/README.md b/demos/text_to_speech/README.md index d7bb8ca1c..b58777def 100644 --- a/demos/text_to_speech/README.md +++ b/demos/text_to_speech/README.md @@ -10,7 +10,7 @@ This demo is an implementation to generate audio from the given text. It can be ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). -You can choose one way from easy, meduim and hard to install paddlespeech. +You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare Input The input of this demo should be a text of the specific language that can be passed via argument. diff --git a/demos/whisper/README.md b/demos/whisper/README.md index 9b12554e6..6e1b8011f 100644 --- a/demos/whisper/README.md +++ b/demos/whisper/README.md @@ -9,7 +9,7 @@ Whisper model trained by OpenAI whisper https://github.com/openai/whisper ### 1. Installation see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md). - You can choose one way from easy, meduim and hard to install paddlespeech. + You can choose one way from easy, medium and hard to install paddlespeech. ### 2. Prepare Input File The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.