PaddleSpeech/deepspeech/frontend/normalizer.py

# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains feature normalizers."""

import numpy as np
import random
from deepspeech.frontend.utility import read_manifest
from deepspeech.frontend.audio import AudioSegment


class FeatureNormalizer(object):
    """Feature normalizer. Normalize features to be of zero mean and unit
    stddev.

    if mean_std_filepath is provided (not None), the normalizer will directly
    initilize from the file. Otherwise, both manifest_path and featurize_func
    should be given for on-the-fly mean and stddev computing.

    :param mean_std_filepath: File containing the pre-computed mean and stddev.
    :type mean_std_filepath: None|str
    :param manifest_path: Manifest of instances for computing mean and stddev.
    :type meanifest_path: None|str
    :param featurize_func: Function to extract features. It should be callable
                           with ``featurize_func(audio_segment)``.
    :type featurize_func: None|callable
    :param num_samples: Number of random samples for computing mean and stddev.
    :type num_samples: int
    :param random_seed: Random seed for sampling instances.
    :type random_seed: int
    :raises ValueError: If both mean_std_filepath and manifest_path
                        (or both mean_std_filepath and featurize_func) are None.
    """

    def __init__(self,
                 mean_std_filepath,
                 manifest_path=None,
                 featurize_func=None,
                 num_samples=500,
                 random_seed=0):
        if not mean_std_filepath:
            if not (manifest_path and featurize_func):
                raise ValueError("If mean_std_filepath is None, meanifest_path "
                                 "and featurize_func should not be None.")
            self._rng = random.Random(random_seed)
            self._compute_mean_std(manifest_path, featurize_func, num_samples)
        else:
            self._read_mean_std_from_file(mean_std_filepath)

    def apply(self, features, eps=1e-14):
        """Normalize features to be of zero mean and unit stddev.

        :param features: Input features to be normalized.
        :type features: ndarray
        :param eps:  added to stddev to provide numerical stablibity.
        :type eps: float
        :return: Normalized features.
        :rtype: ndarray
        """
        return (features - self._mean) / (self._std + eps)

    def write_to_file(self, filepath):
        """Write the mean and stddev to the file.

        :param filepath: File to write mean and stddev.
        :type filepath: str
        """
        np.savez(filepath, mean=self._mean, std=self._std)

    def _read_mean_std_from_file(self, filepath):
        """Load mean and std from file."""
        npzfile = np.load(filepath)
        self._mean = npzfile["mean"]
        self._std = npzfile["std"]

    def _compute_mean_std(self, manifest_path, featurize_func, num_samples):
        """Compute mean and std from randomly sampled instances."""
        manifest = read_manifest(manifest_path)
        sampled_manifest = self._rng.sample(manifest, num_samples)
        features = []
        for instance in sampled_manifest:
            features.append(
                featurize_func(
                    AudioSegment.from_file(instance["audio_filepath"])))
        features = np.hstack(features)
        self._mean = np.mean(features, axis=1).reshape([-1, 1])
        self._std = np.std(features, axis=1).reshape([-1, 1])
add copyright 4 years ago			`# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.`
			`#`
			`# Licensed under the Apache License, Version 2.0 (the "License");`
			`# you may not use this file except in compliance with the License.`
			`# You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing, software`
			`# distributed under the License is distributed on an "AS IS" BASIS,`
			`# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`# See the License for the specific language governing permissions and`
			`# limitations under the License.`
Add function, class and module docs for data parts in DS2. 7 years ago			`"""Contains feature normalizers."""`
Refactor whole data preprocessor for DS2 (re-design classes, re-organize dir, add augmentaion interfaces etc.). 1. Refactor data preprocessor with new added class AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer. 2. Add data augmentation interfaces and class AugmentorBase, AugmentationPipeline, VolumnPerturbAugmentor etc.. 3. Seperate normalizer's mean and std computing from training, by adding FeatureNormalizer and a seperate tool compute_mean_std.py. 4. Re-organize directory. 7 years ago
			`import numpy as np`
			`import random`
Support paddle 2.x (#538) * 2.x model * model test pass * fix data * fix soundfile with flac support * one thread dataloader test pass * export feasture size add trainer and utils add setup model and dataloader update travis using Bionic dist * add venv; test under venv * fix unittest; train and valid * add train and config * add config and train script * fix ctc cuda memcopy error * fix imports * fix train valid log * fix dataset batch shuffle shift start from 1 fix rank_zero_only decreator error close tensorboard when train over add decoding config and code * test process can run * test with decoding * test and infer with decoding * fix infer * fix ctc loss lr schedule sortagrad logger * aishell egs * refactor train add aishell egs * fix dataset batch shuffle and add batch sampler log print model parameter * fix model and ctc * sequence_mask make all inputs zeros, which cause grad be zero, this is a bug of LessThanOp add grad clip by global norm add model train test notebook * ctc loss remove run prefix using ord value as text id * using unk when training compute_loss need text ids ord id using in test mode, which compute wer/cer * fix tester * add lr_deacy refactor code * fix tools * fix ci add tune fix gru model bugs add dataset and model test * fix decoding * refactor repo fix decoding * fix musan and rir dataset * refactor io, loss, conv, rnn, gradclip, model, utils * fix ci and import * refactor model add export jit model * add deploy bin and test it * rm uselss egs * add layer tools * refactor socket server new model from pretrain * remve useless * fix instability loss and grad nan or inf for librispeech training * fix sampler * fix libri train.sh * fix doc * add license on cpp * fix doc * fix libri script * fix install * clip 5 wer 7.39, clip 400 wer 7.54, 1.8 clip 400 baseline 7.49 4 years ago			`from deepspeech.frontend.utility import read_manifest`
			`from deepspeech.frontend.audio import AudioSegment`
Refactor whole data preprocessor for DS2 (re-design classes, re-organize dir, add augmentaion interfaces etc.). 1. Refactor data preprocessor with new added class AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer. 2. Add data augmentation interfaces and class AugmentorBase, AugmentationPipeline, VolumnPerturbAugmentor etc.. 3. Seperate normalizer's mean and std computing from training, by adding FeatureNormalizer and a seperate tool compute_mean_std.py. 4. Re-organize directory. 7 years ago

			`class FeatureNormalizer(object):`
Add function, class and module docs for data parts in DS2. 7 years ago			`"""Feature normalizer. Normalize features to be of zero mean and unit`
			`stddev.`

			`if mean_std_filepath is provided (not None), the normalizer will directly`
			`initilize from the file. Otherwise, both manifest_path and featurize_func`
			`should be given for on-the-fly mean and stddev computing.`
add mfcc feature for DS2 7 years ago
Add function, class and module docs for data parts in DS2. 7 years ago			`:param mean_std_filepath: File containing the pre-computed mean and stddev.`
support py3 4 years ago			`:type mean_std_filepath: None\|str`
Add function, class and module docs for data parts in DS2. 7 years ago			`:param manifest_path: Manifest of instances for computing mean and stddev.`
support py3 4 years ago			`:type meanifest_path: None\|str`
Add function, class and module docs for data parts in DS2. 7 years ago			`:param featurize_func: Function to extract features. It should be callable`
			with ``featurize_func(audio_segment)``.
			`:type featurize_func: None\|callable`
			`:param num_samples: Number of random samples for computing mean and stddev.`
			`:type num_samples: int`
			`:param random_seed: Random seed for sampling instances.`
			`:type random_seed: int`
			`:raises ValueError: If both mean_std_filepath and manifest_path`
			`(or both mean_std_filepath and featurize_func) are None.`
			`"""`

Refactor whole data preprocessor for DS2 (re-design classes, re-organize dir, add augmentaion interfaces etc.). 1. Refactor data preprocessor with new added class AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer. 2. Add data augmentation interfaces and class AugmentorBase, AugmentationPipeline, VolumnPerturbAugmentor etc.. 3. Seperate normalizer's mean and std computing from training, by adding FeatureNormalizer and a seperate tool compute_mean_std.py. 4. Re-organize directory. 7 years ago			`def __init__(self,`
			`mean_std_filepath,`
			`manifest_path=None,`
			`featurize_func=None,`
			`num_samples=500,`
			`random_seed=0):`
			`if not mean_std_filepath:`
			`if not (manifest_path and featurize_func):`
			`raise ValueError("If mean_std_filepath is None, meanifest_path "`
			`"and featurize_func should not be None.")`
			`self._rng = random.Random(random_seed)`
			`self._compute_mean_std(manifest_path, featurize_func, num_samples)`
			`else:`
			`self._read_mean_std_from_file(mean_std_filepath)`

			`def apply(self, features, eps=1e-14):`
Add function, class and module docs for data parts in DS2. 7 years ago			`"""Normalize features to be of zero mean and unit stddev.`

			`:param features: Input features to be normalized.`
			`:type features: ndarray`
			`:param eps: added to stddev to provide numerical stablibity.`
			`:type eps: float`
			`:return: Normalized features.`
			`:rtype: ndarray`
			`"""`
Refactor whole data preprocessor for DS2 (re-design classes, re-organize dir, add augmentaion interfaces etc.). 1. Refactor data preprocessor with new added class AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer. 2. Add data augmentation interfaces and class AugmentorBase, AugmentationPipeline, VolumnPerturbAugmentor etc.. 3. Seperate normalizer's mean and std computing from training, by adding FeatureNormalizer and a seperate tool compute_mean_std.py. 4. Re-organize directory. 7 years ago			`return (features - self._mean) / (self._std + eps)`

			`def write_to_file(self, filepath):`
Add function, class and module docs for data parts in DS2. 7 years ago			`"""Write the mean and stddev to the file.`

			`:param filepath: File to write mean and stddev.`
support py3 4 years ago			`:type filepath: str`
Add function, class and module docs for data parts in DS2. 7 years ago			`"""`
Refactor whole data preprocessor for DS2 (re-design classes, re-organize dir, add augmentaion interfaces etc.). 1. Refactor data preprocessor with new added class AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer. 2. Add data augmentation interfaces and class AugmentorBase, AugmentationPipeline, VolumnPerturbAugmentor etc.. 3. Seperate normalizer's mean and std computing from training, by adding FeatureNormalizer and a seperate tool compute_mean_std.py. 4. Re-organize directory. 7 years ago			`np.savez(filepath, mean=self._mean, std=self._std)`

			`def _read_mean_std_from_file(self, filepath):`
Add function, class and module docs for data parts in DS2. 7 years ago			`"""Load mean and std from file."""`
Refactor whole data preprocessor for DS2 (re-design classes, re-organize dir, add augmentaion interfaces etc.). 1. Refactor data preprocessor with new added class AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer. 2. Add data augmentation interfaces and class AugmentorBase, AugmentationPipeline, VolumnPerturbAugmentor etc.. 3. Seperate normalizer's mean and std computing from training, by adding FeatureNormalizer and a seperate tool compute_mean_std.py. 4. Re-organize directory. 7 years ago			`npzfile = np.load(filepath)`
			`self._mean = npzfile["mean"]`
			`self._std = npzfile["std"]`

			`def _compute_mean_std(self, manifest_path, featurize_func, num_samples):`
Add function, class and module docs for data parts in DS2. 7 years ago			`"""Compute mean and std from randomly sampled instances."""`
Re-organize folder structure and hierarchy for DS2. 7 years ago			`manifest = read_manifest(manifest_path)`
Refactor whole data preprocessor for DS2 (re-design classes, re-organize dir, add augmentaion interfaces etc.). 1. Refactor data preprocessor with new added class AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer. 2. Add data augmentation interfaces and class AugmentorBase, AugmentationPipeline, VolumnPerturbAugmentor etc.. 3. Seperate normalizer's mean and std computing from training, by adding FeatureNormalizer and a seperate tool compute_mean_std.py. 4. Re-organize directory. 7 years ago			`sampled_manifest = self._rng.sample(manifest, num_samples)`
			`features = []`
			`for instance in sampled_manifest:`
			`features.append(`
			`featurize_func(`
			`AudioSegment.from_file(instance["audio_filepath"])))`
			`features = np.hstack(features)`
			`self._mean = np.mean(features, axis=1).reshape([-1, 1])`
			`self._std = np.std(features, axis=1).reshape([-1, 1])`