Add cls docs of training and custom dataset. test=doc_fix

4 years ago · 9ff7bcfb47
parent 2fcaf0b01e
commit 9ff7bcfb47
2 changed files with 168 additions and 0 deletions
--- a/docs/source/cls/custom_dataset.md
+++ b/docs/source/cls/custom_dataset.md
@ -0,0 +1,117 @@
 # Customize Dataset for Audio Classification
 Following this tutorial you can customize your dataset for audio classification task by using `paddlespeech` and `paddleaudio`.
 A base class of classification dataset is `paddleaudio.dataset.AudioClassificationDataset`. To customize your dataset you should write a dataset class derived from `AudioClassificationDataset`. 
 Assuming you have some wave files that stored in your own directory. You should prepare a meta file with the information of filepaths and labels. For example the absolute path of it is `/PATH/TO/META_FILE.txt`:
 ```
 /PATH/TO/WAVE_FILE/1.wav cat
 /PATH/TO/WAVE_FILE/2.wav cat
 /PATH/TO/WAVE_FILE/3.wav dog
 /PATH/TO/WAVE_FILE/4.wav dog
 ```
 Here is an example to build your custom dataset in `custom_dataset.py`:
 ```python
 from paddleaudio.datasets.dataset import AudioClassificationDataset
 class CustomDataset(AudioClassificationDataset):
    # All *.wav file with same sample rate 16k/24k/32k/44k.
    sample_rate = 16000
    meta_file = '/PATH/TO/META_FILE.txt'
    # List all the class labels
    label_list = [
        'cat',
        'dog',
    ]
    def __init__(self):
        files, labels = self._get_data()
        super(CustomDataset, self).__init__(
            files=files, labels=labels, feat_type='raw')
    def _get_data(self):
        '''
        This method offer information of wave files and labels.
        '''
        files = []
        labels = []
        with open(self.meta_file) as f:
            for line in f:
                file, label_str = line.strip().split(' ')
                files.append(file)
                labels.append(self.label_list.index(label_str))
        return files, labels
 ```
 Then you can build dataset and data loader from `CustomDataset`:
 ```python
 import paddle
 from paddleaudio.features import LogMelSpectrogram
 from custom_dataset import CustomDataset
 train_ds = CustomDataset()
 feature_extractor = LogMelSpectrogram(sr=train_ds.sample_rate)
 train_sampler = paddle.io.DistributedBatchSampler(
    train_ds, batch_size=4, shuffle=True, drop_last=False)
 train_loader = paddle.io.DataLoader(
    train_ds,
    batch_sampler=train_sampler,
    return_list=True,
    use_buffer_reader=True)
 ```
 Train model with `CustomDataset`:
 ```python
 from paddlespeech.cls.models import cnn14
 from paddlespeech.cls.models import SoundClassifier
 backbone = cnn14(pretrained=True, extract_embedding=True)
 model = SoundClassifier(backbone, num_class=len(train_ds.label_list))
 optimizer = paddle.optimizer.Adam(
    learning_rate=1e-6, parameters=model.parameters())
 criterion = paddle.nn.loss.CrossEntropyLoss()
 steps_per_epoch = len(train_sampler)
 epochs = 10
 for epoch in range(1, epochs + 1):
    model.train()
    for batch_idx, batch in enumerate(train_loader):
        waveforms, labels = batch
        # Need a padding when lengths of waveforms differ in a batch.
        feats = feature_extractor(waveforms)        
        feats = paddle.transpose(feats, [0, 2, 1])
        logits = model(feats)
        loss = criterion(logits, labels)
        loss.backward()
        optimizer.step()
        if isinstance(optimizer._learning_rate,
                        paddle.optimizer.lr.LRScheduler):
            optimizer._learning_rate.step()
        optimizer.clear_grad()
        # Calculate loss
        avg_loss = loss.numpy()[0]
        # Calculate metrics
        preds = paddle.argmax(logits, axis=1)
        num_corrects = (preds == labels).numpy().sum()
        num_samples = feats.shape[0]
        avg_acc = num_corrects / num_samples
        print_msg = 'Epoch={}/{}, Step={}/{}'.format(
            epoch, epochs, batch_idx + 1, steps_per_epoch)
        print_msg += ' loss={:.4f}'.format(avg_loss)
        print_msg += ' acc={:.4f}'.format(avg_acc)
        print_msg += ' lr={:.6f}'.format(optimizer.get_lr())
        print(print_msg)
 ```
 If you want to save the checkpoint of model and evaluate from a specific dataset, please see `paddlespeech/cli/exp/panns/train.py` for more details.
--- a/docs/source/cls/quick_start.md
+++ b/docs/source/cls/quick_start.md
@ -0,0 +1,51 @@
 # Quick Start of Audio Classification
 Several shell scripts provided in `./examples/esc50/cls0` will help us to quickly give it a try, for most major modules, including data preparation, model training, model evaluation, with [ESC50](ttps://github.com/karolpiczak/ESC-50) dataset.
 Some of the scripts in `./examples` are not configured with GPUs. If you want to train with 8 GPUs, please modify `CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`. If you don't have any GPU available, please set `CUDA_VISIBLE_DEVICES=` to use CPUs instead.
 Let's start a audio classification task with the following steps:
 - Go to the directory
    ```bash
    cd examples/esc50/cls0
    ```
 - Source env
    ```bash
    source path.sh
    ```
 - Main entry point
    ```bash
    CUDA_VISIBLE_DEVICES=0 ./run.sh 1
    ```
 This demo includes fine-tuning, evaluating and deploying a audio classificatio model. More detailed information is provided in the following sections. 
 ## Fine-tuning a model
 PANNs([PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf)) are pretrained models with [Audioset](https://research.google.com/audioset/). They can be easily used to extract audio embeddings for audio classification task.
 To start a model fine-tuning, please run:
 ```bash
 ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
 feat_backend=numpy
 ./local/train.sh ${ngpu} ${feat_backend}
 ```
 ## Deploy a model
 Once you save a model checkpoint, you can export it to static graph and deploy by python scirpt:
 - Export to a static graph
    ```bash
    ./local/export.sh ${ckpt_dir} ./export
    ```
    The argument `ckpt_dir` should be a directory in which a model checkpoint stored, for example `checkpoint/epoch_50`.
    The static graph will be exported to `./export`.
 - Inference
    ```bash
    ./local/static_model_infer.sh ${infer_device} ./export ${audio_file}
    ```
    The argument `infer_device` can be `cpu` or `gpu`, and it means which device to be used to infer. And `audio_file` should be a wave file with name `*.wav`.