Merge pull request #1178 from KPatr1ck/doc

[DOC]Add cls docs of training and custom dataset.
3 years ago · f5d038e7ab
parent 2fcaf0b01e 9ff7bcfb47
commit f5d038e7ab
2 changed files with 168 additions and 0 deletions
--- a/docs/source/cls/custom_dataset.md
+++ b/docs/source/cls/custom_dataset.md
@ -0,0 +1,117 @@
+# Customize Dataset for Audio Classification
+
+Following this tutorial you can customize your dataset for audio classification task by using `paddlespeech` and `paddleaudio`.
+
+A base class of classification dataset is `paddleaudio.dataset.AudioClassificationDataset`. To customize your dataset you should write a dataset class derived from `AudioClassificationDataset`. 
+
+Assuming you have some wave files that stored in your own directory. You should prepare a meta file with the information of filepaths and labels. For example the absolute path of it is `/PATH/TO/META_FILE.txt`:
+```
+/PATH/TO/WAVE_FILE/1.wav cat
+/PATH/TO/WAVE_FILE/2.wav cat
+/PATH/TO/WAVE_FILE/3.wav dog
+/PATH/TO/WAVE_FILE/4.wav dog
+```
+Here is an example to build your custom dataset in `custom_dataset.py`:
+
+```python
+from paddleaudio.datasets.dataset import AudioClassificationDataset
+
+class CustomDataset(AudioClassificationDataset):
+    # All *.wav file with same sample rate 16k/24k/32k/44k.
+    sample_rate = 16000
+    meta_file = '/PATH/TO/META_FILE.txt'
+    # List all the class labels
+    label_list = [
+        'cat',
+        'dog',
+    ]
+
+    def __init__(self):
+        files, labels = self._get_data()
+        super(CustomDataset, self).__init__(
+            files=files, labels=labels, feat_type='raw')
+
+    def _get_data(self):
+        '''
+        This method offer information of wave files and labels.
+        '''
+        files = []
+        labels = []
+
+        with open(self.meta_file) as f:
+            for line in f:
+                file, label_str = line.strip().split(' ')
+                files.append(file)
+                labels.append(self.label_list.index(label_str))
+
+        return files, labels
+```
+
+Then you can build dataset and data loader from `CustomDataset`:
+```python
+import paddle
+from paddleaudio.features import LogMelSpectrogram
+
+from custom_dataset import CustomDataset
+
+train_ds = CustomDataset()
+feature_extractor = LogMelSpectrogram(sr=train_ds.sample_rate)
+
+train_sampler = paddle.io.DistributedBatchSampler(
+    train_ds, batch_size=4, shuffle=True, drop_last=False)
+train_loader = paddle.io.DataLoader(
+    train_ds,
+    batch_sampler=train_sampler,
+    return_list=True,
+    use_buffer_reader=True)
+```
+
+Train model with `CustomDataset`:
+```python
+from paddlespeech.cls.models import cnn14
+from paddlespeech.cls.models import SoundClassifier
+
+backbone = cnn14(pretrained=True, extract_embedding=True)
+model = SoundClassifier(backbone, num_class=len(train_ds.label_list))
+optimizer = paddle.optimizer.Adam(
+    learning_rate=1e-6, parameters=model.parameters())
+criterion = paddle.nn.loss.CrossEntropyLoss()
+
+steps_per_epoch = len(train_sampler)
+epochs = 10
+for epoch in range(1, epochs + 1):
+    model.train()
+
+    for batch_idx, batch in enumerate(train_loader):
+        waveforms, labels = batch
+        # Need a padding when lengths of waveforms differ in a batch.
+        feats = feature_extractor(waveforms)        
+        feats = paddle.transpose(feats, [0, 2, 1])
+        logits = model(feats)
+        loss = criterion(logits, labels)
+        loss.backward()
+        optimizer.step()
+        if isinstance(optimizer._learning_rate,
+                        paddle.optimizer.lr.LRScheduler):
+            optimizer._learning_rate.step()
+        optimizer.clear_grad()
+
+        # Calculate loss
+        avg_loss = loss.numpy()[0]
+
+        # Calculate metrics
+        preds = paddle.argmax(logits, axis=1)
+        num_corrects = (preds == labels).numpy().sum()
+        num_samples = feats.shape[0]
+
+        avg_acc = num_corrects / num_samples
+
+        print_msg = 'Epoch={}/{}, Step={}/{}'.format(
+            epoch, epochs, batch_idx + 1, steps_per_epoch)
+        print_msg += ' loss={:.4f}'.format(avg_loss)
+        print_msg += ' acc={:.4f}'.format(avg_acc)
+        print_msg += ' lr={:.6f}'.format(optimizer.get_lr())
+        print(print_msg)
+```
+
+If you want to save the checkpoint of model and evaluate from a specific dataset, please see `paddlespeech/cli/exp/panns/train.py` for more details.
--- a/docs/source/cls/quick_start.md
+++ b/docs/source/cls/quick_start.md
@ -0,0 +1,51 @@
+# Quick Start of Audio Classification
+Several shell scripts provided in `./examples/esc50/cls0` will help us to quickly give it a try, for most major modules, including data preparation, model training, model evaluation, with [ESC50](ttps://github.com/karolpiczak/ESC-50) dataset.
+
+Some of the scripts in `./examples` are not configured with GPUs. If you want to train with 8 GPUs, please modify `CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`. If you don't have any GPU available, please set `CUDA_VISIBLE_DEVICES=` to use CPUs instead.
+
+Let's start a audio classification task with the following steps:
+
+- Go to the directory
+
+    ```bash
+    cd examples/esc50/cls0
+    ```
+
+- Source env
+    ```bash
+    source path.sh
+    ```
+
+- Main entry point
+    ```bash
+    CUDA_VISIBLE_DEVICES=0 ./run.sh 1
+    ```
+
+This demo includes fine-tuning, evaluating and deploying a audio classificatio model. More detailed information is provided in the following sections. 
+
+## Fine-tuning a model
+PANNs([PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://arxiv.org/pdf/1912.10211.pdf)) are pretrained models with [Audioset](https://research.google.com/audioset/). They can be easily used to extract audio embeddings for audio classification task.
+
+To start a model fine-tuning, please run:
+```bash
+ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
+feat_backend=numpy
+./local/train.sh ${ngpu} ${feat_backend}
+```
+
+## Deploy a model
+Once you save a model checkpoint, you can export it to static graph and deploy by python scirpt:
+
+- Export to a static graph
+    ```bash
+    ./local/export.sh ${ckpt_dir} ./export
+    ```
+    The argument `ckpt_dir` should be a directory in which a model checkpoint stored, for example `checkpoint/epoch_50`.
+
+    The static graph will be exported to `./export`.
+
+- Inference
+    ```bash
+    ./local/static_model_infer.sh ${infer_device} ./export ${audio_file}
+    ```
+    The argument `infer_device` can be `cpu` or `gpu`, and it means which device to be used to infer. And `audio_file` should be a wave file with name `*.wav`.