PaddleSpeech/demos/audio_tagging/README.md

([简体中文](./README_cn.md)|English)

# Audio Tagging

## Introduction
Audio tagging is the task of labeling an audio clip with one or more labels or tags, including music tagging, acoustic scene classification, audio event classification, etc.

This demo is an implementation to tag an audio file with 527 [AudioSet](https://research.google.com/audioset/) labels. It can be done by a single command or a few lines in python using `PaddleSpeech`. 

## Usage
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

You can choose one way from easy, meduim and hard to install paddlespeech.

### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`).

Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
```

### 3. Usage
- Command Line(Recommended)
  ```bash
  paddlespeech cls --input ./cat.wav --topk 10
  ```
  Usage:
  ```bash
  paddlespeech cls --help
  ```
  Arguments:
  - `input`(required): The audio file to tag.
  - `model`: Model type of tagging task. Default: `panns_cnn14`.
  - `config`: Config of tagging task. Use a pretrained model when it is None. Default: `None`.
  - `ckpt_path`: Model checkpoint. Use a pretrained model when it is None. Default: `None`.
  - `label_file`: Label file of tagging task. Use audio set labels when it is None. Default: `None`.
  - `topk`: Show topk tagging labels of the result. Default: `1`.
  - `device`: Choose the device to execute model inference. Default: default device of paddlepaddle in the current environment.

  Output:
  ```bash
  [2021-12-08 14:49:40,671] [    INFO] [utils.py] [L225] - CLS Result:
  Cat: 0.8991316556930542
  Domestic animals, pets: 0.8806838393211365
  Meow: 0.8784668445587158
  Animal: 0.8776564598083496
  Caterwaul: 0.2232048511505127
  Speech: 0.03101264126598835
  Music: 0.02870696596801281
  Inside, small room: 0.016673989593982697
  Purr: 0.008387474343180656
  Bird: 0.006304860580712557
  ```

- Python API
  ```python
  import paddle
  from paddlespeech.cli.cls import CLSExecutor

  cls_executor = CLSExecutor()
  result = cls_executor(
      model='panns_cnn14',
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
      label_file=None,
      ckpt_path=None,
      audio_file='./cat.wav',
      topk=10,
      device=paddle.get_device())
  print('CLS Result: \n{}'.format(result))
  ```
  Output:
  ```bash
  CLS Result:
  Cat: 0.8991316556930542
  Domestic animals, pets: 0.8806838393211365
  Meow: 0.8784668445587158
  Animal: 0.8776564598083496
  Caterwaul: 0.2232048511505127
  Speech: 0.03101264126598835
  Music: 0.02870696596801281
  Inside, small room: 0.016673989593982697
  Purr: 0.008387474343180656
  Bird: 0.006304860580712557
  ```

### 4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

| Model | Sample Rate
| :--- | :---: 
| panns_cnn6| 32000
| panns_cnn10| 32000
| panns_cnn14| 32000
add readme_cn for audio_tagging automatic_video_subtitiles, punctuation_restoration and speech_recognition, test=doc_fix (#1162) 3 years ago			`([简体中文](./README_cn.md)\|English)`

Update asr and audio tagging demo. 3 years ago			`# Audio Tagging`

			`## Introduction`
update readme, test=doc_fix (#1156) 3 years ago			`Audio tagging is the task of labeling an audio clip with one or more labels or tags, including music tagging, acoustic scene classification, audio event classification, etc.`
Update asr and audio tagging demo. 3 years ago
Add st demo. 3 years ago			This demo is an implementation to tag an audio file with 527 [AudioSet](https://research.google.com/audioset/) labels. It can be done by a single command or a few lines in python using `PaddleSpeech`.
Update asr and audio tagging demo. 3 years ago
			`## Usage`
			`### 1. Installation`
update readme, test=doc_fix 3 years ago			`see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).`
test=doc_fix 3 years ago
			`You can choose one way from easy, meduim and hard to install paddlespeech.`
Update asr and audio tagging demo. 3 years ago
			`### 2. Prepare Input File`
update readme, test=doc_fix (#1156) 3 years ago			The input of this demo should be a WAV file(`.wav`).
Update asr and audio tagging demo. 3 years ago
			`Here are sample files for this demo that can be downloaded:`
Update asr and audio tagging demo. 3 years ago			```bash
Add automatic_video_subtitiles demo. 3 years ago			`wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav`
Update asr and audio tagging demo. 3 years ago			```

			`### 3. Usage`
			`- Command Line(Recommended)`
Update asr and audio tagging demo. 3 years ago			```bash
Update download logic and fix README typos. 3 years ago			`paddlespeech cls --input ./cat.wav --topk 10`
Update asr and audio tagging demo. 3 years ago			```
Update asr and audio tagging demo. 3 years ago			`Usage:`
			```bash
			`paddlespeech cls --help`
			```
			`Arguments:`
update readme, test=doc_fix (#1156) 3 years ago			- `input`(required): The audio file to tag.
Update asr and audio tagging demo. 3 years ago			- `model`: Model type of tagging task. Default: `panns_cnn14`.
update readme, test=doc_fix (#1156) 3 years ago			- `config`: Config of tagging task. Use a pretrained model when it is None. Default: `None`.
			- `ckpt_path`: Model checkpoint. Use a pretrained model when it is None. Default: `None`.
			- `label_file`: Label file of tagging task. Use audio set labels when it is None. Default: `None`.
			- `topk`: Show topk tagging labels of the result. Default: `1`.
			- `device`: Choose the device to execute model inference. Default: default device of paddlepaddle in the current environment.
Update asr and audio tagging demo. 3 years ago
			`Output:`
Update asr and audio tagging demo. 3 years ago			```bash
Update asr and audio tagging demo. 3 years ago			`[2021-12-08 14:49:40,671] [ INFO] [utils.py] [L225] - CLS Result:`
			`Cat: 0.8991316556930542`
			`Domestic animals, pets: 0.8806838393211365`
			`Meow: 0.8784668445587158`
			`Animal: 0.8776564598083496`
			`Caterwaul: 0.2232048511505127`
			`Speech: 0.03101264126598835`
			`Music: 0.02870696596801281`
			`Inside, small room: 0.016673989593982697`
			`Purr: 0.008387474343180656`
			`Bird: 0.006304860580712557`
			```

			`- Python API`
Update asr and audio tagging demo. 3 years ago			```python
Update asr and audio tagging demo. 3 years ago			`import paddle`
Update usage and doc of cli executor. 2 years ago			`from paddlespeech.cli.cls import CLSExecutor`
Update asr and audio tagging demo. 3 years ago
			`cls_executor = CLSExecutor()`
			`result = cls_executor(`
Update asr and audio tagging demo. 3 years ago			`model='panns_cnn14',`
			config=None, # Set `config` and `ckpt_path` to None to use pretrained model.
Update asr and audio tagging demo. 3 years ago			`label_file=None,`
			`ckpt_path=None,`
			`audio_file='./cat.wav',`
			`topk=10,`
Update asr and audio tagging demo. 3 years ago			`device=paddle.get_device())`
Update asr and audio tagging demo. 3 years ago			`print('CLS Result: \n{}'.format(result))`
Update asr and audio tagging demo. 3 years ago			```
			`Output:`
Update asr and audio tagging demo. 3 years ago			```bash
Update asr and audio tagging demo. 3 years ago			`CLS Result:`
			`Cat: 0.8991316556930542`
			`Domestic animals, pets: 0.8806838393211365`
			`Meow: 0.8784668445587158`
			`Animal: 0.8776564598083496`
			`Caterwaul: 0.2232048511505127`
			`Speech: 0.03101264126598835`
			`Music: 0.02870696596801281`
			`Inside, small room: 0.016673989593982697`
			`Purr: 0.008387474343180656`
			`Bird: 0.006304860580712557`
			```

			`### 4.Pretrained Models`

update readme, test=doc_fix (#1156) 3 years ago			`Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:`
Update asr and audio tagging demo. 3 years ago
			`\| Model \| Sample Rate`
			`\| :--- \| :---:`
			`\| panns_cnn6\| 32000`
			`\| panns_cnn10\| 32000`
			`\| panns_cnn14\| 32000`