You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
PaddleSpeech/demos/audio_tagging
KP 70a8a75476
Add st demo.
3 years ago
..
README.md Add st demo. 3 years ago

README.md

Audio Tagging

Introduction

Audio tagging is the task of labelling an audio clip with one or more labels or tags, includeing music tagging, acoustic scene classification, audio event classification, etc.

This demo is an implementation to tag an audio file with 527 AudioSet labels. It can be done by a single command or a few lines in python using PaddleSpeech.

Usage

1. Installation

pip install paddlespeech

2. Prepare Input File

Input of this demo should be a WAV file(.wav).

Here are sample files for this demo that can be downloaded:

wget https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav

3. Usage

  • Command Line(Recommended)

    paddlespeech cls --input ~/cat.wav --topk 10
    

    Usage:

    paddlespeech cls --help
    

    Arguments:

    • input(required): Audio file to tag.
    • model: Model type of tagging task. Default: panns_cnn14.
    • config: Config of tagging task. Use pretrained model when it is None. Default: None.
    • ckpt_path: Model checkpoint. Use pretrained model when it is None. Default: None.
    • label_file: Label file of tagging task. Use audioset labels when it is None. Default: None.
    • topk: Show topk tagging labels of result. Default: 1.
    • device: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

    Output:

    [2021-12-08 14:49:40,671] [    INFO] [utils.py] [L225] - CLS Result:
    Cat: 0.8991316556930542
    Domestic animals, pets: 0.8806838393211365
    Meow: 0.8784668445587158
    Animal: 0.8776564598083496
    Caterwaul: 0.2232048511505127
    Speech: 0.03101264126598835
    Music: 0.02870696596801281
    Inside, small room: 0.016673989593982697
    Purr: 0.008387474343180656
    Bird: 0.006304860580712557
    
  • Python API

    import paddle
    from paddlespeech.cli import CLSExecutor
    
    cls_executor = CLSExecutor()
    result = cls_executor(
        model='panns_cnn14',
        config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
        label_file=None,
        ckpt_path=None,
        audio_file='./cat.wav',
        topk=10,
        device=paddle.get_device())
    print('CLS Result: \n{}'.format(result))
    

    Output:

    CLS Result:
    Cat: 0.8991316556930542
    Domestic animals, pets: 0.8806838393211365
    Meow: 0.8784668445587158
    Animal: 0.8776564598083496
    Caterwaul: 0.2232048511505127
    Speech: 0.03101264126598835
    Music: 0.02870696596801281
    Inside, small room: 0.016673989593982697
    Purr: 0.008387474343180656
    Bird: 0.006304860580712557
    

4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python api:

Model Sample Rate
panns_cnn6 32000
panns_cnn10 32000
panns_cnn14 32000