@ -12,7 +12,7 @@ All the scripts you need are in the `run.sh`. There are several stages in the `r
| 3 | Test the final model performance |
| 3 | Test the final model performance |
| 4 | Get CTC alignment of test data using the final model |
| 4 | Get CTC alignment of test data using the final model |
| 5 | Infer a single audio file |
| 5 | Infer a single audio file |
| 51 | Export the final model to a static graph format for deployment |
| 51 | Export the final model as a JIT (Just-In-Time) compiled static graph |
You can choose to run a range of stages by setting the `stage` and `stop_stage` parameters.
You can choose to run a range of stages by setting the `stage` and `stop_stage` parameters.
@ -25,110 +25,148 @@ For example, if you only want to run `stage 0`, you can use the script below:
```bash
```bash
bash run.sh --stage 0 --stop_stage 0
bash run.sh --stage 0 --stop_stage 0
```
```
The script `run.sh` utilizes configuration files, GPU resources, and local scripts to perform the tasks outlined in each stage. Specifically:
- Configuration files are loaded from `conf/transformer.yaml` and `conf/tuning/decode.yaml`.
- GPU devices are specified via the `gpus` variable (e.g., `gpus=0,1,2,3`).
- Local scripts (e.g., `data.sh`, `train.sh`, `avg.sh`, `test.sh`, `align.sh`, `test_wav.sh`, and `export.sh`) handle the respective tasks for data preparation, model training, averaging, testing, alignment, single audio file inference, and model export.
The document below will describe the scripts in the `run.sh` in detail.
The document below will describe the scripts in the `run.sh` in detail.
## The Environment Variables
## The Environment Variables
The `path.sh` contains the essential environment variables required for the scripts to run correctly.
The `path.sh` contains the essential environment variables required for the system to function correctly.
```bash
```bash
. ./path.sh || exit 1;
. ./path.sh
. ./cmd.sh || exit 1;
. ./cmd.sh
```
```
These scripts need to be sourced first to ensure that all necessary environment variables are set.
These scripts need to be sourced first to ensure all necessary paths and variables are set up properly.
Additionally, another script is also required:
Additionally, another script is also required:
```bash
```bash
. ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
source ${MAIN_ROOT}/utils/parse_options.sh
```
```
This script supports the use of `--variable value` options in the shell scripts, allowing for flexible configuration without directly modifying the scripts.
This script enhances the shell scripts by enabling the use of `--variable value` options.
The environment variables set in `path.sh` and `cmd.sh` include paths to directories, executable files, and other configurations necessary for the system to function properly. For example, `MAIN_ROOT` is typically set in `path.sh` to point to the main directory of the project.
The environment variables set in `path.sh` and `cmd.sh`typically include paths to directories, executable files, and other configuration settings. These variables are crucial for scripts like data preparation, model training, evaluation, and export.
The script also uses several other environment variables that are set either directly in the script or through command-line arguments parsed by `parse_options.sh`. These variables include:
Here are some key environment variables that might be set in `path.sh` and used throughout the scripts:
- `gpus`: A comma-separated list of GPU IDs to use for training and inference.
- `MAIN_ROOT`: The root directory of the project.
- `stage` and `stop_stage`: These variables control which stages of the process to run. For instance, setting `stage=1` and `stop_stage=3` will run only stages 1, 2, and 3.
- `gpus`: A comma-separated list of GPU IDs to use for training and evaluation.
- `stage` and `stop_stage`: Control the stages of the process to run, enabling partial execution for debugging or testing purposes.
- `conf_path`: The path to the configuration file for the model.
- `conf_path`: The path to the configuration file for the model.
- `ips`: A placeholder for a comma-separated list of IP addresses (typically used for distributed training).
- `decode_conf_path`: The path to the decoding configuration file.
- `decode_conf_path`: The path to the decoding configuration file.
- `avg_num`: The number of best models to average for improving model performance.
- `avg_num`: The number of best models to average for better performance.
- `audio_file`: The path to an audio file for single-file testing.
- `audio_file`: A path to an audio file for testing the model with a single input.
These variables are used throughout the script to control the behavior of different stages, such as data preparation, model training, model averaging, testing, alignment, and exporting the model. For example:
Here is an example of how these variables are used in the script:
```bash
```bash
#!/bin/bash
set -e
. ./path.sh || exit 1;
. ./cmd.sh || exit 1;
gpus=0,1,2,3
stage=0
stop_stage=50
conf_path=conf/transformer.yaml
ips= #xx.xx.xx.xx,xx.xx.xx.xx (fill with actual IP addresses)
By setting these environment variables appropriately, users can customize the behavior of the script to suit their specific needs and resources.
Ensure that `path.sh` and `cmd.sh` are sourced correctly at the beginning of your scripts to avoid errors related to missing environment variables.
## The Local Variables
## The Local Variables
Some local variables are set in the `run.sh` script and can be configured to customize the execution of the script. Here is a detailed explanation of each variable:
Some local variables are set in the `run.sh` script and within the bash script itself for configuring the experiment. Here's a detailed breakdown of each variable:
`gpus` denotes the GPU number or numbers you want to use for training or inference. If you set `gpus=` (an empty value), it means you only use the CPU. Multiple GPUs can be specified using a comma-separated list, e.g., `0,1,2,3`.
`gpus` denotes the GPU number(s) you want to use. If you set `gpus=` (an empty value), it means you only use the CPU. Multiple GPUs can be specified with a comma-separated list, e.g., `0,1,2,3`.
`stage` denotes the number of the stage you want to start from in the experiments. This allows you to skip certain stages, such as data preparation, if they have already been completed.
`stage` denotes the number of the stage you want to start from in the experiments. This allows you to skip certain stages if they have already been completed.
`stop_stage` denotes the number of the stage you want to end at in the experiments. This is useful when you only want to run a subset of the stages.
`stop_stage` denotes the number of the stage you want to end at in the experiments. This is useful for running only a subset of the stages.
`conf_path` denotes the path to the configuration file of the model. This file contains all the necessary parameters and settings for the model.
`conf_path` denotes the config path of the model. This should point to a YAML file containing the model configuration.
`avg_num` denotes the number K of top-K models you want to average to get the final model. Averaging multiple models can improve the robustness and performance of the final model.
`avg_num` denotes the number K of top-K models you want to average to get the final model. Averaging multiple models can improve performance and robustness.
`decode_conf_path` denotes the path to the configuration file used for decoding. This file contains settings related to decoding, such as beam size and language model weight.
`decode_conf_path` denotes the path to the decoding configuration file, which is used during the testing phase.
`audio_file` denotes the file path of the single audio file you want to infer in stage 5. This is useful for testing the model on a specific audio file.
`ips` (not typically set via `run.sh`) allows you to specify IP addresses for distributed training. This variable is commented out by default in the script.
`ckpt` denotes the checkpoint prefix of the model. This is the name of the directory under which the model checkpoints are saved, e.g., "conformer". Note that you cannot set this variable via the command line; it is derived from the `conf_path`.
`audio_file` denotes the file path of the single audio file you want to infer in stage 5. This is useful for evaluating the model on a specific audio sample.
`ips` (optional) can be used to specify the IP addresses of multiple machines for distributed training. This is not typically used in single-machine setups.
`ckpt` denotes the checkpoint prefix of the model, e.g., "conformer". This prefix is used to identify the model checkpoints during training and evaluation.
You can set the local variables (except `ckpt`) when you use the `run.sh` script via command-line options. For example, you can set the `gpus` and `avg_num` when you use the following command:
You can set the local variables (except `ckpt`, which is derived from `conf_path`) when you use the `run.sh` script. The script uses `parse_options.sh` to handle command-line arguments and update the variables accordingly.
For example, you can set the `gpus` and `avg_num` when you use the command line:
```bash
```bash
bash run.sh --gpus 0,1 --avg_num 20
bash run.sh --gpus 0,1 --avg_num 20
```
```
The script uses the `parse_options.sh` utility to parse these command-line options and set the corresponding variables. The script then proceeds to execute the stages specified by `stage` and `stop_stage`, using the configured variables throughout the process.
The bash script also includes logic to handle different stages of the experiment, such as preparing data, training the model, averaging the best models, testing the averaged model, aligning test data, testing a single `.wav` file, and exporting the model for inference. Each stage is conditionally executed based on the `stage` and `stop_stage` variables.
## Stage 0: Data Processing
## Stage 0: Data Processing and Model Training
To use this example, you need to process the data first. Stage 0 in the `run.sh` script is dedicated to this task. The relevant code snippet is shown below:
If you want to process the data and train the model, you can use stages 0 and 1 in the `run.sh`. The code is shown below:
```bash
```bash
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
# prepare data
# prepare data
bash ./local/data.sh || exit -1
bash ./local/data.sh || exit -1
fi
fi
```
Stage 0 is specifically for processing the data. This stage prepares all necessary datasets and metadata required for subsequent training and evaluation phases.
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
If you only want to process the data without proceeding to other stages, you can run the following command:
To process the data and train the model, you can use the script below to execute stages 0 and 1:
```bash
```bash
bash run.sh --stage 0 --stop_stage 0
bash run.sh --stage 0 --stop_stage 1
```
```
Alternatively, you can manually execute the data processing script in your command line. Ensure you have sourced the necessary environment setup scripts first:
Alternatively, you can run these scripts in the command line step-by-step. First, source the configuration scripts and process the data:
```bash
```bash
source path.sh
. ./path.sh
. ./cmd.sh
bash ./local/data.sh
bash ./local/data.sh
```
```
After successfully processing the data, the `data` directory will be populated with the following structure:
After processing the data, the `data` directory will look like this:
```bash
```bash
data/
data/
@ -148,76 +186,61 @@ data/
`-- train.meta
`-- train.meta
```
```
This directory structure contains manifests, metadata files, vocabulary files, and mean-std normalization parameters necessary for the training and evaluation of your model. Each file and directory serves a specific purpose and is essential for the pipeline to function correctly.
To train the model, set the `CUDA_VISIBLE_DEVICES` environment variable to specify the GPUs you want to use (e.g., `0,1,2,3` for multiple GPUs or `0` for a single GPU), and run the training script:
## Stage 1: Model Training
If you want to train the model, you can use stage 1 in the `run.sh` script. The relevant code segment is shown below:
```bash
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
If you want to use GPUs (assuming you have multiple GPUs configured in the `gpus` variable, e.g., `gpus=0,1,2,3`), you can specify the GPU devices as follows:
```bash
```bash
. ./path.sh
CUDA_VISIBLE_DEVICES=0 ./local/train.sh ${conf_path} ${ckpt} ${ips} # For a single GPU
. ./cmd.sh
# or
bash ./local/data.sh
CUDA_VISIBLE_DEVICES= ./local/train.sh ${conf_path} ${ckpt} ${ips} # For CPU (though it's much slower)
Remember to replace `<ips>` with the actual IP addresses of the machines if you are running the training in a distributed setting. If you are running on a single machine, you can leave the `ips` variable empty in the `run.sh` script.
Replace `${conf_path}`, `${ckpt}`, and `${ips}` with your actual configuration file path, checkpoint name, and IP addresses for distributed training, respectively.
## Stage 1: Top-k Models Averaging
By running the above commands, the script will prepare the data (if `stage 0` is included), and then proceed to train the model (stage 1). The trained checkpoints will be saved under the `exp` directory.
## Stage 2: Top-k Models Averaging
After training the model, we need to get the final model for testing and inference. In every epoch, the model checkpoint is saved, so we can choose the best model from them based on the validation loss or we can sort them and average the parameters of the top-k models to get the final model. Averaging the top-k models often leads to a more robust and generalizable final model.
After training the model in Stage 1, we proceed to average the parameters of the top-k best models to improve robustness and generalization. This technique, often referred to as model averaging or ensemble averaging, combines multiple models to reduce variance and improve overall performance.
We can use stage 2 to perform this model averaging, and the relevant code snippet is shown below:
In our pipeline, each epoch saves a model checkpoint. We can sort these checkpoints based on validation loss and select the top-k models for averaging. This process is automated in Stage 1, and the relevant code is shown below:
```bash
```bash
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
# avg n best model
# avg n best model
avg.sh best exp/${ckpt}/checkpoints ${avg_num}
avg.sh best exp/${ckpt}/checkpoints ${avg_num}
fi
fi
```
```
Here, `avg.sh` is a script located in the `../../../utils/` directory, which is defined in the `path.sh` script. This script is responsible for averaging the parameters of the top-k models. The `${ckpt}` variable represents the basename of the configuration file (excluding the extension), and `${avg_num}` specifies the number of top models to average.
Here, the `avg.sh`script is used to average the top-k models. This script is located in the `../../../utils/` directory, which is sourced in the `path.sh` file. The `avg_num` variable specifies the number of top models to average.
To execute this stage along with the previous stages (stage 0 for data preparation and stage 1 for model training), you can use the following command:
To execute Stage 1 and proceed with model averaging, you can use the following command:
```bash
```bash
bash run.sh --stage 0 --stop_stage 2
bash run.sh --stage 1 --stop_stage 2
```
```
Alternatively, you can run the individual scripts in the command line (CPU-only) using the following sequence:
Alternatively, you can run the scripts in the command line manually (using only CPU if desired):
In this example, `conf/transformer.yaml` is the configuration file for the transformer model, `transformer` is the basename used for the checkpoints, and `30` is the number of top models to average.
This will train the model, save the checkpoints, and then average the top-k models specified by `avg_num`. The averaged model will be used in subsequent stages for testing and inference.
## Stage 2: Model Testing
After averaging the top-k models, we proceed to the testing stage to evaluate the performance of the final averaged model. The testing stage ensures that the model works well on unseen data and provides insights into its accuracy and robustness. The code for the testing stage is shown below:
Make sure to adjust the `conf_path`, `ckpt`, `gpus`, and `avg_num` variables in the main script according to your specific setup and requirements. The `avg_ckpt` variable will be used in subsequent stages for testing and inference with the averaged model.
## Stage 3: Model Testing
The test stage is designed to evaluate the model performance. This stage uses the averaged checkpoint (avg_n) obtained from the previous stage to assess the model's accuracy and reliability. The code snippet responsible for this stage is provided below:
```bash
```bash
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
# test ckpt avg_n
# test ckpt avg_n
@ -225,120 +248,139 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
fi
fi
```
```
Here's a breakdown of the script:
In this script, the `test.sh` script is responsible for evaluating the model. It reads the configuration file specified by `${conf_path}`, decoding configuration file `${decode_conf_path}`, and the checkpoint directory of the averaged model `exp/${ckpt}/checkpoints/${avg_ckpt}`. The script runs the evaluation on the GPU specified by `CUDA_VISIBLE_DEVICES=0`.
- `CUDA_VISIBLE_DEVICES=0` specifies that only the first GPU (with index 0) should be used for testing.
- `./local/test.sh` is the script that performs the testing.
If you want to train a model, average the top-k models, and test the final averaged model, you can use the script below to execute stage 0, stage 1, stage 2, and stage 3:
- `${conf_path}` is the path to the configuration file that defines the model architecture and other training parameters.
- `${decode_conf_path}` is the path to the decoding configuration file that includes parameters like beam size and language model weight.
- `exp/${ckpt}/checkpoints/${avg_ckpt}` specifies the location of the averaged checkpoint to be tested.
If you want to train a model from scratch and test it up to stage 3, you can use the following script:
```bash
```bash
bash run.sh --stage 0 --stop_stage 3
bash run.sh --stage 0 --stop_stage 3
```
```
Alternatively, you can manually run the relevant scripts in the command line. Below is an example using only the CPU (by omitting the `CUDA_VISIBLE_DEVICES` setting):
Alternatively, you can run these scripts step-by-step in the command line (only using CPU if needed):
```bash
```bash
. ./path.sh
source path.sh
. ./cmd.sh
bash ./local/data.sh
bash ./local/data.sh
# Assuming you have set the `gpus`, `conf_path`, `decode_conf_path`, `avg_num`, and other variables correctly
Remember to customize the paths and parameters according to your specific setup and model configuration.
This will prepare the data, train the model, average the top-k models, and finally test the averaged model using the `test.sh` script. The output of the test stage will provide you with detailed performance metrics, such as word error rate (WER) or character error rate (CER), depending on your evaluation setup.
## Pretrained Model
## Pretrained Model
You can get the pretrained transformer or conformer models from [this](../../../docs/source/released_model.md) link.
You can get the pretrained transformer or conformer models from [this](../../../docs/source/released_model.md) page.
Once you have downloaded the model, you can use the `tar` command to unpack it. After unpacking, you can leverage a series of bash scripts to train, average the best models, test the model, align the test data, test a single `.wav` file, and even export the model for inference.
After downloading the pretrained model, you can use the `tar` command to unpack it. Once unpacked, you can leverage a series of scripts to handle various tasks such as training, testing, aligning, and exporting the model.
Here's a step-by-step guide to utilizing these scripts:
Here's a detailed guide on how to use these scripts:
Replace `${ckpt_name}` with an appropriate name for your checkpoint and `${ips}` with the IP addresses of the nodes (if you're using multiple nodes for distributed training).
### Averaging Best Models
4. **Average the Best Models**:
After training, you can average the best models to improve performance:
After training, you can average the best `n` models to improve performance.
```bash
```bash
avg_num=30
stage=2
stage=2
stop_stage=2
stop_stage=2
avg_num=30
. ${MAIN_ROOT}/utils/parse_options.sh
avg.sh best exp/${ckpt_name}/checkpoints ${avg_num}
avg.sh best exp/${ckpt}/checkpoints ${avg_num}
```
```
### Testing the Model
5. **Test the Averaged Model**:
You can test the averaged model using the testing script:
Use the test script to evaluate the averaged model.
The performance of the released models are shown in [this](./RESULTS.md) document.
The performance of the released models are shown in [this](./RESULTS.md).
## Stage 4: CTC Alignment
## Stage 3: CTC Alignment
This stage is dedicated to obtaining the alignment between the audio and the text using Connectionist Temporal Classification (CTC). CTC is a powerful technique that aligns the input sequence to the target sequence without needing an alignment pre-specified.
The CTC Alignment stage aims to generate the alignment between the audio and the text. This step is crucial for understanding how the model maps audio signals to text characters. The code for this stage is provided below:
```bash
```bash
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
@ -347,95 +389,95 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
fi
fi
```
```
To perform CTC alignment, you need to have a trained model and its average checkpoint. The script above will use the specified configuration file (`conf_path`), decoding configuration (`decode_conf_path`), and the average checkpoint (`avg_ckpt`) located in the `exp/${ckpt}/checkpoints/` directory.
To perform CTC alignment, you need to have already trained and averaged your model. The script above uses the averaged checkpoint (`avg_ckpt`) to generate the alignment.
If you have already trained a model and wish to obtain the alignment for your test data, you can use the following commands. These commands assume you have already set up your environment and paths correctly:
If you want to train the model, test it, and perform the alignment in sequence, you can use the following script to execute stages 0 through 4:
```bash
```bash
# Assuming you have already trained the model and obtained the average checkpoint
bash run.sh --stage 0 --stop_stage 4
bash run.sh --stage 0 --stop_stage 4
```
```
Alternatively, if you only need to perform the alignment without retraining the model, you can specify the stages directly:
Alternatively, if you have already trained and averaged your model but skipped the test stage, you can directly proceed to the alignment stage using:
```bash
bash run.sh --stage 4 --stop_stage 4
```
Or, you can manually run the necessary scripts in the command line. Below is an example using only the CPU:
```bash
```bash
# Load necessary environment variables
. ./path.sh
. ./path.sh
. ./cmd.sh
. ./cmd.sh
bash ./local/data.sh
# Assuming you have already trained and averaged your model
# CUDA_VISIBLE_DEVICES= ./local/train.sh and avg.sh commands would be run previously
# Assuming `conf_path` and `decode_conf_path` are already set correctly
Make sure to replace `conf/transformer.yaml` and other paths with the actual configuration and checkpoint paths you are using. The `decode_conf_path` is also crucial as it contains parameters related to decoding, which might affect the alignment result.
This stage involves exporting the trained model to a JIT (Just-In-Time) format, which is optimized for inference. The JIT format allows for faster and more efficient model loading and execution.
```bash
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
In the script, `CUDA_VISIBLE_DEVICES=0` specifies that the alignment process should run on the first GPU. If you do not have a GPU or want to use a different one, you can adjust this setting accordingly.
If you have successfully trained and averaged your model in the previous stages, you can now export it to the JIT format by running the script above. The script takes the configuration file path, the path to the averaged checkpoint, and the desired output path for the JIT model as arguments.
Note that the `align.sh` script will generate alignment information for your test data, which can be useful for various applications such as visualization, error analysis, and more.
Here is an example command to export a model:
If you encounter any issues during this stage, ensure that all dependencies are correctly installed, and that the paths to the configuration files and checkpoints are correct. Additionally, check the logs for any error messages that might provide insight into the problem.
In this example, `conf/transformer.yaml` is the configuration file used for training, `exp/transformer/checkpoints/avg_30` is the path to the averaged checkpoint, and `exp/transformer/checkpoints/avg_30.jit` is the desired output path for the JIT model.
Make sure to adjust the paths and filenames according to your specific setup. Once the export process is complete, you will have a JIT model ready for inference.
## Stage 5: Single Audio File Inference
## Stage 5: Single Audio File Inference
In some situations, you may want to use the trained model to perform inference on a single audio file. This can be accomplished using Stage 5. The relevant code snippet is shown below:
In some situations, you want to use the trained model to perform inference on a single audio file. You can use stage 5 for this purpose. The code segment related to this stage is shown below:
```bash
```bash
if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
Before running this script, you have the option to train the model yourself using the following command:
To use this script, you have the option to train the model yourself using the following command:
```bash
```bash
bash run.sh --stage 0 --stop_stage 3
bash run.sh --stage 0 --stop_stage 3
```
```
Alternatively, you can download a pretrained model using the script below. Please note that the configuration file and model checkpoint may vary depending on the specific experiment you are running. For this example, let's assume you are using a Conformer model trained on Librispeech:
Alternatively, you can download a pretrained model using the script below. Make sure to adjust the URL to match the specific model you want to use:
Please ensure that your audio file, whether it's the sample provided or your own, has a sample rate of 16K. To run the inference on the sample audio file, you can use the following command:
Please ensure that your audio file, whether the demo or your own, has a sample rate of 16K. To run the inference on the demo audio file, use the following command:
- `exp/conformer/checkpoints/avg_20` is the path to the averaged checkpoint of the trained model.
- `exp/conformer/checkpoints/avg_20` is the path to the averaged checkpoint of the trained model.
- `data/demo_002_en.wav` is the path to the audio file you want to perform inference on.
- `data/demo_002_en.wav` is the path to the audio file you want to test.
Make sure to adjust the paths and filenames according to your specific setup. If the inference runs successfully, you should see the transcription result of the audio file printed in the console or saved to a file (depending on how the `test_wav.sh` script is implemented).
## Stage 5: Single Audio File Testing
The single audio file testing stage is designed to evaluate the model's performance on a specific audio file. The code for this stage is shown below:
```bash
if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
To perform the single audio file testing, you can use the script below to execute stages up to and including stage 5:
```bash
bash run.sh --stage 0 --stop_stage 5
```
Alternatively, you can manually run these scripts in the command line (using only CPU for inference if desired). Below is an example of the full sequence of commands:
Remember to replace `${conf_path}`, `${ckpt}`, `${ips}`, `${decode_conf_path}`, `${avg_num}`, and `${audio_file}` with your actual configuration file path, checkpoint name, IP addresses, decode configuration file path, average number of models, and the path to your audio file, respectively.
Adjust these paths according to your specific setup and the model you are using.