diff --git a/.github/CODE_OF_CONDUCT.md b/.github/CODE_OF_CONDUCT.md
deleted file mode 100644
index 33d53d9f5..000000000
--- a/.github/CODE_OF_CONDUCT.md
+++ /dev/null
@@ -1,77 +0,0 @@
-# Contributor Covenant Code of Conduct
-
-## Our Pledge
-
-In the interest of fostering an open and welcoming environment, we as
-contributors and maintainers pledge to making participation in our project and
-our community a harassment-free experience for everyone, regardless of age, body
-size, disability, ethnicity, sex characteristics, gender identity and expression,
-level of experience, education, socio-economic status, nationality, personal
-appearance, race, religion, or sexual identity and orientation.
-
-## Our Standards
-
-Examples of behavior that contributes to creating a positive environment
-include:
-
-* Using welcoming and inclusive language
-* Being respectful of differing viewpoints and experiences
-* Gracefully accepting constructive criticism
-* Focusing on what is best for the community
-* Showing empathy towards other community members
-
-Examples of unacceptable behavior by participants include:
-
-* The use of sexualized language or imagery and unwelcome sexual attention or
- advances
-* Racial or political allusions
-* Trolling, insulting/derogatory comments, and personal or political attacks
-* Public or private harassment
-* Publishing others' private information, such as a physical or electronic
- address, without explicit permission
-* Other conduct which could reasonably be considered inappropriate in a
- professional setting
-
-## Our Responsibilities
-
-Project maintainers are responsible for clarifying the standards of acceptable
-behavior and are expected to take appropriate and fair corrective action in
-response to any instances of unacceptable behavior.
-
-Project maintainers have the right and responsibility to remove, edit, or
-reject comments, commits, code, wiki edits, issues, and other contributions
-that are not aligned to this Code of Conduct, or to ban temporarily or
-permanently any contributor for other behaviors that they deem inappropriate,
-threatening, offensive, or harmful.
-
-## Scope
-
-This Code of Conduct applies both within project spaces and in public spaces
-when an individual is representing the project or its community. Examples of
-representing a project or community include using an official project e-mail
-address, posting via an official social media account, or acting as an appointed
-representative at an online or offline event. Representation of a project may be
-further defined and clarified by project maintainers.
-
-## Enforcement
-
-Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported by contacting the project team at paddlespeech@baidu.com. All
-complaints will be reviewed and investigated and will result in a response that
-is deemed necessary and appropriate to the circumstances. The project team is
-obligated to maintain confidentiality with regard to the reporter of an incident.
-Further details of specific enforcement policies may be posted separately.
-
-Project maintainers who do not follow or enforce the Code of Conduct in good
-faith may face temporary or permanent repercussions as determined by other
-members of the project's leadership.
-
-## Attribution
-
-This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
-available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
-
-[homepage]: https://www.contributor-covenant.org
-
-For answers to common questions about this code of conduct, see
-https://www.contributor-covenant.org/faq
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
deleted file mode 100644
index 1ff473308..000000000
--- a/.github/CONTRIBUTING.md
+++ /dev/null
@@ -1,30 +0,0 @@
-# 💡 paddlespeech 提交代码须知
-
-### Discussed in https://github.com/PaddlePaddle/PaddleSpeech/discussions/1326
-
-
-
-Originally posted by **yt605155624** January 12, 2022
-1. 写完代码之后可以用我们的 pre-commit 检查一下代码格式,注意只改自己修改的代码的格式即可,其他的代码有可能也被改了格式,不要 add 就好
-```
-pip install pre-commit
-pre-commit run --file 你修改的代码
-```
-2. 提交 commit 中增加必要信息跳过不必要的 CI
-- 提交 asr 相关代码
-```text
-git commit -m "xxxxxx, test=asr"
-```
-- 提交 tts 相关代码
-```text
-git commit -m "xxxxxx, test=tts"
-```
-- 仅修改文档
-```text
-git commit -m "xxxxxx, test=doc"
-```
-注意:
-1. 虽然跳过了 CI,但是还要先排队排到才能跳过,所以非自己方向看到 pending 不要着急 🤣
-2. 在 `git commit --amend` 的时候才加 `test=xxx` 可能不太有效
-3. 一个 pr 多次提交 commit 注意每次都要加 `test=xxx`,因为每个 commit 都会触发 CI
-4. 删除 python 环境中已经安装好的 paddlespeech,否则可能会影响 import paddlespeech 的顺序
diff --git a/.github/ISSUE_TEMPLATE/bug-report-tts.md b/.github/ISSUE_TEMPLATE/bug-report-tts.md
index e2322c239..64b33c32e 100644
--- a/.github/ISSUE_TEMPLATE/bug-report-tts.md
+++ b/.github/ISSUE_TEMPLATE/bug-report-tts.md
@@ -3,6 +3,7 @@ name: "\U0001F41B TTS Bug Report"
about: Create a report to help us improve
title: "[TTS]XXXX"
labels: Bug, T2S
+assignees: yt605155624
---
diff --git a/.github/stale.yml b/.github/stale.yml
index 6b0da9b98..da19b6606 100644
--- a/.github/stale.yml
+++ b/.github/stale.yml
@@ -6,8 +6,7 @@ daysUntilClose: 30
exemptLabels:
- Roadmap
- Bug
- - feature request
- - Tips
+ - New Feature
# Label to use when marking an issue as stale
staleLabel: Stale
# Comment to post when marking an issue as stale. Set to `false` to disable
@@ -18,4 +17,4 @@ markComment: >
unmarkComment: false
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: >
- This issue is closed. Please re-open if needed.
+ This issue is closed. Please re-open if needed.
\ No newline at end of file
diff --git a/.gitignore b/.gitignore
index 4a0c43312..75f56b604 100644
--- a/.gitignore
+++ b/.gitignore
@@ -15,7 +15,6 @@
*.egg-info
build
*output/
-.history
audio/dist/
audio/fc_patch/
diff --git a/.pre-commit-hooks/copyright-check.hook b/.pre-commit-hooks/copyright-check.hook
index 5a409e062..761edbc01 100644
--- a/.pre-commit-hooks/copyright-check.hook
+++ b/.pre-commit-hooks/copyright-check.hook
@@ -19,7 +19,7 @@ import subprocess
import platform
COPYRIGHT = '''
-Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
+Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@@ -128,4 +128,4 @@ def main(argv=None):
if __name__ == '__main__':
- exit(main())
+ exit(main())
\ No newline at end of file
diff --git a/README.md b/README.md
index 9ed823116..0a12ec049 100644
--- a/README.md
+++ b/README.md
@@ -97,47 +97,26 @@
- Life was like a box of chocolates, you never know what you're gonna get. |
+ Life was like a box of chocolates, you never know what you're gonna get. |

|
- 早上好,今天是2020/10/29,最低温度是-3°C。 |
+ 早上好,今天是2020/10/29,最低温度是-3°C。 |

|
- 季姬寂,集鸡,鸡即棘鸡。棘鸡饥叽,季姬及箕稷济鸡。鸡既济,跻姬笈,季姬忌,急咭鸡,鸡急,继圾几,季姬急,即籍箕击鸡,箕疾击几伎,伎即齑,鸡叽集几基,季姬急极屐击鸡,鸡既殛,季姬激,即记《季姬击鸡记》。 |
+ 季姬寂,集鸡,鸡即棘鸡。棘鸡饥叽,季姬及箕稷济鸡。鸡既济,跻姬笈,季姬忌,急咭鸡,鸡急,继圾几,季姬急,即籍箕击鸡,箕疾击几伎,伎即齑,鸡叽集几基,季姬急极屐击鸡,鸡既殛,季姬激,即记《季姬击鸡记》。 |

|
-
- 大家好,我是 parrot 虚拟老师,我们来读一首诗,我与春风皆过客,I and the spring breeze are passing by,你携秋水揽星河,you take the autumn water to take the galaxy。 |
-
-
- 
- |
-
-
- 宜家唔系事必要你讲,但系你所讲嘅说话将会变成呈堂证供。 |
-
-
- 
- |
-
-
- 各个国家有各个国家嘅国歌 |
-
-
- 
- |
-
@@ -178,24 +157,16 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
### Recent Update
-- 🔥 2023.04.06: Add [subtitle file (.srt format) generation example](./demos/streaming_asr_server).
-- 🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including [DiffSinger](./examples/opencpop/svs1)、[PWGAN](./examples/opencpop/voc1) and [HiFiGAN](./examples/opencpop/voc5), the effect is continuously optimized.
-- 👑 2023.03.09: Add [Wav2vec2ASR-zh](./examples/aishell/asr3).
-- 🎉 2023.03.07: Add [TTS ARM Linux C++ Demo (with C++ Chinese Text Frontend)](./demos/TTSArmLinux).
-- 🔥 2023.03.03 Add Voice Conversion [StarGANv2-VC synthesize pipeline](./examples/vctk/vc3).
-- 🎉 2023.02.16: Add [Cantonese TTS](./examples/canton/tts3).
-- 🔥 2023.01.10: Add [code-switch asr CLI and Demos](./demos/speech_recognition).
-- 👑 2023.01.06: Add [code-switch asr tal_cs recipe](./examples/tal_cs/asr1/).
-- 🎉 2022.12.02: Add [end-to-end Prosody Prediction pipeline](./examples/csmsc/tts3_rhy) (including using prosody labels in Acoustic Model).
-- 🎉 2022.11.30: Add [TTS Android Demo](./demos/TTSAndroid).
+- 🎉 2022.12.02: Add [end-to-end Prosody Prediction pipeline](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (including using prosody labels in Acoustic Model).
+- 🎉 2022.11.30: Add [TTS Android Demo](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid).
- 🤗 2022.11.28: PP-TTS and PP-ASR demos are available in [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) and [official website
of paddlepaddle](https://www.paddlepaddle.org.cn/models).
- 👑 2022.11.18: Add [Whisper CLI and Demos](https://github.com/PaddlePaddle/PaddleSpeech/pull/2640), support multi language recognition and translation.
-- 🔥 2022.11.18: Add [Wav2vec2 CLI and Demos](./demos/speech_ssl), Support ASR and Feature Extraction.
+- 🔥 2022.11.18: Add [Wav2vec2 CLI and Demos](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_ssl), Support ASR and Feature Extraction.
- 🎉 2022.11.17: Add [male voice for TTS](https://github.com/PaddlePaddle/PaddleSpeech/pull/2660).
- 🔥 2022.11.07: Add [U2/U2++ C++ High Performance Streaming ASR Deployment](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/runtime/examples/u2pp_ol/wenetspeech).
- 👑 2022.11.01: Add [Adversarial Loss](https://arxiv.org/pdf/1907.04448.pdf) for [Chinese English mixed TTS](./examples/zh_en_tts/tts3).
-- 🔥 2022.10.26: Add [Prosody Prediction](./examples/other/rhy) for TTS.
+- 🔥 2022.10.26: Add [Prosody Prediction](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/rhy) for TTS.
- 🎉 2022.10.21: Add [SSML](https://github.com/PaddlePaddle/PaddleSpeech/discussions/2538) for TTS Chinese Text Frontend.
- 👑 2022.10.11: Add [Wav2vec2ASR-en](./examples/librispeech/asr3), wav2vec2.0 fine-tuning for ASR on LibriSpeech.
- 🔥 2022.09.26: Add Voice Cloning, TTS finetune, and [ERNIE-SAT](https://arxiv.org/abs/2211.03545) in [PaddleSpeech Web Demo](./demos/speech_web).
@@ -209,16 +180,16 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🎉 2022.06.22: All TTS models support ONNX format.
- 🍀 2022.06.17: Add [PaddleSpeech Web Demo](./demos/speech_web).
- 👑 2022.05.13: Release [PP-ASR](./docs/source/asr/PPASR.md)、[PP-TTS](./docs/source/tts/PPTTS.md)、[PP-VPR](docs/source/vpr/PPVPR.md).
-- 👏🏻 2022.05.06: `PaddleSpeech Streaming Server` is available for `Streaming ASR` with `Punctuation Restoration` and `Token Timestamp` and `Text-to-Speech`.
-- 👏🏻 2022.05.06: `PaddleSpeech Server` is available for `Audio Classification`, `Automatic Speech Recognition` and `Text-to-Speech`, `Speaker Verification` and `Punctuation Restoration`.
-- 👏🏻 2022.03.28: `PaddleSpeech CLI` is available for `Speaker Verification`.
-- 👏🏻 2021.12.10: `PaddleSpeech CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`.
+- 👏🏻 2022.05.06: `PaddleSpeech Streaming Server` is available for `Streaming ASR` with `Punctuation Restoration` and `Token Timestamp` and `Text-to-Speech`.
+- 👏🏻 2022.05.06: `PaddleSpeech Server` is available for `Audio Classification`, `Automatic Speech Recognition` and `Text-to-Speech`, `Speaker Verification` and `Punctuation Restoration`.
+- 👏🏻 2022.03.28: `PaddleSpeech CLI` is available for `Speaker Verification`.
+- 👏🏻 2021.12.10: `PaddleSpeech CLI` is available for `Audio Classification`, `Automatic Speech Recognition`, `Speech Translation (English to Chinese)` and `Text-to-Speech`.
### Community
- Scan the QR code below with your Wechat, you can access to official technical exchange group and get the bonus ( more than 20GB learning materials, such as papers, codes and videos ) and the live link of the lessons. Look forward to your participation.
-

+
## Installation
@@ -579,14 +550,14 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
- Text Frontend |
- |
-
- tn / g2p
- |
+ Text Frontend |
+ |
+
+ tn / g2p
+ |
- Acoustic Model |
+ Acoustic Model |
Tacotron2 |
LJSpeech / CSMSC |
@@ -621,13 +592,6 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
ERNIE-SAT-vctk / ERNIE-SAT-aishell3 / ERNIE-SAT-zh_en
|
-
- DiffSinger |
- Opencpop |
-
- DiffSinger-opencpop
- |
-
Vocoder |
WaveFlow |
@@ -638,9 +602,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
Parallel WaveGAN |
- LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop |
+ LJSpeech / VCTK / CSMSC / AISHELL-3 |
- PWGAN-ljspeech / PWGAN-vctk / PWGAN-csmsc / PWGAN-aishell3 / PWGAN-opencpop
+ PWGAN-ljspeech / PWGAN-vctk / PWGAN-csmsc / PWGAN-aishell3
|
@@ -659,9 +623,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
HiFiGAN |
- LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop |
+ LJSpeech / VCTK / CSMSC / AISHELL-3 |
- HiFiGAN-ljspeech / HiFiGAN-vctk / HiFiGAN-csmsc / HiFiGAN-aishell3 / HiFiGAN-opencpop
+ HiFiGAN-ljspeech / HiFiGAN-vctk / HiFiGAN-csmsc / HiFiGAN-aishell3
|
@@ -1021,16 +985,10 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
- Many thanks to [vpegasus](https://github.com/vpegasus)/[xuesebot](https://github.com/vpegasus/xuesebot) for developing a rasa chatbot,which is able to speak and listen thanks to PaddleSpeech.
- Many thanks to [chenkui164](https://github.com/chenkui164)/[FastASR](https://github.com/chenkui164/FastASR) for the C++ inference implementation of PaddleSpeech ASR.
- Many thanks to [heyudage](https://github.com/heyudage)/[VoiceTyping](https://github.com/heyudage/VoiceTyping) for the real-time voice typing tool implementation of PaddleSpeech ASR streaming services.
-- Many thanks to [EscaticZheng](https://github.com/EscaticZheng)/[ps3.9wheel-install](https://github.com/EscaticZheng/ps3.9wheel-install) for the python3.9 prebuilt wheel for PaddleSpeech installation in Windows without Viusal Studio.
+
Besides, PaddleSpeech depends on a lot of open source repositories. See [references](./docs/source/reference.md) for more information.
-- Many thanks to [chinobing](https://github.com/chinobing)/[FastAPI-PaddleSpeech-Audio-To-Text](https://github.com/chinobing/FastAPI-PaddleSpeech-Audio-To-Text) for converting audio to text based on FastAPI and PaddleSpeech.
-- Many thanks to [MistEO](https://github.com/MistEO)/[Pallas-Bot](https://github.com/MistEO/Pallas-Bot) for QQ bot based on PaddleSpeech TTS.
## License
PaddleSpeech is provided under the [Apache-2.0 License](./LICENSE).
-
-## Stargazers over time
-
-[](https://starchart.cc/PaddlePaddle/PaddleSpeech)
diff --git a/README_cn.md b/README_cn.md
index 8b98b61ce..5cc156c9f 100644
--- a/README_cn.md
+++ b/README_cn.md
@@ -122,27 +122,6 @@

-
- 大家好,我是 parrot 虚拟老师,我们来读一首诗,我与春风皆过客,I and the spring breeze are passing by,你携秋水揽星河,you take the autumn water to take the galaxy。 |
-
-
- 
- |
-
-
- 宜家唔系事必要你讲,但系你所讲嘅说话将会变成呈堂证供。 |
-
-
- 
- |
-
-
- 各个国家有各个国家嘅国歌 |
-
-
- 
- |
-
@@ -182,24 +161,18 @@
- 🔬 主流模型及数据集: 本工具包实现了参与整条语音任务流水线的各个模块,并且采用了主流数据集如 LibriSpeech、LJSpeech、AIShell、CSMSC,详情请见 [模型列表](#model-list)。
- 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。
+
+
### 近期更新
-- 👑 2023.04.06: 新增 [srt格式字幕生成功能](./demos/streaming_asr_server)。
-- 🔥 2023.03.14: 新增基于 Opencpop 数据集的 SVS (歌唱合成) 示例,包含 [DiffSinger](./examples/opencpop/svs1)、[PWGAN](./examples/opencpop/voc1) 和 [HiFiGAN](./examples/opencpop/voc5),效果持续优化中。
-- 👑 2023.03.09: 新增 [Wav2vec2ASR-zh](./examples/aishell/asr3)。
-- 🎉 2023.03.07: 新增 [TTS ARM Linux C++ 部署示例 (包含 C++ 中文文本前端模块)](./demos/TTSArmLinux)。
-- 🔥 2023.03.03: 新增声音转换模型 [StarGANv2-VC 合成流程](./examples/vctk/vc3)。
-- 🎉 2023.02.16: 新增[粤语语音合成](./examples/canton/tts3)。
-- 🔥 2023.01.10: 新增[中英混合 ASR CLI 和 Demos](./demos/speech_recognition)。
-- 👑 2023.01.06: 新增 [ASR 中英混合 tal_cs 训练推理流程](./examples/tal_cs/asr1/)。
-- 🎉 2022.12.02: 新增[端到端韵律预测全流程](./examples/csmsc/tts3_rhy) (包含在声学模型中使用韵律标签)。
-- 🎉 2022.11.30: 新增 [TTS Android 部署示例](./demos/TTSAndroid)。
+- 🎉 2022.12.02: 新增 [端到端韵律预测全流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (包含在声学模型中使用韵律标签)。
+- 🎉 2022.11.30: 新增 [TTS Android 部署示例](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid)。
- 🤗 2022.11.28: PP-TTS and PP-ASR 示例可在 [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) 和[飞桨官网](https://www.paddlepaddle.org.cn/models)体验!
- 👑 2022.11.18: 新增 [Whisper CLI 和 Demos](https://github.com/PaddlePaddle/PaddleSpeech/pull/2640), 支持多种语言的识别与翻译。
-- 🔥 2022.11.18: 新增 [Wav2vec2 CLI 和 Demos](./demos/speech_ssl), 支持 ASR 和特征提取。
+- 🔥 2022.11.18: 新增 [Wav2vec2 CLI 和 Demos](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_ssl), 支持 ASR 和 特征提取.
- 🎉 2022.11.17: TTS 新增[高质量男性音色](https://github.com/PaddlePaddle/PaddleSpeech/pull/2660)。
-- 🔥 2022.11.07: 新增 [U2/U2++ 高性能流式 ASR C++ 部署](./speechx/examples/u2pp_ol/wenetspeech)。
+- 🔥 2022.11.07: 新增 [U2/U2++ 高性能流式 ASR C++ 部署](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/speechx/examples/u2pp_ol/wenetspeech)。
- 👑 2022.11.01: [中英文混合 TTS](./examples/zh_en_tts/tts3) 新增 [Adversarial Loss](https://arxiv.org/pdf/1907.04448.pdf) 模块。
-- 🔥 2022.10.26: TTS 新增[韵律预测](./develop/examples/other/rhy)功能。
+- 🔥 2022.10.26: TTS 新增[韵律预测](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/rhy)功能。
- 🎉 2022.10.21: TTS 中文文本前端新增 [SSML](https://github.com/PaddlePaddle/PaddleSpeech/discussions/2538) 功能。
- 👑 2022.10.11: 新增 [Wav2vec2ASR-en](./examples/librispeech/asr3), 在 LibriSpeech 上针对 ASR 任务对 wav2vec2.0 的 finetuning。
- 🔥 2022.09.26: 新增 Voice Cloning, TTS finetune 和 [ERNIE-SAT](https://arxiv.org/abs/2211.03545) 到 [PaddleSpeech 网页应用](./demos/speech_web)。
@@ -227,7 +200,7 @@
微信扫描二维码关注公众号,点击“马上报名”填写问卷加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
-

+
@@ -578,50 +551,43 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
tn / g2p
|
-
-
- 声学模型 |
+
+
+ 声学模型 |
Tacotron2 |
LJSpeech / CSMSC |
tacotron2-ljspeech / tacotron2-csmsc
|
-
-
+
+
Transformer TTS |
LJSpeech |
transformer-ljspeech
|
-
-
+
+
SpeedySpeech |
CSMSC |
speedyspeech-csmsc
|
-
-
+
+
FastSpeech2 |
LJSpeech / VCTK / CSMSC / AISHELL-3 / ZH_EN / finetune |
fastspeech2-ljspeech / fastspeech2-vctk / fastspeech2-csmsc / fastspeech2-aishell3 / fastspeech2-zh_en / fastspeech2-finetune
|
-
-
+
+
ERNIE-SAT |
VCTK / AISHELL-3 / ZH_EN |
ERNIE-SAT-vctk / ERNIE-SAT-aishell3 / ERNIE-SAT-zh_en
|
-
-
- DiffSinger |
- Opencpop |
-
- DiffSinger-opencpop
- |
-
+
声码器 |
WaveFlow |
@@ -632,9 +598,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
Parallel WaveGAN |
- LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop |
+ LJSpeech / VCTK / CSMSC / AISHELL-3 |
- PWGAN-ljspeech / PWGAN-vctk / PWGAN-csmsc / PWGAN-aishell3 / PWGAN-opencpop
+ PWGAN-ljspeech / PWGAN-vctk / PWGAN-csmsc / PWGAN-aishell3
|
@@ -653,9 +619,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
HiFiGAN |
- LJSpeech / VCTK / CSMSC / AISHELL-3 / Opencpop |
+ LJSpeech / VCTK / CSMSC / AISHELL-3 |
- HiFiGAN-ljspeech / HiFiGAN-vctk / HiFiGAN-csmsc / HiFiGAN-aishell3 / HiFiGAN-opencpop
+ HiFiGAN-ljspeech / HiFiGAN-vctk / HiFiGAN-csmsc / HiFiGAN-aishell3
|
@@ -712,7 +678,6 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
-
**声音分类**
@@ -1021,19 +986,13 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
- 非常感谢 [awmmmm](https://github.com/awmmmm) 提供 fastspeech2 aishell3 conformer 预训练模型。
- 非常感谢 [phecda-xu](https://github.com/phecda-xu)/[PaddleDubbing](https://github.com/phecda-xu/PaddleDubbing) 基于 PaddleSpeech 的 TTS 模型搭建带 GUI 操作界面的配音工具。
- 非常感谢 [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) 基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。
+
- 非常感谢 [vpegasus](https://github.com/vpegasus)/[xuesebot](https://github.com/vpegasus/xuesebot) 基于 PaddleSpeech 的 ASR 与 TTS 设计的可听、说对话机器人。
- 非常感谢 [chenkui164](https://github.com/chenkui164)/[FastASR](https://github.com/chenkui164/FastASR) 对 PaddleSpeech 的 ASR 进行 C++ 推理实现。
- 非常感谢 [heyudage](https://github.com/heyudage)/[VoiceTyping](https://github.com/heyudage/VoiceTyping) 基于 PaddleSpeech 的 ASR 流式服务实现的实时语音输入法工具。
-- 非常感谢 [EscaticZheng](https://github.com/EscaticZheng)/[ps3.9wheel-install](https://github.com/EscaticZheng/ps3.9wheel-install) 对PaddleSpeech在Windows下的安装提供了无需Visua Studio,基于python3.9的预编译依赖安装包。
-- 非常感谢 [chinobing](https://github.com/chinobing)/[FastAPI-PaddleSpeech-Audio-To-Text](https://github.com/chinobing/FastAPI-PaddleSpeech-Audio-To-Text) 利用 FastAPI 实现 PaddleSpeech 语音转文字,文件上传、分割、转换进度显示、后台更新任务并以 csv 格式输出。
-- 非常感谢 [MistEO](https://github.com/MistEO)/[Pallas-Bot](https://github.com/MistEO/Pallas-Bot) 基于 PaddleSpeech TTS 的 QQ Bot 项目。
此外,PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅 [references](./docs/source/reference.md)。
## License
PaddleSpeech 在 [Apache-2.0 许可](./LICENSE) 下提供。
-
-## Stargazers over time
-
-[](https://starchart.cc/PaddlePaddle/PaddleSpeech)
diff --git a/audio/CMakeLists.txt b/audio/CMakeLists.txt
index 021e24477..d9ae63cd2 100644
--- a/audio/CMakeLists.txt
+++ b/audio/CMakeLists.txt
@@ -41,18 +41,24 @@ option(BUILD_PADDLEAUDIO_PYTHON_EXTENSION "Build Python extension" ON)
# cmake
set(CMAKE_MODULE_PATH "${CMAKE_MODULE_PATH};${PROJECT_SOURCE_DIR}/cmake;${PROJECT_SOURCE_DIR}/cmake/external")
+if (NOT MSVC)
+ find_package(GFortranLibs REQUIRED)
+ include(FortranCInterface)
+ include(FindGFortranLibs REQUIRED)
+endif()
+
# fc_patch dir
set(FETCHCONTENT_QUIET off)
get_filename_component(fc_patch "fc_patch" REALPATH BASE_DIR "${CMAKE_SOURCE_DIR}")
set(FETCHCONTENT_BASE_DIR ${fc_patch})
set(THIRD_PARTY_PATH ${fc_patch})
+include(openblas)
+
set(PYBIND11_PYTHON_VERSION ${PY_VERSION})
include(cmake/pybind.cmake)
include_directories(${PYTHON_INCLUDE_DIR})
-include_directories(${CMAKE_CURRENT_SOURCE_DIR}/paddleaudio/third_party/)
-
# packages
find_package(Python3 COMPONENTS Interpreter Development)
diff --git a/audio/README.md b/audio/README.md
index d42d41229..bfd8625f0 100644
--- a/audio/README.md
+++ b/audio/README.md
@@ -2,22 +2,33 @@
安装方式: pip install paddleaudio
-目前支持的平台:Linux, Mac, Windows
+目前支持的平台:Linux:
## Environment
## Build wheel
-cmd: python setup.py bdist_wheel
Linux test build whl environment:
+* docker - `registry.baidubce.com/paddlepaddle/paddle:2.2.2`
* os - Ubuntu 16.04.7 LTS
-* gcc/g++ - 8.2.0
+* gcc/g++/gfortran - 8.2.0
* cmake - 3.18.0 (need install)
+* [How to Install Docker](https://docs.docker.com/engine/install/)
+* [A Docker Tutorial for Beginners](https://docker-curriculum.com/)
+
+1. First to launch docker container.
+
+```
+docker run --privileged --net=host --ipc=host -it --rm -v $PWD:/workspace --name=dev registry.baidubce.com/paddlepaddle/paddle:2.2.2 /bin/bash
+```
+2. python setup.py bdist_wheel
+
MAC:test build whl envrioment:
* os
-* gcc/g++ 12.2.0
+* gcc/g++/gfortran 12.2.0
* cpu Intel Xeon E5 x86_64
Windows:
-not support paddleaudio C++ extension lib (sox io, kaldi native fbank)
+not support: paddleaudio C++ extension lib (sox io, kaldi native fbank)
+python setup.py bdist_wheel
diff --git a/audio/paddleaudio/CMakeLists.txt b/audio/paddleaudio/CMakeLists.txt
index c6b43c780..dbf2bd3eb 100644
--- a/audio/paddleaudio/CMakeLists.txt
+++ b/audio/paddleaudio/CMakeLists.txt
@@ -1,3 +1,19 @@
add_subdirectory(third_party)
add_subdirectory(src)
+
+if (APPLE)
+ file(COPY ${GFORTRAN_LIBRARIES_DIR}/libgcc_s.1.1.dylib
+ DESTINATION ${CMAKE_CURRENT_SOURCE_DIR}/lib)
+endif(APPLE)
+
+if (UNIX AND NOT APPLE)
+ file(COPY ${GFORTRAN_LIBRARIES_DIR}/libgfortran.so.5
+ DESTINATION ${CMAKE_CURRENT_SOURCE_DIR}/lib FOLLOW_SYMLINK_CHAIN)
+
+ file(COPY ${GFORTRAN_LIBRARIES_DIR}/libquadmath.so.0
+ DESTINATION ${CMAKE_CURRENT_SOURCE_DIR}/lib FOLLOW_SYMLINK_CHAIN)
+
+ file(COPY ${GFORTRAN_LIBRARIES_DIR}/libgcc_s.so.1
+ DESTINATION ${CMAKE_CURRENT_SOURCE_DIR}/lib FOLLOW_SYMLINK_CHAIN)
+endif()
diff --git a/audio/paddleaudio/_internal/module_utils.py b/audio/paddleaudio/_internal/module_utils.py
index becd23cd8..7b3230de9 100644
--- a/audio/paddleaudio/_internal/module_utils.py
+++ b/audio/paddleaudio/_internal/module_utils.py
@@ -67,11 +67,8 @@ def deprecated(direction: str, version: Optional[str]=None):
def is_kaldi_available():
- try:
- from paddleaudio import _paddleaudio
- return True
- except Exception:
- return False
+ return is_module_available("paddleaudio._paddleaudio")
+
def requires_kaldi():
if is_kaldi_available():
@@ -131,11 +128,9 @@ def requires_soundfile():
def is_sox_available():
- try:
- from paddleaudio import _paddleaudio
- return True
- except Exception:
+ if platform.system() == "Windows": # not support sox in windows
return False
+ return is_module_available("paddleaudio._paddleaudio")
def requires_sox():
diff --git a/audio/paddleaudio/backends/soundfile_backend.py b/audio/paddleaudio/backends/soundfile_backend.py
index 9195ea097..ae7b5b52d 100644
--- a/audio/paddleaudio/backends/soundfile_backend.py
+++ b/audio/paddleaudio/backends/soundfile_backend.py
@@ -191,7 +191,7 @@ def soundfile_save(y: np.ndarray, sr: int, file: os.PathLike) -> None:
if sr <= 0:
raise ParameterError(
- f'Sample rate should be larger than 0, received sr = {sr}')
+ f'Sample rate should be larger than 0, recieved sr = {sr}')
if y.dtype not in ['int16', 'int8']:
warnings.warn(
diff --git a/audio/paddleaudio/kaldi/__init__.py b/audio/paddleaudio/kaldi/__init__.py
index a0ae644d1..f951e280a 100644
--- a/audio/paddleaudio/kaldi/__init__.py
+++ b/audio/paddleaudio/kaldi/__init__.py
@@ -12,4 +12,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .kaldi import fbank
-#from .kaldi import pitch
+from .kaldi import pitch
diff --git a/audio/paddleaudio/kaldi/kaldi.py b/audio/paddleaudio/kaldi/kaldi.py
index 0f080de04..16969d772 100644
--- a/audio/paddleaudio/kaldi/kaldi.py
+++ b/audio/paddleaudio/kaldi/kaldi.py
@@ -16,6 +16,7 @@ from paddleaudio._internal import module_utils
__all__ = [
'fbank',
+ 'pitch',
]
@@ -32,6 +33,8 @@ def fbank(
round_to_power_of_two: bool=True,
blackman_coeff: float=0.42,
snip_edges: bool=True,
+ allow_downsample: bool=False,
+ allow_upsample: bool=False,
max_feature_vectors: int=-1,
num_bins: int=23,
low_freq: float=20,
@@ -59,6 +62,8 @@ def fbank(
frame_opts.round_to_power_of_two = round_to_power_of_two
frame_opts.blackman_coeff = blackman_coeff
frame_opts.snip_edges = snip_edges
+ frame_opts.allow_downsample = allow_downsample
+ frame_opts.allow_upsample = allow_upsample
frame_opts.max_feature_vectors = max_feature_vectors
mel_opts.num_bins = num_bins
@@ -80,48 +85,48 @@ def fbank(
return feat
-#@module_utils.requires_kaldi()
-#def pitch(wav,
-#samp_freq: int=16000,
-#frame_shift_ms: float=10.0,
-#frame_length_ms: float=25.0,
-#preemph_coeff: float=0.0,
-#min_f0: int=50,
-#max_f0: int=400,
-#soft_min_f0: float=10.0,
-#penalty_factor: float=0.1,
-#lowpass_cutoff: int=1000,
-#resample_freq: int=4000,
-#delta_pitch: float=0.005,
-#nccf_ballast: int=7000,
-#lowpass_filter_width: int=1,
-#upsample_filter_width: int=5,
-#max_frames_latency: int=0,
-#frames_per_chunk: int=0,
-#simulate_first_pass_online: bool=False,
-#recompute_frame: int=500,
-#nccf_ballast_online: bool=False,
-#snip_edges: bool=True):
-#pitch_opts = paddleaudio._paddleaudio.PitchExtractionOptions()
-#pitch_opts.samp_freq = samp_freq
-#pitch_opts.frame_shift_ms = frame_shift_ms
-#pitch_opts.frame_length_ms = frame_length_ms
-#pitch_opts.preemph_coeff = preemph_coeff
-#pitch_opts.min_f0 = min_f0
-#pitch_opts.max_f0 = max_f0
-#pitch_opts.soft_min_f0 = soft_min_f0
-#pitch_opts.penalty_factor = penalty_factor
-#pitch_opts.lowpass_cutoff = lowpass_cutoff
-#pitch_opts.resample_freq = resample_freq
-#pitch_opts.delta_pitch = delta_pitch
-#pitch_opts.nccf_ballast = nccf_ballast
-#pitch_opts.lowpass_filter_width = lowpass_filter_width
-#pitch_opts.upsample_filter_width = upsample_filter_width
-#pitch_opts.max_frames_latency = max_frames_latency
-#pitch_opts.frames_per_chunk = frames_per_chunk
-#pitch_opts.simulate_first_pass_online = simulate_first_pass_online
-#pitch_opts.recompute_frame = recompute_frame
-#pitch_opts.nccf_ballast_online = nccf_ballast_online
-#pitch_opts.snip_edges = snip_edges
-#pitch = paddleaudio._paddleaudio.ComputeKaldiPitch(pitch_opts, wav)
-#return pitch
+@module_utils.requires_kaldi()
+def pitch(wav,
+ samp_freq: int=16000,
+ frame_shift_ms: float=10.0,
+ frame_length_ms: float=25.0,
+ preemph_coeff: float=0.0,
+ min_f0: int=50,
+ max_f0: int=400,
+ soft_min_f0: float=10.0,
+ penalty_factor: float=0.1,
+ lowpass_cutoff: int=1000,
+ resample_freq: int=4000,
+ delta_pitch: float=0.005,
+ nccf_ballast: int=7000,
+ lowpass_filter_width: int=1,
+ upsample_filter_width: int=5,
+ max_frames_latency: int=0,
+ frames_per_chunk: int=0,
+ simulate_first_pass_online: bool=False,
+ recompute_frame: int=500,
+ nccf_ballast_online: bool=False,
+ snip_edges: bool=True):
+ pitch_opts = paddleaudio._paddleaudio.PitchExtractionOptions()
+ pitch_opts.samp_freq = samp_freq
+ pitch_opts.frame_shift_ms = frame_shift_ms
+ pitch_opts.frame_length_ms = frame_length_ms
+ pitch_opts.preemph_coeff = preemph_coeff
+ pitch_opts.min_f0 = min_f0
+ pitch_opts.max_f0 = max_f0
+ pitch_opts.soft_min_f0 = soft_min_f0
+ pitch_opts.penalty_factor = penalty_factor
+ pitch_opts.lowpass_cutoff = lowpass_cutoff
+ pitch_opts.resample_freq = resample_freq
+ pitch_opts.delta_pitch = delta_pitch
+ pitch_opts.nccf_ballast = nccf_ballast
+ pitch_opts.lowpass_filter_width = lowpass_filter_width
+ pitch_opts.upsample_filter_width = upsample_filter_width
+ pitch_opts.max_frames_latency = max_frames_latency
+ pitch_opts.frames_per_chunk = frames_per_chunk
+ pitch_opts.simulate_first_pass_online = simulate_first_pass_online
+ pitch_opts.recompute_frame = recompute_frame
+ pitch_opts.nccf_ballast_online = nccf_ballast_online
+ pitch_opts.snip_edges = snip_edges
+ pitch = paddleaudio._paddleaudio.ComputeKaldiPitch(pitch_opts, wav)
+ return pitch
diff --git a/audio/paddleaudio/src/CMakeLists.txt b/audio/paddleaudio/src/CMakeLists.txt
index 21e0f170d..fb6f32092 100644
--- a/audio/paddleaudio/src/CMakeLists.txt
+++ b/audio/paddleaudio/src/CMakeLists.txt
@@ -52,7 +52,7 @@ if(BUILD_KALDI)
list(
APPEND
LIBPADDLEAUDIO_LINK_LIBRARIES
- kaldi-native-fbank-core
+ libkaldi
)
list(
APPEND
@@ -92,6 +92,14 @@ define_library(
"${LIBPADDLEAUDIO_COMPILE_DEFINITIONS}"
)
+if (APPLE)
+ add_custom_command(TARGET libpaddleaudio POST_BUILD COMMAND install_name_tool -change "${GFORTRAN_LIBRARIES_DIR}/libgcc_s.1.1.dylib" "@loader_path/libgcc_s.1.1.dylib" libpaddleaudio.so)
+endif(APPLE)
+
+if (UNIX AND NOT APPLE)
+ set_target_properties(libpaddleaudio PROPERTIES INSTALL_RPATH "$ORIGIN")
+endif()
+
if (APPLE)
set(AUDIO_LIBRARY libpaddleaudio CACHE INTERNAL "")
else()
@@ -199,3 +207,11 @@ define_extension(
# )
# endif()
endif()
+
+if (APPLE)
+ add_custom_command(TARGET _paddleaudio POST_BUILD COMMAND install_name_tool -change "${GFORTRAN_LIBRARIES_DIR}/libgcc_s.1.1.dylib" "@loader_path/lib/libgcc_s.1.1.dylib" _paddleaudio.so)
+endif(APPLE)
+
+if (UNIX AND NOT APPLE)
+ set_target_properties(_paddleaudio PROPERTIES INSTALL_RPATH "$ORIGIN/lib")
+endif()
diff --git a/audio/paddleaudio/src/pybind/kaldi/feature_common.h b/audio/paddleaudio/src/pybind/kaldi/feature_common.h
index 6571fa3eb..05522bb7e 100644
--- a/audio/paddleaudio/src/pybind/kaldi/feature_common.h
+++ b/audio/paddleaudio/src/pybind/kaldi/feature_common.h
@@ -16,7 +16,7 @@
#include "pybind11/pybind11.h"
#include "pybind11/numpy.h"
-#include "kaldi-native-fbank/csrc/feature-window.h"
+#include "feat/feature-window.h"
namespace paddleaudio {
namespace kaldi {
@@ -28,18 +28,18 @@ class StreamingFeatureTpl {
public:
typedef typename F::Options Options;
StreamingFeatureTpl(const Options& opts);
- bool ComputeFeature(const std::vector& wav,
- std::vector* feats);
- void Reset() { remained_wav_.resize(0); }
+ bool ComputeFeature(const ::kaldi::VectorBase<::kaldi::BaseFloat>& wav,
+ ::kaldi::Vector<::kaldi::BaseFloat>* feats);
+ void Reset() { remained_wav_.Resize(0); }
int Dim() { return computer_.Dim(); }
private:
- bool Compute(const std::vector& waves,
- std::vector* feats);
+ bool Compute(const ::kaldi::Vector<::kaldi::BaseFloat>& waves,
+ ::kaldi::Vector<::kaldi::BaseFloat>* feats);
Options opts_;
- knf::FeatureWindowFunction window_function_;
- std::vector remained_wav_;
+ ::kaldi::FeatureWindowFunction window_function_;
+ ::kaldi::Vector<::kaldi::BaseFloat> remained_wav_;
F computer_;
};
diff --git a/audio/paddleaudio/src/pybind/kaldi/feature_common_inl.h b/audio/paddleaudio/src/pybind/kaldi/feature_common_inl.h
index 985d586fe..c894b9775 100644
--- a/audio/paddleaudio/src/pybind/kaldi/feature_common_inl.h
+++ b/audio/paddleaudio/src/pybind/kaldi/feature_common_inl.h
@@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
+#include "base/kaldi-common.h"
namespace paddleaudio {
namespace kaldi {
@@ -24,29 +25,24 @@ StreamingFeatureTpl::StreamingFeatureTpl(const Options& opts)
template
bool StreamingFeatureTpl::ComputeFeature(
- const std::vector& wav,
- std::vector* feats) {
+ const ::kaldi::VectorBase<::kaldi::BaseFloat>& wav,
+ ::kaldi::Vector<::kaldi::BaseFloat>* feats) {
// append remaned waves
- int wav_len = wav.size();
+ ::kaldi::int32 wav_len = wav.Dim();
if (wav_len == 0) return false;
- int left_len = remained_wav_.size();
- std::vector waves(left_len + wav_len);
- std::memcpy(waves.data(),
- remained_wav_.data(),
- left_len * sizeof(float));
- std::memcpy(waves.data() + left_len,
- wav.data(),
- wav_len * sizeof(float));
+ ::kaldi::int32 left_len = remained_wav_.Dim();
+ ::kaldi::Vector<::kaldi::BaseFloat> waves(left_len + wav_len);
+ waves.Range(0, left_len).CopyFromVec(remained_wav_);
+ waves.Range(left_len, wav_len).CopyFromVec(wav);
// cache remaned waves
- knf::FrameExtractionOptions frame_opts = computer_.GetFrameOptions();
- int num_frames = knf::NumFrames(waves.size(), frame_opts);
- int frame_shift = frame_opts.WindowShift();
- int left_samples = waves.size() - frame_shift * num_frames;
- remained_wav_.resize(left_samples);
- std::memcpy(remained_wav_.data(),
- waves.data() + frame_shift * num_frames,
- left_samples * sizeof(float));
+ ::kaldi::FrameExtractionOptions frame_opts = computer_.GetFrameOptions();
+ ::kaldi::int32 num_frames = ::kaldi::NumFrames(waves.Dim(), frame_opts);
+ ::kaldi::int32 frame_shift = frame_opts.WindowShift();
+ ::kaldi::int32 left_samples = waves.Dim() - frame_shift * num_frames;
+ remained_wav_.Resize(left_samples);
+ remained_wav_.CopyFromVec(
+ waves.Range(frame_shift * num_frames, left_samples));
// compute speech feature
Compute(waves, feats);
@@ -55,39 +51,40 @@ bool StreamingFeatureTpl::ComputeFeature(
// Compute feat
template
-bool StreamingFeatureTpl::Compute(const std::vector& waves,
- std::vector* feats) {
- const knf::FrameExtractionOptions& frame_opts = computer_.GetFrameOptions();
- int num_samples = waves.size();
- int frame_length = frame_opts.WindowSize();
- int sample_rate = frame_opts.samp_freq;
+bool StreamingFeatureTpl::Compute(
+ const ::kaldi::Vector<::kaldi::BaseFloat>& waves,
+ ::kaldi::Vector<::kaldi::BaseFloat>* feats) {
+ ::kaldi::BaseFloat vtln_warp = 1.0;
+ const ::kaldi::FrameExtractionOptions& frame_opts =
+ computer_.GetFrameOptions();
+ ::kaldi::int32 num_samples = waves.Dim();
+ ::kaldi::int32 frame_length = frame_opts.WindowSize();
+ ::kaldi::int32 sample_rate = frame_opts.samp_freq;
if (num_samples < frame_length) {
- return true;
+ return false;
}
- int num_frames = knf::NumFrames(num_samples, frame_opts);
- feats->resize(num_frames * Dim());
+ ::kaldi::int32 num_frames = ::kaldi::NumFrames(num_samples, frame_opts);
+ feats->Resize(num_frames * Dim());
- std::vector window;
+ ::kaldi::Vector<::kaldi::BaseFloat> window;
bool need_raw_log_energy = computer_.NeedRawLogEnergy();
- for (int frame = 0; frame < num_frames; frame++) {
- std::fill(window.begin(), window.end(), 0);
- float raw_log_energy = 0.0;
- float vtln_warp = 1.0;
- knf::ExtractWindow(0,
- waves,
- frame,
- frame_opts,
- window_function_,
- &window,
- need_raw_log_energy ? &raw_log_energy : NULL);
+ for (::kaldi::int32 frame = 0; frame < num_frames; frame++) {
+ ::kaldi::BaseFloat raw_log_energy = 0.0;
+ ::kaldi::ExtractWindow(0,
+ waves,
+ frame,
+ frame_opts,
+ window_function_,
+ &window,
+ need_raw_log_energy ? &raw_log_energy : NULL);
- std::vector this_feature(computer_.Dim());
- computer_.Compute(
- raw_log_energy, vtln_warp, &window, this_feature.data());
- std::memcpy(feats->data() + frame * Dim(),
- this_feature.data(),
- sizeof(float) * Dim());
+ ::kaldi::Vector<::kaldi::BaseFloat> this_feature(computer_.Dim(),
+ ::kaldi::kUndefined);
+ computer_.Compute(raw_log_energy, vtln_warp, &window, &this_feature);
+ ::kaldi::SubVector<::kaldi::BaseFloat> output_row(
+ feats->Data() + frame * Dim(), Dim());
+ output_row.CopyFromVec(this_feature);
}
return true;
}
diff --git a/audio/paddleaudio/src/pybind/kaldi/kaldi_feature.cc b/audio/paddleaudio/src/pybind/kaldi/kaldi_feature.cc
index 83df454c5..40e3786e8 100644
--- a/audio/paddleaudio/src/pybind/kaldi/kaldi_feature.cc
+++ b/audio/paddleaudio/src/pybind/kaldi/kaldi_feature.cc
@@ -13,16 +13,16 @@
// limitations under the License.
#include "paddleaudio/src/pybind/kaldi/kaldi_feature.h"
-//#include "feat/pitch-functions.h"
+#include "feat/pitch-functions.h"
namespace paddleaudio {
namespace kaldi {
bool InitFbank(
- knf::FrameExtractionOptions frame_opts,
- knf::MelBanksOptions mel_opts,
+ ::kaldi::FrameExtractionOptions frame_opts,
+ ::kaldi::MelBanksOptions mel_opts,
FbankOptions fbank_opts) {
- knf::FbankOptions opts;
+ ::kaldi::FbankOptions opts;
opts.frame_opts = frame_opts;
opts.mel_opts = mel_opts;
opts.use_energy = fbank_opts.use_energy;
@@ -41,8 +41,8 @@ py::array_t ComputeFbankStreaming(const py::array_t& wav) {
}
py::array_t ComputeFbank(
- knf::FrameExtractionOptions frame_opts,
- knf::MelBanksOptions mel_opts,
+ ::kaldi::FrameExtractionOptions frame_opts,
+ ::kaldi::MelBanksOptions mel_opts,
FbankOptions fbank_opts,
const py::array_t& wav) {
InitFbank(frame_opts, mel_opts, fbank_opts);
@@ -55,21 +55,21 @@ void ResetFbank() {
paddleaudio::kaldi::KaldiFeatureWrapper::GetInstance()->ResetFbank();
}
-//py::array_t ComputeKaldiPitch(
- //const ::kaldi::PitchExtractionOptions& opts,
- //const py::array_t& wav) {
- //py::buffer_info info = wav.request();
- //::kaldi::SubVector<::kaldi::BaseFloat> input_wav((float*)info.ptr, info.size);
+py::array_t ComputeKaldiPitch(
+ const ::kaldi::PitchExtractionOptions& opts,
+ const py::array_t& wav) {
+ py::buffer_info info = wav.request();
+ ::kaldi::SubVector<::kaldi::BaseFloat> input_wav((float*)info.ptr, info.size);
- //::kaldi::Matrix<::kaldi::BaseFloat> features;
- //::kaldi::ComputeKaldiPitch(opts, input_wav, &features);
- //auto result = py::array_t({features.NumRows(), features.NumCols()});
- //for (int row_idx = 0; row_idx < features.NumRows(); ++row_idx) {
- //std::memcpy(result.mutable_data(row_idx), features.Row(row_idx).Data(),
- //sizeof(float)*features.NumCols());
- //}
- //return result;
-//}
+ ::kaldi::Matrix<::kaldi::BaseFloat> features;
+ ::kaldi::ComputeKaldiPitch(opts, input_wav, &features);
+ auto result = py::array_t({features.NumRows(), features.NumCols()});
+ for (int row_idx = 0; row_idx < features.NumRows(); ++row_idx) {
+ std::memcpy(result.mutable_data(row_idx), features.Row(row_idx).Data(),
+ sizeof(float)*features.NumCols());
+ }
+ return result;
+}
} // namespace kaldi
} // namespace paddleaudio
diff --git a/audio/paddleaudio/src/pybind/kaldi/kaldi_feature.h b/audio/paddleaudio/src/pybind/kaldi/kaldi_feature.h
index 031ec863b..e059c52c1 100644
--- a/audio/paddleaudio/src/pybind/kaldi/kaldi_feature.h
+++ b/audio/paddleaudio/src/pybind/kaldi/kaldi_feature.h
@@ -19,7 +19,7 @@
#include
#include "paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.h"
-//#include "feat/pitch-functions.h"
+#include "feat/pitch-functions.h"
namespace py = pybind11;
@@ -42,13 +42,13 @@ struct FbankOptions{
};
bool InitFbank(
- knf::FrameExtractionOptions frame_opts,
- knf::MelBanksOptions mel_opts,
+ ::kaldi::FrameExtractionOptions frame_opts,
+ ::kaldi::MelBanksOptions mel_opts,
FbankOptions fbank_opts);
py::array_t ComputeFbank(
- knf::FrameExtractionOptions frame_opts,
- knf::MelBanksOptions mel_opts,
+ ::kaldi::FrameExtractionOptions frame_opts,
+ ::kaldi::MelBanksOptions mel_opts,
FbankOptions fbank_opts,
const py::array_t& wav);
@@ -56,9 +56,9 @@ py::array_t ComputeFbankStreaming(const py::array_t& wav);
void ResetFbank();
-//py::array_t ComputeKaldiPitch(
- //const ::kaldi::PitchExtractionOptions& opts,
- //const py::array_t& wav);
+py::array_t ComputeKaldiPitch(
+ const ::kaldi::PitchExtractionOptions& opts,
+ const py::array_t& wav);
} // namespace kaldi
} // namespace paddleaudio
diff --git a/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.cc b/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.cc
index 8b8ff18be..79558046b 100644
--- a/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.cc
+++ b/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.cc
@@ -22,7 +22,7 @@ KaldiFeatureWrapper* KaldiFeatureWrapper::GetInstance() {
return &instance;
}
-bool KaldiFeatureWrapper::InitFbank(knf::FbankOptions opts) {
+bool KaldiFeatureWrapper::InitFbank(::kaldi::FbankOptions opts) {
fbank_.reset(new Fbank(opts));
return true;
}
@@ -30,18 +30,21 @@ bool KaldiFeatureWrapper::InitFbank(knf::FbankOptions opts) {
py::array_t KaldiFeatureWrapper::ComputeFbank(
const py::array_t wav) {
py::buffer_info info = wav.request();
- std::vector input_wav((float*)info.ptr, (float*)info.ptr + info.size);
+ ::kaldi::SubVector<::kaldi::BaseFloat> input_wav((float*)info.ptr, info.size);
- std::vector feats;
+ ::kaldi::Vector<::kaldi::BaseFloat> feats;
bool flag = fbank_->ComputeFeature(input_wav, &feats);
- if (flag == false || feats.size() == 0) return py::array_t();
- auto result = py::array_t(feats.size());
+ if (flag == false || feats.Dim() == 0) return py::array_t();
+ auto result = py::array_t(feats.Dim());
py::buffer_info xs = result.request();
+ std::cout << std::endl;
float* res_ptr = (float*)xs.ptr;
- std::memcpy(res_ptr, feats.data(), sizeof(float)*feats.size());
- std::vector shape{static_cast(feats.size() / Dim()),
- static_cast(Dim())};
- return result.reshape(shape);
+ for (int idx = 0; idx < feats.Dim(); ++idx) {
+ *res_ptr = feats(idx);
+ res_ptr++;
+ }
+
+ return result.reshape({feats.Dim() / Dim(), Dim()});
}
} // namesapce kaldi
diff --git a/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.h b/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.h
index daad2d587..bee1eee02 100644
--- a/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.h
+++ b/audio/paddleaudio/src/pybind/kaldi/kaldi_feature_wrapper.h
@@ -14,18 +14,20 @@
#pragma once
-#include "paddleaudio/third_party/kaldi-native-fbank/csrc/feature-fbank.h"
+#include "base/kaldi-common.h"
+#include "feat/feature-fbank.h"
+
#include "paddleaudio/src/pybind/kaldi/feature_common.h"
namespace paddleaudio {
namespace kaldi {
-typedef StreamingFeatureTpl Fbank;
+typedef StreamingFeatureTpl<::kaldi::FbankComputer> Fbank;
class KaldiFeatureWrapper {
public:
static KaldiFeatureWrapper* GetInstance();
- bool InitFbank(knf::FbankOptions opts);
+ bool InitFbank(::kaldi::FbankOptions opts);
py::array_t ComputeFbank(const py::array_t wav);
int Dim() { return fbank_->Dim(); }
void ResetFbank() { fbank_->Reset(); }
diff --git a/audio/paddleaudio/src/pybind/pybind.cpp b/audio/paddleaudio/src/pybind/pybind.cpp
index 510712034..692e80995 100644
--- a/audio/paddleaudio/src/pybind/pybind.cpp
+++ b/audio/paddleaudio/src/pybind/pybind.cpp
@@ -2,7 +2,7 @@
#ifdef INCLUDE_KALDI
#include "paddleaudio/src/pybind/kaldi/kaldi_feature.h"
-#include "paddleaudio/third_party/kaldi-native-fbank/csrc/feature-fbank.h"
+#include "paddleaudio/third_party/kaldi/feat/feature-fbank.h"
#endif
#ifdef INCLUDE_SOX
@@ -89,51 +89,53 @@ PYBIND11_MODULE(_paddleaudio, m) {
#ifdef INCLUDE_KALDI
m.def("ComputeFbank", &paddleaudio::kaldi::ComputeFbank, "compute fbank");
- //py::class_(m, "PitchExtractionOptions")
- //.def(py::init<>())
- //.def_readwrite("samp_freq", &kaldi::PitchExtractionOptions::samp_freq)
- //.def_readwrite("frame_shift_ms", &kaldi::PitchExtractionOptions::frame_shift_ms)
- //.def_readwrite("frame_length_ms", &kaldi::PitchExtractionOptions::frame_length_ms)
- //.def_readwrite("preemph_coeff", &kaldi::PitchExtractionOptions::preemph_coeff)
- //.def_readwrite("min_f0", &kaldi::PitchExtractionOptions::min_f0)
- //.def_readwrite("max_f0", &kaldi::PitchExtractionOptions::max_f0)
- //.def_readwrite("soft_min_f0", &kaldi::PitchExtractionOptions::soft_min_f0)
- //.def_readwrite("penalty_factor", &kaldi::PitchExtractionOptions::penalty_factor)
- //.def_readwrite("lowpass_cutoff", &kaldi::PitchExtractionOptions::lowpass_cutoff)
- //.def_readwrite("resample_freq", &kaldi::PitchExtractionOptions::resample_freq)
- //.def_readwrite("delta_pitch", &kaldi::PitchExtractionOptions::delta_pitch)
- //.def_readwrite("nccf_ballast", &kaldi::PitchExtractionOptions::nccf_ballast)
- //.def_readwrite("lowpass_filter_width", &kaldi::PitchExtractionOptions::lowpass_filter_width)
- //.def_readwrite("upsample_filter_width", &kaldi::PitchExtractionOptions::upsample_filter_width)
- //.def_readwrite("max_frames_latency", &kaldi::PitchExtractionOptions::max_frames_latency)
- //.def_readwrite("frames_per_chunk", &kaldi::PitchExtractionOptions::frames_per_chunk)
- //.def_readwrite("simulate_first_pass_online", &kaldi::PitchExtractionOptions::simulate_first_pass_online)
- //.def_readwrite("recompute_frame", &kaldi::PitchExtractionOptions::recompute_frame)
- //.def_readwrite("nccf_ballast_online", &kaldi::PitchExtractionOptions::nccf_ballast_online)
- //.def_readwrite("snip_edges", &kaldi::PitchExtractionOptions::snip_edges);
- //m.def("ComputeKaldiPitch", &paddleaudio::kaldi::ComputeKaldiPitch, "compute kaldi pitch");
- py::class_(m, "FrameExtractionOptions")
+ py::class_(m, "PitchExtractionOptions")
+ .def(py::init<>())
+ .def_readwrite("samp_freq", &kaldi::PitchExtractionOptions::samp_freq)
+ .def_readwrite("frame_shift_ms", &kaldi::PitchExtractionOptions::frame_shift_ms)
+ .def_readwrite("frame_length_ms", &kaldi::PitchExtractionOptions::frame_length_ms)
+ .def_readwrite("preemph_coeff", &kaldi::PitchExtractionOptions::preemph_coeff)
+ .def_readwrite("min_f0", &kaldi::PitchExtractionOptions::min_f0)
+ .def_readwrite("max_f0", &kaldi::PitchExtractionOptions::max_f0)
+ .def_readwrite("soft_min_f0", &kaldi::PitchExtractionOptions::soft_min_f0)
+ .def_readwrite("penalty_factor", &kaldi::PitchExtractionOptions::penalty_factor)
+ .def_readwrite("lowpass_cutoff", &kaldi::PitchExtractionOptions::lowpass_cutoff)
+ .def_readwrite("resample_freq", &kaldi::PitchExtractionOptions::resample_freq)
+ .def_readwrite("delta_pitch", &kaldi::PitchExtractionOptions::delta_pitch)
+ .def_readwrite("nccf_ballast", &kaldi::PitchExtractionOptions::nccf_ballast)
+ .def_readwrite("lowpass_filter_width", &kaldi::PitchExtractionOptions::lowpass_filter_width)
+ .def_readwrite("upsample_filter_width", &kaldi::PitchExtractionOptions::upsample_filter_width)
+ .def_readwrite("max_frames_latency", &kaldi::PitchExtractionOptions::max_frames_latency)
+ .def_readwrite("frames_per_chunk", &kaldi::PitchExtractionOptions::frames_per_chunk)
+ .def_readwrite("simulate_first_pass_online", &kaldi::PitchExtractionOptions::simulate_first_pass_online)
+ .def_readwrite("recompute_frame", &kaldi::PitchExtractionOptions::recompute_frame)
+ .def_readwrite("nccf_ballast_online", &kaldi::PitchExtractionOptions::nccf_ballast_online)
+ .def_readwrite("snip_edges", &kaldi::PitchExtractionOptions::snip_edges);
+ m.def("ComputeKaldiPitch", &paddleaudio::kaldi::ComputeKaldiPitch, "compute kaldi pitch");
+ py::class_(m, "FrameExtractionOptions")
.def(py::init<>())
- .def_readwrite("samp_freq", &knf::FrameExtractionOptions::samp_freq)
- .def_readwrite("frame_shift_ms", &knf::FrameExtractionOptions::frame_shift_ms)
- .def_readwrite("frame_length_ms", &knf::FrameExtractionOptions::frame_length_ms)
- .def_readwrite("dither", &knf::FrameExtractionOptions::dither)
- .def_readwrite("preemph_coeff", &knf::FrameExtractionOptions::preemph_coeff)
- .def_readwrite("remove_dc_offset", &knf::FrameExtractionOptions::remove_dc_offset)
- .def_readwrite("window_type", &knf::FrameExtractionOptions::window_type)
- .def_readwrite("round_to_power_of_two", &knf::FrameExtractionOptions::round_to_power_of_two)
- .def_readwrite("blackman_coeff", &knf::FrameExtractionOptions::blackman_coeff)
- .def_readwrite("snip_edges", &knf::FrameExtractionOptions::snip_edges)
- .def_readwrite("max_feature_vectors", &knf::FrameExtractionOptions::max_feature_vectors);
- py::class_(m, "MelBanksOptions")
+ .def_readwrite("samp_freq", &kaldi::FrameExtractionOptions::samp_freq)
+ .def_readwrite("frame_shift_ms", &kaldi::FrameExtractionOptions::frame_shift_ms)
+ .def_readwrite("frame_length_ms", &kaldi::FrameExtractionOptions::frame_length_ms)
+ .def_readwrite("dither", &kaldi::FrameExtractionOptions::dither)
+ .def_readwrite("preemph_coeff", &kaldi::FrameExtractionOptions::preemph_coeff)
+ .def_readwrite("remove_dc_offset", &kaldi::FrameExtractionOptions::remove_dc_offset)
+ .def_readwrite("window_type", &kaldi::FrameExtractionOptions::window_type)
+ .def_readwrite("round_to_power_of_two", &kaldi::FrameExtractionOptions::round_to_power_of_two)
+ .def_readwrite("blackman_coeff", &kaldi::FrameExtractionOptions::blackman_coeff)
+ .def_readwrite("snip_edges", &kaldi::FrameExtractionOptions::snip_edges)
+ .def_readwrite("allow_downsample", &kaldi::FrameExtractionOptions::allow_downsample)
+ .def_readwrite("allow_upsample", &kaldi::FrameExtractionOptions::allow_upsample)
+ .def_readwrite("max_feature_vectors", &kaldi::FrameExtractionOptions::max_feature_vectors);
+ py::class_(m, "MelBanksOptions")
.def(py::init<>())
- .def_readwrite("num_bins", &knf::MelBanksOptions::num_bins)
- .def_readwrite("low_freq", &knf::MelBanksOptions::low_freq)
- .def_readwrite("high_freq", &knf::MelBanksOptions::high_freq)
- .def_readwrite("vtln_low", &knf::MelBanksOptions::vtln_low)
- .def_readwrite("vtln_high", &knf::MelBanksOptions::vtln_high)
- .def_readwrite("debug_mel", &knf::MelBanksOptions::debug_mel)
- .def_readwrite("htk_mode", &knf::MelBanksOptions::htk_mode);
+ .def_readwrite("num_bins", &kaldi::MelBanksOptions::num_bins)
+ .def_readwrite("low_freq", &kaldi::MelBanksOptions::low_freq)
+ .def_readwrite("high_freq", &kaldi::MelBanksOptions::high_freq)
+ .def_readwrite("vtln_low", &kaldi::MelBanksOptions::vtln_low)
+ .def_readwrite("vtln_high", &kaldi::MelBanksOptions::vtln_high)
+ .def_readwrite("debug_mel", &kaldi::MelBanksOptions::debug_mel)
+ .def_readwrite("htk_mode", &kaldi::MelBanksOptions::htk_mode);
py::class_(m, "FbankOptions")
.def(py::init<>())
diff --git a/audio/paddleaudio/third_party/CMakeLists.txt b/audio/paddleaudio/third_party/CMakeLists.txt
index 4b85bada0..43288f39b 100644
--- a/audio/paddleaudio/third_party/CMakeLists.txt
+++ b/audio/paddleaudio/third_party/CMakeLists.txt
@@ -11,6 +11,5 @@ endif()
# kaldi
################################################################################
if (BUILD_KALDI)
- include_directories(${CMAKE_CURRENT_SOURCE_DIR})
- add_subdirectory(kaldi-native-fbank/csrc)
-endif()
+ add_subdirectory(kaldi)
+endif()
\ No newline at end of file
diff --git a/audio/paddleaudio/third_party/kaldi-native-fbank/csrc/CMakeLists.txt b/audio/paddleaudio/third_party/kaldi-native-fbank/csrc/CMakeLists.txt
deleted file mode 100644
index 176607fc0..000000000
--- a/audio/paddleaudio/third_party/kaldi-native-fbank/csrc/CMakeLists.txt
+++ /dev/null
@@ -1,22 +0,0 @@
-include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../../)
-add_library(kaldi-native-fbank-core
- feature-fbank.cc
- feature-functions.cc
- feature-window.cc
- fftsg.c
- log.cc
- mel-computations.cc
- rfft.cc
-)
-# We are using std::call_once() in log.h,which requires us to link with -pthread
-if(NOT WIN32)
- target_link_libraries(kaldi-native-fbank-core -pthread)
-endif()
-
-if(KNF_HAVE_EXECINFO_H)
- target_compile_definitions(kaldi-native-fbank-core PRIVATE KNF_HAVE_EXECINFO_H=1)
-endif()
-
-if(KNF_HAVE_CXXABI_H)
- target_compile_definitions(kaldi-native-fbank-core PRIVATE KNF_HAVE_CXXABI_H=1)
-endif()
diff --git a/audio/paddleaudio/third_party/kaldi-native-fbank/csrc/feature-fbank.cc b/audio/paddleaudio/third_party/kaldi-native-fbank/csrc/feature-fbank.cc
deleted file mode 100644
index 740ee17e9..000000000
--- a/audio/paddleaudio/third_party/kaldi-native-fbank/csrc/feature-fbank.cc
+++ /dev/null
@@ -1,117 +0,0 @@
-/**
- * Copyright (c) 2022 Xiaomi Corporation (authors: Fangjun Kuang)
- *
- * See LICENSE for clarification regarding multiple authors
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-// This file is copied/modified from kaldi/src/feat/feature-fbank.cc
-//
-#include "kaldi-native-fbank/csrc/feature-fbank.h"
-
-#include
-
-#include "kaldi-native-fbank/csrc/feature-functions.h"
-
-namespace knf {
-
-static void Sqrt(float *in_out, int32_t n) {
- for (int32_t i = 0; i != n; ++i) {
- in_out[i] = std::sqrt(in_out[i]);
- }
-}
-
-std::ostream &operator<<(std::ostream &os, const FbankOptions &opts) {
- os << opts.ToString();
- return os;
-}
-
-FbankComputer::FbankComputer(const FbankOptions &opts)
- : opts_(opts), rfft_(opts.frame_opts.PaddedWindowSize()) {
- if (opts.energy_floor > 0.0f) {
- log_energy_floor_ = logf(opts.energy_floor);
- }
-
- // We'll definitely need the filterbanks info for VTLN warping factor 1.0.
- // [note: this call caches it.]
- GetMelBanks(1.0f);
-}
-
-FbankComputer::~FbankComputer() {
- for (auto iter = mel_banks_.begin(); iter != mel_banks_.end(); ++iter)
- delete iter->second;
-}
-
-const MelBanks *FbankComputer::GetMelBanks(float vtln_warp) {
- MelBanks *this_mel_banks = nullptr;
-
- // std::map::iterator iter = mel_banks_.find(vtln_warp);
- auto iter = mel_banks_.find(vtln_warp);
- if (iter == mel_banks_.end()) {
- this_mel_banks = new MelBanks(opts_.mel_opts, opts_.frame_opts, vtln_warp);
- mel_banks_[vtln_warp] = this_mel_banks;
- } else {
- this_mel_banks = iter->second;
- }
- return this_mel_banks;
-}
-
-void FbankComputer::Compute(float signal_raw_log_energy, float vtln_warp,
- std::vector *signal_frame, float *feature) {
- const MelBanks &mel_banks = *(GetMelBanks(vtln_warp));
-
- KNF_CHECK_EQ(signal_frame->size(), opts_.frame_opts.PaddedWindowSize());
-
- // Compute energy after window function (not the raw one).
- if (opts_.use_energy && !opts_.raw_energy) {
- signal_raw_log_energy = std::log(
- std::max(InnerProduct(signal_frame->data(), signal_frame->data(),
- signal_frame->size()),
- std::numeric_limits::epsilon()));
- }
- rfft_.Compute(signal_frame->data()); // signal_frame is modified in-place
- ComputePowerSpectrum(signal_frame);
-
- // Use magnitude instead of power if requested.
- if (!opts_.use_power) {
- Sqrt(signal_frame->data(), signal_frame->size() / 2 + 1);
- }
-
- int32_t mel_offset = ((opts_.use_energy && !opts_.htk_compat) ? 1 : 0);
-
- // Its length is opts_.mel_opts.num_bins
- float *mel_energies = feature + mel_offset;
-
- // Sum with mel filter banks over the power spectrum
- mel_banks.Compute(signal_frame->data(), mel_energies);
-
- if (opts_.use_log_fbank) {
- // Avoid log of zero (which should be prevented anyway by dithering).
- for (int32_t i = 0; i != opts_.mel_opts.num_bins; ++i) {
- auto t = std::max(mel_energies[i], std::numeric_limits::epsilon());
- mel_energies[i] = std::log(t);
- }
- }
-
- // Copy energy as first value (or the last, if htk_compat == true).
- if (opts_.use_energy) {
- if (opts_.energy_floor > 0.0 && signal_raw_log_energy < log_energy_floor_) {
- signal_raw_log_energy = log_energy_floor_;
- }
- int32_t energy_index = opts_.htk_compat ? opts_.mel_opts.num_bins : 0;
- feature[energy_index] = signal_raw_log_energy;
- }
-}
-
-} // namespace knf
diff --git a/audio/paddleaudio/third_party/kaldi-native-fbank/csrc/feature-fbank.h b/audio/paddleaudio/third_party/kaldi-native-fbank/csrc/feature-fbank.h
deleted file mode 100644
index 0ef3fac0d..000000000
--- a/audio/paddleaudio/third_party/kaldi-native-fbank/csrc/feature-fbank.h
+++ /dev/null
@@ -1,132 +0,0 @@
-/**
- * Copyright (c) 2022 Xiaomi Corporation (authors: Fangjun Kuang)
- *
- * See LICENSE for clarification regarding multiple authors
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-// This file is copied/modified from kaldi/src/feat/feature-fbank.h
-
-#ifndef KALDI_NATIVE_FBANK_CSRC_FEATURE_FBANK_H_
-#define KALDI_NATIVE_FBANK_CSRC_FEATURE_FBANK_H_
-
-#include