You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
3 months ago | |
|---|---|---|
| .. | ||
| TTSAndroid | 9 months ago | |
| TTSArmLinux | 9 months ago | |
| TTSCppFrontend | 9 months ago | |
| asr_deployment | ||
| audio_content_search | 9 months ago | |
| audio_searching | 9 months ago | |
| audio_tagging | 9 months ago | |
| automatic_video_subtitiles | 9 months ago | |
| custom_streaming_asr | 9 months ago | |
| keyword_spotting | 9 months ago | |
| metaverse | 9 months ago | |
| punctuation_restoration | ||
| speaker_verification | 9 months ago | |
| speech_recognition | 9 months ago | |
| speech_server | 9 months ago | |
| speech_ssl | 9 months ago | |
| speech_translation | 9 months ago | |
| speech_web | 3 months ago | |
| story_talker | 9 months ago | |
| streaming_asr_server | 4 months ago | |
| streaming_tts_server | 9 months ago | |
| streaming_tts_serving_fastdeploy | 9 months ago | |
| style_fs2 | 9 months ago | |
| text_to_speech | 9 months ago | |
| whisper | 3 months ago | |
| README.md | ||
| README_cn.md | ||
README.md
Speech Application based on PaddleSpeech
(简体中文|English)
This directory contains many speech applications in multiple scenarios.
- audio searching - mass audio similarity retrieval
- audio tagging - multi-label tagging of an audio file
- automatic_video_subtitles - generate subtitles from a video
- metaverse - 2D AR with TTS
- punctuation_restoration - restore punctuation from raw text
- speech recognition - recognize text of an audio file
- speech server - Server for Speech Task, e.g. ASR,TTS,CLS
- streaming asr server - receive audio stream from websocket, and recognize to transcript.
- streaming tts server - receive text from http or websocket, and streaming audio data stream.
- speech translation - end to end speech translation
- story talker - book reader based on OCR and TTS
- style_fs2 - multi style control for FastSpeech2 model
- text_to_speech - convert text into speech
- self supervised pretraining - speech feature extraction and speech recognition based on wav2vec2
- Whisper - speech recognize and translate based on Whisper model