You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
3.7 KiB
3.7 KiB
TTS Papers
Text Frontend
Polyphone
- 【g2pM】g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
- Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
- Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning
- WikipediaHomographData
Text Normalization
English
G2P
English
Acoustic Models
- 【AdaSpeech3】AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
- 【AdaSpeech2】AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
- 【AdaSpeech】AdaSpeech: Adaptive Text to Speech for Custom Voice
- 【FastSpeech2】FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
- 【FastPitch】FastPitch: Parallel Text-to-speech with Pitch Prediction
- 【SpeedySpeech】SpeedySpeech: Efficient Neural Speech Synthesis
- 【FastSpeech】FastSpeech: Fast, Robust and Controllable Text to Speech
- 【Transformer TTS】Neural Speech Synthesis with Transformer Network
- 【Tacotron2】Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Vocoders
- 【RefineGAN】RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses
- 【Fre-GAN】Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
- 【StyleMelGAN】StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization
- 【Multi-band MelGAN】Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
- 【HiFi-GAN】HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
- 【VocGAN】VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
- 【Parallel WaveGAN】Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
- 【MelGAN】MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
- 【WaveFlow】WaveFlow: A Compact Flow-based Model for Raw Audio
- 【LPCNet】LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
- 【WaveRNN】Efficient Neural Audio Synthesis