3.4 KiB

Raw Blame History

TTS Papers

Text Frontend

Polyphone

【g2pM】g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT

Text Normalization

English

applenob/text_normalization

G2P

English

cmusphinx/g2p-seq2seq

Acoustic Models

Vocoders

【RefineGAN】RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses
【Fre-GAN】Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
【StyleMelGAN】StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization
【Multi-band MelGAN】Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
【HiFi-GAN】HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
【VocGAN】VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
【Parallel WaveGAN】Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
【MelGAN】MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
【WaveFlow】WaveFlow: A Compact Flow-based Model for Raw Audio
【LPCNet】LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
【WaveRNN】Efficient Neural Audio Synthesis

GAN TTS

【GAN TTS】High Fidelity Speech Synthesis with Adversarial Networks

Voice Cloning