diff --git a/README.md b/README.md index df143174..bf25c558 100644 --- a/README.md +++ b/README.md @@ -3,9 +3,7 @@ -This project is a part of [Mozilla Common Voice](https://voice.mozilla.org/en). TTS aims a deep learning based Text2Speech engine, low in cost and high in quality. To begin with, you can hear a sample generated voice from [here](https://soundcloud.com/user-565970875/commonvoice-loc-sens-attn). - -TTS includes two different model implementations which are based on [Tacotron](https://arxiv.org/abs/1703.10135) and [Tacotron2](https://arxiv.org/abs/1712.05884). Tacotron is smaller, efficient and easier to train but Tacotron2 provides better results, especially when it is combined with a Neural vocoder. Therefore, choose depending on your project requirements. +This project is a part of [Mozilla Common Voice](https://voice.mozilla.org/en). TTS aims a deep learning based Text2Speech engine, low in cost and high in quality. To begin with, you can hear a sample synthesized voice from [here](https://soundcloud.com/user-565970875/commonvoice-loc-sens-attn). If you are new, you can also find [here](http://www.erogol.com/text-speech-deep-learning-architectures/) a brief post about TTS architectures and their comparisons. @@ -16,6 +14,27 @@ If you are new, you can also find [here](http://www.erogol.com/text-speech-deep- [Details...](https://github.com/mozilla/TTS/wiki/Mean-Opinion-Score-Results) +## Provided Models and Methods +Text-to-Spectrogram: +- Tacotron: [paper](https://arxiv.org/abs/1703.10135) +- Tacotron2: [paper](https://arxiv.org/abs/1712.05884) + +Attention Methods: +- Guided Attention [paper](https://arxiv.org/abs/1710.08969) +- Forward Backward Decoding [paper](https://arxiv.org/abs/1907.09006) +- Graves Attention [paper](https://arxiv.org/abs/1907.09006) +- Double Decoder Consistency [blog](https://erogol.com/solving-attention-problems-of-tts-models-with-double-decoder-consistency/) + +Speaker Encoder: +- GE2E: [paper](https://arxiv.org/abs/1710.10467) + +Vocoders: +- MelGAN: [paper](https://arxiv.org/abs/1710.10467) +- MultiBandMelGAN: [paper](https://arxiv.org/abs/2005.05106) +- GAN-TTS discriminators: [paper](https://arxiv.org/abs/1909.11646) + +You can also help us implement more models. Some TTS related work can be found [here](https://github.com/erogol/TTS-papers). + ## Features - High performance Deep Learning models for Text2Speech related tasks. - Text2Speech models (Tacotron, Tacotron2). @@ -56,7 +75,7 @@ Or you can use ```requirements.txt``` to install the requirements only. | |- train.py (train your TTS model.) | |- distribute.py (train your TTS model using Multiple GPUs) | |- config.json (TTS model configuration file) -| |- tf (Tensorflow 2 utilities and model implementations) +| |- tf/ (Tensorflow 2 utilities and model implementations) | |- layers/ (model layer definitions) | |- models/ (model definitions) | |- notebooks/ (Jupyter Notebooks for model evaluation and parameter selection)