README updates added models and method ssection

2020-06-19 16:53:37 +02:00 · 2020-06-19 16:53:37 +02:00 · ec7aa4496e
parent 4b99eacb38
commit ec7aa4496e
1 changed files with 23 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -3,9 +3,7 @@

 <img src="https://travis-ci.org/mozilla/TTS.svg?branch=dev"/>

-This project is a part of [Mozilla Common Voice](https://voice.mozilla.org/en). TTS aims a deep learning based Text2Speech engine, low in cost and high in quality. To begin with, you can hear a sample generated voice from [here](https://soundcloud.com/user-565970875/commonvoice-loc-sens-attn).
-
-TTS includes two different model implementations which are based on [Tacotron](https://arxiv.org/abs/1703.10135) and [Tacotron2](https://arxiv.org/abs/1712.05884). Tacotron is smaller, efficient and easier to train but Tacotron2 provides better results, especially when it is combined with a Neural vocoder. Therefore, choose depending on your project requirements.
+This project is a part of [Mozilla Common Voice](https://voice.mozilla.org/en). TTS aims a deep learning based Text2Speech engine, low in cost and high in quality. To begin with, you can hear a sample synthesized voice from [here](https://soundcloud.com/user-565970875/commonvoice-loc-sens-attn).

 If you are new, you can also find [here](http://www.erogol.com/text-speech-deep-learning-architectures/) a brief post about TTS architectures and their comparisons.

@ -16,6 +14,27 @@ If you are new, you can also find [here](http://www.erogol.com/text-speech-deep-

 [Details...](https://github.com/mozilla/TTS/wiki/Mean-Opinion-Score-Results)

+## Provided Models and Methods
+Text-to-Spectrogram:
+- Tacotron: [paper](https://arxiv.org/abs/1703.10135)
+- Tacotron2: [paper](https://arxiv.org/abs/1712.05884)
+
+Attention Methods:
+- Guided Attention [paper](https://arxiv.org/abs/1710.08969)
+- Forward Backward Decoding [paper](https://arxiv.org/abs/1907.09006)
+- Graves Attention [paper](https://arxiv.org/abs/1907.09006)
+- Double Decoder Consistency [blog](https://erogol.com/solving-attention-problems-of-tts-models-with-double-decoder-consistency/)
+
+Speaker Encoder:
+- GE2E: [paper](https://arxiv.org/abs/1710.10467)
+
+Vocoders:
+- MelGAN: [paper](https://arxiv.org/abs/1710.10467)
+- MultiBandMelGAN: [paper](https://arxiv.org/abs/2005.05106)
+- GAN-TTS discriminators: [paper](https://arxiv.org/abs/1909.11646)
+
+You can also help us implement more models. Some TTS related work can be found [here](https://github.com/erogol/TTS-papers).
+
 ## Features
 - High performance Deep Learning models for Text2Speech related tasks.
    - Text2Speech models (Tacotron, Tacotron2).
@ -56,7 +75,7 @@ Or you can use ```requirements.txt``` to install the requirements only.
 |   |- train.py         (train your TTS model.)
 |   |- distribute.py    (train your TTS model using Multiple GPUs)
 |   |- config.json      (TTS model configuration file)
-|   |- tf               (Tensorflow 2 utilities and model implementations)
+|   |- tf/              (Tensorflow 2 utilities and model implementations)
 |   |- layers/          (model layer definitions)
 |   |- models/          (model definitions)
 |   |- notebooks/       (Jupyter Notebooks for model evaluation and parameter selection)