coqui-tts/README.md

67 lines
2.2 KiB
Markdown

# TTS (Work in Progress...)
Here we have pytorch implementation of:
- Tacotron: [A Fully End-to-End Text-To-Speech Synthesis Model](https://arxiv.org/abs/1703.10135).
- Tacotron2 (TODO): [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/pdf/1712.05884.pdf)
At the end, it should be easy to add new models and try different architectures.
You can find [here](https://www.evernote.com/shard/s146/sh/9544e7e9-d372-4610-a7b7-3ddcb63d5dac/d01d33837dab625229dec3cfb4cfb887) a brief note about possible TTS architectures and their comparisons.
## Requirements
Highly recommended to use [miniconda](https://conda.io/miniconda.html) for easier installation.
* python 3.6
* pytorch > 0.2.0
* TODO
## Data
Currently TTS provides data loaders for
- [LJ Speech](https://keithito.com/LJ-Speech-Dataset/)
## Training the network
To run your own training, you need to define a ```config.json``` file (simple template below) and call with the command.
```train.py --config_path config.json```
If you like to use specific set of GPUs.
```CUDA_VISIBLE_DEVICES="0,1,4" train.py --config_path config.json```
Each run creates an experiment folder with the corresponfing date and time, under the folder you set in ```config.json```. And if there is no checkpoint yet under that folder, it is going to be removed when you press Ctrl+C.
Example ```config.json```:
```
{
// Data loading parameters
"num_mels": 80,
"num_freq": 1024,
"sample_rate": 20000,
"frame_length_ms": 50.0,
"frame_shift_ms": 12.5,
"preemphasis": 0.97,
"min_level_db": -100,
"ref_level_db": 20,
"hidden_size": 128,
"embedding_size": 256,
"text_cleaner": "english_cleaners",
// Training parameters
"epochs": 2000,
"lr": 0.001,
"batch_size": 256,
"griffinf_lim_iters": 60,
"power": 1.5,
"r": 5, // number of decoder outputs for Tacotron
// Number of data loader processes
"num_loader_workers": 8,
// Experiment logging parameters
"checkpoint": true, // if save checkpoint per save_step
"save_step": 200,
"data_path": "/path/to/KeithIto/LJSpeech-1.0",
"output_path": "/path/to/my_experiment",
"log_dir": "/path/to/my/tensorboard/logs/"
}
```