mirror of https://github.com/coqui-ai/TTS.git
67 lines
2.2 KiB
Markdown
67 lines
2.2 KiB
Markdown
# TTS (Work in Progress...)
|
|
|
|
Here we have pytorch implementation of:
|
|
- Tacotron: [A Fully End-to-End Text-To-Speech Synthesis Model](https://arxiv.org/abs/1703.10135).
|
|
- Tacotron2 (TODO): [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/pdf/1712.05884.pdf)
|
|
|
|
At the end, it should be easy to add new models and try different architectures.
|
|
|
|
You can find [here](https://www.evernote.com/shard/s146/sh/9544e7e9-d372-4610-a7b7-3ddcb63d5dac/d01d33837dab625229dec3cfb4cfb887) a brief note about possible TTS architectures and their comparisons.
|
|
|
|
## Requirements
|
|
Highly recommended to use [miniconda](https://conda.io/miniconda.html) for easier installation.
|
|
* python 3.6
|
|
* pytorch > 0.2.0
|
|
* TODO
|
|
|
|
## Data
|
|
Currently TTS provides data loaders for
|
|
- [LJ Speech](https://keithito.com/LJ-Speech-Dataset/)
|
|
|
|
## Training the network
|
|
To run your own training, you need to define a ```config.json``` file (simple template below) and call with the command.
|
|
|
|
```train.py --config_path config.json```
|
|
|
|
If you like to use specific set of GPUs.
|
|
|
|
```CUDA_VISIBLE_DEVICES="0,1,4" train.py --config_path config.json```
|
|
|
|
Each run creates an experiment folder with the corresponfing date and time, under the folder you set in ```config.json```. And if there is no checkpoint yet under that folder, it is going to be removed when you press Ctrl+C.
|
|
|
|
Example ```config.json```:
|
|
```
|
|
{
|
|
// Data loading parameters
|
|
"num_mels": 80,
|
|
"num_freq": 1024,
|
|
"sample_rate": 20000,
|
|
"frame_length_ms": 50.0,
|
|
"frame_shift_ms": 12.5,
|
|
"preemphasis": 0.97,
|
|
"min_level_db": -100,
|
|
"ref_level_db": 20,
|
|
"hidden_size": 128,
|
|
"embedding_size": 256,
|
|
"text_cleaner": "english_cleaners",
|
|
|
|
// Training parameters
|
|
"epochs": 2000,
|
|
"lr": 0.001,
|
|
"batch_size": 256,
|
|
"griffinf_lim_iters": 60,
|
|
"power": 1.5,
|
|
"r": 5, // number of decoder outputs for Tacotron
|
|
|
|
// Number of data loader processes
|
|
"num_loader_workers": 8,
|
|
|
|
// Experiment logging parameters
|
|
"checkpoint": true, // if save checkpoint per save_step
|
|
"save_step": 200,
|
|
"data_path": "/path/to/KeithIto/LJSpeech-1.0",
|
|
"output_path": "/path/to/my_experiment",
|
|
"log_dir": "/path/to/my/tensorboard/logs/"
|
|
}
|
|
```
|