mirror of https://github.com/coqui-ai/TTS.git
* Implement most similar ref training approach * Use non-enhanced hifigan for test samples * Add Perceiver * Update GPT Trainer for perceiver support * Update XTTS docs * Bug fix masking with XTTS perceiver * Bug fix on gpt forward * Bug Fix on XTTS v2.0 training * Add XTTS v2.0 unit tests * Add XTTS v2.0 inference unit tests * Bug Fix on diffusion inference * Add XTTS v2.0 training recipe * Placeholder model entry * Add cloning params to config * Make prompt embedding configurable * Make cloning configurable * Cheap fix for a cheaper fix * Prevent resampling * Update model entry * Update docs * Update requirements * Code linting * Add xtts v2 to sep tests * Bug fix on XTTS get_gpt_cond_latents * Bug fix on rebase * Make style * Bug fix in Japenese tokenizer * Add num2words to deps * Remove unused kwarg and added num_beams=1 as default --------- Co-authored-by: Eren G??lge <egolge@coqui.ai> |
||
---|---|---|
.. | ||
bel-alex73 | ||
blizzard2013 | ||
kokoro/tacotron2-DDC | ||
ljspeech | ||
multilingual | ||
thorsten_DE | ||
vctk | ||
README.md |
README.md
🐸💬 TTS Training Recipes
TTS recipes intended to host scripts running all the necessary steps to train a TTS model on a particular dataset.
For each dataset, you need to download the dataset once. Then you run the training for the model you want.
Run each script from the root TTS folder as follows.
$ sh ./recipes/<dataset>/download_<dataset>.sh
$ python recipes/<dataset>/<model_name>/train.py
For some datasets you might need to resample the audio files. For example, VCTK dataset can be resampled to 22050Hz as follows.
python TTS/bin/resample.py --input_dir recipes/vctk/VCTK/wav48_silence_trimmed --output_sr 22050 --output_dir recipes/vctk/VCTK/wav48_silence_trimmed --n_jobs 8 --file_ext flac
If you train a new model using TTS, feel free to share your training to expand the list of recipes.
You can also open a new discussion and share your progress with the 🐸 community.