* Fix checkpointing GAN models (#1641)
* checkpoint sae step crash fix
* checkpoint save step crash fix
* Update gan.py
updated requested changes
* crash fix
* Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469)
Co-authored-by: Eren Gölge <erogol@hotmail.com>
* Fix Publish CI (#1597)
* Try out manylinux
* temporary removal of useless pipeline
* remove check and use only manylinux
* Try --plat-name
* Add install requirements
* Add back other actions
* Add PR trigger
* Remove conditions
* Fix sythax
* Roll back some changes
* Add other python versions
* Add test pypi upload
* Add username
* Add back __token__ as username
* Modify name of entry to testpypi
* Set it to release only
* Fix version checking
* Fix tokenizer for punc only (#1717)
* Remove redundant config field
* Fix SSIM loss
* Separate loss tests
* Fix BCELoss adressing #1192
* Make style
* Add durations as aux input for VITS (#1694)
* Add durations as aux input for VITS
* Make style
* Fix tts_tests
* Fix test_get_aux_input
* Make lint
* feat: updated recipes and lr fix (#1718)
- updated the recipes activating more losses for more stable training
- re-enabling guided attention loss
- fixed a bug about not the correct lr fetched for logging
* Implement VitsAudioConfig (#1556)
* Implement VitsAudioConfig
* Update VITS LJSpeech recipe
* Update VITS VCTK recipe
* Make style
* Add missing decorator
* Add missing param
* Make style
* Update recipes
* Fix test
* Bug fix
* Exclude tests folder
* Make linter
* Make style
* Fix device allocation
* Fix SSIM loss correction
* Fix aux tests (#1753)
* Set n_jobs to 1 for resample script
* Delete resample test
* Set n_jobs 1 in vad test
* delete vad test
* Revert "Delete resample test"
This reverts commit
|
||
---|---|---|
.. | ||
configs | ||
datasets | ||
layers | ||
models | ||
utils | ||
README.md | ||
__init__.py | ||
pqmf_output.wav |
README.md
Mozilla TTS Vocoders (Experimental)
Here there are vocoder model implementations which can be combined with the other TTS models.
Currently, following models are implemented:
- Melgan
- MultiBand-Melgan
- ParallelWaveGAN
- GAN-TTS (Discriminator Only)
It is also very easy to adapt different vocoder models as we provide a flexible and modular (but not too modular) framework.
Training a model
You can see here an example (Soon)Colab Notebook training MelGAN with LJSpeech dataset.
In order to train a new model, you need to gather all wav files into a folder and give this folder to data_path
in '''config.json'''
You need to define other relevant parameters in your config.json
and then start traning with the following command.
CUDA_VISIBLE_DEVICES='0' python tts/bin/train_vocoder.py --config_path path/to/config.json
Example config files can be found under tts/vocoder/configs/
folder.
You can continue a previous training run by the following command.
CUDA_VISIBLE_DEVICES='0' python tts/bin/train_vocoder.py --continue_path path/to/your/model/folder
You can fine-tune a pre-trained model by the following command.
CUDA_VISIBLE_DEVICES='0' python tts/bin/train_vocoder.py --restore_path path/to/your/model.pth
Restoring a model starts a new training in a different folder. It only restores model weights with the given checkpoint file. However, continuing a training starts from the same directory where the previous training run left off.
You can also follow your training runs on Tensorboard as you do with our TTS models.
Acknowledgement
Thanks to @kan-bayashi for his repository being the start point of our work.