* chore: remove unused argument * refactor(audio.processor): remove duplicate stft+griffin_lim * chore(audio.processor): remove unused compute_stft_paddings Same function available in numpy_transforms * refactor(audio.processor): remove duplicate db_to_amp * refactor(audio.processor): remove duplicate amp_to_db * refactor(audio.processor): remove duplicate linear_to_mel * refactor(audio.processor): remove duplicate mel_to_linear * refactor(audio.processor): remove duplicate build_mel_basis * refactor(audio.processor): remove duplicate stft_parameters * refactor(audio.processor): use pre-/deemphasis from numpy_transforms * refactor(audio.processor): use rms_volume_norm from numpy_transforms * chore(audio.processor): remove duplicate assert Already checked in numpy_transforms.compute_f0 * refactor(audio.processor): use find_endpoint from numpy_transforms * refactor(audio.processor): use trim_silence from numpy_transforms * refactor(audio.processor): use volume_norm from numpy_transforms * refactor(audio.processor): use load_wav from numpy_transforms * fix(bin.extract_tts_spectrograms): set quantization bits * fix(ExtractTTSpectrogram.ipynb): adapt to current TTS code Fixes #2447, #2574 * refactor(audio.processor): remove duplicate quantization methods |
||
---|---|---|
.. | ||
configs | ||
datasets | ||
layers | ||
models | ||
utils | ||
README.md | ||
__init__.py | ||
pqmf_output.wav |
README.md
Mozilla TTS Vocoders (Experimental)
Here there are vocoder model implementations which can be combined with the other TTS models.
Currently, following models are implemented:
- Melgan
- MultiBand-Melgan
- ParallelWaveGAN
- GAN-TTS (Discriminator Only)
It is also very easy to adapt different vocoder models as we provide a flexible and modular (but not too modular) framework.
Training a model
You can see here an example (Soon)Colab Notebook training MelGAN with LJSpeech dataset.
In order to train a new model, you need to gather all wav files into a folder and give this folder to data_path
in '''config.json'''
You need to define other relevant parameters in your config.json
and then start traning with the following command.
CUDA_VISIBLE_DEVICES='0' python tts/bin/train_vocoder.py --config_path path/to/config.json
Example config files can be found under tts/vocoder/configs/
folder.
You can continue a previous training run by the following command.
CUDA_VISIBLE_DEVICES='0' python tts/bin/train_vocoder.py --continue_path path/to/your/model/folder
You can fine-tune a pre-trained model by the following command.
CUDA_VISIBLE_DEVICES='0' python tts/bin/train_vocoder.py --restore_path path/to/your/model.pth
Restoring a model starts a new training in a different folder. It only restores model weights with the given checkpoint file. However, continuing a training starts from the same directory where the previous training run left off.
You can also follow your training runs on Tensorboard as you do with our TTS models.
Acknowledgement
Thanks to @kan-bayashi for his repository being the start point of our work.