diff --git a/docs/source/finetuning.md b/docs/source/finetuning.md new file mode 100644 index 00000000..35cd6f19 --- /dev/null +++ b/docs/source/finetuning.md @@ -0,0 +1,115 @@ +# Fine-tuning a 🐸 TTS model + +## Fine-tuning + +Fine-tuning takes a pre-trained model, and retrains it to improve the model performance on a different task or dataset. +In 🐸TTS we provide different pre-trained models in different languages and different pros and cons. You can take one of +them and fine-tune it for your own dataset. This will help you in two main ways: + +1. Faster learning + + Since a pre-trained model has already learned features that are relevant for the task, it will converge faster on + a new dataset. This will reduce the cost of training and let you experient faster. + +2. Better resutls with small datasets + + Deep learning models are data hungry and they give better performance with more data. However, it is not always + possible to have this abondance, especially in domain. For instance, LJSpeech dataset, that we released most of + our English models with, is almost 24 hours long. And it requires for someone to collect thid amount of data with + a help of a voice talent takes weeks. + + Fine-tuning cames to rescue in this case. You can take one of our pre-trained models and fine-tune it for your own + speech dataset and achive reasonable results with only a couple of hours in the worse case. + + However, note that, fine-tuning does not promise great results. The model performance is still depends on the + {ref}`dataset quality ` and the hyper-parameters you choose for fine-tuning. Therefore, + it still demands a bit of tinkering. + + +## Steps to fine-tune a 🐸 TTS model + +1. Setup your dataset. + + You need to format your target dataset in a certain way so that 🐸TTS data loader would be able to load it for the + training. Please see {ref}`this page ` for more information about formatting. + +2. Choose the model you want to fine-tune. + + You can list the availabe models on terminal as + + ```bash + tts --list-models + ``` + + The command above lists the the models in a naming format as ```///```. + + Or you can manually check `.model.json` file in the project directory. + + You should choose the model based on your requirements. Some models are fast and some are better in speech quality. + One lazy way to check a model is running the model on the hardware you want to use and see how it works. For + simple testing, you can use the `tts` command on the terminal. For more info see {ref}`here `. + +3. Download the model. + + You can download the model by `tts` command. If you run `tts` with a particular model, it will download automatically + and the model path will be printed on the terminal. + + ```bash + tts --model_name tts_models/es/mai/tacotron2-DDC --text "Ola." + + > Downloading model to /home/ubuntu/.local/share/tts/tts_models--en--ljspeech--glow-tts + ... + ``` + + In the example above, we called the Spanish Tacotron model and give the sample output showing use the path where + the model is downloaded. + +4. Setup the model config for fine-tuning. + + You need to change certain fields in the model config. You have 3 options for playing with the configuration. + + 1. Edit the fields in the ```config.json``` file if you want to use ```TTS/bin/train_tts.py``` to train the model. + 2. Edit the fields in one of the training scripts in the ```recipes``` directory if you want to use python. + 3. Use the command-line arguments to override the fields like ```--coqpit.lr 0.00001``` to change the learning rate. + + Some of the important fields are as follows: + + - `datasets` field: This is set to the dataset you want to fine-tune the model on. + - `run_name` field: This is the name of the run. This is used to name the output directory and the entry in the + logging dashboard. + - `output_path` field: This is the path where the fine-tuned model is saved. + - `lr` field: You may need to use a smaller learning rate for fine-tuning not to impair the features learned by the + pre-trained model with big update steps. + - `audio` fields: Different datasets have different audio characteristics. You must check the current audio parameters and + make sure that the values reflect your dataset. For instance, your dataset might have a different audio sampling rate. + + Apart from these above, you should check the whole configuration file and make sure that the values are correct for + your dataset and training. + +5. Start fine-tuning. + + Whether you use one of the training scripts under ```recipes``` folder or the ```train_tts.py``` to start + your training, you should use the ```--restore_path``` flag to specify the path to the pre-trained model. + + ```bash + CUDA_VISIBLE_DEVICES="0" python recipes/ljspeech/glow_tts/train_glowtts.py \ + --restore_path /home/ubuntu/.local/share/tts/tts_models--en--ljspeech--glow-tts + ``` + + ```bash + CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_tts.py \ + --config_path /home/ubuntu/.local/share/tts/tts_models--en--ljspeech--glow-tts/config.json \ + --restore_path /home/ubuntu/.local/share/tts/tts_models--en--ljspeech--glow-tts + ``` + + As stated above, you can also use command-line arguments to change the model configuration. + + + ```bash + CUDA_VISIBLE_DEVICES="0" python recipes/ljspeech/glow_tts/train_glowtts.py \ + --restore_path /home/ubuntu/.local/share/tts/tts_models--en--ljspeech--glow-tts + --coqpit.run_name "glow-tts-finetune" \ + --coqpit.lr 0.00001 + ``` + + diff --git a/docs/source/formatting_your_dataset.md b/docs/source/formatting_your_dataset.md index cc0e456a..cbefc61d 100644 --- a/docs/source/formatting_your_dataset.md +++ b/docs/source/formatting_your_dataset.md @@ -1,3 +1,4 @@ +(formatting_your_dataset)= # Formatting Your Dataset For training a TTS model, you need a dataset with speech recordings and transcriptions. The speech must be divided into audio clips and each clip needs transcription. @@ -18,15 +19,15 @@ Let's assume you created the audio clips and their transcription. You can collec You can either create separate transcription files for each clip or create a text file that maps each audio clip to its transcription. In this file, each line must be delimitered by a special character separating the audio file name from the transcription. And make sure that the delimiter is not used in the transcription text. -We recommend the following format delimited by `|`. +We recommend the following format delimited by `||`. ``` # metadata.txt -audio1.wav | This is my sentence. -audio2.wav | This is maybe my sentence. -audio3.wav | This is certainly my sentence. -audio4.wav | Let this be your sentence. +audio1.wav || This is my sentence. +audio2.wav || This is maybe my sentence. +audio3.wav || This is certainly my sentence. +audio4.wav || Let this be your sentence. ... ``` diff --git a/docs/source/index.md b/docs/source/index.md index fab14d12..d5f77ad4 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -22,6 +22,7 @@ inference implementing_a_new_model training_a_model + finetuning configuration formatting_your_dataset what_makes_a_good_dataset