coqui-tts/docs/source/tutorial_for_nervous_beginn...

3.1 KiB
Raw Blame History

Tutorial for nervous beginners

First install Coqui TTS.

Synthesizing Speech

You can run tts and synthesize speech directly on the terminal.

$ tts -h # see the help
$ tts --list_models  # list the available models.

cli.gif

You can call tts-server to start a local demo server that you can open on your favorite web browser and 🗣️ (make sure to install the additional dependencies with pip install coqui-tts[server]).

$ tts-server -h # see the help
$ tts-server --list_models  # list the available models.

server.gif

See this page for more details on synthesizing speech with the CLI, server or Python API.

Training a tts Model

A breakdown of a simple script that trains a GlowTTS model on the LJspeech dataset. For a more in-depth guide to training and fine-tuning also see this page.

Pure Python Way

  1. Download your dataset.

    In this example, we download and use the LJSpeech dataset. Set the download directory based on your preferences.

    $ python -c 'from TTS.utils.downloaders import download_ljspeech; download_ljspeech("../recipes/ljspeech/");'
    
  2. Define train.py.

  3. Run the script.

    CUDA_VISIBLE_DEVICES=0 python train.py
    
    • Continue a previous run.

      CUDA_VISIBLE_DEVICES=0 python train.py --continue_path path/to/previous/run/folder/
      
    • Fine-tune a model.

      CUDA_VISIBLE_DEVICES=0 python train.py --restore_path path/to/model/checkpoint.pth
      
    • Run multi-gpu training.

      CUDA_VISIBLE_DEVICES=0,1,2 python -m trainer.distribute --script train.py
      

CLI Way

We still support running training from CLI like in the old days. The same training run can also be started as follows.

  1. Define your config.json

    {
        "run_name": "my_run",
        "model": "glow_tts",
        "batch_size": 32,
        "eval_batch_size": 16,
        "num_loader_workers": 4,
        "num_eval_loader_workers": 4,
        "run_eval": true,
        "test_delay_epochs": -1,
        "epochs": 1000,
        "text_cleaner": "english_cleaners",
        "use_phonemes": false,
        "phoneme_language": "en-us",
        "phoneme_cache_path": "phoneme_cache",
        "print_step": 25,
        "print_eval": true,
        "mixed_precision": false,
        "output_path": "recipes/ljspeech/glow_tts/",
        "datasets":[{"formatter": "ljspeech", "meta_file_train":"metadata.csv", "path": "recipes/ljspeech/LJSpeech-1.1/"}]
    }
    
  2. Start training.

    $ CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_tts.py --config_path config.json
    

Training a vocoder Model

Note that you can also use train_vocoder.py as the tts models above.