From 71e24f422271025872ed313fb27fecd323dc4342 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Wed, 27 Jan 2021 11:21:50 +0100 Subject: [PATCH 01/18] Update README.md --- README.md | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 9fb5deb2..7d378f05 100644 --- a/README.md +++ b/README.md @@ -157,16 +157,35 @@ Some of the public datasets that we successfully applied TTS: After the installation, TTS provides a CLI interface for synthesizing speech using pre-trained models. You can either use your own model or the release models under the TTS project. Listing released TTS models. -```tts --list_models``` +``` +tts --list_models +``` Run a tts and a vocoder model from the released model list. (Simply copy and paste the full model names from the list as arguments for the command below.) -```tts --text "Text for TTS" --model_name "///" --vocoder_name "///" --output_path``` +```console +tts --text "Text for TTS" \ + --model_name "///" \ + --vocoder_name "///" \ + --out_path folder/to/save/output/ +``` Run your own TTS model (Using Griffin-Lim Vocoder) -```tts --text "Text for TTS" --model_path path/to/model.pth.tar --config_path path/to/config.json --out_path output/path/speech.wav``` +```console +tts --text "Text for TTS" \ + --model_path path/to/model.pth.tar \ + --config_path path/to/config.json \ + --out_path output/path/speech.wav +``` Run your own TTS and Vocoder models -```tts --text "Text for TTS" --model_path path/to/config.json --config_path path/to/model.pth.tar --out_path output/path/speech.wav --vocoder_path path/to/vocoder.pth.tar --vocoder_config_path path/to/vocoder_config.json``` +```console +tts --text "Text for TTS" \ + --model_path path/to/config.json \ + --config_path path/to/model.pth.tar \ + --out_path output/path/speech.wav \ + --vocoder_path path/to/vocoder.pth.tar \ + --vocoder_config_path path/to/vocoder_config.json +``` **Note:** You can use ```./TTS/bin/synthesize.py``` if you prefer running ```tts``` from the TTS project folder. From 54139f6333194330b1efbda9c6c78f82e14a0d60 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Wed, 27 Jan 2021 11:26:38 +0100 Subject: [PATCH 02/18] Update README.md --- README.md | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 7d378f05..ba036ddf 100644 --- a/README.md +++ b/README.md @@ -97,13 +97,13 @@ TTS supports **python >= 3.6, <3.9**. If you are only interested in [synthesizing speech](https://github.com/mozilla/TTS/tree/dev#example-synthesizing-speech-on-terminal-using-the-released-models) with the released TTS models, installing from PyPI is the easiest option. -``` +```bash pip install TTS ``` If you plan to code or train models, clone TTS and install it locally. -``` +```bash git clone https://github.com/mozilla/TTS pip install -e . ``` @@ -157,12 +157,12 @@ Some of the public datasets that we successfully applied TTS: After the installation, TTS provides a CLI interface for synthesizing speech using pre-trained models. You can either use your own model or the release models under the TTS project. Listing released TTS models. -``` +```bash tts --list_models ``` Run a tts and a vocoder model from the released model list. (Simply copy and paste the full model names from the list as arguments for the command below.) -```console +```bash tts --text "Text for TTS" \ --model_name "///" \ --vocoder_name "///" \ @@ -170,7 +170,7 @@ tts --text "Text for TTS" \ ``` Run your own TTS model (Using Griffin-Lim Vocoder) -```console +```bash tts --text "Text for TTS" \ --model_path path/to/model.pth.tar \ --config_path path/to/config.json \ @@ -178,7 +178,7 @@ tts --text "Text for TTS" \ ``` Run your own TTS and Vocoder models -```console +```bash tts --text "Text for TTS" \ --model_path path/to/config.json \ --config_path path/to/model.pth.tar \ @@ -204,19 +204,27 @@ To train a new model, you need to define your own ```config.json``` to define mo For instance, in order to train a tacotron or tacotron2 model on LJSpeech dataset, follow these steps. -```python TTS/bin/train_tacotron.py --config_path TTS/tts/configs/config.json``` +```bash +python TTS/bin/train_tacotron.py --config_path TTS/tts/configs/config.json +``` To fine-tune a model, use ```--restore_path```. -```python TTS/bin/train_tacotron.py --config_path TTS/tts/configs/config.json --restore_path /path/to/your/model.pth.tar``` +```bash +python TTS/bin/train_tacotron.py --config_path TTS/tts/configs/config.json --restore_path /path/to/your/model.pth.tar +``` To continue an old training run, use ```--continue_path```. -```python TTS/bin/train_tacotron.py --continue_path /path/to/your/run_folder/``` +```bash +python TTS/bin/train_tacotron.py --continue_path /path/to/your/run_folder/ +``` For multi-GPU training, call ```distribute.py```. It runs any provided train script in multi-GPU setting. -```CUDA_VISIBLE_DEVICES="0,1,4" python TTS/bin/distribute.py --script train_tacotron.py --config_path TTS/tts/configs/config.json``` +```bash +CUDA_VISIBLE_DEVICES="0,1,4" python TTS/bin/distribute.py --script train_tacotron.py --config_path TTS/tts/configs/config.json +``` Each run creates a new output folder accomodating used ```config.json```, model checkpoints and tensorboard logs. From 25c86ca715d7bc90d01f081f8f62d292815b9262 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Wed, 27 Jan 2021 11:46:01 +0100 Subject: [PATCH 03/18] README update, set default models for synthesize.py and server.py. Disable verbose for ap init. --- README.md | 6 +++--- TTS/bin/synthesize.py | 7 +++++-- TTS/server/server.py | 4 ++-- TTS/utils/synthesizer.py | 4 ++-- 4 files changed, 12 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index ba036ddf..fc1598a6 100644 --- a/README.md +++ b/README.md @@ -10,11 +10,11 @@ TTS comes with [pretrained models](https://github.com/mozilla/TTS/wiki/Released- [![License]()](https://opensource.org/licenses/MPL-2.0) [![PyPI version](https://badge.fury.io/py/TTS.svg)](https://badge.fury.io/py/TTS) -:loudspeaker: [English Voice Samples](https://erogol.github.io/ddc-samples/) and [SoundCloud playlist](https://soundcloud.com/user-565970875/pocket-article-wavernn-and-tacotron2) +📢 [English Voice Samples](https://erogol.github.io/ddc-samples/) and [SoundCloud playlist](https://soundcloud.com/user-565970875/pocket-article-wavernn-and-tacotron2) -:man_cook: [TTS training recipes](https://github.com/erogol/TTS_recipes) +👩🏽‍🍳 [TTS training recipes](https://github.com/erogol/TTS_recipes) -:page_facing_up: [Text-to-Speech paper collection](https://github.com/erogol/TTS-papers) +📄 [Text-to-Speech paper collection](https://github.com/erogol/TTS-papers) ## 💬 Where to ask questions Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly, so that more people can benefit from it. diff --git a/TTS/bin/synthesize.py b/TTS/bin/synthesize.py index b7ccf850..9a06c866 100755 --- a/TTS/bin/synthesize.py +++ b/TTS/bin/synthesize.py @@ -35,6 +35,9 @@ def main(): # list provided models ./TTS/bin/synthesize.py --list_models + # run tts with default models. + ./TTS/bin synthesize.py --text "Text for TTS" + # run a model from the list ./TTS/bin/synthesize.py --text "Text for TTS" --model_name "//" --vocoder_name "//" --output_path @@ -67,14 +70,14 @@ def main(): parser.add_argument( '--model_name', type=str, - default=None, + default="tts_models/en/ljspeech/speedy-speech-wn", help= 'Name of one of the pre-trained tts models in format //' ) parser.add_argument( '--vocoder_name', type=str, - default=None, + default="vocoder_models/en/ljspeech/mulitband-melgan", help= 'Name of one of the pre-trained vocoder models in format //' ) diff --git a/TTS/server/server.py b/TTS/server/server.py index 1f7357af..425879cf 100644 --- a/TTS/server/server.py +++ b/TTS/server/server.py @@ -17,8 +17,8 @@ def create_argparser(): parser = argparse.ArgumentParser() parser.add_argument('--list_models', type=convert_boolean, nargs='?', const=True, default=False, help='list available pre-trained tts and vocoder models.') - parser.add_argument('--model_name', type=str, help='name of one of the released tts models.') - parser.add_argument('--vocoder_name', type=str, help='name of one of the released vocoder models.') + parser.add_argument('--model_name', type=str, default="tts_models/en/ljspeech/speedy-speech-wn", help='name of one of the released tts models.') + parser.add_argument('--vocoder_name', type=str, default="vocoder_models/en/ljspeech/mulitband-melgan", help='name of one of the released vocoder models.') parser.add_argument('--tts_checkpoint', type=str, help='path to custom tts checkpoint file') parser.add_argument('--tts_config', type=str, help='path to custom tts config.json file') parser.add_argument('--tts_speakers', type=str, help='path to JSON file containing speaker ids, if speaker ids are used in the model') diff --git a/TTS/utils/synthesizer.py b/TTS/utils/synthesizer.py index 615e0d1d..4131bc7c 100644 --- a/TTS/utils/synthesizer.py +++ b/TTS/utils/synthesizer.py @@ -79,7 +79,7 @@ class Synthesizer(object): self.tts_config = load_config(tts_config) self.use_phonemes = self.tts_config.use_phonemes - self.ap = AudioProcessor(**self.tts_config.audio) + self.ap = AudioProcessor(verbose=False, **self.tts_config.audio) if 'characters' in self.tts_config.keys(): symbols, phonemes = make_symbols(**self.tts_config.characters) @@ -96,7 +96,7 @@ class Synthesizer(object): def load_vocoder(self, model_file, model_config, use_cuda): self.vocoder_config = load_config(model_config) - self.vocoder_ap = AudioProcessor(**self.vocoder_config['audio']) + self.vocoder_ap = AudioProcessor(verbose=False, **self.vocoder_config['audio']) self.vocoder_model = setup_generator(self.vocoder_config) self.vocoder_model.load_checkpoint(self.vocoder_config, model_file, eval=True) if use_cuda: From ca28e05ed71cea7462d9a4517a121edabf900239 Mon Sep 17 00:00:00 2001 From: Alexander Korolev Date: Wed, 27 Jan 2021 16:33:25 +0100 Subject: [PATCH 04/18] update fixed stopnet_pos_weight parameter config parameter c.stopnet_pos_weight has currently no effect as it is not used. --- TTS/bin/train_tacotron.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TTS/bin/train_tacotron.py b/TTS/bin/train_tacotron.py index ccb35a7c..be609905 100644 --- a/TTS/bin/train_tacotron.py +++ b/TTS/bin/train_tacotron.py @@ -534,7 +534,7 @@ def main(args): # pylint: disable=redefined-outer-name optimizer_st = None # setup criterion - criterion = TacotronLoss(c, stopnet_pos_weight=10.0, ga_sigma=0.4) + criterion = TacotronLoss(c, stopnet_pos_weight=c.stopnet_pos_weight, ga_sigma=0.4) if args.restore_path: checkpoint = torch.load(args.restore_path, map_location='cpu') From 8a6eee7fec46da19f486f392e3233f978ea85c5c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Thu, 28 Jan 2021 17:04:08 +0100 Subject: [PATCH 05/18] distill import statement, check python version in setup.py --- TTS/utils/synthesizer.py | 2 +- setup.py | 12 +++++++++--- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/TTS/utils/synthesizer.py b/TTS/utils/synthesizer.py index 4131bc7c..85e116cf 100644 --- a/TTS/utils/synthesizer.py +++ b/TTS/utils/synthesizer.py @@ -11,7 +11,7 @@ from TTS.tts.utils.speakers import load_speaker_mapping from TTS.vocoder.utils.generic_utils import setup_generator, interpolate_vocoder_input # pylint: disable=unused-wildcard-import # pylint: disable=wildcard-import -from TTS.tts.utils.synthesis import * +from TTS.tts.utils.synthesis import synthesis, trim_silence from TTS.tts.utils.text import make_symbols, phonemes, symbols diff --git a/setup.py b/setup.py index 6cc06f89..8df52e44 100644 --- a/setup.py +++ b/setup.py @@ -5,14 +5,20 @@ import os import shutil import subprocess import sys +from distutils.extension import Extension +from distutils.version import LooseVersion import numpy import setuptools.command.build_py import setuptools.command.develop - -from setuptools import find_packages, setup -from distutils.extension import Extension from Cython.Build import cythonize +from setuptools import find_packages, setup + +if LooseVersion(sys.version) < LooseVersion("3.6") or LooseVersion(sys.version) > LooseVersion("3.9"): + raise RuntimeError( + "TTS requires python >= 3.6 and <3.9 " + "but your Python version is {}".format(sys.version) + ) # parameters for wheeling server. parser = argparse.ArgumentParser(add_help=False, allow_abbrev=False) From a926aa106de1846d72f29b5b662076720c3f5002 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Fri, 29 Jan 2021 01:36:21 +0100 Subject: [PATCH 06/18] reorder imports --- TTS/tts/utils/synthesis.py | 2 ++ requirements.txt | 1 - setup.py | 3 ++- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/TTS/tts/utils/synthesis.py b/TTS/tts/utils/synthesis.py index 7e71df64..be587211 100644 --- a/TTS/tts/utils/synthesis.py +++ b/TTS/tts/utils/synthesis.py @@ -1,3 +1,5 @@ +import os +os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' import pkg_resources installed = {pkg.key for pkg in pkg_resources.working_set} #pylint: disable=not-an-iterable if 'tensorflow' in installed or 'tensorflow-gpu' in installed: diff --git a/requirements.txt b/requirements.txt index 31b49916..5b947f4e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -6,7 +6,6 @@ numba==0.48 librosa==0.7.2 phonemizer>=2.2.0 unidecode==0.4.20 -attrdict tensorboardX matplotlib Pillow diff --git a/setup.py b/setup.py index 8df52e44..9ea48efa 100644 --- a/setup.py +++ b/setup.py @@ -11,8 +11,9 @@ from distutils.version import LooseVersion import numpy import setuptools.command.build_py import setuptools.command.develop -from Cython.Build import cythonize from setuptools import find_packages, setup +from Cython.Build import cythonize + if LooseVersion(sys.version) < LooseVersion("3.6") or LooseVersion(sys.version) > LooseVersion("3.9"): raise RuntimeError( From 094b39939f394b83ad4b9a0984ac29552aa20906 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Fri, 29 Jan 2021 01:36:35 +0100 Subject: [PATCH 07/18] pyaml --- requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index 5b947f4e..1e92f17e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -22,4 +22,4 @@ pylint==2.5.3 gdown umap-learn cython -pyyaml +pyyaml \ No newline at end of file From 5a6abe78df8a6f1c72162a09bdd0765f92ca013c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Fri, 29 Jan 2021 01:40:51 +0100 Subject: [PATCH 08/18] setup import reset --- setup.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/setup.py b/setup.py index 9ea48efa..53a142a1 100644 --- a/setup.py +++ b/setup.py @@ -5,13 +5,12 @@ import os import shutil import subprocess import sys -from distutils.extension import Extension from distutils.version import LooseVersion import numpy import setuptools.command.build_py import setuptools.command.develop -from setuptools import find_packages, setup +from setuptools import setup, Extension, find_packages from Cython.Build import cythonize From e81ebec7a885b52d20506ffcdf6a30c4d058695f Mon Sep 17 00:00:00 2001 From: Alexander Korolev Date: Fri, 29 Jan 2021 15:18:59 +0100 Subject: [PATCH 09/18] fix device mismatch wavegrad training this should fixe the device mismatch as seen here https://github.com/mozilla/TTS/issues/622#issue-789802916 --- TTS/bin/train_vocoder_wavegrad.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/TTS/bin/train_vocoder_wavegrad.py b/TTS/bin/train_vocoder_wavegrad.py index 73802c63..b104652d 100644 --- a/TTS/bin/train_vocoder_wavegrad.py +++ b/TTS/bin/train_vocoder_wavegrad.py @@ -344,6 +344,10 @@ def main(args): # pylint: disable=redefined-outer-name # setup criterion criterion = torch.nn.L1Loss().cuda() + + if use_cuda: + model.cuda() + criterion.cuda() if args.restore_path: checkpoint = torch.load(args.restore_path, map_location='cpu') @@ -378,10 +382,6 @@ def main(args): # pylint: disable=redefined-outer-name else: args.restore_step = 0 - if use_cuda: - model.cuda() - criterion.cuda() - # DISTRUBUTED if num_gpus > 1: model = DDP_th(model, device_ids=[args.rank]) From aa5f24608a2e9529ae2e2d2a807687898de7b038 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Fri, 29 Jan 2021 15:00:33 +0000 Subject: [PATCH 10/18] hubconf.py and load .models.json from the defualt location by mange.py --- TTS/hubconf.py | 26 ++++++++++++++++++++++++++ TTS/utils/manage.py | 16 +++++++++++----- 2 files changed, 37 insertions(+), 5 deletions(-) create mode 100644 TTS/hubconf.py diff --git a/TTS/hubconf.py b/TTS/hubconf.py new file mode 100644 index 00000000..c4e5bc99 --- /dev/null +++ b/TTS/hubconf.py @@ -0,0 +1,26 @@ +dependencies = ['torch', 'gdown'] +import torch +import os +import zipfile + +from TTS.utils.generic_utils import get_user_data_dir +from TTS.utils.synthesizer import Synthesizer +from TTS.utils.manage import ModelManager + + + +def tts(model_name='tts_models/en/ljspeech/tacotron2-DCA', vocoder_name='vocoder_models/en/ljspeech/mulitband-melgan', pretrained=True): + manager = ModelManager() + + model_path, config_path = manager.download_model(model_name) + vocoder_path, vocoder_config_path = manager.download_model(vocoder_name) + + # create synthesizer + synthesizer = Synthesizer(model_path, config_path, vocoder_path, vocoder_config_path) + return synthesizer + + +if __name__ == '__main__': + # synthesizer = torch.hub.load('/data/rw/home/projects/TTS/TTS', 'tts', source='local') + synthesizer = torch.hub.load('mozilla/TTS:hub_conf', 'tts', source='github') + synthesizer.tts("This is a test!") \ No newline at end of file diff --git a/TTS/utils/manage.py b/TTS/utils/manage.py index af741156..3cf8d67f 100644 --- a/TTS/utils/manage.py +++ b/TTS/utils/manage.py @@ -1,10 +1,11 @@ import json -import gdown -from pathlib import Path import os +from pathlib import Path -from TTS.utils.io import load_config +import gdown from TTS.utils.generic_utils import get_user_data_dir +from TTS.utils.io import load_config + class ModelManager(object): """Manage TTS models defined in .models.json. @@ -17,12 +18,17 @@ class ModelManager(object): Args: models_file (str): path to .model.json """ - def __init__(self, models_file): + def __init__(self, models_file=None): super().__init__() self.output_prefix = get_user_data_dir('tts') self.url_prefix = "https://drive.google.com/uc?id=" self.models_dict = None - self.read_models_file(models_file) + if models_file is not None: + self.read_models_file(models_file) + else: + # try the default location + path = Path(__file__).parent / "../.models.json" + self.read_models_file(path) def read_models_file(self, file_path): """Read .models.json as a dict From 0354b6f35ec31659a61182d4a7b32562704d08e0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Fri, 29 Jan 2021 15:02:45 +0000 Subject: [PATCH 11/18] move hubconf --- TTS/hubconf.py => hubconf.py | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename TTS/hubconf.py => hubconf.py (100%) diff --git a/TTS/hubconf.py b/hubconf.py similarity index 100% rename from TTS/hubconf.py rename to hubconf.py From 66c2a61f74188d506bd55afaa9d3826cfeee3983 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Fri, 29 Jan 2021 15:17:29 +0000 Subject: [PATCH 12/18] docstring hubconf --- hubconf.py | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/hubconf.py b/hubconf.py index c4e5bc99..0e2e60d8 100644 --- a/hubconf.py +++ b/hubconf.py @@ -8,8 +8,22 @@ from TTS.utils.synthesizer import Synthesizer from TTS.utils.manage import ModelManager +def tts(model_name='tts_models/en/ljspeech/tacotron2-DCA', vocoder_name='vocoder_models/en/ljspeech/mulitband-melgan'): + """TTS entry point for PyTorch Hub that provides a Synthesizer object to synthesize speech from a give text. -def tts(model_name='tts_models/en/ljspeech/tacotron2-DCA', vocoder_name='vocoder_models/en/ljspeech/mulitband-melgan', pretrained=True): + Example: + >>> synthesizer = torch.hub.load('mozilla/TTS', 'tts', source='github') + >>> wavs = synthesizer.tts("This is a test! This is also a test!!") + wavs - is a list of values of the synthesized speech. + + Args: + model_name (str, optional): One of the model names from .model.json. Defaults to 'tts_models/en/ljspeech/tacotron2-DCA'. + vocoder_name (str, optional): One of the model names from .model.json. Defaults to 'vocoder_models/en/ljspeech/mulitband-melgan'. + pretrained (bool, optional): [description]. Defaults to True. + + Returns: + TTS.utils.synthesizer.Synthesizer: Synthesizer object wrapping both vocoder and tts models. + """ manager = ModelManager() model_path, config_path = manager.download_model(model_name) @@ -21,6 +35,5 @@ def tts(model_name='tts_models/en/ljspeech/tacotron2-DCA', vocoder_name='vocoder if __name__ == '__main__': - # synthesizer = torch.hub.load('/data/rw/home/projects/TTS/TTS', 'tts', source='local') synthesizer = torch.hub.load('mozilla/TTS:hub_conf', 'tts', source='github') synthesizer.tts("This is a test!") \ No newline at end of file From c7407571fa902009ca4ebcf062f703d43eb7d3b1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Mon, 1 Feb 2021 10:05:55 +0000 Subject: [PATCH 13/18] fix #638 --- TTS/bin/train_vocoder_wavernn.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/TTS/bin/train_vocoder_wavernn.py b/TTS/bin/train_vocoder_wavernn.py index cad357dc..14d57837 100644 --- a/TTS/bin/train_vocoder_wavernn.py +++ b/TTS/bin/train_vocoder_wavernn.py @@ -200,7 +200,7 @@ def train(model, optimizer, criterion, scheduler, scaler, ap, global_step, epoch train_data[rand_idx], (tuple, list)) else train_data[rand_idx][0] wav = ap.load_wav(wav_path) ground_mel = ap.melspectrogram(wav) - sample_wav = model.generate(ground_mel, + sample_wav = model.inference(ground_mel, c.batched, c.target_samples, c.overlap_samples, @@ -287,7 +287,7 @@ def evaluate(model, criterion, ap, global_step, epoch): eval_data[rand_idx], (tuple, list)) else eval_data[rand_idx][0] wav = ap.load_wav(wav_path) ground_mel = ap.melspectrogram(wav) - sample_wav = model.generate(ground_mel, + sample_wav = model.inference(ground_mel, c.batched, c.target_samples, c.overlap_samples, From d003e593477da90e9c3850b22350be7a01b2e7a7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Mon, 1 Feb 2021 11:26:21 +0000 Subject: [PATCH 14/18] readme update for espeak install --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index fc1598a6..5c631140 100644 --- a/README.md +++ b/README.md @@ -93,7 +93,7 @@ Please use our dedicated channels for questions and discussion. Help is much mor You can also help us implement more models. Some TTS related work can be found [here](https://github.com/erogol/TTS-papers). ## Install TTS -TTS supports **python >= 3.6, <3.9**. +TTS is tested on Ubuntu 18.04 with **python >= 3.6, <3.9**. If you are only interested in [synthesizing speech](https://github.com/mozilla/TTS/tree/dev#example-synthesizing-speech-on-terminal-using-the-released-models) with the released TTS models, installing from PyPI is the easiest option. @@ -108,6 +108,11 @@ git clone https://github.com/mozilla/TTS pip install -e . ``` +We use ```espeak``` to convert graphemes to phonemes. You might need to install separately. +```bash +sudo apt-get install espeak +``` + ## Directory Structure ``` |- notebooks/ (Jupyter Notebooks for model evaluation, parameter selection and data analysis.) From 699d2aa1c367d6ec21af456bb5082164771cd207 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Mon, 1 Feb 2021 11:26:46 +0000 Subject: [PATCH 15/18] pin cython verions 0.29.20 --- requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index 1e92f17e..b1baadd7 100644 --- a/requirements.txt +++ b/requirements.txt @@ -21,5 +21,5 @@ cardboardlint==1.3.0 pylint==2.5.3 gdown umap-learn -cython +cython==0.29.20 # > 0.29.20 breaks pyworld installation with the min numpy req of Tensorflow 2.4.1 pyyaml \ No newline at end of file From 8774e374446ca1491ef9ed3dedc3bd9401c4195d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Mon, 1 Feb 2021 11:34:05 +0000 Subject: [PATCH 16/18] unpin cython version and commentout pyworld in audio.py causing dep issues --- TTS/utils/audio.py | 21 +++++++++++---------- requirements.txt | 2 +- 2 files changed, 12 insertions(+), 11 deletions(-) diff --git a/TTS/utils/audio.py b/TTS/utils/audio.py index 93a5880f..87ae4f5b 100644 --- a/TTS/utils/audio.py +++ b/TTS/utils/audio.py @@ -3,7 +3,7 @@ import soundfile as sf import numpy as np import scipy.io.wavfile import scipy.signal -import pyworld as pw +# import pyworld as pw from TTS.tts.utils.data import StandardScaler @@ -292,15 +292,16 @@ class AudioProcessor(object): return pad // 2, pad // 2 + pad % 2 ### Compute F0 ### - def compute_f0(self, x): - f0, t = pw.dio( - x.astype(np.double), - fs=self.sample_rate, - f0_ceil=self.mel_fmax, - frame_period=1000 * self.hop_length / self.sample_rate, - ) - f0 = pw.stonemask(x.astype(np.double), f0, t, self.sample_rate) - return f0 + # TODO: pw causes some dep issues + # def compute_f0(self, x): + # f0, t = pw.dio( + # x.astype(np.double), + # fs=self.sample_rate, + # f0_ceil=self.mel_fmax, + # frame_period=1000 * self.hop_length / self.sample_rate, + # ) + # f0 = pw.stonemask(x.astype(np.double), f0, t, self.sample_rate) + # return f0 ### Audio Processing ### def find_endpoint(self, wav, threshold_db=-40, min_silence_sec=0.8): diff --git a/requirements.txt b/requirements.txt index b1baadd7..1e92f17e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -21,5 +21,5 @@ cardboardlint==1.3.0 pylint==2.5.3 gdown umap-learn -cython==0.29.20 # > 0.29.20 breaks pyworld installation with the min numpy req of Tensorflow 2.4.1 +cython pyyaml \ No newline at end of file From 5c46543765192016a5638824cf3ff6fe88081088 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Mon, 1 Feb 2021 13:18:56 +0000 Subject: [PATCH 17/18] linter fixes and version updates for deps --- TTS/bin/train_vocoder_wavegrad.py | 2 +- TTS/utils/audio.py | 2 +- hubconf.py | 15 ++++++--------- pyproject.toml | 2 +- requirements.txt | 2 +- tests/test_vocoder_gan_datasets.py | 3 ++- 6 files changed, 12 insertions(+), 14 deletions(-) diff --git a/TTS/bin/train_vocoder_wavegrad.py b/TTS/bin/train_vocoder_wavegrad.py index b104652d..fe5fb3d7 100644 --- a/TTS/bin/train_vocoder_wavegrad.py +++ b/TTS/bin/train_vocoder_wavegrad.py @@ -344,7 +344,7 @@ def main(args): # pylint: disable=redefined-outer-name # setup criterion criterion = torch.nn.L1Loss().cuda() - + if use_cuda: model.cuda() criterion.cuda() diff --git a/TTS/utils/audio.py b/TTS/utils/audio.py index 87ae4f5b..3d31ce6e 100644 --- a/TTS/utils/audio.py +++ b/TTS/utils/audio.py @@ -292,7 +292,7 @@ class AudioProcessor(object): return pad // 2, pad // 2 + pad % 2 ### Compute F0 ### - # TODO: pw causes some dep issues + # TODO: pw causes some dep issues # def compute_f0(self, x): # f0, t = pw.dio( # x.astype(np.double), diff --git a/hubconf.py b/hubconf.py index 0e2e60d8..9de4f7b2 100644 --- a/hubconf.py +++ b/hubconf.py @@ -1,9 +1,6 @@ dependencies = ['torch', 'gdown'] import torch -import os -import zipfile -from TTS.utils.generic_utils import get_user_data_dir from TTS.utils.synthesizer import Synthesizer from TTS.utils.manage import ModelManager @@ -15,7 +12,7 @@ def tts(model_name='tts_models/en/ljspeech/tacotron2-DCA', vocoder_name='vocoder >>> synthesizer = torch.hub.load('mozilla/TTS', 'tts', source='github') >>> wavs = synthesizer.tts("This is a test! This is also a test!!") wavs - is a list of values of the synthesized speech. - + Args: model_name (str, optional): One of the model names from .model.json. Defaults to 'tts_models/en/ljspeech/tacotron2-DCA'. vocoder_name (str, optional): One of the model names from .model.json. Defaults to 'vocoder_models/en/ljspeech/mulitband-melgan'. @@ -23,15 +20,15 @@ def tts(model_name='tts_models/en/ljspeech/tacotron2-DCA', vocoder_name='vocoder Returns: TTS.utils.synthesizer.Synthesizer: Synthesizer object wrapping both vocoder and tts models. - """ + """ manager = ModelManager() - + model_path, config_path = manager.download_model(model_name) vocoder_path, vocoder_config_path = manager.download_model(vocoder_name) - + # create synthesizer - synthesizer = Synthesizer(model_path, config_path, vocoder_path, vocoder_config_path) - return synthesizer + synt = Synthesizer(model_path, config_path, vocoder_path, vocoder_config_path) + return synt if __name__ == '__main__': diff --git a/pyproject.toml b/pyproject.toml index fc0aca47..77d6b975 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,2 +1,2 @@ [build-system] -requires = ["setuptools", "wheel", "Cython", "numpy>=1.16.0"] \ No newline at end of file +requires = ["setuptools", "wheel", "Cython", "numpy==1.17.0"] \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index 1e92f17e..a427062e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,6 +1,6 @@ torch>=1.5 tensorflow==2.3.1 -numpy>=1.16.0 +numpy==1.17.0 scipy>=0.19.0 numba==0.48 librosa==0.7.2 diff --git a/tests/test_vocoder_gan_datasets.py b/tests/test_vocoder_gan_datasets.py index 2a487d9a..99a25dcf 100644 --- a/tests/test_vocoder_gan_datasets.py +++ b/tests/test_vocoder_gan_datasets.py @@ -61,7 +61,8 @@ def gan_dataset_case(batch_size, seq_len, hop_len, conv_pad, return_segments, us mel = ap.melspectrogram(audio) # the first 2 and the last 2 frames are skipped due to the padding # differences in stft - assert (feat - mel[:, :feat1.shape[-1]])[:, 2:-2].sum() <= 0, f' [!] {(feat - mel[:, :feat1.shape[-1]])[:, 2:-2].sum()}' + max_diff = abs((feat - mel[:, :feat1.shape[-1]])[:, 2:-2]).max() + assert max_diff <= 0, f' [!] {max_diff}' count_iter += 1 # if count_iter == max_iter: From 41f6579a746256d3d52598b1b4a7401b0f61a003 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eren=20G=C3=B6lge?= Date: Mon, 1 Feb 2021 13:47:29 +0000 Subject: [PATCH 18/18] push numpy version up to 1.17.5 --- pyproject.toml | 2 +- requirements.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pyproject.toml b/pyproject.toml index 77d6b975..8b8da28d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,2 +1,2 @@ [build-system] -requires = ["setuptools", "wheel", "Cython", "numpy==1.17.0"] \ No newline at end of file +requires = ["setuptools", "wheel", "Cython", "numpy==1.17.5"] \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index a427062e..7a0d9f76 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,6 +1,6 @@ torch>=1.5 tensorflow==2.3.1 -numpy==1.17.0 +numpy==1.17.5 scipy>=0.19.0 numba==0.48 librosa==0.7.2