Commit Graph

322 Commits

Author SHA1 Message Date
Edresson dcb2374bc9 Add multilingual training support to the VITS model 2021-12-20 11:54:09 +00:00
Edresson f996afedb0 Implement multilingual dataloader support 2021-12-20 11:54:09 +00:00
Edresson 5f1c18187f Fix pylint issues 2021-12-20 11:54:09 +00:00
Edresson d91c595c5a Implement training support with d_vecs in the VITS model 2021-12-20 11:54:09 +00:00
Edresson e0ad838066 Select randomly a speaker from the speaker manager for the test setences 2021-12-20 11:54:09 +00:00
Eren Gölge babdd84f91 Fix GST inference
commit d3e477875a7e46a101fcf95a1794442823750fe2
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Wed Nov 3 10:16:12 2021 +0000

    Read .wav for GST conditioning from CL

commit 074e6d0874d3b34fb6a4991fc17d66dccd413fbb
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 14:43:47 2021 +0100

    Fix GST during inference in Tacotron2

commit fdece14585ab5a36eed1061a9a838d8e48aa6882
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Wed Nov 3 10:16:12 2021 +0000

    Read .wav for GST conditioning from CL

commit cd29e21b8d0a541ee298d2bf5f67223ad60be38f
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 14:43:47 2021 +0100

    Fix GST during inference in Tacotron2

commit 908ce39370eadcc9fa8510cdb26c9ead87305427
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 12:49:37 2021 +0100

    Make trim_db value negative

commit 1008a2e0f72fa7ca7f0307424f570386f2f16d42
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 12:22:24 2021 +0100

    Set find_endpoint db threshold in config.json
2021-12-07 13:28:49 +00:00
Michael Hansen 3bc043faeb
Upgrade to gruut 2.0 (#882) 2021-10-31 11:41:55 +01:00
Eren Gölge 00becf2671 Fix import statements 2021-10-25 19:29:16 +02:00
Eren Gölge 3cb07fb6b5 Fix SpeakerManager init with data items 2021-10-21 13:54:39 +00:00
Eren Gölge aea90e2501 Comment synthesis.py 2021-10-21 13:53:45 +00:00
Eren Gölge 3da79a4de4 Comment Tacotron2 model 2021-10-20 18:14:04 +00:00
Eren Gölge 9f23ad6a0f Fix imports 2021-09-30 14:47:56 +00:00
Eren Gölge 26f76fce22 Remove SpeedySpeech from .models.json 2021-09-10 17:47:27 +00:00
Eren Gölge d6e29ef98a Style update 2021-09-10 08:30:33 +00:00
Eren Gölge ed4b1d8514 Test `TTS.tts.utils.helpers` 2021-09-10 08:25:21 +00:00
Eren Gölge bfc6ceac29 Move MAS to `TTS.tts.utils.helpers` 2021-09-09 10:57:19 +00:00
Eren Gölge 537c8576ec Stage `TTS.tts.utils.helpers` 2021-09-08 13:35:18 +00:00
Eren Gölge 4761853c5c Fix imports 2021-09-08 13:34:40 +00:00
Eren Gölge c1513ec4cd Plot pitch over spectrogram 2021-09-06 15:16:58 +00:00
Eren Gölge 42862f7fdb Format style of the recipes 2021-09-06 15:16:58 +00:00
Eren Gölge 8fffd4e813 Don't print computed phonemes
It causes noise in logs
2021-09-06 15:16:58 +00:00
Katsuya Iida 165e5814af
Update Japanese phonemizer (#758)
* Update default ja vocoder

* update

* Japanese phonemizer test

* Run make style

Co-authored-by: Eren Gölge <egolge@coqui.ai>
2021-09-01 09:33:15 +02:00
Eren Gölge 49e1181ea4 Fixes for the vits model 2021-08-26 17:15:09 +00:00
Eren Gölge c312acac7d Implement VITS model 🚀
VITS model implementation built on Glow TTS and HiFiGAN
layers.
2021-08-09 18:02:36 +00:00
Eren Gölge f5a6aa974f Modify `symbols.py` not to add _arpanet 2021-08-09 18:02:36 +00:00
Eren Gölge 003e5579e8 Enable `custom_symbols` in text processing
Models can define their own custom symbols lists with custom
`make_symbols()`
2021-08-09 18:02:36 +00:00
Eren Gölge e4648ffef1 Fix multi-speaker init of Tacotron models & tests 2021-08-09 18:02:36 +00:00
Agrin Hilmkil ced4cfdbbf Allow saving / loading checkpoints from cloud paths (#683)
* Allow saving / loading checkpoints from cloud paths

Allows saving and loading checkpoints directly from cloud paths like
Amazon S3 (s3://) and Google Cloud Storage (gs://) by using fsspec.

Note: The user will have to install the relevant dependency for each
protocol. Otherwise fsspec will fail and specify which dependency is
missing.

* Append suffix _fsspec to save/load function names

* Add a lower bound to the fsspec dependency

Skips the 0 major version.

* Add missing changes from refactor

* Use fsspec for remaining artifacts

* Add test case with path requiring fsspec

* Avoid writing logs to file unless output_path is local

* Document the possibility of using paths supported by fsspec

* Fix style and lint

* Add missing lint fixes

* Add type annotations to new functions

* Use Coqpit method for converting config to dict

* Fix type annotation in semi-new function

* Add return type for load_fsspec

* Fix bug where fs not always created

* Restore the experiment removal functionality
2021-08-09 18:02:36 +00:00
Eren Gölge 75b201c6c1
Merge pull request #673 from coqui-ai/fix_stopnet
Fix stopnet training for Tacotron models
2021-07-24 12:25:38 +02:00
Eren Gölge fc0c4600bd Fix stopnet training 2021-07-24 11:39:54 +02:00
Edresson 2e5baffa9c Merge fix and eval split as argparse 2021-07-13 01:47:32 -03:00
Eren Gölge c25a2184e7 Add docs for `SpeakerManager` 2021-07-03 13:55:27 +02:00
Eren Gölge ae6405bb76 Docstrings for `Trainer` 2021-06-28 17:03:47 +02:00
Eren Gölge f23b228e24 Update `speaker_manager` 2021-06-28 17:03:47 +02:00
Eren Gölge 98298ee671 Implement unified IO utils 2021-06-28 17:03:19 +02:00
Eren Gölge 00c82c516d rename to 2021-06-28 17:03:19 +02:00
Eren Gölge 166f0aeb9a merge if branches with the same implementation 2021-06-28 17:03:19 +02:00
Eren Gölge 03494ad642 adjust `distribute.py` for the `train_tts.py` 2021-06-28 17:03:19 +02:00
Eren Gölge 25238e0658 fix glow-tts `inference()` 2021-06-28 17:03:19 +02:00
Eren Gölge 419735f440 refactor and fix multi-speaker training in Trainer and Tacotron models 2021-06-28 17:03:19 +02:00
Eren Gölge 2c38ef8441 use get_speaker_manager in Trainer and save speakers.json file when
needed
2021-06-28 17:03:19 +02:00
Eren Gölge db6a97d1a2 rename external speaker embedding arguments as `d_vectors` 2021-06-28 17:03:19 +02:00
Eren Gölge f82f1970b8 change `to(device)` to `type_as` in models 2021-06-28 17:03:19 +02:00
Eren Gölge 30211512a4 fix type annotations 2021-06-28 17:03:19 +02:00
Eren Gölge f840268181 refactor `SpeakerManager` 2021-06-28 17:03:19 +02:00
Eren Gölge 421194880d linter fixes 2021-06-28 17:03:19 +02:00
Eren Gölge d96ebcd6d3 make style 2021-06-28 17:03:19 +02:00
Eren Gölge b500338faa make style 2021-06-28 17:03:19 +02:00
Eren Gölge c680a07a20 fix `Synthesized` for the new `synthesis()` 2021-06-28 17:03:19 +02:00
Eren Gölge b8a4af4010 update `synthesis.py` for being more generic 2021-06-28 17:03:19 +02:00
Eren Gölge f4f83b6379 update `synthesis.py` for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 130781dab6 remove `tts.generic_utils` as all the functions are moved to other files 2021-06-28 17:03:19 +02:00
Eren Gölge ca302db7b0 add sequence_mask to `utils.data` 2021-06-28 17:03:19 +02:00
Eren Gölge 8def3c87af trainer-API updates 2021-06-28 17:03:19 +02:00
Edresson 1c4e806f54 use speaker manager on compute embeddings script 2021-06-27 03:35:34 -03:00
Michael Hansen 3f172b84d8 Fix linting issues 2021-06-25 14:41:31 +02:00
Michael Hansen 4d8426fa0a Use eSpeak IPA lexicons by default for phoneme models 2021-06-25 14:41:05 +02:00
Michael Hansen 618b509204 Use combined characters available in TTS phonemes (like ç) 2021-06-25 14:41:05 +02:00
Michael Hansen da6f6a4a01 Update docstring for clean_gruut_phonemes 2021-06-25 14:41:05 +02:00
Michael Hansen 47191f3ecc Add tests for gruut phonemization 2021-06-25 14:41:05 +02:00
Michael Hansen 67869e77f9 Use gruut for phonemization 2021-06-25 14:41:05 +02:00
Eren Gölge 49c5e5d820 maket style japanese PR 2021-06-02 11:44:46 +02:00
Eren Gölge 73b4083c6c
Merge pull request #502 from kaiidams/kaiidams/kokoro
Japanese Tacotron 2 model
2021-06-02 10:20:08 +02:00
Katsuya Iida 1cc18d1972 Move unittest of Japanese phonemizer. 2021-06-01 18:51:34 +09:00
Katsuya Iida d0c9c1ca5c Move TTS/tts/utils/japanese 2021-05-29 09:21:47 +09:00
Katsuya Iida c4987e9d4e Move import at the head of the file. 2021-05-28 00:22:57 +09:00
Eren Gölge 925c08cf95 replace unidecode with anyascii 2021-05-27 14:02:44 +02:00
Katsuya Iida f921a05bdb Fixed lint errors 2021-05-26 19:02:16 +09:00
Katsuya Iida 0536aa6d0f Japanese Tacotron 2 model 2021-05-22 17:12:19 +09:00
Eren Gölge 8a7c40736c set use_phonemes false 2021-05-19 01:27:26 +02:00
Eren Gölge ccfaa6b1d5 add `needs_phonemizer` field to models.json. If set true these models
are only compatible with v0.0.13 or below.
2021-05-18 17:57:28 +02:00
Eren Gölge a14fcf2a13 remove text_processing test 2021-05-18 17:57:28 +02:00
Eren Gölge d7fae3f515 remove all espeaker and phonemizer deps 2021-05-18 17:57:28 +02:00
Eren Gölge ced05e812a move chinese phonemizer 2021-05-18 17:57:28 +02:00
Eren Gölge 19fb1d743d style update 2021-05-11 11:30:00 +02:00
Eren Gölge 21dd4d7960 fix load_config imports for Coqpit 2021-05-11 11:29:18 +02:00
Eren Gölge c57f0b46bb reintro use_gst for backwars compat 2021-05-11 11:29:18 +02:00
Eren Gölge 9ee70af9bb code styling 2021-05-11 11:29:18 +02:00
Eren Gölge 720fe13056 update glow_tts modules and training script for coqpit use 2021-05-11 11:29:17 +02:00
Eren Gölge 647163397d coqpit refactoring 2021-05-11 11:29:17 +02:00
Eren Gölge eaa130e813 fix tacotron for coqpit 2021-05-11 11:29:17 +02:00
Eren Gölge 05d9543ed8 init GST module using gst config in Tacotron models 2021-05-11 11:29:17 +02:00
Eren Gölge 93a00373f6 move split_dataset 2021-05-11 11:29:17 +02:00
Eren Gölge 79d7215142 config refactor #5 WIP 2021-05-11 11:29:17 +02:00
Eren Gölge dc50f5f0b0 config refactor #4 WIP 2021-05-11 11:28:35 +02:00
Eren Gölge a21c0b5585 config update 2 WIP 2021-05-11 11:28:35 +02:00
Eren Gölge e092ae40dc config update WIP 2021-05-11 11:28:35 +02:00
Eren Gölge f7582107da
Merge pull request #453 from Edresson/dev
Script for spectrogram extraction using teacher forcing and Glow-TTS inference with MAS.
2021-05-06 17:53:28 +02:00
Eren Gölge 8cb27267a4 formatting 2021-05-03 14:26:35 +02:00
Eren Gölge 2f0716073e enable multi-speaker CoquiTTS models for synthesize.py 2021-04-26 19:36:53 +02:00
Eren Gölge b531fa699c remove conflicy noise 2021-04-26 15:27:52 +02:00
Eren Gölge f37b488876 Merge branch 'speaker-manager' of https://github.com/coqui-ai/TTS into speaker-manager 2021-04-26 15:25:25 +02:00
Edresson 8228091f92 add script for extraction of tts spectrograms 2021-04-23 14:17:46 -03:00
Eren Gölge 4cf211348d styling and linting 2021-04-23 18:04:37 +02:00
Eren Gölge f69195739e let speaker manager compute mean x_vector from multiple wav files 2021-04-23 18:04:37 +02:00
Eren Gölge c80d21f311 load speaker_encoder_ap and compute x_vector directly from the input file in speaker manager 2021-04-23 18:04:37 +02:00
Eren Gölge e97126314c add ```unique``` argument to make_symbols to fix the incompat. issue of the
SC-Glow models
2021-04-23 18:04:37 +02:00
Eren Gölge d08888e603 formating speakers.py 2021-04-23 18:04:37 +02:00
Eren Gölge df422223a3 initial SpeakerManager implementation 2021-04-23 18:04:37 +02:00
Eren Gölge 7a7aeb35f5 fix the glow-tts in setup_model 2021-04-23 18:04:37 +02:00
Eren Gölge 99dc07a7dd add ```unique``` param to keep scglow models compatible (they are duplicate symbols ins the character set) 2021-04-23 18:04:37 +02:00
Eren Gölge aadb2106ec code styling 2021-04-23 18:04:37 +02:00
kirianguiller 7dccbfdcd5 handle multi speaker and gst in Synthetizer class 2021-04-23 18:04:37 +02:00
Eren Gölge 04b6881b66 add ```unique``` argument to make_symbols to fix the incompat. issue of the
SC-Glow models
2021-04-21 13:12:35 +02:00
Eren Gölge 790946faec formating speakers.py 2021-04-21 13:12:11 +02:00
Eren Gölge ab313814de initial SpeakerManager implementation 2021-04-21 13:11:46 +02:00
Eren Gölge 09890c7421 fix the glow-tts in setup_model 2021-04-21 13:10:40 +02:00
Eren Gölge d2fa8add1f add ```unique``` param to keep scglow models compatible (they are duplicate symbols ins the character set) 2021-04-16 19:40:13 +02:00
Eren Gölge 47e356cb48 code styling 2021-04-16 16:01:40 +02:00
kirianguiller 48ae52a9a3 handle multi speaker and gst in Synthetizer class 2021-04-16 15:54:49 +02:00
Eren Gölge 9cc17be53a formatting and a small bug fix in Tacotron model 2021-04-15 16:36:51 +02:00
Eren Gölge 3de5a89154 optionally enable prenet dropout at inference time for tacotron models 2021-04-13 13:24:56 +02:00
Eren Gölge 480e2f7888 docstring update and better handling make_symbols 2021-04-12 16:40:49 +02:00
Eren Gölge e5b9607bc3 isort all imports 2021-04-09 00:45:20 +02:00
Eren Gölge 0e79fa86ad format with black and pylint 2.7.3 2021-04-09 00:38:08 +02:00
Eren Gölge 7a382a5c2b stowed aligntts commit and small refactoring with feed_forward layers 2021-03-30 14:39:16 +02:00
Eren Gölge 844e8e0ed4 adapt align_tts and model name handling 2021-03-30 14:39:16 +02:00
Eren Gölge 2b3e12ea49 correct imports after refactoring, add AlignTTS (old SSMAS) and some formatting 2021-03-30 14:39:16 +02:00
Eren Gölge e8cf8cb00e restructure TF tacotron files 2021-03-30 14:39:16 +02:00
Eren Gölge bdfd1f8a89 linter fix 2021-03-16 19:13:32 +01:00
WeberJulian 11e25a7125 fix linter issues 2021-03-16 19:13:01 +01:00
WeberJulian 1574d8dd39 fix french_cleaners 2021-03-16 19:13:01 +01:00
Eren Gölge 94805236fb Merge branch 'dev' of https://github.com/coqui-ai/TTS into dev 2021-03-08 15:21:06 +01:00
Eren Gölge 9a48ba3821 a ton of linter updates 2021-03-08 05:06:54 +01:00
kirianguiller 557239db7f remove re.Match typing in '_number_replace()' 2021-03-08 02:59:48 +01:00
kirianguiller 9ab07f94e2 modify according to PR reviews 2021-03-08 02:59:48 +01:00
kirianguiller 42ba30eb8f <add> Chinese mandarin implementation (tacotron2) 2021-03-08 02:59:24 +01:00
kirianguiller e85658ac2b remove re.Match typing in '_number_replace()' 2021-03-08 02:57:11 +01:00
kirianguiller 0d4525322c modify according to PR reviews 2021-03-08 02:57:11 +01:00
kirianguiller e6fd118cf8 <add> Chinese mandarin implementation (tacotron2) 2021-03-08 02:57:11 +01:00
Eren Gölge 0e1e60bef0 remove redundancy 2021-03-08 02:54:47 +01:00
Eren Gölge 55fc50b26d update test_text_processing for espeak-ng 2021-03-08 02:54:47 +01:00
Eren Gölge 5b8a6736a7 remove _phoneme_punctuations 2021-03-08 02:54:47 +01:00
Eren Gölge 62a8eba3b2 parse_characters function 2021-03-08 02:54:47 +01:00
Eren Gölge 0b33acdcca enable saving model characters in io.py 2021-03-08 02:54:47 +01:00
Eren Gölge 9fefc79f0c fix make_symbols 2021-03-08 02:54:47 +01:00
Eren Gölge 5f1018abee fix spelling of a def argument and parse phonemes from config.json if
use_phonemes is True
2021-03-08 02:54:47 +01:00
Eren Gölge 6cd642c2e1 add missing phonemes to test_config.json 2021-03-08 02:54:47 +01:00
Eren Gölge ee58ff2d38 add russian phoneme char 2021-03-08 02:54:47 +01:00
Eren Gölge 90d4f08d6c reorder imports 2021-03-08 02:48:31 +01:00
kirianguiller 7f36d91131 update chinese model 2021-03-01 14:55:05 +01:00
kirianguiller 3911b87e54 remove re.Match typing in '_number_replace()' 2021-02-17 20:53:56 +01:00
kirianguiller fb0655d1e7 modify according to PR reviews 2021-02-17 20:53:56 +01:00
kirianguiller c4c7bc1b88 <add> Chinese mandarin implementation (tacotron2) 2021-02-17 20:53:56 +01:00
Eren Gölge ff218e2370 remove redundancy 2021-02-15 12:07:02 +00:00
Eren Gölge 4244096ccb update test_text_processing for espeak-ng 2021-02-12 14:07:26 +00:00
Eren Gölge b28c724c04 remove _phoneme_punctuations 2021-02-12 12:10:57 +00:00
Eren Gölge 593cedee14 parse_characters function 2021-02-12 12:05:56 +00:00
Eren Gölge 2abfff17f9 enable saving model characters in io.py 2021-02-12 12:04:41 +00:00
Eren Gölge 43f54d2dce fix make_symbols 2021-02-11 15:26:52 +00:00