Commit Graph

487 Commits

Author SHA1 Message Date
Eren G??lge 2cf89b88c9 Make style 2022-07-12 14:12:57 +02:00
Eren G??lge a6f73a18cb Fix BCELoss adressing #1192 2022-07-12 14:11:34 +02:00
Eren G??lge eefd482f51 Separate loss tests 2022-07-12 12:35:46 +02:00
WeberJulian 5cef6facb0
Fix tokenizer for punc only (#1717) 2022-07-06 22:59:41 +02:00
Eren Gölge f70e82cd19
Use fsspec and torch for embedding file IO (#1581)
* Use fsspec and torch for embedding file

* Fixup

* Fix load and save files

* Fix compute embedding script

* Set use_cuda to true if available

* Add dummy speakers.pth file

* Make style

* Change default speakers file extension

Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
2022-06-01 13:49:42 +02:00
a-froghyar 8be21ec387
Capacitron (#977)
* new CI config

* initial Capacitron implementation

* delete old unused file

* fix empty formatting changes

* update losses and training script

* fix previous commit

* fix commit

* Add Capacitron test and first round of test fixes

* revert formatter change

* add changes to the synthesizer

* add stepwise gradual lr scheduler and changes to the recipe

* add inference script for dev use

* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting

* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model

* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style

* chore: fix linting

* feat: add Tacotron 2 support

* leftover from dev

* chore:rename parser args

* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers

* chore: revert arbitrary trainer changes

* fmt: revert formatting bug

* formatting again

* formatting fixed

* fix: log func

* fix: update optimizer
- Implemented load_state_dict for continuing training

* fix: clean optimizer init for standard models

* improvement: purge espeak flags and add training scripts

* Delete capacitronT2.py

delete old training script, new one is pushed

* feat: capacitron trainer methods
- extracted capacitron specific training  operations from the trainer into custom
methods in taco1 and taco2 models

* chore: renaming and merging capacitron and gst style args

* fix: bug fixes from the previous commit

* fix: implement state_dict method on CapacitronOptimizer

* fix: call method

* fix: inference naming

* Delete train_capacitron.py

* fix: synthesize

* feat: update tests

* chore: fix style

* Delete capacitron_inference.py

* fix: fix train tts t2 capacitron tests

* fix: double forward in T2 train step

* fix: double forward in T1 train step

* fix: run make style

* fix: remove unused import

* fix: test for T1 capacitron

* fix: make lint

* feat: add blizzard2013 recipes

* make style

* fix: update recipes

* chore: make style

* Plot test sentences in Tacotron

* chore: make style and fix import

* fix: call forward first before problematic floordiv op

* fix: update recipes

* feat: add min_audio_len to recipes

* aux_input["style_mel"]

* chore: make style

* Make capacitron T2 recipe more stable

* Remove T1 capacitron Ljspeech

* feat: implement new grad clipping routine and update configs

* make style

* Add pretrained checkpoints

* Add default vocoder

* Change trainer package

* Fix grad clip issue for tacotron

* Fix scheduler issue with tacotron

Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-05-20 16:17:11 +02:00
Edresson Casanova ee99a6c1e2 Fix voice conversion inference (#1583)
* Add voice conversion zoo test

* Fix style

* Fix unit test
2022-05-20 15:50:25 +02:00
Edresson Casanova c6008e5235
Add audio length sampler balancer (#1561)
* Add audio length sampler balancer

* Add unit tests
2022-05-12 19:59:19 +02:00
Taras Sereda f9d91a55f2
Improve data_path resolvement (#1567) 2022-05-12 13:10:35 +02:00
Edresson Casanova 8d228ab22a
Trick to Upsampling to High sampling rates using VITS model (#1456)
* Add upsample VITS support

* Fix the bug in inference

* Fix lint checks

* Add RMS based norm in save_wav method

* Style fix

* Add the period for VITS multi-period discriminator in model_args

* Bug fix in speaker encoder load in inference time

* Add unit tests

* Remove useless detach_z_vocoder parameter

* Add docs for VITS upsampling

* Fix the docs

* Rename TTS_part_sample_rate to encoder_sample_rate

* Add upsampling_init and upsampling_z methods

* Add asserts for encoder_sample_rate part

* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Edresson Casanova 060e0f9368
Add EmbeddingManager and BaseIDManager (#1374) 2022-03-31 13:41:16 +02:00
WeberJulian c66a6241fd
Enforce phonemizer definition for synthesis (#1441)
* Enforce phonemizer definition for synthesis

* Fix train_tts, tokenizer init can now edit config

* Add small change to trigger CI pipeline

* fix wrong output path for one tts_test

* Fix style

* Test config overides by args and tokenizer

* Fix style
2022-03-25 23:15:33 +01:00
Edresson Casanova 37896e1743
Bug fix in freeze encoder (#1391)
* Fix the bug in freeze encoder

* Remove emb_l definition for non-multilingual training

* Fix unit tests
2022-03-24 18:16:04 +01:00
Eren Gölge 72d85e53c9
Update model file extension (#1422)
* Update model file ext to ```.pth```

* Update docs

* Rename more

* Find model files
2022-03-22 17:55:00 +01:00
Eren Gölge 0870a4faa2
Make style (#1405) 2022-03-16 12:13:55 +01:00
Edresson Casanova f81892483d
REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349)
* Rename Speaker encoder module to encoder

* Add a generic emotion dataset formatter

* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config

* Add class map in emotion config

* Add Base encoder config

* Add evaluation encoder script

* Fix the bug in plot_embeddings

* Enable Weight decay for encoder training

* Add argumnet to disable storage

* Add Perfect Sampler and remove storage

* Add evaluation during encoder training

* Fix lint checks

* Remove useless config parameter

* Active evaluation in speaker encoder test and use multispeaker dataset for this test

* Unit tests fixs

* Remove useless tests for speedup the aux_tests

* Use get_optimizer in Encoder

* Add BaseEncoder Class

* Fix the unitests

* Add Perfect Batch Sampler unit test

* Add compute encoder accuracy in a function
2022-03-11 14:43:40 +01:00
Edresson Casanova 917f417ac4
Add alphas to control language and speaker balancer (#1216)
* Add alphas to control language and speaker balancer

* Add docs for speaker and language samplers

* Change the Samplers weights to float for save memory

* Change the test_samplers to unittest format

* Add get_sampler method in BaseTTS

* Fix rebase issues

* Add language and speaker samplers support for DDP training

* Rename distributed sampler wrapper

* Remove the DistributedSamplerWrapper and use the one from Trainer

* Bugfix after rebase

* Move the samplers config to tts config
2022-03-10 14:56:09 +01:00
Eren Gölge 1425a023fe Make style and lint 2022-03-02 13:25:35 +01:00
Eren Gölge 27b67b7945 Fix import 2022-03-02 09:15:20 +01:00
Eren Gölge 690de1ab06 Update Characters and add more tests 2022-02-25 11:32:44 +01:00
Eren Gölge 14c117978d Fix return outputs 2022-02-25 11:31:56 +01:00
Eren Gölge 424d04e4f6 Make stlye 2022-02-25 11:31:56 +01:00
Eren Gölge c0b40a0cb7 Update VITS tests 2022-02-25 11:31:20 +01:00
Eren Gölge b0cff949f5 Update tests 2022-02-25 11:28:14 +01:00
Eren Gölge 1f0c8179da Make style 2022-02-25 11:26:59 +01:00
Eren Gölge ef63c99524 Implement `start_by_longest` option for TTSDatase 2022-02-25 11:26:18 +01:00
Eren Gölge c4c471d61d Allow padding for shorter segments 2022-02-25 11:25:48 +01:00
Eren Gölge bc2243bac4 Fix tests 2022-02-25 11:25:00 +01:00
Eren Gölge 21940952bf Make lint 2022-02-25 11:25:00 +01:00
Eren Gölge 146fbfd7c9 Extend unittests 2022-02-25 11:25:00 +01:00
Eren Gölge 2fe16de8e3 Make lint 2022-02-25 11:25:00 +01:00
Eren Gölge d0eb3e4ef2 Add get_tests_data_path 2022-02-25 11:24:13 +01:00
Eren Gölge 235f7d9b02 Extend glow_tts model tests 2022-02-25 11:24:13 +01:00
Eren Gölge 5176ae9e53 Fixes small compat. issues 2022-02-25 11:21:19 +01:00
Eren Gölge edec27738b Delete `use_espeak_phonemes` from tests 2022-02-25 11:18:00 +01:00
Eren Gölge 0a47a7eac0 Update tests 2022-02-25 11:12:44 +01:00
Eren Gölge b341951b78 Update loader tests 2022-02-25 11:12:44 +01:00
Eren Gölge 196ae74273 Update data loader tests 2022-02-25 11:05:06 +01:00
Eren Gölge 75c507c36a Update VITS LJspeech recipe 2022-02-25 10:57:35 +01:00
Eren Gölge 04202da1ac Make style 2022-02-25 10:48:03 +01:00
Eren Gölge 961e98a461 Add OOV case to tokenizer tests 2022-02-25 10:48:03 +01:00
Eren Gölge 8c8093ce23 Make style 2022-02-25 10:48:03 +01:00
Eren Gölge f1ea3ad182 Remove old text processing tests 2022-02-25 10:48:02 +01:00
Eren Gölge ba3b60c90f Test TTSTokenizer 2022-02-25 10:48:02 +01:00
Eren Gölge 79a84410f2 Test punctuations 2022-02-25 10:48:02 +01:00
Eren Gölge 99d9bb7a17 Test Phonemizers 2022-02-25 10:48:02 +01:00
Eren Gölge a1df4f9887 Test character classes 2022-02-25 10:45:24 +01:00
Eren Gölge a51b031bff
Merge branch 'dev' into dev-fix-glowtts-infer 2022-02-21 12:01:40 +03:00
Edresson Casanova 28a7464975
Fix the bug in split dataset function (#1251)
* Fix the bug in split_dataset

* Make eval_split_size configurable

* Change test_loader to use load_tts_samples function

* Change eval_split_portion to eval_split_size and permits to set the absolute number of samples in eval

* Fix samplers unit test

* Add data unit test on GitHub workflow
2022-02-21 11:59:36 +03:00
Edresson Casanova 531821545e Fix inference test issue 2022-02-19 12:21:32 +00:00
Edresson Casanova 5218d6b7a4 Fix unit tests issue 2022-02-19 12:15:03 +00:00
Edresson Casanova fc7081fc5e Add Inference test using TTS API in all models unit tests 2022-02-18 21:06:08 +00:00
Edresson Casanova 5cca4aa8ae Add FastPitch Speaker embedding train unit test 2022-02-18 20:16:52 +00:00
Edresson Casanova 759f9ac76a Add Glow-TTS d-vectors training unit test 2022-02-18 20:03:36 +00:00
Edresson Casanova 06cad27e31 Add Glow-TTS multi-speaker unit test 2022-02-18 18:20:47 +00:00
Eren Gölge 127118c637
Update TTS.tts formatters (#1228)
* Return Dict from tts formatters

* Make style
2022-02-11 23:03:43 +01:00
Edresson Casanova 0860d73cf8
Remove Tensorflow requeriment (#1225)
* Remove TF modules

* Remove TF unit tests

* Remove TF vocoder modules

* Remove TF convert scripts

* Remove TF requirement

* Remove the Docs TF instructions

* Remove TF inference support
2022-02-10 16:14:54 +01:00
Eren Gölge 8fd1ee1926 Print urls when BadZipError 2022-01-01 15:26:35 +00:00
Eren Gölge 254c110ec1 Print testing model 2022-01-01 13:57:01 +00:00
Eren Gölge 61874bc0a0 Fix your_tts inference from the listed models 2021-12-31 13:45:05 +00:00
Eren Gölge 36cef5966b Fix resnet speaker encoder 2021-12-30 15:36:35 +00:00
Eren Gölge 348b5c96a2 Fix speaker encoder test 2021-12-30 15:36:35 +00:00
Eren Gölge 497332bd46 Add custom asserts to tests 2021-12-30 14:08:17 +00:00
Eren Gölge 2033e17c44 Add VITS model tests 2021-12-29 16:51:40 +00:00
Eren Gölge 56378b12f7 Fix speaker encoder init 2021-12-21 12:26:25 +00:00
Eren Gölge 704dddcffa Make style 2021-12-20 11:54:10 +00:00
WeberJulian 8b3769c957 Fix seed in test_samplers to avoid random fails 2021-12-20 11:54:10 +00:00
WeberJulian 6f01eed672 Add test for language_weighted_sampler 2021-12-20 11:54:10 +00:00
Edresson a57ddfb4ec Add remove silence vad script Unit test 2021-12-20 11:54:10 +00:00
Edresson e068fab6b2 Add find unique phonemes unit tests 2021-12-20 11:54:10 +00:00
WeberJulian 54e33bff61 Make a multilingual test use chars 2021-12-20 11:54:10 +00:00
WeberJulian 09eda31a3f Fix tests 2021-12-20 11:54:10 +00:00
Edresson 06d89f93a8 Add VITS multilingual d-vectors unit test 2021-12-20 11:54:10 +00:00
Edresson f394d60695 Fix the bug in multispeaker vits 2021-12-20 11:54:10 +00:00
WeberJulian 1472b6df49 make style 2021-12-20 11:54:10 +00:00
WeberJulian 3b5592abcf fix test vits 2021-12-20 11:54:10 +00:00
Edresson bbdb5c38e6 Add VITS multispeaker train unit test 2021-12-20 11:54:09 +00:00
Edresson 92f7f4f400 Active the multispeaker mode in multilingual training 2021-12-20 11:54:09 +00:00
Edresson e68b042493 Add VITS d-vector unit test 2021-12-20 11:54:09 +00:00
Edresson 959cc8f03c Add VITS multilingual unit test 2021-12-20 11:54:09 +00:00
Edresson 3fbbebd74d Fix pylint issues 2021-12-20 11:54:09 +00:00
Michael Hansen 3bc043faeb
Upgrade to gruut 2.0 (#882) 2021-10-31 11:41:55 +01:00
Eren Gölge 2df0752e73
Model zoo tests (#900)
* Fix VITS model multi-speaker init

* Remove gdrive support in model manager

* Add model zoo tests
2021-10-29 17:54:16 +02:00
Eren Gölge 25759d6a61 Split tests 2021-10-21 17:30:15 +00:00
Eren Gölge e62d3c5cf7 Use absolute imports for tts configs and models 2021-10-21 16:29:06 +00:00
Eren Gölge 4dbe7ed0de Fix all-zero duration case for GlowTTS 2021-10-01 09:24:26 +00:00
Eren Gölge 7edbe04fe0 Fix WaveRNN config and test 2021-09-30 16:20:12 +00:00
Eren Gölge 4cacbf0d45 Fix WaveRNN test 2021-09-30 14:47:56 +00:00
Eren Gölge 2766dd1d6e
Fix #813 - GlowTTS training (#814)
* Fix #813

* Update glow_tts recipe

* Fix glow-tts test

* Linter fix

* Run data dep init only in training
2021-09-17 20:06:55 +02:00
Eren Gölge 1e7db32e90 Test FastPitch train 2021-09-11 10:19:47 +00:00
Eren Gölge 26f76fce22 Remove SpeedySpeech from .models.json 2021-09-10 17:47:27 +00:00
Eren Gölge 7ec23e69d4 Skip TF tests on GPU 2021-09-10 17:28:58 +00:00
Eren Gölge 1ebf9ec6bf Remove speedy_speech implementation 2021-09-10 17:28:20 +00:00
Eren Gölge 7d8f77385a Use `glow-tts` in synthesis tests 2021-09-10 17:27:33 +00:00
Eren Gölge d6e29ef98a Style update 2021-09-10 08:30:33 +00:00
Eren Gölge 3abc3a1d32 Fix GPU init in tests 2021-09-10 08:28:10 +00:00
Eren Gölge ed4b1d8514 Test `TTS.tts.utils.helpers` 2021-09-10 08:25:21 +00:00
Eren Gölge 8b7e094bde Implement `forward_tts`
- Generic API for feed-forward TTS models (FastPitch, SpeedySpeech)

- Tests for `forward-tts`

- Edit  FastPitchConfig and SpeedySpeechConfig to use `forward_tts`
2021-09-10 08:24:33 +00:00
Eren Gölge 4761853c5c Fix imports 2021-09-08 13:34:40 +00:00
Eren Gölge e72c265cd4 Fix linter issues 2021-09-06 15:16:58 +00:00