Commit Graph

25 Commits

Author SHA1 Message Date
Edresson Casanova c6008e5235
Add audio length sampler balancer (#1561)
* Add audio length sampler balancer

* Add unit tests
2022-05-12 19:59:19 +02:00
Edresson Casanova f81892483d
REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349)
* Rename Speaker encoder module to encoder

* Add a generic emotion dataset formatter

* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config

* Add class map in emotion config

* Add Base encoder config

* Add evaluation encoder script

* Fix the bug in plot_embeddings

* Enable Weight decay for encoder training

* Add argumnet to disable storage

* Add Perfect Sampler and remove storage

* Add evaluation during encoder training

* Fix lint checks

* Remove useless config parameter

* Active evaluation in speaker encoder test and use multispeaker dataset for this test

* Unit tests fixs

* Remove useless tests for speedup the aux_tests

* Use get_optimizer in Encoder

* Add BaseEncoder Class

* Fix the unitests

* Add Perfect Batch Sampler unit test

* Add compute encoder accuracy in a function
2022-03-11 14:43:40 +01:00
Edresson Casanova 917f417ac4
Add alphas to control language and speaker balancer (#1216)
* Add alphas to control language and speaker balancer

* Add docs for speaker and language samplers

* Change the Samplers weights to float for save memory

* Change the test_samplers to unittest format

* Add get_sampler method in BaseTTS

* Fix rebase issues

* Add language and speaker samplers support for DDP training

* Rename distributed sampler wrapper

* Remove the DistributedSamplerWrapper and use the one from Trainer

* Bugfix after rebase

* Move the samplers config to tts config
2022-03-10 14:56:09 +01:00
Eren Gölge 8b3ba02c95 Add vocab_dict to model config 2022-02-25 11:31:20 +01:00
Eren Gölge f70e4bb8c6 Add new speakers to the vits model 2022-02-25 11:26:59 +01:00
Eren Gölge ef63c99524 Implement `start_by_longest` option for TTSDatase 2022-02-25 11:26:18 +01:00
Eren Gölge cfaa51fddc Update BaseTTS config 2022-02-25 11:11:35 +01:00
Eren Gölge 4cd690e4c1 Updates BaseTTS and configs 2022-02-25 10:57:35 +01:00
Eren Gölge 3eca5ad060 Update config fields for phonemizer 2022-02-25 10:48:03 +01:00
Edresson Casanova 28a7464975
Fix the bug in split dataset function (#1251)
* Fix the bug in split_dataset

* Make eval_split_size configurable

* Change test_loader to use load_tts_samples function

* Change eval_split_portion to eval_split_size and permits to set the absolute number of samples in eval

* Fix samplers unit test

* Add data unit test on GitHub workflow
2022-02-21 11:59:36 +03:00
Eren Gölge 073a2d2eb0 Refactor VITS multi-speaker initialization 2021-10-15 10:20:00 +00:00
Eren Gölge 91a70e80b2 Refactor TTSDataset
Return a dict by `collate`
Refactor batch handling in `collate`
A couple of bug fixes
2021-09-06 15:16:58 +00:00
Eren Gölge 6e9d4062f2 Add `sort_by_audio_len` option 2021-09-06 15:16:58 +00:00
Eren Gölge 57b3aec1b9 Update docstring format 2021-09-06 15:16:58 +00:00
Eren Gölge bd4e29b4dd Add `compute_linear_spec=False` to `BaseTTSConfig` 2021-08-09 18:02:36 +00:00
Eren Gölge 786170fe7d Update tts model configs 2021-06-28 17:03:19 +02:00
Eren Gölge b500338faa make style 2021-06-28 17:03:19 +02:00
Eren Gölge 8def3c87af trainer-API updates 2021-06-28 17:03:19 +02:00
Michael Hansen 4d8426fa0a Use eSpeak IPA lexicons by default for phoneme models 2021-06-25 14:41:05 +02:00
Eren Gölge c6f22aaa67 fix #509 2021-05-27 13:09:15 +02:00
Eren Gölge 12722501bb styling 2021-05-15 23:48:31 +02:00
Eren Gölge 8b1014d188 add docstrings with default value fixes 2021-05-15 23:45:10 +02:00
Eren Gölge 19fb1d743d style update 2021-05-11 11:30:00 +02:00
Eren Gölge c57f0b46bb reintro use_gst for backwars compat 2021-05-11 11:29:18 +02:00
Eren Gölge 7663bc63c1 add Coqpit configs for the TTS models 2021-05-11 11:29:17 +02:00