Commit Graph

1374 Commits

Author SHA1 Message Date
Eren Gölge 8d2bb284ac Add UK vocoder models 2021-12-21 13:13:35 +00:00
Eren Gölge 56378b12f7 Fix speaker encoder init 2021-12-21 12:26:25 +00:00
Eren Gölge c9c1fa0548 Fix multi-speaker init in Synthesizer 2021-12-21 09:44:07 +00:00
Eren Gölge f769595112 Add more listing options to ModelManager 2021-12-20 11:54:10 +00:00
Eren Gölge a25269d897 Remove commented code 2021-12-20 11:54:10 +00:00
Eren Gölge 473414d4af Implement init_speaker_encoder and change arg names 2021-12-20 11:54:10 +00:00
Eren Gölge d29c3780d1 Use speaker_encoder from speaker manager in Vits 2021-12-20 11:54:10 +00:00
Eren Gölge 4d13b887f5 Change speaker_idx to speaker_name 2021-12-20 11:54:10 +00:00
Eren Gölge 4c50f6f4df Add functions to get and check and argument in config and config.model_args 2021-12-20 11:54:10 +00:00
Eren Gölge 3c6d7f495c Fixup 2021-12-20 11:54:10 +00:00
Eren Gölge 3818bd0c23 Fixup 2021-12-20 11:54:10 +00:00
Eren Gölge 79de38ca76 Rename setup_model to setup_speaker_encoder_model 2021-12-20 11:54:10 +00:00
Eren Gölge 35a781fb90 Fix synthesizer reading `use_language_embedding` 2021-12-20 11:54:10 +00:00
Eren Gölge 7a987db62b Use torchaudio for ResNet speaker encoder 2021-12-20 11:54:10 +00:00
Eren Gölge 649dc9e9da Remove redundant code 2021-12-20 11:54:10 +00:00
Eren Gölge 704dddcffa Make style 2021-12-20 11:54:10 +00:00
WeberJulian 54b7fb4e4a Fix zoo tests 2021-12-20 11:54:10 +00:00
WeberJulian a564eb9f54 Add support for multi-lingual models in CLI 2021-12-20 11:54:10 +00:00
WeberJulian 2bbcb558dc Prevent weighted sampler use when num_gpus > 1 2021-12-20 11:54:10 +00:00
WeberJulian 74cedfac38 Revert init multispeaker change 2021-12-20 11:54:10 +00:00
WeberJulian 9cfbacc622 Fix trailing space 2021-12-20 11:54:10 +00:00
WeberJulian 6b03943526 Move multilingual logic out of the trainer 2021-12-20 11:54:10 +00:00
Edresson 818dc4ccd8 Add Docstring for TorchSTFT 2021-12-20 11:54:10 +00:00
Edresson 67dda0abe1 Add the SCL resample TODO 2021-12-20 11:54:10 +00:00
WeberJulian 8b52fb89d1 Fix merge bug 2021-12-20 11:54:10 +00:00
WeberJulian 09eda31a3f Fix tests 2021-12-20 11:54:10 +00:00
Edresson 78a23e19df Fix pylint checks 2021-12-20 11:54:10 +00:00
WeberJulian 4cd0e4eb0d Remove self.audio_config from VITS 2021-12-20 11:54:10 +00:00
Edresson d39200e69b Remove torchaudio requeriment 2021-12-20 11:54:10 +00:00
WeberJulian 2e516869a1 Fix trailing whitespace 2021-12-20 11:54:10 +00:00
WeberJulian ffc269eaf4 Update docstring 2021-12-20 11:54:10 +00:00
Edresson 12968532fe Add the language embedding dim in the duration predictor class 2021-12-20 11:54:10 +00:00
Edresson 4196a42de7 Get the number speaker from the Speaker Manager property 2021-12-20 11:54:10 +00:00
Edresson f394d60695 Fix the bug in multispeaker vits 2021-12-20 11:54:10 +00:00
Edresson 90eac13bb2 Rename ununsed_speakers to ignored_speakers 2021-12-20 11:54:10 +00:00
Edresson f34596d957 Fix function name 2021-12-20 11:54:10 +00:00
Edresson 45d0b04179 Lint fixs 2021-12-20 11:54:10 +00:00
Edresson 85418ffeaa Fix the bug in extract tts spectrograms 2021-12-20 11:54:10 +00:00
Edresson 2b2cecaea2 Set the new_fields in copy_model_files as None by default 2021-12-20 11:54:10 +00:00
Edresson 34749f8727 Remove the call to get_speaker_manager 2021-12-20 11:54:10 +00:00
Edresson b769b49e34 Remove the data from the set_d_vectors_from_file function 2021-12-20 11:54:10 +00:00
Edresson 9daa33d1fd Remove unusable speaker manager function 2021-12-20 11:54:10 +00:00
Edresson 8c22d5ac49 Turn more clear the VITS loss function 2021-12-20 11:54:10 +00:00
Edresson 6fc3b9e679 Remove the unusable fine-tuning model 2021-12-20 11:54:10 +00:00
Edresson 352aa69eca Create a module for the VAD script 2021-12-20 11:54:10 +00:00
WeberJulian 631addf33b fix d-vector 2021-12-20 11:54:10 +00:00
WeberJulian da6c1e858c Fix small issues 2021-12-20 11:54:10 +00:00
WeberJulian e8af6a9f08 Fix use_speaker_embedding logic 2021-12-20 11:54:10 +00:00
WeberJulian 23d789c072 Fix continue path 2021-12-20 11:54:10 +00:00
WeberJulian 120332d53f Fix phonemes 2021-12-20 11:54:10 +00:00
WeberJulian 846bf16f02 fix imports for load_meta_data 2021-12-20 11:54:10 +00:00
WeberJulian 1340938159 fix phonemes per language 2021-12-20 11:54:10 +00:00
WeberJulian e995a63bd6 fix linter 2021-12-20 11:54:10 +00:00
WeberJulian 1472b6df49 make style 2021-12-20 11:54:10 +00:00
WeberJulian 4d721bcabd fix test sentence synthesis 2021-12-20 11:54:10 +00:00
WeberJulian 0804806727 fix f0_cache_path in dataset 2021-12-20 11:54:10 +00:00
WeberJulian 3b5592abcf fix test vits 2021-12-20 11:54:10 +00:00
WeberJulian 2a2b5767c2 fix collate_fn 2021-12-20 11:54:10 +00:00
Julian WEBER 78c2d12a91 PitchExtractor 2021-12-20 11:54:10 +00:00
Julian WEBER 9a2f91327c get_aux_input 2021-12-20 11:54:10 +00:00
Julian WEBER b3abd01793 Merge dataset 2021-12-20 11:54:10 +00:00
Edresson 10ff90d6d2 Add remove silence VAD script 2021-12-20 11:54:10 +00:00
Edresson 1bd1a0546b Add audio resample in the speaker consistency loss 2021-12-20 11:54:10 +00:00
Edresson 1c6bcda950 Add freeze vocoder generator and flow-based decoder option 2021-12-20 11:54:10 +00:00
WeberJulian 2b952d8b97 freeze vits parts 2021-12-20 11:54:10 +00:00
WeberJulian 005bba60b0 get_speaker_weighted_sampler 2021-12-20 11:54:10 +00:00
Edresson 9de4539422 Update the VITS model docs 2021-12-20 11:54:10 +00:00
Edresson eeb8ac07d9 Add voice conversion fine tuning mode 2021-12-20 11:54:10 +00:00
Edresson 690b37d0ab Add support to use the speaker encoder as loss function in VITS model 2021-12-20 11:54:09 +00:00
Edresson 9b011b1cb3 Add H/ASP original checkpoint support 2021-12-20 11:54:09 +00:00
Edresson 0bdfd3cb50 Add the ValueError in the restore checkpoint exception to avoid problems with the optimizer restauration when new keys are addition 2021-12-20 11:54:09 +00:00
Edresson de78556655 Fix the optimizer parameters bug in multilingual and multispeaker training 2021-12-20 11:54:09 +00:00
Edresson 9be5b75da3 Fix bug after merge 2021-12-20 11:54:09 +00:00
Edresson 76251b619a Fix d-vector multispeaker training bug 2021-12-20 11:54:09 +00:00
Edresson 7ef3ddc6ff Fix unit tests 2021-12-20 11:54:09 +00:00
Edresson 36dcd11453 Fix pylint issues 2021-12-20 11:54:09 +00:00
Edresson c53693c155 Implement vocoder Fine Tuning like SC-GlowTTS paper 2021-12-20 11:54:09 +00:00
Edresson f1f016314e Fix the bug in M-AILABS formatter 2021-12-20 11:54:09 +00:00
Edresson c334d39acc Add voice conversion support for the model VITS trained with external speaker embedding 2021-12-20 11:54:09 +00:00
Edresson e997889ba8 Fix bug in VITS multilingual inference 2021-12-20 11:54:09 +00:00
Edresson 7c0b8ec572 Fix bugs in the non-multilingual VITS inference 2021-12-20 11:54:09 +00:00
Edresson 3fbbebd74d Fix pylint issues 2021-12-20 11:54:09 +00:00
Edresson ac9416fb86 Add multilingual inference support 2021-12-20 11:54:09 +00:00
Edresson dcb2374bc9 Add multilingual training support to the VITS model 2021-12-20 11:54:09 +00:00
Edresson f996afedb0 Implement multilingual dataloader support 2021-12-20 11:54:09 +00:00
Edresson 5f1c18187f Fix pylint issues 2021-12-20 11:54:09 +00:00
Edresson d91c595c5a Implement training support with d_vecs in the VITS model 2021-12-20 11:54:09 +00:00
Edresson 6a7db67a91 Allow ignore speakers for all multispeaker datasets 2021-12-20 11:54:09 +00:00
Edresson e0ad838066 Select randomly a speaker from the speaker manager for the test setences 2021-12-20 11:54:09 +00:00
Edresson eb3e8affe1 Save speakers embeddings/ids before starting training 2021-12-20 11:54:09 +00:00
Eren Gölge 37803467aa
Merge pull request #1021 from loganhart420/dataset_downloaders
Add addtional datasets
2021-12-20 10:42:20 +01:00
Reuben Morais 859ac1a54c Include usage instructions in README 2021-12-17 11:37:19 +01:00
loganhart420 103c010eca Add addtional datasets 2021-12-16 07:21:27 -05:00
Jörg Thalheim bce143c738
server: fix compatibility with tts_models/en/ljspeech/fast_pitch (#893) 2021-12-07 14:36:29 +01:00
Eren Gölge babdd84f91 Fix GST inference
commit d3e477875a7e46a101fcf95a1794442823750fe2
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Wed Nov 3 10:16:12 2021 +0000

    Read .wav for GST conditioning from CL

commit 074e6d0874d3b34fb6a4991fc17d66dccd413fbb
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 14:43:47 2021 +0100

    Fix GST during inference in Tacotron2

commit fdece14585ab5a36eed1061a9a838d8e48aa6882
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Wed Nov 3 10:16:12 2021 +0000

    Read .wav for GST conditioning from CL

commit cd29e21b8d0a541ee298d2bf5f67223ad60be38f
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 14:43:47 2021 +0100

    Fix GST during inference in Tacotron2

commit 908ce39370eadcc9fa8510cdb26c9ead87305427
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 12:49:37 2021 +0100

    Make trim_db value negative

commit 1008a2e0f72fa7ca7f0307424f570386f2f16d42
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 12:22:24 2021 +0100

    Set find_endpoint db threshold in config.json
2021-12-07 13:28:49 +00:00
Eren Gölge ce45d9e1af Make style and lint 2021-12-01 10:42:52 +00:00
Eren Gölge 40cb8ac966 Fix #958 2021-12-01 10:33:34 +00:00
Eren Gölge 512ada7548 Fix callbacks against multi-gpu training 2021-12-01 10:32:14 +00:00
Eren Gölge 2ed9e3c241 Fix constant use of noise augment 2021-11-08 09:20:34 +01:00
Eren Gölge b6b14a76af Fix VITS stochastic duration predictor 2021-11-08 09:20:11 +01:00