Commit Graph

781 Commits

Author SHA1 Message Date
Eren Gölge b3fb0e19e8 Implement get_state_dict 2022-05-17 13:46:05 +02:00
Eren Gölge 96779e75ba Return duration by ForwardTTS inference 2022-05-17 13:46:05 +02:00
Eren Gölge 9291d13c69 Make style 2022-05-17 13:46:05 +02:00
Eren Gölge 0b585b46c1 Refactor TTSDataset to use numpy transforms 2022-05-17 13:44:01 +02:00
Eren Gölge 4171f4e9c6 Update ForwardTTSE2eLoss 2022-05-17 13:44:01 +02:00
Eren Gölge dbe5eb992e Make AP optional in BaseTTS 2022-05-17 13:44:01 +02:00
Eren Gölge c3fb49bf76 Refactor ForwardTTS to skip decoder 2022-05-17 13:44:01 +02:00
Eren Gölge e7c5db0d97 Add missing kernel size attr to transformer layer 2022-05-17 13:44:01 +02:00
Eren Gölge 231c69b12e Remove AP from FastPitchE2e 2022-05-17 13:44:01 +02:00
Eren Gölge 4556c61902 Update fastpitche2e recipe 2022-05-17 13:44:01 +02:00
Eren Gölge 5f9d559419 Update import statements 2022-05-17 13:44:01 +02:00
Eren Gölge 9f8d86b716 Remove redundancy 2022-05-17 13:42:09 +02:00
Eren Gölge 760f045aaa Rename vars in VITS 2022-05-17 13:42:09 +02:00
Eren Gölge 775a6ab6ee Add cond layer in decoder 2022-05-17 13:38:53 +02:00
Eren Gölge 28a53c7462 Refactor multi-speaker init in ForwardTTS 2022-05-17 13:38:53 +02:00
Eren Gölge c125024da0 Implement BaseTTSE2E 2022-05-17 13:38:53 +02:00
Eren Gölge b16613c5ad Implement ForwardTTSE2E Loss 2022-05-17 13:38:53 +02:00
Eren Gölge 85731482e1 Implement FastPitchE2EConfig 2022-05-17 13:38:53 +02:00
Eren Gölge fccda5ae7b Implement ForwardTTSE2Eg 2022-05-17 13:38:53 +02:00
Edresson Casanova e5d8ec2402
Change the VITS upsampling interpolation trick to linear (#1564) 2022-05-13 10:52:39 +02:00
Edresson Casanova c6008e5235
Add audio length sampler balancer (#1561)
* Add audio length sampler balancer

* Add unit tests
2022-05-12 19:59:19 +02:00
Eren Gölge 6e460b7e42
Add an assert for the upsampling trick (#1538) 2022-05-12 19:55:24 +02:00
Edresson Casanova a97eed696a
Fix the bug in eSpeak wrapper for eSpeak version 1.48.15 (#1560) 2022-05-12 15:15:18 +02:00
Eren Gölge e45ae57aef
Merge pull request #1550 from coqui-ai/fix-upsampling-asserts
Fix VITS upsampling asserts
2022-05-12 14:51:41 +02:00
Edresson Casanova 175ca06388 Add reinit text encoder and duration predictor parameter (#1562)
* Add reinit encoder and duration predictor option

* Add .data to prevent any overlooked autograd hook
2022-05-12 09:08:36 -03:00
Edresson Casanova 182711043c Fix the VITS upsampling asserts
Fix style
2022-05-12 09:08:29 -03:00
Eren Gölge c3f8c4d5eb Return default SpeakerManager if no d_vector_file 2022-05-11 11:31:45 +02:00
Eren Gölge 121e9ed685 Pass use_cuda to init_encoder 2022-05-11 11:31:17 +02:00
Eren Gölge c18bd21b3f Return durations at VITS inference 2022-05-11 11:30:05 +02:00
Eren Gölge 5021a03de0 Use torch.no_grad for VITS inference 2022-05-11 11:29:36 +02:00
Eren Gölge 3f03e3012c Fix batch_group_size in VITS 2022-05-07 13:44:44 +02:00
code-review-doctor fa887ef5f9
Fix issue probably-meant-fstring found at https://codereview.doctor (#1532) 2022-05-07 13:33:40 +02:00
WeberJulian fbdf76b2fc returns y_mask in VITS inference (#1540)
* returns y_mask

* make style
2022-05-03 13:49:24 +02:00
Edresson Casanova 8d228ab22a
Trick to Upsampling to High sampling rates using VITS model (#1456)
* Add upsample VITS support

* Fix the bug in inference

* Fix lint checks

* Add RMS based norm in save_wav method

* Style fix

* Add the period for VITS multi-period discriminator in model_args

* Bug fix in speaker encoder load in inference time

* Add unit tests

* Remove useless detach_z_vocoder parameter

* Add docs for VITS upsampling

* Fix the docs

* Rename TTS_part_sample_rate to encoder_sample_rate

* Add upsampling_init and upsampling_z methods

* Add asserts for encoder_sample_rate part

* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Edresson Casanova 060e0f9368
Add EmbeddingManager and BaseIDManager (#1374) 2022-03-31 13:41:16 +02:00
WeberJulian c66a6241fd
Enforce phonemizer definition for synthesis (#1441)
* Enforce phonemizer definition for synthesis

* Fix train_tts, tokenizer init can now edit config

* Add small change to trigger CI pipeline

* fix wrong output path for one tts_test

* Fix style

* Test config overides by args and tokenizer

* Fix style
2022-03-25 23:15:33 +01:00
Edresson Casanova 37896e1743
Bug fix in freeze encoder (#1391)
* Fix the bug in freeze encoder

* Remove emb_l definition for non-multilingual training

* Fix unit tests
2022-03-24 18:16:04 +01:00
Eren Gölge 1c3623af33
Fix model manager (#1436)
* Fix manager

* Make style
2022-03-23 12:57:14 +01:00
Eren Gölge fd56fabb21
Fix #1380 (#1409) 2022-03-16 12:38:27 +01:00
Eren Gölge 0870a4faa2
Make style (#1405) 2022-03-16 12:13:55 +01:00
WeberJulian 690c96ed28
Fix default phonemizer for ja and zh (#1399) 2022-03-16 12:13:22 +01:00
Edresson Casanova f81892483d
REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349)
* Rename Speaker encoder module to encoder

* Add a generic emotion dataset formatter

* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config

* Add class map in emotion config

* Add Base encoder config

* Add evaluation encoder script

* Fix the bug in plot_embeddings

* Enable Weight decay for encoder training

* Add argumnet to disable storage

* Add Perfect Sampler and remove storage

* Add evaluation during encoder training

* Fix lint checks

* Remove useless config parameter

* Active evaluation in speaker encoder test and use multispeaker dataset for this test

* Unit tests fixs

* Remove useless tests for speedup the aux_tests

* Use get_optimizer in Encoder

* Add BaseEncoder Class

* Fix the unitests

* Add Perfect Batch Sampler unit test

* Add compute encoder accuracy in a function
2022-03-11 14:43:40 +01:00
Edresson Casanova 36e9ea2f97
Open bible dataset formatter (#1365)
* Add support for voice conversion inference

* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json

* Rebase bug fix

* Use the average d-vector for inference

* Fix the bug in find unique chars script

* Add OpenBible formatter

Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-03-11 10:43:31 +01:00
Edresson Casanova dbe9da7f15
Add Voice conversion inference support (#1337)
* Add support for voice conversion inference

* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json

* Rebase bug fix

* Use the average d-vector for inference
2022-03-10 14:57:12 +01:00
Edresson Casanova 917f417ac4
Add alphas to control language and speaker balancer (#1216)
* Add alphas to control language and speaker balancer

* Add docs for speaker and language samplers

* Change the Samplers weights to float for save memory

* Change the test_samplers to unittest format

* Add get_sampler method in BaseTTS

* Fix rebase issues

* Add language and speaker samplers support for DDP training

* Rename distributed sampler wrapper

* Remove the DistributedSamplerWrapper and use the one from Trainer

* Bugfix after rebase

* Move the samplers config to tts config
2022-03-10 14:56:09 +01:00
Edresson Casanova f381e29b91
REBASED: Add support for the speaker encoder training using torch spectrograms (#1348)
* Add support for the speaker encoder training using torch spectrograms

* Remove useless function in speaker encoder dataset class
2022-03-10 14:54:51 +01:00
Eren Gölge c670365507 Fix VCTK recipe and formatter 2022-03-08 14:20:34 +01:00
Eren Gölge e9d9028b4d Revert cleaner name 2022-03-06 12:57:06 +01:00
Eren Gölge 764c7fa4a4 Rename phoneme_cleaners 2022-03-06 12:09:54 +01:00
Eren Gölge dd4287de1f Update models 2022-03-03 20:23:00 +01:00