Commit Graph

135 Commits

Author SHA1 Message Date
Edresson Casanova bd35371944 Add prosody encoder inference support 2022-05-27 16:00:41 -03:00
Edresson Casanova 2568b722dd Add an option to detach the prosody encoder input 2022-05-27 13:19:06 -03:00
Edresson Casanova a2aecea8f3 Add VAE prosody encoder 2022-05-27 12:53:56 -03:00
Edresson Casanova 312789edbf Condition the prosody encoder on z_p 2022-05-26 15:41:24 -03:00
Edresson Casanova e667fcb057 Support prosody conditional model on decoder input 2022-05-25 17:03:38 -03:00
Edresson Casanova 6c23398518 Add emotion classifier loss 2022-05-25 10:05:52 -03:00
Edresson Casanova d699416735 Add conditional module 2022-05-23 14:05:17 -03:00
Edresson Casanova 6e4b13c6cc Fix unit tests 2022-05-23 10:26:45 -03:00
Edresson Casanova 749b217884 Fix rebase issues 2022-05-20 18:29:39 -03:00
Edresson Casanova 1a88191a5a Disable the reversal prosody encoder speaker loss 2022-05-20 15:18:02 -03:00
Edresson Casanova 8505cd09e8 Add text encoder reversal speaker classifier loss 2022-05-20 15:18:02 -03:00
Edresson Casanova 024e567849 Clean up old code 2022-05-20 15:18:00 -03:00
Edresson Casanova dbaa71c944 Add prosody encoder params on config 2022-05-20 15:03:16 -03:00
Edresson Casanova d49c6ab72f Add reversal classifier loss 2022-05-20 14:56:02 -03:00
Edresson Casanova 004862a79b Add prosody encoder training support 2022-05-20 14:56:02 -03:00
Edresson Casanova d9d9415513 Add emotion embedding in the encoder 2022-05-20 14:56:02 -03:00
Edresson Casanova b2b54668bc Add formatter for the Emotional Speech Dataset 2022-05-20 14:56:02 -03:00
Edresson Casanova d2b5db84f0 Remove useless encoder weights reload 2022-05-20 14:56:00 -03:00
Edresson Casanova 01dd4e4051 Fix Style tests 2022-05-20 14:50:52 -03:00
Edresson Casanova 46762ccf35 Fix style tests 2022-05-20 14:37:43 -03:00
Edresson Casanova 39575f2937 Bug fix in single speaker emotion embedding training 2022-05-20 14:35:12 -03:00
Edresson Casanova a3ecaf3bdd Add emotion external embeddings training unit test 2022-05-20 14:34:22 -03:00
Edresson Casanova 5e0286c534 Add emotion consistency loss 2022-05-20 14:31:50 -03:00
Edresson Casanova 6f95522edf Add Emotion Support for the VITS model 2022-05-20 14:31:09 -03:00
Edresson Casanova ee99a6c1e2 Fix voice conversion inference (#1583)
* Add voice conversion zoo test

* Fix style

* Fix unit test
2022-05-20 15:50:25 +02:00
Edresson Casanova e5d8ec2402
Change the VITS upsampling interpolation trick to linear (#1564) 2022-05-13 10:52:39 +02:00
Eren Gölge 6e460b7e42
Add an assert for the upsampling trick (#1538) 2022-05-12 19:55:24 +02:00
Eren Gölge e45ae57aef
Merge pull request #1550 from coqui-ai/fix-upsampling-asserts
Fix VITS upsampling asserts
2022-05-12 14:51:41 +02:00
Edresson Casanova 175ca06388 Add reinit text encoder and duration predictor parameter (#1562)
* Add reinit encoder and duration predictor option

* Add .data to prevent any overlooked autograd hook
2022-05-12 09:08:36 -03:00
Edresson Casanova 182711043c Fix the VITS upsampling asserts
Fix style
2022-05-12 09:08:29 -03:00
Eren Gölge c18bd21b3f Return durations at VITS inference 2022-05-11 11:30:05 +02:00
Eren Gölge 5021a03de0 Use torch.no_grad for VITS inference 2022-05-11 11:29:36 +02:00
Eren Gölge 3f03e3012c Fix batch_group_size in VITS 2022-05-07 13:44:44 +02:00
WeberJulian fbdf76b2fc returns y_mask in VITS inference (#1540)
* returns y_mask

* make style
2022-05-03 13:49:24 +02:00
Edresson Casanova 8d228ab22a
Trick to Upsampling to High sampling rates using VITS model (#1456)
* Add upsample VITS support

* Fix the bug in inference

* Fix lint checks

* Add RMS based norm in save_wav method

* Style fix

* Add the period for VITS multi-period discriminator in model_args

* Bug fix in speaker encoder load in inference time

* Add unit tests

* Remove useless detach_z_vocoder parameter

* Add docs for VITS upsampling

* Fix the docs

* Rename TTS_part_sample_rate to encoder_sample_rate

* Add upsampling_init and upsampling_z methods

* Add asserts for encoder_sample_rate part

* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Edresson Casanova 060e0f9368
Add EmbeddingManager and BaseIDManager (#1374) 2022-03-31 13:41:16 +02:00
Edresson Casanova 37896e1743
Bug fix in freeze encoder (#1391)
* Fix the bug in freeze encoder

* Remove emb_l definition for non-multilingual training

* Fix unit tests
2022-03-24 18:16:04 +01:00
Eren Gölge 0870a4faa2
Make style (#1405) 2022-03-16 12:13:55 +01:00
Edresson Casanova dbe9da7f15
Add Voice conversion inference support (#1337)
* Add support for voice conversion inference

* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json

* Rebase bug fix

* Use the average d-vector for inference
2022-03-10 14:57:12 +01:00
Edresson Casanova 917f417ac4
Add alphas to control language and speaker balancer (#1216)
* Add alphas to control language and speaker balancer

* Add docs for speaker and language samplers

* Change the Samplers weights to float for save memory

* Change the test_samplers to unittest format

* Add get_sampler method in BaseTTS

* Fix rebase issues

* Add language and speaker samplers support for DDP training

* Rename distributed sampler wrapper

* Remove the DistributedSamplerWrapper and use the one from Trainer

* Bugfix after rebase

* Move the samplers config to tts config
2022-03-10 14:56:09 +01:00
Eren Gölge dd4287de1f Update models 2022-03-03 20:23:00 +01:00
Eren Gölge 1425a023fe Make style and lint 2022-03-02 13:25:35 +01:00
Eren Gölge c68885b3fd Update Vits speaker encoder init 2022-03-02 13:20:23 +01:00
Eren Gölge 942df0fb05 Update vits dataset 2022-03-02 09:14:32 +01:00
Eren Gölge 1e414b3a09 Make stlye 2022-02-25 11:31:56 +01:00
Eren Gölge acc83cd3e6 Update Vits model API 2022-02-25 11:31:56 +01:00
Eren Gölge 83c5ddc5b7 Update imports 2022-02-25 11:31:56 +01:00
Eren Gölge 14c117978d Fix return outputs 2022-02-25 11:31:56 +01:00
Eren Gölge 424d04e4f6 Make stlye 2022-02-25 11:31:56 +01:00
Eren Gölge 00c7600103 Update Vits model API 2022-02-25 11:30:24 +01:00