Commit Graph

4111 Commits

Author SHA1 Message Date
Edresson Casanova d6d8d0e3e1 Fix the VITS GAN loss 2022-06-08 09:52:38 -03:00
Edresson Casanova e07fcc7a8c Add text encoder adversarial loss on the VITS 2022-06-08 09:52:38 -03:00
Edresson Casanova 4e94b46d5e Add end2end VITS loss 2022-06-08 09:52:38 -03:00
Edresson Casanova ec8c8dc5a2 Recreate the prior distribution of Capacitron VAE on the right device 2022-06-08 09:52:38 -03:00
Edresson Casanova a822f21b78 Add prosody encoder inference support 2022-06-08 09:52:38 -03:00
Edresson Casanova 010f847929 Add an option to detach the prosody encoder input 2022-06-08 09:52:38 -03:00
Edresson Casanova 2cac18c7b7 Add VAE prosody encoder 2022-06-08 09:52:37 -03:00
Edresson Casanova f774cf0648 Condition the prosody encoder on z_p 2022-06-08 09:52:37 -03:00
Edresson Casanova 512525cc39 Support prosody conditional model on decoder input 2022-06-08 09:52:37 -03:00
Edresson Casanova 02194367d7 Add emotion classifier loss 2022-06-08 09:52:37 -03:00
Edresson Casanova f50819a5f6 Fix compute embeddings issue 2022-06-08 09:52:37 -03:00
Edresson Casanova a6c8fea192 Add conditional module 2022-06-08 09:52:37 -03:00
Edresson Casanova bce4a41b9c Fix unit tests 2022-06-08 09:52:37 -03:00
Edresson Casanova 0fb1b200c6 Fix rebase issues 2022-06-08 09:52:37 -03:00
Edresson Casanova 98c2834b17 Disable the reversal prosody encoder speaker loss 2022-06-08 09:52:37 -03:00
Edresson Casanova ac3f98cefb Add text encoder reversal speaker classifier loss 2022-06-08 09:52:37 -03:00
Edresson Casanova a543d71352 Clean up old code 2022-06-08 09:52:36 -03:00
Edresson Casanova 66e3f5388e Add prosody encoder params on config 2022-06-08 09:52:36 -03:00
Edresson Casanova 95409be0bc Add Speech style balancer 2022-06-08 09:52:36 -03:00
Edresson Casanova 050f7707e2 Add reversal classifier loss 2022-06-08 09:52:36 -03:00
Edresson Casanova 44ec2ab387 Add prosody encoder training support 2022-06-08 09:52:36 -03:00
Edresson Casanova 6126e5e588 Add emotion embedding in the encoder 2022-06-08 09:52:36 -03:00
Edresson Casanova 1fdef1c4c9 Add formatter for the Emotional Speech Dataset 2022-06-08 09:52:36 -03:00
Edresson Casanova 61a04a7855 Remove useless encoder weights reload 2022-06-08 09:52:36 -03:00
Edresson Casanova 836c4c6801 Fix emotion unit test 2022-06-08 09:52:36 -03:00
Edresson Casanova e8c4417f07 Fix Style tests 2022-06-08 09:52:36 -03:00
Edresson Casanova 730befebcc Fix style tests 2022-06-08 09:52:36 -03:00
Edresson Casanova a8292c7c03 Fix the Bug in Synthesizer 2022-06-08 09:52:36 -03:00
Edresson Casanova e409f3588b Bug fix in single speaker emotion embedding training 2022-06-08 09:52:36 -03:00
Edresson Casanova 6f33506d89 Fix unit tests 2022-06-08 09:52:35 -03:00
Edresson Casanova 7a0eba517f Add emotion external embeddings training unit test 2022-06-08 09:52:35 -03:00
Edresson Casanova 5a10ef27b3 Add emotion consistency loss 2022-06-08 09:52:35 -03:00
Edresson Casanova c54e6ae1e4 Fix the bug in sythesizer 2022-06-08 09:52:35 -03:00
Edresson Casanova bd99548016 Add Emotion Support for the VITS model 2022-06-08 09:52:35 -03:00
Edresson Casanova ad7ce05ac9 Add emotion manager 2022-06-08 09:52:35 -03:00
WeberJulian f09ea11c71
Internal formatter (#1629)
* Add coqui formatter

* Make style
2022-06-08 14:31:03 +02:00
Aya-AlJafari 68cef28a88
Adding TTS Tutorials (#1584)
* Adding inferencing notebook

* added multispeaker explanation and usecase and renamed the file

* Adding training tutorial

* fixed dummy paths

* fixed review comments

* fixed metadata extension

Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-06-02 12:23:00 +02:00
Eren Gölge f70e82cd19
Use fsspec and torch for embedding file IO (#1581)
* Use fsspec and torch for embedding file

* Fixup

* Fix load and save files

* Fix compute embedding script

* Set use_cuda to true if available

* Add dummy speakers.pth file

* Make style

* Change default speakers file extension

Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
2022-06-01 13:49:42 +02:00
Ryan Le-Nguyen b6bd74a9a9
fix invalid json (#1599) 2022-05-31 10:20:10 +02:00
Noran Raskin a790df4e94
Training recipes for thorsten dataset (#1020)
* Fix style

* Fix isort

* Remove tensorboardX from requirements

Co-authored-by: logan hart <72301874+loganhart420@users.noreply.github.com>
Co-authored-by: Eren Gölge <egolge@coqui.ai>
2022-05-30 12:07:31 +02:00
Eren Gölge 71111d14e4
Merge pull request #1587 from ribeiromiranda/patch-1
Fixed use_cuda issue in compute_embeddings.py
2022-05-29 14:51:08 +02:00
André R. de Miranda 3b84ef9524
Fixed use_cuda issue in compute_embeddings.py
Added use_cuda argument in self.init_encoder method
2022-05-20 12:46:46 -03:00
a-froghyar 8be21ec387
Capacitron (#977)
* new CI config

* initial Capacitron implementation

* delete old unused file

* fix empty formatting changes

* update losses and training script

* fix previous commit

* fix commit

* Add Capacitron test and first round of test fixes

* revert formatter change

* add changes to the synthesizer

* add stepwise gradual lr scheduler and changes to the recipe

* add inference script for dev use

* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting

* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model

* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style

* chore: fix linting

* feat: add Tacotron 2 support

* leftover from dev

* chore:rename parser args

* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers

* chore: revert arbitrary trainer changes

* fmt: revert formatting bug

* formatting again

* formatting fixed

* fix: log func

* fix: update optimizer
- Implemented load_state_dict for continuing training

* fix: clean optimizer init for standard models

* improvement: purge espeak flags and add training scripts

* Delete capacitronT2.py

delete old training script, new one is pushed

* feat: capacitron trainer methods
- extracted capacitron specific training  operations from the trainer into custom
methods in taco1 and taco2 models

* chore: renaming and merging capacitron and gst style args

* fix: bug fixes from the previous commit

* fix: implement state_dict method on CapacitronOptimizer

* fix: call method

* fix: inference naming

* Delete train_capacitron.py

* fix: synthesize

* feat: update tests

* chore: fix style

* Delete capacitron_inference.py

* fix: fix train tts t2 capacitron tests

* fix: double forward in T2 train step

* fix: double forward in T1 train step

* fix: run make style

* fix: remove unused import

* fix: test for T1 capacitron

* fix: make lint

* feat: add blizzard2013 recipes

* make style

* fix: update recipes

* chore: make style

* Plot test sentences in Tacotron

* chore: make style and fix import

* fix: call forward first before problematic floordiv op

* fix: update recipes

* feat: add min_audio_len to recipes

* aux_input["style_mel"]

* chore: make style

* Make capacitron T2 recipe more stable

* Remove T1 capacitron Ljspeech

* feat: implement new grad clipping routine and update configs

* make style

* Add pretrained checkpoints

* Add default vocoder

* Change trainer package

* Fix grad clip issue for tacotron

* Fix scheduler issue with tacotron

Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-05-20 16:17:11 +02:00
Edresson Casanova ee99a6c1e2 Fix voice conversion inference (#1583)
* Add voice conversion zoo test

* Fix style

* Fix unit test
2022-05-20 15:50:25 +02:00
Eren Gölge e282da5161 Update CI badges 2022-05-13 14:56:49 +02:00
Edresson Casanova e5d8ec2402
Change the VITS upsampling interpolation trick to linear (#1564) 2022-05-13 10:52:39 +02:00
Edresson Casanova c6008e5235
Add audio length sampler balancer (#1561)
* Add audio length sampler balancer

* Add unit tests
2022-05-12 19:59:19 +02:00
Eren Gölge 6e460b7e42
Add an assert for the upsampling trick (#1538) 2022-05-12 19:55:24 +02:00
Eren Gölge 6048959e24
Add CPU only Docker image (#1573)
Co-authored-by: Reuben Morais <reuben.morais@gmail.com>
2022-05-12 19:33:27 +02:00
Eren Gölge 27cf388a79
Update CI tests (#1572)
* Use direct model URLs in CI

* Fixup

* Fixup
2022-05-12 18:41:01 +02:00