Commit Graph

1612 Commits

Author SHA1 Message Date
Edresson Casanova bf45319f64 Add speaker and emotion squeezer layers 2022-06-07 09:27:08 -03:00
Edresson Casanova 0f1dde642f Remove VITS conditional flow module 2022-06-06 19:48:56 -03:00
Edresson Casanova 5bd59a6023 Remove VITS End2End loss 2022-06-06 15:10:00 -03:00
Edresson Casanova 8ea72356a2 Fix Lint checks 2022-06-06 14:59:21 -03:00
Edresson Casanova 0d7f8e24b2 Add Noise scale predictor 2022-06-06 13:22:26 -03:00
Edresson Casanova cbc81b55cb Fix the VITS GAN loss 2022-06-03 13:05:40 +00:00
Edresson Casanova 7f8c12888c Add text encoder adversarial loss on the VITS 2022-06-02 18:40:57 -03:00
Edresson Casanova d9452d7038 Add end2end VITS loss 2022-06-02 13:50:08 -03:00
Edresson Casanova b7080f0f64 Recreate the prior distribution of Capacitron VAE on the right device 2022-05-27 16:41:13 -03:00
Edresson Casanova bd35371944 Add prosody encoder inference support 2022-05-27 16:00:41 -03:00
Edresson Casanova 2568b722dd Add an option to detach the prosody encoder input 2022-05-27 13:19:06 -03:00
Edresson Casanova a2aecea8f3 Add VAE prosody encoder 2022-05-27 12:53:56 -03:00
Edresson Casanova 312789edbf Condition the prosody encoder on z_p 2022-05-26 15:41:24 -03:00
Edresson Casanova e667fcb057 Support prosody conditional model on decoder input 2022-05-25 17:03:38 -03:00
Edresson Casanova 6c23398518 Add emotion classifier loss 2022-05-25 10:05:52 -03:00
Edresson Casanova 5b641ff0e3 Fix compute embeddings issue 2022-05-25 10:05:28 -03:00
Edresson Casanova d699416735 Add conditional module 2022-05-23 14:05:17 -03:00
Edresson Casanova 6e4b13c6cc Fix unit tests 2022-05-23 10:26:45 -03:00
Edresson Casanova 749b217884 Fix rebase issues 2022-05-20 18:29:39 -03:00
Edresson Casanova 1a88191a5a Disable the reversal prosody encoder speaker loss 2022-05-20 15:18:02 -03:00
Edresson Casanova 8505cd09e8 Add text encoder reversal speaker classifier loss 2022-05-20 15:18:02 -03:00
Edresson Casanova 024e567849 Clean up old code 2022-05-20 15:18:00 -03:00
Edresson Casanova dbaa71c944 Add prosody encoder params on config 2022-05-20 15:03:16 -03:00
Edresson Casanova 4107a1ef85 Add Speech style balancer 2022-05-20 14:57:40 -03:00
Edresson Casanova d49c6ab72f Add reversal classifier loss 2022-05-20 14:56:02 -03:00
Edresson Casanova 004862a79b Add prosody encoder training support 2022-05-20 14:56:02 -03:00
Edresson Casanova d9d9415513 Add emotion embedding in the encoder 2022-05-20 14:56:02 -03:00
Edresson Casanova b2b54668bc Add formatter for the Emotional Speech Dataset 2022-05-20 14:56:02 -03:00
Edresson Casanova d2b5db84f0 Remove useless encoder weights reload 2022-05-20 14:56:00 -03:00
Edresson Casanova 721a81b1d8 Fix emotion unit test 2022-05-20 14:51:26 -03:00
Edresson Casanova 01dd4e4051 Fix Style tests 2022-05-20 14:50:52 -03:00
Edresson Casanova 46762ccf35 Fix style tests 2022-05-20 14:37:43 -03:00
Edresson Casanova d8d1775273 Fix the Bug in Synthesizer 2022-05-20 14:36:28 -03:00
Edresson Casanova 39575f2937 Bug fix in single speaker emotion embedding training 2022-05-20 14:35:12 -03:00
Edresson Casanova d1ab3298ba Fix unit tests 2022-05-20 14:35:09 -03:00
Edresson Casanova a3ecaf3bdd Add emotion external embeddings training unit test 2022-05-20 14:34:22 -03:00
Edresson Casanova 5e0286c534 Add emotion consistency loss 2022-05-20 14:31:50 -03:00
Edresson Casanova e5c7ae9f1b Fix the bug in sythesizer 2022-05-20 14:31:12 -03:00
Edresson Casanova 6f95522edf Add Emotion Support for the VITS model 2022-05-20 14:31:09 -03:00
Edresson Casanova d8f5cb2674 Add emotion manager 2022-05-20 14:26:43 -03:00
a-froghyar 8be21ec387
Capacitron (#977)
* new CI config

* initial Capacitron implementation

* delete old unused file

* fix empty formatting changes

* update losses and training script

* fix previous commit

* fix commit

* Add Capacitron test and first round of test fixes

* revert formatter change

* add changes to the synthesizer

* add stepwise gradual lr scheduler and changes to the recipe

* add inference script for dev use

* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting

* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model

* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style

* chore: fix linting

* feat: add Tacotron 2 support

* leftover from dev

* chore:rename parser args

* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers

* chore: revert arbitrary trainer changes

* fmt: revert formatting bug

* formatting again

* formatting fixed

* fix: log func

* fix: update optimizer
- Implemented load_state_dict for continuing training

* fix: clean optimizer init for standard models

* improvement: purge espeak flags and add training scripts

* Delete capacitronT2.py

delete old training script, new one is pushed

* feat: capacitron trainer methods
- extracted capacitron specific training  operations from the trainer into custom
methods in taco1 and taco2 models

* chore: renaming and merging capacitron and gst style args

* fix: bug fixes from the previous commit

* fix: implement state_dict method on CapacitronOptimizer

* fix: call method

* fix: inference naming

* Delete train_capacitron.py

* fix: synthesize

* feat: update tests

* chore: fix style

* Delete capacitron_inference.py

* fix: fix train tts t2 capacitron tests

* fix: double forward in T2 train step

* fix: double forward in T1 train step

* fix: run make style

* fix: remove unused import

* fix: test for T1 capacitron

* fix: make lint

* feat: add blizzard2013 recipes

* make style

* fix: update recipes

* chore: make style

* Plot test sentences in Tacotron

* chore: make style and fix import

* fix: call forward first before problematic floordiv op

* fix: update recipes

* feat: add min_audio_len to recipes

* aux_input["style_mel"]

* chore: make style

* Make capacitron T2 recipe more stable

* Remove T1 capacitron Ljspeech

* feat: implement new grad clipping routine and update configs

* make style

* Add pretrained checkpoints

* Add default vocoder

* Change trainer package

* Fix grad clip issue for tacotron

* Fix scheduler issue with tacotron

Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-05-20 16:17:11 +02:00
Edresson Casanova ee99a6c1e2 Fix voice conversion inference (#1583)
* Add voice conversion zoo test

* Fix style

* Fix unit test
2022-05-20 15:50:25 +02:00
Edresson Casanova e5d8ec2402
Change the VITS upsampling interpolation trick to linear (#1564) 2022-05-13 10:52:39 +02:00
Edresson Casanova c6008e5235
Add audio length sampler balancer (#1561)
* Add audio length sampler balancer

* Add unit tests
2022-05-12 19:59:19 +02:00
Eren Gölge 6e460b7e42
Add an assert for the upsampling trick (#1538) 2022-05-12 19:55:24 +02:00
Eren Gölge 4857967063
🐍 Python 3.10.x support and drop Python 3.6 support (#1565)
* Update requirements

* Update CI for p3.10

* Update numpy requirement

* Drop 🐍p3.6 support

Numpy also dropped support for p3.6

* Bind cython v0.29.28

* Bind pyworld to v0.2.10

> 0.2.10 is not p3.10.x compatible

* Update Dockerfile
2022-05-12 15:50:25 +02:00
Edresson Casanova a97eed696a
Fix the bug in eSpeak wrapper for eSpeak version 1.48.15 (#1560) 2022-05-12 15:15:18 +02:00
Eren Gölge e45ae57aef
Merge pull request #1550 from coqui-ai/fix-upsampling-asserts
Fix VITS upsampling asserts
2022-05-12 14:51:41 +02:00
Edresson Casanova 175ca06388 Add reinit text encoder and duration predictor parameter (#1562)
* Add reinit encoder and duration predictor option

* Add .data to prevent any overlooked autograd hook
2022-05-12 09:08:36 -03:00
Edresson Casanova 182711043c Fix the VITS upsampling asserts
Fix style
2022-05-12 09:08:29 -03:00