coqui-tts

Commit Graph

Author	SHA1	Message	Date
Edresson Casanova	bf45319f64	Add speaker and emotion squeezer layers	2022-06-07 09:27:08 -03:00
Edresson Casanova	0f1dde642f	Remove VITS conditional flow module	2022-06-06 19:48:56 -03:00
Edresson Casanova	5bd59a6023	Remove VITS End2End loss	2022-06-06 15:10:00 -03:00
Edresson Casanova	8ea72356a2	Fix Lint checks	2022-06-06 14:59:21 -03:00
Edresson Casanova	0d7f8e24b2	Add Noise scale predictor	2022-06-06 13:22:26 -03:00
Edresson Casanova	cbc81b55cb	Fix the VITS GAN loss	2022-06-03 13:05:40 +00:00
Edresson Casanova	7f8c12888c	Add text encoder adversarial loss on the VITS	2022-06-02 18:40:57 -03:00
Edresson Casanova	d9452d7038	Add end2end VITS loss	2022-06-02 13:50:08 -03:00
Edresson Casanova	b7080f0f64	Recreate the prior distribution of Capacitron VAE on the right device	2022-05-27 16:41:13 -03:00
Edresson Casanova	bd35371944	Add prosody encoder inference support	2022-05-27 16:00:41 -03:00
Edresson Casanova	2568b722dd	Add an option to detach the prosody encoder input	2022-05-27 13:19:06 -03:00
Edresson Casanova	a2aecea8f3	Add VAE prosody encoder	2022-05-27 12:53:56 -03:00
Edresson Casanova	312789edbf	Condition the prosody encoder on z_p	2022-05-26 15:41:24 -03:00
Edresson Casanova	e667fcb057	Support prosody conditional model on decoder input	2022-05-25 17:03:38 -03:00
Edresson Casanova	6c23398518	Add emotion classifier loss	2022-05-25 10:05:52 -03:00
Edresson Casanova	5b641ff0e3	Fix compute embeddings issue	2022-05-25 10:05:28 -03:00
Edresson Casanova	d699416735	Add conditional module	2022-05-23 14:05:17 -03:00
Edresson Casanova	6e4b13c6cc	Fix unit tests	2022-05-23 10:26:45 -03:00
Edresson Casanova	749b217884	Fix rebase issues	2022-05-20 18:29:39 -03:00
Edresson Casanova	1a88191a5a	Disable the reversal prosody encoder speaker loss	2022-05-20 15:18:02 -03:00
Edresson Casanova	8505cd09e8	Add text encoder reversal speaker classifier loss	2022-05-20 15:18:02 -03:00
Edresson Casanova	024e567849	Clean up old code	2022-05-20 15:18:00 -03:00
Edresson Casanova	dbaa71c944	Add prosody encoder params on config	2022-05-20 15:03:16 -03:00
Edresson Casanova	4107a1ef85	Add Speech style balancer	2022-05-20 14:57:40 -03:00
Edresson Casanova	d49c6ab72f	Add reversal classifier loss	2022-05-20 14:56:02 -03:00
Edresson Casanova	004862a79b	Add prosody encoder training support	2022-05-20 14:56:02 -03:00
Edresson Casanova	d9d9415513	Add emotion embedding in the encoder	2022-05-20 14:56:02 -03:00
Edresson Casanova	b2b54668bc	Add formatter for the Emotional Speech Dataset	2022-05-20 14:56:02 -03:00
Edresson Casanova	d2b5db84f0	Remove useless encoder weights reload	2022-05-20 14:56:00 -03:00
Edresson Casanova	721a81b1d8	Fix emotion unit test	2022-05-20 14:51:26 -03:00
Edresson Casanova	01dd4e4051	Fix Style tests	2022-05-20 14:50:52 -03:00
Edresson Casanova	46762ccf35	Fix style tests	2022-05-20 14:37:43 -03:00
Edresson Casanova	d8d1775273	Fix the Bug in Synthesizer	2022-05-20 14:36:28 -03:00
Edresson Casanova	39575f2937	Bug fix in single speaker emotion embedding training	2022-05-20 14:35:12 -03:00
Edresson Casanova	d1ab3298ba	Fix unit tests	2022-05-20 14:35:09 -03:00
Edresson Casanova	a3ecaf3bdd	Add emotion external embeddings training unit test	2022-05-20 14:34:22 -03:00
Edresson Casanova	5e0286c534	Add emotion consistency loss	2022-05-20 14:31:50 -03:00
Edresson Casanova	e5c7ae9f1b	Fix the bug in sythesizer	2022-05-20 14:31:12 -03:00
Edresson Casanova	6f95522edf	Add Emotion Support for the VITS model	2022-05-20 14:31:09 -03:00
Edresson Casanova	d8f5cb2674	Add emotion manager	2022-05-20 14:26:43 -03:00
a-froghyar	8be21ec387	Capacitron (#977 ) * new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-05-20 16:17:11 +02:00
Edresson Casanova	ee99a6c1e2	Fix voice conversion inference (#1583 ) * Add voice conversion zoo test * Fix style * Fix unit test	2022-05-20 15:50:25 +02:00
Edresson Casanova	e5d8ec2402	Change the VITS upsampling interpolation trick to linear (#1564 )	2022-05-13 10:52:39 +02:00
Edresson Casanova	c6008e5235	Add audio length sampler balancer (#1561 ) * Add audio length sampler balancer * Add unit tests	2022-05-12 19:59:19 +02:00
Eren Gölge	6e460b7e42	Add an assert for the upsampling trick (#1538 )	2022-05-12 19:55:24 +02:00
Eren Gölge	4857967063	🐍 Python 3.10.x support and drop Python 3.6 support (#1565 ) * Update requirements * Update CI for p3.10 * Update numpy requirement * Drop 🐍p3.6 support Numpy also dropped support for p3.6 * Bind cython v0.29.28 * Bind pyworld to v0.2.10 > 0.2.10 is not p3.10.x compatible * Update Dockerfile	2022-05-12 15:50:25 +02:00
Edresson Casanova	a97eed696a	Fix the bug in eSpeak wrapper for eSpeak version 1.48.15 (#1560 )	2022-05-12 15:15:18 +02:00
Eren Gölge	e45ae57aef	Merge pull request #1550 from coqui-ai/fix-upsampling-asserts Fix VITS upsampling asserts	2022-05-12 14:51:41 +02:00
Edresson Casanova	175ca06388	Add reinit text encoder and duration predictor parameter (#1562 ) * Add reinit encoder and duration predictor option * Add .data to prevent any overlooked autograd hook	2022-05-12 09:08:36 -03:00
Edresson Casanova	182711043c	Fix the VITS upsampling asserts Fix style	2022-05-12 09:08:29 -03:00

1 2 3 4 5 ...

1612 Commits