coqui-tts

Commit Graph

Author	SHA1	Message	Date
Edresson Casanova	bd35371944	Add prosody encoder inference support	2022-05-27 16:00:41 -03:00
Edresson Casanova	2568b722dd	Add an option to detach the prosody encoder input	2022-05-27 13:19:06 -03:00
Edresson Casanova	a2aecea8f3	Add VAE prosody encoder	2022-05-27 12:53:56 -03:00
Edresson Casanova	312789edbf	Condition the prosody encoder on z_p	2022-05-26 15:41:24 -03:00
Edresson Casanova	e667fcb057	Support prosody conditional model on decoder input	2022-05-25 17:03:38 -03:00
Edresson Casanova	6c23398518	Add emotion classifier loss	2022-05-25 10:05:52 -03:00
Edresson Casanova	d699416735	Add conditional module	2022-05-23 14:05:17 -03:00
Edresson Casanova	6e4b13c6cc	Fix unit tests	2022-05-23 10:26:45 -03:00
Edresson Casanova	749b217884	Fix rebase issues	2022-05-20 18:29:39 -03:00
Edresson Casanova	1a88191a5a	Disable the reversal prosody encoder speaker loss	2022-05-20 15:18:02 -03:00
Edresson Casanova	8505cd09e8	Add text encoder reversal speaker classifier loss	2022-05-20 15:18:02 -03:00
Edresson Casanova	024e567849	Clean up old code	2022-05-20 15:18:00 -03:00
Edresson Casanova	dbaa71c944	Add prosody encoder params on config	2022-05-20 15:03:16 -03:00
Edresson Casanova	d49c6ab72f	Add reversal classifier loss	2022-05-20 14:56:02 -03:00
Edresson Casanova	004862a79b	Add prosody encoder training support	2022-05-20 14:56:02 -03:00
Edresson Casanova	d9d9415513	Add emotion embedding in the encoder	2022-05-20 14:56:02 -03:00
Edresson Casanova	b2b54668bc	Add formatter for the Emotional Speech Dataset	2022-05-20 14:56:02 -03:00
Edresson Casanova	d2b5db84f0	Remove useless encoder weights reload	2022-05-20 14:56:00 -03:00
Edresson Casanova	01dd4e4051	Fix Style tests	2022-05-20 14:50:52 -03:00
Edresson Casanova	46762ccf35	Fix style tests	2022-05-20 14:37:43 -03:00
Edresson Casanova	39575f2937	Bug fix in single speaker emotion embedding training	2022-05-20 14:35:12 -03:00
Edresson Casanova	a3ecaf3bdd	Add emotion external embeddings training unit test	2022-05-20 14:34:22 -03:00
Edresson Casanova	5e0286c534	Add emotion consistency loss	2022-05-20 14:31:50 -03:00
Edresson Casanova	6f95522edf	Add Emotion Support for the VITS model	2022-05-20 14:31:09 -03:00
Edresson Casanova	ee99a6c1e2	Fix voice conversion inference (#1583 ) * Add voice conversion zoo test * Fix style * Fix unit test	2022-05-20 15:50:25 +02:00
Edresson Casanova	e5d8ec2402	Change the VITS upsampling interpolation trick to linear (#1564 )	2022-05-13 10:52:39 +02:00
Eren Gölge	6e460b7e42	Add an assert for the upsampling trick (#1538 )	2022-05-12 19:55:24 +02:00
Eren Gölge	e45ae57aef	Merge pull request #1550 from coqui-ai/fix-upsampling-asserts Fix VITS upsampling asserts	2022-05-12 14:51:41 +02:00
Edresson Casanova	175ca06388	Add reinit text encoder and duration predictor parameter (#1562 ) * Add reinit encoder and duration predictor option * Add .data to prevent any overlooked autograd hook	2022-05-12 09:08:36 -03:00
Edresson Casanova	182711043c	Fix the VITS upsampling asserts Fix style	2022-05-12 09:08:29 -03:00
Eren Gölge	c18bd21b3f	Return durations at VITS inference	2022-05-11 11:30:05 +02:00
Eren Gölge	5021a03de0	Use torch.no_grad for VITS inference	2022-05-11 11:29:36 +02:00
Eren Gölge	3f03e3012c	Fix batch_group_size in VITS	2022-05-07 13:44:44 +02:00
WeberJulian	fbdf76b2fc	returns y_mask in VITS inference (#1540 ) * returns y_mask * make style	2022-05-03 13:49:24 +02:00
Edresson Casanova	8d228ab22a	Trick to Upsampling to High sampling rates using VITS model (#1456 ) * Add upsample VITS support * Fix the bug in inference * Fix lint checks * Add RMS based norm in save_wav method * Style fix * Add the period for VITS multi-period discriminator in model_args * Bug fix in speaker encoder load in inference time * Add unit tests * Remove useless detach_z_vocoder parameter * Add docs for VITS upsampling * Fix the docs * Rename TTS_part_sample_rate to encoder_sample_rate * Add upsampling_init and upsampling_z methods * Add asserts for encoder_sample_rate part * Move upsampling tests to test_vits.py	2022-04-26 11:47:46 +02:00
Edresson Casanova	060e0f9368	Add EmbeddingManager and BaseIDManager (#1374 )	2022-03-31 13:41:16 +02:00
Edresson Casanova	37896e1743	Bug fix in freeze encoder (#1391 ) * Fix the bug in freeze encoder * Remove emb_l definition for non-multilingual training * Fix unit tests	2022-03-24 18:16:04 +01:00
Eren Gölge	0870a4faa2	Make style (#1405 )	2022-03-16 12:13:55 +01:00
Edresson Casanova	dbe9da7f15	Add Voice conversion inference support (#1337 ) * Add support for voice conversion inference * Cache d_vectors_by_speaker for fast inference using a bigger speakers.json * Rebase bug fix * Use the average d-vector for inference	2022-03-10 14:57:12 +01:00
Edresson Casanova	917f417ac4	Add alphas to control language and speaker balancer (#1216 ) * Add alphas to control language and speaker balancer * Add docs for speaker and language samplers * Change the Samplers weights to float for save memory * Change the test_samplers to unittest format * Add get_sampler method in BaseTTS * Fix rebase issues * Add language and speaker samplers support for DDP training * Rename distributed sampler wrapper * Remove the DistributedSamplerWrapper and use the one from Trainer * Bugfix after rebase * Move the samplers config to tts config	2022-03-10 14:56:09 +01:00
Eren Gölge	dd4287de1f	Update models	2022-03-03 20:23:00 +01:00
Eren Gölge	1425a023fe	Make style and lint	2022-03-02 13:25:35 +01:00
Eren Gölge	c68885b3fd	Update Vits speaker encoder init	2022-03-02 13:20:23 +01:00
Eren Gölge	942df0fb05	Update vits dataset	2022-03-02 09:14:32 +01:00
Eren Gölge	1e414b3a09	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	acc83cd3e6	Update Vits model API	2022-02-25 11:31:56 +01:00
Eren Gölge	83c5ddc5b7	Update imports	2022-02-25 11:31:56 +01:00
Eren Gölge	14c117978d	Fix return outputs	2022-02-25 11:31:56 +01:00
Eren Gölge	424d04e4f6	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	00c7600103	Update Vits model API	2022-02-25 11:30:24 +01:00

1 2 3

135 Commits