Edresson Casanova
6186da855f
Bug fix on pre-compute F0
2022-06-16 19:38:11 +00:00
Edresson Casanova
6a573065f4
Add pitch predictor
2022-06-16 19:34:54 +00:00
Edresson Casanova
92e7391a5d
Add speaker embedding on prosody encoder
2022-06-16 19:06:48 +00:00
Edresson Casanova
856e185641
Add Resnet prosody encoder support
2022-06-13 13:47:22 +00:00
Edresson Casanova
0844d9225d
Fix unit tests
2022-06-08 10:18:19 -03:00
Edresson Casanova
4b59f07946
Support the use of speaker embedding as emotion embedding
2022-06-08 09:52:39 -03:00
Edresson Casanova
360b969c23
Fix rebase issues
2022-06-08 09:52:39 -03:00
Edresson Casanova
e069985f17
Add speaker and emotion squeezer layers
2022-06-08 09:52:39 -03:00
Edresson Casanova
a309edacb4
Remove VITS conditional flow module
2022-06-08 09:52:39 -03:00
Edresson Casanova
a1d0088087
Remove VITS End2End loss
2022-06-08 09:52:38 -03:00
Edresson Casanova
ae55bdae6c
Fix Lint checks
2022-06-08 09:52:38 -03:00
Edresson Casanova
fd1036f4ba
Add Noise scale predictor
2022-06-08 09:52:38 -03:00
Edresson Casanova
d6d8d0e3e1
Fix the VITS GAN loss
2022-06-08 09:52:38 -03:00
Edresson Casanova
e07fcc7a8c
Add text encoder adversarial loss on the VITS
2022-06-08 09:52:38 -03:00
Edresson Casanova
4e94b46d5e
Add end2end VITS loss
2022-06-08 09:52:38 -03:00
Edresson Casanova
a822f21b78
Add prosody encoder inference support
2022-06-08 09:52:38 -03:00
Edresson Casanova
010f847929
Add an option to detach the prosody encoder input
2022-06-08 09:52:38 -03:00
Edresson Casanova
2cac18c7b7
Add VAE prosody encoder
2022-06-08 09:52:37 -03:00
Edresson Casanova
f774cf0648
Condition the prosody encoder on z_p
2022-06-08 09:52:37 -03:00
Edresson Casanova
512525cc39
Support prosody conditional model on decoder input
2022-06-08 09:52:37 -03:00
Edresson Casanova
02194367d7
Add emotion classifier loss
2022-06-08 09:52:37 -03:00
Edresson Casanova
a6c8fea192
Add conditional module
2022-06-08 09:52:37 -03:00
Edresson Casanova
bce4a41b9c
Fix unit tests
2022-06-08 09:52:37 -03:00
Edresson Casanova
0fb1b200c6
Fix rebase issues
2022-06-08 09:52:37 -03:00
Edresson Casanova
98c2834b17
Disable the reversal prosody encoder speaker loss
2022-06-08 09:52:37 -03:00
Edresson Casanova
ac3f98cefb
Add text encoder reversal speaker classifier loss
2022-06-08 09:52:37 -03:00
Edresson Casanova
a543d71352
Clean up old code
2022-06-08 09:52:36 -03:00
Edresson Casanova
66e3f5388e
Add prosody encoder params on config
2022-06-08 09:52:36 -03:00
Edresson Casanova
050f7707e2
Add reversal classifier loss
2022-06-08 09:52:36 -03:00
Edresson Casanova
44ec2ab387
Add prosody encoder training support
2022-06-08 09:52:36 -03:00
Edresson Casanova
6126e5e588
Add emotion embedding in the encoder
2022-06-08 09:52:36 -03:00
Edresson Casanova
1fdef1c4c9
Add formatter for the Emotional Speech Dataset
2022-06-08 09:52:36 -03:00
Edresson Casanova
61a04a7855
Remove useless encoder weights reload
2022-06-08 09:52:36 -03:00
Edresson Casanova
e8c4417f07
Fix Style tests
2022-06-08 09:52:36 -03:00
Edresson Casanova
730befebcc
Fix style tests
2022-06-08 09:52:36 -03:00
Edresson Casanova
e409f3588b
Bug fix in single speaker emotion embedding training
2022-06-08 09:52:36 -03:00
Edresson Casanova
7a0eba517f
Add emotion external embeddings training unit test
2022-06-08 09:52:35 -03:00
Edresson Casanova
5a10ef27b3
Add emotion consistency loss
2022-06-08 09:52:35 -03:00
Edresson Casanova
bd99548016
Add Emotion Support for the VITS model
2022-06-08 09:52:35 -03:00
Edresson Casanova
ee99a6c1e2
Fix voice conversion inference ( #1583 )
...
* Add voice conversion zoo test
* Fix style
* Fix unit test
2022-05-20 15:50:25 +02:00
Edresson Casanova
e5d8ec2402
Change the VITS upsampling interpolation trick to linear ( #1564 )
2022-05-13 10:52:39 +02:00
Eren Gölge
6e460b7e42
Add an assert for the upsampling trick ( #1538 )
2022-05-12 19:55:24 +02:00
Eren Gölge
e45ae57aef
Merge pull request #1550 from coqui-ai/fix-upsampling-asserts
...
Fix VITS upsampling asserts
2022-05-12 14:51:41 +02:00
Edresson Casanova
175ca06388
Add reinit text encoder and duration predictor parameter ( #1562 )
...
* Add reinit encoder and duration predictor option
* Add .data to prevent any overlooked autograd hook
2022-05-12 09:08:36 -03:00
Edresson Casanova
182711043c
Fix the VITS upsampling asserts
...
Fix style
2022-05-12 09:08:29 -03:00
Eren Gölge
c18bd21b3f
Return durations at VITS inference
2022-05-11 11:30:05 +02:00
Eren Gölge
5021a03de0
Use torch.no_grad for VITS inference
2022-05-11 11:29:36 +02:00
Eren Gölge
3f03e3012c
Fix batch_group_size in VITS
2022-05-07 13:44:44 +02:00
WeberJulian
fbdf76b2fc
returns y_mask in VITS inference ( #1540 )
...
* returns y_mask
* make style
2022-05-03 13:49:24 +02:00
Edresson Casanova
8d228ab22a
Trick to Upsampling to High sampling rates using VITS model ( #1456 )
...
* Add upsample VITS support
* Fix the bug in inference
* Fix lint checks
* Add RMS based norm in save_wav method
* Style fix
* Add the period for VITS multi-period discriminator in model_args
* Bug fix in speaker encoder load in inference time
* Add unit tests
* Remove useless detach_z_vocoder parameter
* Add docs for VITS upsampling
* Fix the docs
* Rename TTS_part_sample_rate to encoder_sample_rate
* Add upsampling_init and upsampling_z methods
* Add asserts for encoder_sample_rate part
* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00