Commit Graph

4090 Commits

Author SHA1 Message Date
Eren Gölge edd59c81e8 Update ForwardTTSe2e tests 2022-05-17 13:46:05 +02:00
Eren Gölge 0b585b46c1 Refactor TTSDataset to use numpy transforms 2022-05-17 13:44:01 +02:00
Eren Gölge 4171f4e9c6 Update ForwardTTSE2eLoss 2022-05-17 13:44:01 +02:00
Eren Gölge dbe5eb992e Make AP optional in BaseTTS 2022-05-17 13:44:01 +02:00
Eren Gölge 6a53b77a95 Add numpy and torch transforms 2022-05-17 13:44:01 +02:00
Eren Gölge c3fb49bf76 Refactor ForwardTTS to skip decoder 2022-05-17 13:44:01 +02:00
Eren Gölge cc57c20162 Make plot results more general 2022-05-17 13:44:01 +02:00
Eren Gölge e7c5db0d97 Add missing kernel size attr to transformer layer 2022-05-17 13:44:01 +02:00
Eren Gölge 231c69b12e Remove AP from FastPitchE2e 2022-05-17 13:44:01 +02:00
Eren Gölge 4556c61902 Update fastpitche2e recipe 2022-05-17 13:44:01 +02:00
Eren Gölge 5f9d559419 Update import statements 2022-05-17 13:44:01 +02:00
Eren Gölge 9f8d86b716 Remove redundancy 2022-05-17 13:42:09 +02:00
Eren Gölge 0738cb0efe Fix Vocoder logging 2022-05-17 13:42:09 +02:00
Eren Gölge 760f045aaa Rename vars in VITS 2022-05-17 13:42:09 +02:00
Eren Gölge 775a6ab6ee Add cond layer in decoder 2022-05-17 13:38:53 +02:00
Eren Gölge 28a53c7462 Refactor multi-speaker init in ForwardTTS 2022-05-17 13:38:53 +02:00
Eren Gölge c125024da0 Implement BaseTTSE2E 2022-05-17 13:38:53 +02:00
Eren Gölge b16613c5ad Implement ForwardTTSE2E Loss 2022-05-17 13:38:53 +02:00
Eren Gölge aea8cb7668 Implement FastPitchE2E LJSpeech recipe 2022-05-17 13:38:53 +02:00
Eren Gölge 2a61b8fdaf Implement ForwardTTSE2E tests 2022-05-17 13:38:53 +02:00
Eren Gölge 85731482e1 Implement FastPitchE2EConfig 2022-05-17 13:38:53 +02:00
Eren Gölge fccda5ae7b Implement ForwardTTSE2Eg 2022-05-17 13:38:53 +02:00
Eren Gölge f237e4ccd9
Merge pull request #1574 from coqui-ai/update_badge
Update CI badges
2022-05-13 14:58:05 +02:00
Eren Gölge e282da5161 Update CI badges 2022-05-13 14:56:49 +02:00
Edresson Casanova e5d8ec2402
Change the VITS upsampling interpolation trick to linear (#1564) 2022-05-13 10:52:39 +02:00
Edresson Casanova c6008e5235
Add audio length sampler balancer (#1561)
* Add audio length sampler balancer

* Add unit tests
2022-05-12 19:59:19 +02:00
Eren Gölge 6e460b7e42
Add an assert for the upsampling trick (#1538) 2022-05-12 19:55:24 +02:00
Eren Gölge 6048959e24
Add CPU only Docker image (#1573)
Co-authored-by: Reuben Morais <reuben.morais@gmail.com>
2022-05-12 19:33:27 +02:00
Eren Gölge 27cf388a79
Update CI tests (#1572)
* Use direct model URLs in CI

* Fixup

* Fixup
2022-05-12 18:41:01 +02:00
Eren Gölge 4857967063
🐍 Python 3.10.x support and drop Python 3.6 support (#1565)
* Update requirements

* Update CI for p3.10

* Update numpy requirement

* Drop 🐍p3.6 support

Numpy also dropped support for p3.6

* Bind cython v0.29.28

* Bind pyworld to v0.2.10

> 0.2.10 is not p3.10.x compatible

* Update Dockerfile
2022-05-12 15:50:25 +02:00
Edresson Casanova a97eed696a
Fix the bug in eSpeak wrapper for eSpeak version 1.48.15 (#1560) 2022-05-12 15:15:18 +02:00
Eren Gölge e45ae57aef
Merge pull request #1550 from coqui-ai/fix-upsampling-asserts
Fix VITS upsampling asserts
2022-05-12 14:51:41 +02:00
Edresson Casanova 175ca06388 Add reinit text encoder and duration predictor parameter (#1562)
* Add reinit encoder and duration predictor option

* Add .data to prevent any overlooked autograd hook
2022-05-12 09:08:36 -03:00
Edresson Casanova 182711043c Fix the VITS upsampling asserts
Fix style
2022-05-12 09:08:29 -03:00
Taras Sereda f9d91a55f2
Improve data_path resolvement (#1567) 2022-05-12 13:10:35 +02:00
Eren Gölge 2fc38f67d2 Update SpeakerManager init in Synthesizer 2022-05-11 11:32:27 +02:00
Eren Gölge c3f8c4d5eb Return default SpeakerManager if no d_vector_file 2022-05-11 11:31:45 +02:00
Eren Gölge 121e9ed685 Pass use_cuda to init_encoder 2022-05-11 11:31:17 +02:00
Eren Gölge c18bd21b3f Return durations at VITS inference 2022-05-11 11:30:05 +02:00
Eren Gölge 5021a03de0 Use torch.no_grad for VITS inference 2022-05-11 11:29:36 +02:00
Eren Gölge 3f03e3012c Fix batch_group_size in VITS 2022-05-07 13:44:44 +02:00
code-review-doctor fa887ef5f9
Fix issue probably-meant-fstring found at https://codereview.doctor (#1532) 2022-05-07 13:33:40 +02:00
Arvind Suresh a34076af35 Update documentation for multi-gpu training 2022-05-07 13:30:03 +02:00
Eren Gölge a0a9279e4b Fix GAN optimizer order
commit 212d330929
Author: Edresson Casanova <edresson1@gmail.com>
Date:   Fri Apr 29 16:29:44 2022 -0300

    Fix unit test

commit 44456b0483
Author: Edresson Casanova <edresson1@gmail.com>
Date:   Fri Apr 29 07:28:39 2022 -0300

    Fix style

commit d545beadb9
Author: Edresson Casanova <edresson1@gmail.com>
Date:   Thu Apr 28 17:08:04 2022 -0300

    Change order of HIFI-GAN optimizers to be equal than the original repository

commit 657c5442e5
Author: Edresson Casanova <edresson1@gmail.com>
Date:   Thu Apr 28 15:40:16 2022 -0300

    Remove audio padding before mel spec extraction

commit 76b274e690
Merge: 379ccd7b 6233f4fc
Author: Edresson Casanova <edresson1@gmail.com>
Date:   Wed Apr 27 07:28:48 2022 -0300

    Merge pull request #1541 from coqui-ai/comp_emb_fix

    Bug fix in compute embedding without eval partition

commit 379ccd7ba6
Author: WeberJulian <julian.weber@hotmail.fr>
Date:   Wed Apr 27 10:42:26 2022 +0200

    returns y_mask in VITS inference (#1540)

    * returns y_mask

    * make style
2022-05-07 13:29:11 +02:00
Edresson Casanova 60034674f9 Remove audio padding before mel spec extraction 2022-05-07 13:12:09 +02:00
WeberJulian fbdf76b2fc returns y_mask in VITS inference (#1540)
* returns y_mask

* make style
2022-05-03 13:49:24 +02:00
Edresson Casanova 6233f4fcd7 Bug fix in compute embedding without eval partition 2022-04-26 13:58:03 -03:00
Edresson Casanova a41e860a66
Update Coqpit requirement (#1539) 2022-04-26 17:39:36 +02:00
Edresson Casanova 8d228ab22a
Trick to Upsampling to High sampling rates using VITS model (#1456)
* Add upsample VITS support

* Fix the bug in inference

* Fix lint checks

* Add RMS based norm in save_wav method

* Style fix

* Add the period for VITS multi-period discriminator in model_args

* Bug fix in speaker encoder load in inference time

* Add unit tests

* Remove useless detach_z_vocoder parameter

* Add docs for VITS upsampling

* Fix the docs

* Rename TTS_part_sample_rate to encoder_sample_rate

* Add upsampling_init and upsampling_z methods

* Add asserts for encoder_sample_rate part

* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Eren Gölge c410bc58ef Bump to v0.6.2 2022-04-20 11:46:26 +02:00