Commit Graph

1658 Commits

Author SHA1 Message Date
Eren Gölge 5021a03de0 Use torch.no_grad for VITS inference 2022-05-11 11:29:36 +02:00
Eren Gölge 3f03e3012c Fix batch_group_size in VITS 2022-05-07 13:44:44 +02:00
code-review-doctor fa887ef5f9
Fix issue probably-meant-fstring found at https://codereview.doctor (#1532) 2022-05-07 13:33:40 +02:00
Eren Gölge a0a9279e4b Fix GAN optimizer order
commit 212d330929
Author: Edresson Casanova <edresson1@gmail.com>
Date:   Fri Apr 29 16:29:44 2022 -0300

    Fix unit test

commit 44456b0483
Author: Edresson Casanova <edresson1@gmail.com>
Date:   Fri Apr 29 07:28:39 2022 -0300

    Fix style

commit d545beadb9
Author: Edresson Casanova <edresson1@gmail.com>
Date:   Thu Apr 28 17:08:04 2022 -0300

    Change order of HIFI-GAN optimizers to be equal than the original repository

commit 657c5442e5
Author: Edresson Casanova <edresson1@gmail.com>
Date:   Thu Apr 28 15:40:16 2022 -0300

    Remove audio padding before mel spec extraction

commit 76b274e690
Merge: 379ccd7b 6233f4fc
Author: Edresson Casanova <edresson1@gmail.com>
Date:   Wed Apr 27 07:28:48 2022 -0300

    Merge pull request #1541 from coqui-ai/comp_emb_fix

    Bug fix in compute embedding without eval partition

commit 379ccd7ba6
Author: WeberJulian <julian.weber@hotmail.fr>
Date:   Wed Apr 27 10:42:26 2022 +0200

    returns y_mask in VITS inference (#1540)

    * returns y_mask

    * make style
2022-05-07 13:29:11 +02:00
Edresson Casanova 60034674f9 Remove audio padding before mel spec extraction 2022-05-07 13:12:09 +02:00
WeberJulian fbdf76b2fc returns y_mask in VITS inference (#1540)
* returns y_mask

* make style
2022-05-03 13:49:24 +02:00
Edresson Casanova 6233f4fcd7 Bug fix in compute embedding without eval partition 2022-04-26 13:58:03 -03:00
Edresson Casanova 8d228ab22a
Trick to Upsampling to High sampling rates using VITS model (#1456)
* Add upsample VITS support

* Fix the bug in inference

* Fix lint checks

* Add RMS based norm in save_wav method

* Style fix

* Add the period for VITS multi-period discriminator in model_args

* Bug fix in speaker encoder load in inference time

* Add unit tests

* Remove useless detach_z_vocoder parameter

* Add docs for VITS upsampling

* Fix the docs

* Rename TTS_part_sample_rate to encoder_sample_rate

* Add upsampling_init and upsampling_z methods

* Add asserts for encoder_sample_rate part

* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Eren Gölge c410bc58ef Bump to v0.6.2 2022-04-20 11:46:26 +02:00
WeberJulian 30bea7d53c
Update manage.py (#1514) 2022-04-19 14:27:32 +02:00
Eren Gölge 7133f8f47d
Print Model's license when downloading (#1512)
* Print model license while downloading

* Make style

* Add a new license link

* Make style
2022-04-19 14:18:49 +02:00
WeberJulian 4953636b14
Add African models (#1511)
* Add african models

* Set default license for all models
2022-04-19 14:18:30 +02:00
Edresson Casanova 060e0f9368
Add EmbeddingManager and BaseIDManager (#1374) 2022-03-31 13:41:16 +02:00
WeberJulian 1b22f03e98
Fix G2P backend of the released models (#1461)
* Fix enforce phonemizer

* Add new models

* Fix .model.json
2022-03-30 12:47:11 +02:00
WeberJulian c66a6241fd
Enforce phonemizer definition for synthesis (#1441)
* Enforce phonemizer definition for synthesis

* Fix train_tts, tokenizer init can now edit config

* Add small change to trigger CI pipeline

* fix wrong output path for one tts_test

* Fix style

* Test config overides by args and tokenizer

* Fix style
2022-03-25 23:15:33 +01:00
Edresson Casanova 37896e1743
Bug fix in freeze encoder (#1391)
* Fix the bug in freeze encoder

* Remove emb_l definition for non-multilingual training

* Fix unit tests
2022-03-24 18:16:04 +01:00
Edresson Casanova 3435bc8fca Fix style tests 2022-03-23 15:05:32 -03:00
Edresson Casanova 0ae1e0248c Fix the bug for emptly audio files 2022-03-23 14:39:31 -03:00
Edresson Casanova ea53d6feb3 Replace webrtcvad by silero-vad 2022-03-23 14:39:31 -03:00
Eren Gölge 3af01cfe3b
Update base model wrt 👟 (#1406) 2022-03-23 17:24:20 +01:00
Eren Gölge 1c3623af33
Fix model manager (#1436)
* Fix manager

* Make style
2022-03-23 12:57:14 +01:00
Eren Gölge 72d85e53c9
Update model file extension (#1422)
* Update model file ext to ```.pth```

* Update docs

* Rename more

* Find model files
2022-03-22 17:55:00 +01:00
Eren Gölge fd56fabb21
Fix #1380 (#1409) 2022-03-16 12:38:27 +01:00
Eren Gölge 0870a4faa2
Make style (#1405) 2022-03-16 12:13:55 +01:00
WeberJulian 690c96ed28
Fix default phonemizer for ja and zh (#1399) 2022-03-16 12:13:22 +01:00
Edresson Casanova f81892483d
REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349)
* Rename Speaker encoder module to encoder

* Add a generic emotion dataset formatter

* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config

* Add class map in emotion config

* Add Base encoder config

* Add evaluation encoder script

* Fix the bug in plot_embeddings

* Enable Weight decay for encoder training

* Add argumnet to disable storage

* Add Perfect Sampler and remove storage

* Add evaluation during encoder training

* Fix lint checks

* Remove useless config parameter

* Active evaluation in speaker encoder test and use multispeaker dataset for this test

* Unit tests fixs

* Remove useless tests for speedup the aux_tests

* Use get_optimizer in Encoder

* Add BaseEncoder Class

* Fix the unitests

* Add Perfect Batch Sampler unit test

* Add compute encoder accuracy in a function
2022-03-11 14:43:40 +01:00
Edresson Casanova 36e9ea2f97
Open bible dataset formatter (#1365)
* Add support for voice conversion inference

* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json

* Rebase bug fix

* Use the average d-vector for inference

* Fix the bug in find unique chars script

* Add OpenBible formatter

Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-03-11 10:43:31 +01:00
Edresson Casanova dbe9da7f15
Add Voice conversion inference support (#1337)
* Add support for voice conversion inference

* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json

* Rebase bug fix

* Use the average d-vector for inference
2022-03-10 14:57:12 +01:00
Edresson Casanova 917f417ac4
Add alphas to control language and speaker balancer (#1216)
* Add alphas to control language and speaker balancer

* Add docs for speaker and language samplers

* Change the Samplers weights to float for save memory

* Change the test_samplers to unittest format

* Add get_sampler method in BaseTTS

* Fix rebase issues

* Add language and speaker samplers support for DDP training

* Rename distributed sampler wrapper

* Remove the DistributedSamplerWrapper and use the one from Trainer

* Bugfix after rebase

* Move the samplers config to tts config
2022-03-10 14:56:09 +01:00
Edresson Casanova f381e29b91
REBASED: Add support for the speaker encoder training using torch spectrograms (#1348)
* Add support for the speaker encoder training using torch spectrograms

* Remove useless function in speaker encoder dataset class
2022-03-10 14:54:51 +01:00
Eren Gölge c670365507 Fix VCTK recipe and formatter 2022-03-08 14:20:34 +01:00
Eren Gölge 8feb41d361 Bump up to v0.6.1 2022-03-07 15:57:44 +01:00
Eren Gölge ee02bc3823 Bump up to v0.6.0 2022-03-07 12:08:22 +01:00
Eren Gölge dc280819be Add new models 2022-03-07 12:08:09 +01:00
Eren Gölge e9d9028b4d Revert cleaner name 2022-03-06 12:57:06 +01:00
Eren Gölge 764c7fa4a4 Rename phoneme_cleaners 2022-03-06 12:09:54 +01:00
Eren Gölge dd4287de1f Update models 2022-03-03 20:23:00 +01:00
Eren Gölge 6cb00be795 Update your_tts model URL 2022-03-02 18:04:49 +01:00
Eren Gölge 1425a023fe Make style and lint 2022-03-02 13:25:35 +01:00
Eren Gölge c68885b3fd Update Vits speaker encoder init 2022-03-02 13:20:23 +01:00
Eren Gölge 27b67b7945 Fix import 2022-03-02 09:15:20 +01:00
Eren Gölge 942df0fb05 Update vits dataset 2022-03-02 09:14:32 +01:00
Eren Gölge 6a9f8074f0 Fix TTSDataset 2022-03-01 07:57:48 +01:00
Eren Gölge 690de1ab06 Update Characters and add more tests 2022-02-25 11:32:44 +01:00
Eren Gölge 9063397892 Fix FastSpeech config 2022-02-25 11:31:56 +01:00
Eren Gölge 1e414b3a09 Make stlye 2022-02-25 11:31:56 +01:00
Eren Gölge acc83cd3e6 Update Vits model API 2022-02-25 11:31:56 +01:00
Eren Gölge fe656659be Implement BaseTTS 2022-02-25 11:31:56 +01:00
Eren Gölge bed4afd4ee Implement BaseVocabulary 2022-02-25 11:31:56 +01:00
Eren Gölge e0f9be76c0 Update test_run in wavernn and wavegrad 2022-02-25 11:31:56 +01:00
Eren Gölge bf540f4323 Update imports for trainer 2022-02-25 11:31:56 +01:00
Eren Gölge 4c43eda414 Update BaseTrainerModel 2022-02-25 11:31:56 +01:00
Eren Gölge 83c5ddc5b7 Update imports 2022-02-25 11:31:56 +01:00
Eren Gölge 14c117978d Fix return outputs 2022-02-25 11:31:56 +01:00
Eren Gölge 424d04e4f6 Make stlye 2022-02-25 11:31:56 +01:00
Eren Gölge 8b3ba02c95 Add vocab_dict to model config 2022-02-25 11:31:20 +01:00
Eren Gölge ff23dce081 Update TTSDataset 2022-02-25 11:31:20 +01:00
Eren Gölge 750903d2ba Add VCTK formatter docstring 2022-02-25 11:30:24 +01:00
Eren Gölge 52a7896668 Update VITS loss 2022-02-25 11:30:24 +01:00
Eren Gölge c68962c574 Update forward tts binary loss 2022-02-25 11:30:24 +01:00
Eren Gölge c11944022d Revert back again rand_segment 2022-02-25 11:30:24 +01:00
Eren Gölge 00c7600103 Update Vits model API 2022-02-25 11:30:24 +01:00
Eren Gölge 935a604046 Delete trainer_utils 2022-02-25 11:29:41 +01:00
Eren Gölge d0c27a9661 Update synthesis.py 2022-02-25 11:29:41 +01:00
Eren Gölge 35fc7270ff Implement BaseTTS 2022-02-25 11:28:47 +01:00
Eren Gölge 2bad098625 Implement BaseVocabulary 2022-02-25 11:28:47 +01:00
Eren Gölge 833de62e30 Update base_vocoder 2022-02-25 11:28:14 +01:00
Eren Gölge fc3b6d2861 Update gan 2022-02-25 11:28:14 +01:00
Eren Gölge 20a677c623 Update test_run in wavernn and wavegrad 2022-02-25 11:28:14 +01:00
Eren Gölge be3a03126a Update imports for trainer 2022-02-25 11:28:14 +01:00
Eren Gölge c911729896 Update BaseTrainerModel 2022-02-25 11:28:14 +01:00
Eren Gölge 1e219fef0a Revert drop_last 2022-02-25 11:26:59 +01:00
Eren Gölge 7dfd753d91 Add a cheap trick to avoid short audio clips 2022-02-25 11:26:59 +01:00
Eren Gölge 1a43e05460 Fix VITS loss bug
Fake and real features were given in the wrong args order to
the loss function
2022-02-25 11:26:59 +01:00
Eren Gölge 4b96bfe925 Fix train logging 2022-02-25 11:26:59 +01:00
Eren Gölge ab8a4ca2c3 Revert random segment 2022-02-25 11:26:59 +01:00
Eren Gölge 8622226f3f Make style 2022-02-25 11:26:59 +01:00
Eren Gölge 27db089d6c Change TrainingArgs -> TrainerArgs 2022-02-25 11:26:59 +01:00
Eren Gölge aa81454721 Update BaseTrainingConfig 2022-02-25 11:26:59 +01:00
Eren Gölge d3a58ed07a Fix default values 2022-02-25 11:26:59 +01:00
Eren Gölge 54c6bb2a8c Fix add speaker VITS 2022-02-25 11:26:59 +01:00
Eren Gölge 590b04fb89 Fix espeak_wrapper 2022-02-25 11:26:59 +01:00
Eren Gölge a013566d15 Delete trainer related code 2022-02-25 11:26:59 +01:00
Eren Gölge 38314194e7 Set `drop_last` 2022-02-25 11:26:59 +01:00
Eren Gölge f70e4bb8c6 Add new speakers to the vits model 2022-02-25 11:26:59 +01:00
Eren Gölge d5c0e17548 Load right char class dynamically 2022-02-25 11:26:59 +01:00
Eren Gölge 1f0c8179da Make style 2022-02-25 11:26:59 +01:00
Eren Gölge b3ed6ff6b7 Update FastPitchConfig 2022-02-25 11:26:59 +01:00
Eren Gölge 1932401e8d Fix dataset preprocessing 2022-02-25 11:26:59 +01:00
Eren Gölge 34c4be5e49 Update forwardtts 2022-02-25 11:26:59 +01:00
Eren Gölge bb37462794 Update language manager 2022-02-25 11:26:59 +01:00
Eren Gölge 5169d4eb32 Plot pitch over input characters 2022-02-25 11:26:59 +01:00
Eren Gölge cd5d1497cf Add pitch_fmin pitch_fmax args to the audio 2022-02-25 11:26:59 +01:00
Eren Gölge 1445a46e9e Update synthesizer to use iinit_from_config 2022-02-25 11:26:59 +01:00
Eren Gölge 7058fcc3ff Take file extension as an argument 2022-02-25 11:26:59 +01:00
Eren Gölge 13482dde1f Update GAN model 2022-02-25 11:26:59 +01:00
Eren Gölge 2829027d8b Refactor VITS model 2022-02-25 11:26:59 +01:00
Eren Gölge ef63c99524 Implement `start_by_longest` option for TTSDatase 2022-02-25 11:26:18 +01:00
Eren Gölge c4c471d61d Allow padding for shorter segments 2022-02-25 11:25:48 +01:00
Eren Gölge 47fbddc8d4 Fix docstring 2022-02-25 11:25:48 +01:00
Eren Gölge bc2243bac4 Fix tests 2022-02-25 11:25:00 +01:00
Eren Gölge 146fbfd7c9 Extend unittests 2022-02-25 11:25:00 +01:00
Eren Gölge 2fe16de8e3 Make lint 2022-02-25 11:25:00 +01:00
Eren Gölge 7b49a4aa2b Fix glow_tts_config missing field 2022-02-25 11:24:13 +01:00
Eren Gölge 07b0a80d57 Fix tokenizer init_from_config 2022-02-25 11:24:13 +01:00
Eren Gölge 50e17097a7 Add verbose option to AudioProcessor 2022-02-25 11:24:13 +01:00
Eren Gölge 235f7d9b02 Extend glow_tts model tests 2022-02-25 11:24:13 +01:00
Eren Gölge 8e248913d6 Update train_tts for the new API 2022-02-25 11:24:13 +01:00
Eren Gölge 001da8afc8 Update Vits for the new model API 2022-02-25 11:21:19 +01:00
Eren Gölge 5176ae9e53 Fixes small compat. issues 2022-02-25 11:21:19 +01:00
Eren Gölge 131bc0cfc0 Fix synthesis.py 🔧 2022-02-25 11:18:00 +01:00
Eren Gölge c0746f23df Fix `too many open files` 2022-02-25 11:16:30 +01:00
Eren Gölge df0d58bf09 Update VCTK recipes 2022-02-25 11:16:30 +01:00
Eren Gölge 730f7c0df4 Add file_ext args to resample.py 2022-02-25 11:15:46 +01:00
Eren Gölge 28d98da422 Update VCTK formatter 2022-02-25 11:15:46 +01:00
Eren Gölge 4d99fee3e2 Update spec extractor 2022-02-25 11:12:44 +01:00
Eren Gölge 38a0b3b6c7 Update train_tts.py 2022-02-25 11:11:35 +01:00
Eren Gölge cfaa51fddc Update BaseTTS config 2022-02-25 11:11:35 +01:00
Eren Gölge 4c5cb44eeb Update setup_model 2022-02-25 11:11:35 +01:00
Eren Gölge 7c4243fba7 Update GlowTTS 2022-02-25 11:11:35 +01:00
Eren Gölge bacf79f4fb Update AlignTTS 2022-02-25 11:11:35 +01:00
Eren Gölge 18f726af65 Update ForwardTTS 2022-02-25 11:11:35 +01:00
Eren Gölge d0ec4b91e5 Update Tacotron models 2022-02-25 11:11:35 +01:00
Eren Gölge ea965a5683 Update VITS for the new API 2022-02-25 11:11:35 +01:00
Eren Gölge f802a931a3 Pass samples to init_from_config in SpeakerManager 2022-02-25 11:07:34 +01:00
Eren Gölge bde68d9f25 Use the same phonemizer for `en` to `en-us` 2022-02-25 11:07:34 +01:00
Eren Gölge 8649d4fd36 Allow None pad and blank tokens 2022-02-25 11:07:34 +01:00
Eren Gölge c9972e6f14 Make lint 2022-02-25 11:07:34 +01:00
Eren Gölge 30cfafce56 Add init_from_config 2022-02-25 11:05:54 +01:00
Eren Gölge 90cc45dd4e Update data loader tests 2022-02-25 11:05:54 +01:00
Eren Gölge 93957d58a1 Refactorin VITS for the tokenizer API 2022-02-25 11:05:06 +01:00
Eren Gölge 04df0a3d9f Refactor TTSDataset 2022-02-25 11:05:06 +01:00
Eren Gölge 9bb347a52b Update for tokenizer API 2022-02-25 11:05:06 +01:00
Eren Gölge 452dbc43d8 Update imports for symbols -> characters 2022-02-25 11:05:06 +01:00
Eren Gölge 8071fa0020 Refactor GlowTTS model and recipe for TTSTokenizer 2022-02-25 11:05:06 +01:00
Eren Gölge b6c2bfdf08 Refactor synthesis.py for TTSTokenizer 2022-02-25 11:05:06 +01:00
Eren Gölge b2bb954a51 Refactor TTSDataset to use TTSTokenizer 2022-02-25 11:05:06 +01:00
Eren Gölge 84091096a6 Refactor Synthesizer class for TTSTokenizer 2022-02-25 11:05:06 +01:00
Eren Gölge 196ae74273 Update data loader tests 2022-02-25 11:05:06 +01:00
Eren Gölge 98057a00ae Make style 2022-02-25 10:57:35 +01:00
Eren Gölge 7575367b9f Refactorin VITS for the tokenizer API 2022-02-25 10:57:35 +01:00
Eren Gölge 4cd690e4c1 Updates BaseTTS and configs 2022-02-25 10:57:35 +01:00
Eren Gölge 176b712c1a Refactor TTSDataset 2022-02-25 10:57:35 +01:00
Eren Gölge 4597d4e5b6 Remove get_characters from BaseTTS 2022-02-25 10:48:03 +01:00
Eren Gölge 1df1d6c4a9 Update for tokenizer API 2022-02-25 10:48:03 +01:00
Eren Gölge 2d8ce98d2a Update imports for symbols -> characters 2022-02-25 10:48:03 +01:00
Eren Gölge 9a95e15483 Refactor GlowTTS model and recipe for TTSTokenizer 2022-02-25 10:48:03 +01:00
Eren Gölge d0eb642d88 Refactor synthesis.py for TTSTokenizer 2022-02-25 10:48:03 +01:00
Eren Gölge 3476be30d7 Refactor Synthesizer class for TTSTokenizer 2022-02-25 10:48:03 +01:00
Eren Gölge 9397a56b13 Allow init_from_config from model or audio config 2022-02-25 10:48:03 +01:00