Commit Graph

1549 Commits

Author SHA1 Message Date
WeberJulian 09eda31a3f Fix tests 2021-12-20 11:54:10 +00:00
Edresson 78a23e19df Fix pylint checks 2021-12-20 11:54:10 +00:00
WeberJulian 4cd0e4eb0d Remove self.audio_config from VITS 2021-12-20 11:54:10 +00:00
Edresson d39200e69b Remove torchaudio requeriment 2021-12-20 11:54:10 +00:00
WeberJulian 2e516869a1 Fix trailing whitespace 2021-12-20 11:54:10 +00:00
WeberJulian ffc269eaf4 Update docstring 2021-12-20 11:54:10 +00:00
Edresson 12968532fe Add the language embedding dim in the duration predictor class 2021-12-20 11:54:10 +00:00
Edresson 4196a42de7 Get the number speaker from the Speaker Manager property 2021-12-20 11:54:10 +00:00
Edresson f394d60695 Fix the bug in multispeaker vits 2021-12-20 11:54:10 +00:00
Edresson 90eac13bb2 Rename ununsed_speakers to ignored_speakers 2021-12-20 11:54:10 +00:00
Edresson f34596d957 Fix function name 2021-12-20 11:54:10 +00:00
Edresson 45d0b04179 Lint fixs 2021-12-20 11:54:10 +00:00
Edresson 85418ffeaa Fix the bug in extract tts spectrograms 2021-12-20 11:54:10 +00:00
Edresson 2b2cecaea2 Set the new_fields in copy_model_files as None by default 2021-12-20 11:54:10 +00:00
Edresson 34749f8727 Remove the call to get_speaker_manager 2021-12-20 11:54:10 +00:00
Edresson b769b49e34 Remove the data from the set_d_vectors_from_file function 2021-12-20 11:54:10 +00:00
Edresson 9daa33d1fd Remove unusable speaker manager function 2021-12-20 11:54:10 +00:00
Edresson 8c22d5ac49 Turn more clear the VITS loss function 2021-12-20 11:54:10 +00:00
Edresson 6fc3b9e679 Remove the unusable fine-tuning model 2021-12-20 11:54:10 +00:00
Edresson 352aa69eca Create a module for the VAD script 2021-12-20 11:54:10 +00:00
WeberJulian 631addf33b fix d-vector 2021-12-20 11:54:10 +00:00
WeberJulian da6c1e858c Fix small issues 2021-12-20 11:54:10 +00:00
WeberJulian e8af6a9f08 Fix use_speaker_embedding logic 2021-12-20 11:54:10 +00:00
WeberJulian 23d789c072 Fix continue path 2021-12-20 11:54:10 +00:00
WeberJulian 120332d53f Fix phonemes 2021-12-20 11:54:10 +00:00
WeberJulian 846bf16f02 fix imports for load_meta_data 2021-12-20 11:54:10 +00:00
WeberJulian 1340938159 fix phonemes per language 2021-12-20 11:54:10 +00:00
WeberJulian e995a63bd6 fix linter 2021-12-20 11:54:10 +00:00
WeberJulian 1472b6df49 make style 2021-12-20 11:54:10 +00:00
WeberJulian 4d721bcabd fix test sentence synthesis 2021-12-20 11:54:10 +00:00
WeberJulian 0804806727 fix f0_cache_path in dataset 2021-12-20 11:54:10 +00:00
WeberJulian 3b5592abcf fix test vits 2021-12-20 11:54:10 +00:00
WeberJulian 2a2b5767c2 fix collate_fn 2021-12-20 11:54:10 +00:00
Julian WEBER 78c2d12a91 PitchExtractor 2021-12-20 11:54:10 +00:00
Julian WEBER 9a2f91327c get_aux_input 2021-12-20 11:54:10 +00:00
Julian WEBER b3abd01793 Merge dataset 2021-12-20 11:54:10 +00:00
Edresson 10ff90d6d2 Add remove silence VAD script 2021-12-20 11:54:10 +00:00
Edresson 1bd1a0546b Add audio resample in the speaker consistency loss 2021-12-20 11:54:10 +00:00
Edresson 1c6bcda950 Add freeze vocoder generator and flow-based decoder option 2021-12-20 11:54:10 +00:00
WeberJulian 2b952d8b97 freeze vits parts 2021-12-20 11:54:10 +00:00
WeberJulian 005bba60b0 get_speaker_weighted_sampler 2021-12-20 11:54:10 +00:00
Edresson 9de4539422 Update the VITS model docs 2021-12-20 11:54:10 +00:00
Edresson eeb8ac07d9 Add voice conversion fine tuning mode 2021-12-20 11:54:10 +00:00
Edresson 690b37d0ab Add support to use the speaker encoder as loss function in VITS model 2021-12-20 11:54:09 +00:00
Edresson 9b011b1cb3 Add H/ASP original checkpoint support 2021-12-20 11:54:09 +00:00
Edresson 0bdfd3cb50 Add the ValueError in the restore checkpoint exception to avoid problems with the optimizer restauration when new keys are addition 2021-12-20 11:54:09 +00:00
Edresson de78556655 Fix the optimizer parameters bug in multilingual and multispeaker training 2021-12-20 11:54:09 +00:00
Edresson 9be5b75da3 Fix bug after merge 2021-12-20 11:54:09 +00:00
Edresson 76251b619a Fix d-vector multispeaker training bug 2021-12-20 11:54:09 +00:00
Edresson 7ef3ddc6ff Fix unit tests 2021-12-20 11:54:09 +00:00
Edresson 36dcd11453 Fix pylint issues 2021-12-20 11:54:09 +00:00
Edresson c53693c155 Implement vocoder Fine Tuning like SC-GlowTTS paper 2021-12-20 11:54:09 +00:00
Edresson f1f016314e Fix the bug in M-AILABS formatter 2021-12-20 11:54:09 +00:00
Edresson c334d39acc Add voice conversion support for the model VITS trained with external speaker embedding 2021-12-20 11:54:09 +00:00
Edresson e997889ba8 Fix bug in VITS multilingual inference 2021-12-20 11:54:09 +00:00
Edresson 7c0b8ec572 Fix bugs in the non-multilingual VITS inference 2021-12-20 11:54:09 +00:00
Edresson 3fbbebd74d Fix pylint issues 2021-12-20 11:54:09 +00:00
Edresson ac9416fb86 Add multilingual inference support 2021-12-20 11:54:09 +00:00
Edresson dcb2374bc9 Add multilingual training support to the VITS model 2021-12-20 11:54:09 +00:00
Edresson f996afedb0 Implement multilingual dataloader support 2021-12-20 11:54:09 +00:00
Edresson 5f1c18187f Fix pylint issues 2021-12-20 11:54:09 +00:00
Edresson d91c595c5a Implement training support with d_vecs in the VITS model 2021-12-20 11:54:09 +00:00
Edresson 6a7db67a91 Allow ignore speakers for all multispeaker datasets 2021-12-20 11:54:09 +00:00
Edresson e0ad838066 Select randomly a speaker from the speaker manager for the test setences 2021-12-20 11:54:09 +00:00
Edresson eb3e8affe1 Save speakers embeddings/ids before starting training 2021-12-20 11:54:09 +00:00
Eren Gölge 37803467aa
Merge pull request #1021 from loganhart420/dataset_downloaders
Add addtional datasets
2021-12-20 10:42:20 +01:00
Reuben Morais 859ac1a54c Include usage instructions in README 2021-12-17 11:37:19 +01:00
loganhart420 103c010eca Add addtional datasets 2021-12-16 07:21:27 -05:00
Jörg Thalheim bce143c738
server: fix compatibility with tts_models/en/ljspeech/fast_pitch (#893) 2021-12-07 14:36:29 +01:00
Eren Gölge babdd84f91 Fix GST inference
commit d3e477875a7e46a101fcf95a1794442823750fe2
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Wed Nov 3 10:16:12 2021 +0000

    Read .wav for GST conditioning from CL

commit 074e6d0874d3b34fb6a4991fc17d66dccd413fbb
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 14:43:47 2021 +0100

    Fix GST during inference in Tacotron2

commit fdece14585ab5a36eed1061a9a838d8e48aa6882
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Wed Nov 3 10:16:12 2021 +0000

    Read .wav for GST conditioning from CL

commit cd29e21b8d0a541ee298d2bf5f67223ad60be38f
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 14:43:47 2021 +0100

    Fix GST during inference in Tacotron2

commit 908ce39370eadcc9fa8510cdb26c9ead87305427
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 12:49:37 2021 +0100

    Make trim_db value negative

commit 1008a2e0f72fa7ca7f0307424f570386f2f16d42
Author: George Rousssos <25833833+george-roussos@users.noreply.github.com>
Date:   Fri Oct 29 12:22:24 2021 +0100

    Set find_endpoint db threshold in config.json
2021-12-07 13:28:49 +00:00
Eren Gölge ce45d9e1af Make style and lint 2021-12-01 10:42:52 +00:00
Eren Gölge 40cb8ac966 Fix #958 2021-12-01 10:33:34 +00:00
Eren Gölge 512ada7548 Fix callbacks against multi-gpu training 2021-12-01 10:32:14 +00:00
Eren Gölge 2ed9e3c241 Fix constant use of noise augment 2021-11-08 09:20:34 +01:00
Eren Gölge b6b14a76af Fix VITS stochastic duration predictor 2021-11-08 09:20:11 +01:00
Eren Gölge dc3dd55dd9 Add collect_env_info.py 2021-11-08 08:59:08 +01:00
Eren Gölge faafea4cf2 Fix style 2021-11-04 17:04:40 +01:00
Eren Gölge d227aaebcc Print when using Griffin-Lim in Synthesizer 2021-11-01 16:52:26 +01:00
Eren Gölge c5077c6c3f Merge branch 'dev' of https://github.com/coqui-ai/TTS into dev 2021-11-01 16:42:27 +01:00
Eren Gölge 20cebde1c9 Add docstring to MAI labs formatter 2021-11-01 16:41:55 +01:00
Eren Gölge 608f437545 Add a function to find unique chars 2021-11-01 16:41:33 +01:00
Eren Gölge d6d780e758 Fix FastSpeech config 2021-11-01 16:41:15 +01:00
Eren Gölge 5ba47081ee Use GL for VCTK FastPitch models 2021-11-01 16:39:03 +01:00
Michael Hansen 3bc043faeb
Upgrade to gruut 2.0 (#882) 2021-10-31 11:41:55 +01:00
George 37eaefc085
Optional silence trimming during inference and find_endpoint() fix (#898)
* Set find_endpoint db threshold in config.json

* Optional silence trimming during inference

* Make trim_db value negative
2021-10-29 18:28:55 +02:00
Eren Gölge 7293abada2 Bump up to v0.4.2 2021-10-29 17:57:30 +02:00
Eren Gölge 2df0752e73
Model zoo tests (#900)
* Fix VITS model multi-speaker init

* Remove gdrive support in model manager

* Add model zoo tests
2021-10-29 17:54:16 +02:00
Eren Gölge aaaa591485 Bump up version to v0.4.1 2021-10-26 19:24:17 +02:00
Eren Gölge 3ea1c2037b Fix model entry in .models.json 2021-10-26 19:14:29 +02:00
Eren Gölge fa4ec83c6e Bump up version to v0.4.0 2021-10-26 18:27:39 +02:00
Eren Gölge 035ed432bc
Doc update (#889)
* Link source files from the docs

* Update glowTTS recipes for docs

* Add dataset downloaders
2021-10-26 17:41:33 +02:00
Eren Gölge 0cac3f330a Enable custom formatter in load_tts_samples 2021-10-26 13:07:11 +02:00
Eren Gölge 7c10574931 Gateway for TTS models 2021-10-26 13:04:51 +02:00
Eren Gölge 00becf2671 Fix import statements 2021-10-25 19:29:16 +02:00
Eren Gölge 027424dda8 Add VCTK fast_pitch and UK glow-tts 2021-10-25 19:29:16 +02:00
Eren Gölge 70e4d0e524 Fix grad_norm handling 2021-10-21 16:29:06 +00:00
Eren Gölge a409e0f8f8 Update train_tts for multi-speaker 2021-10-21 16:29:06 +00:00
Eren Gölge 2b7d159383 Update BaseTTS for multi-speaker training 2021-10-21 16:29:06 +00:00
Eren Gölge e62d3c5cf7 Use absolute imports for tts configs and models 2021-10-21 16:29:06 +00:00
Eren Gölge 82fed4add2 Make style 2021-10-21 16:05:51 +00:00
Eren Gölge 3cb07fb6b5 Fix SpeakerManager init with data items 2021-10-21 13:54:39 +00:00
Eren Gölge aea90e2501 Comment synthesis.py 2021-10-21 13:53:45 +00:00
Eren Gölge 1987aaaaed Update d-vector reshape in synthesizer 2021-10-21 13:53:25 +00:00
Eren Gölge 3ab009ca8d Edit model configs for multi-speaker 2021-10-21 13:51:37 +00:00
Eren Gölge cea8e1739b Update AlignTTS to use SpeakerManager 2021-10-20 18:22:41 +00:00
Eren Gölge 0e768dd4c5 Update comments 2021-10-20 18:21:26 +00:00
Eren Gölge 7c2cb7cc30 Update BaseTTS 2021-10-20 18:18:22 +00:00
Eren Gölge 330ee7d208 Comment BaseTacotron and remove unused funcs 2021-10-20 18:17:25 +00:00
Eren Gölge aa25f70b95 Update ForwardTTS for multi-speaker 2021-10-20 18:16:41 +00:00
Eren Gölge 0ebc2a400e Implement `_set_speaker_embedding` in GlowTTS 2021-10-20 18:15:20 +00:00
Eren Gölge 3da79a4de4 Comment Tacotron2 model 2021-10-20 18:14:04 +00:00
Eren Gölge 92b6d98443 Set pitch frame alignment wrt spec computation 2021-10-20 18:12:38 +00:00
Eren Gölge 0a3d1cc7ee Pass speaker manager to the model in synthesizer 2021-10-20 18:11:36 +00:00
Eren Gölge 588da1a24e Simplify grad_norm handling in trainer 2021-10-19 16:33:04 +00:00
Eren Gölge 3c7848e9b1 Don't OOR values in train console log 2021-10-19 16:32:16 +00:00
Eren Gölge c514351c0e Refactor multi-speaker init in BaseTTS-Tacotron1-2 2021-10-18 08:55:45 +00:00
Eren Gölge 127571423c Update multi-speaker init in BaseTTS 2021-10-18 08:54:41 +00:00
Eren Gölge a0a5d580e9 Approximate audio length from file size 2021-10-18 08:54:02 +00:00
Eren Gölge b4b890df03 Update trainer's initialization 2021-10-18 08:53:19 +00:00
Eren Gölge fcbfc53cb7 Fix linter 2021-10-15 10:24:19 +00:00
Eren Gölge 700b056117 Update Synthesizer multi-speaker handling 2021-10-15 10:21:12 +00:00
Eren Gölge 073a2d2eb0 Refactor VITS multi-speaker initialization 2021-10-15 10:20:00 +00:00
Eren Gölge 0565457faa Fix #846 2021-10-14 14:46:14 +00:00
Eren Gölge e15bc157d8 Fix #873 2021-10-14 14:39:45 +00:00
Eren Gölge 21cc0517a3 Fix WaveRNN test 2021-10-01 10:21:37 +00:00
Eren Gölge 4dbe7ed0de Fix all-zero duration case for GlowTTS 2021-10-01 09:24:26 +00:00
Eren Gölge 37959ad0c7 Make linter 2021-09-30 23:02:16 +00:00
Eren Gölge 0b1986384f Make style 2021-09-30 16:21:18 +00:00
Eren Gölge 7edbe04fe0 Fix WaveRNN config and test 2021-09-30 16:20:12 +00:00
Eren Gölge 55d9209221 Remote STT tokenizer 2021-09-30 14:58:26 +00:00
Eren Gölge ba2b8c827f Update `train_tts.py` and `train_vocoder.py` 2021-09-30 14:47:56 +00:00
Eren Gölge 2e9b6b4f90 Refactor Speaker Encoder training 2021-09-30 14:47:56 +00:00
Eren Gölge 043dca61b4 Rename `load_meta_data` as `load_tts_data` 2021-09-30 14:47:56 +00:00
Eren Gölge 9f23ad6a0f Fix imports 2021-09-30 14:47:56 +00:00
Eren Gölge 16b70be0dd Add `_set_model_args` to BaseModel 2021-09-30 14:47:56 +00:00
Eren Gölge 9a0d8fa027 Update `copy_model_files()` 2021-09-30 14:47:56 +00:00
Eren Gölge 4163b4f2e4 Update Tacotron models 2021-09-30 14:47:56 +00:00
Eren Gölge e27feade38 Fixup wavernn 2021-09-30 14:47:56 +00:00
Eren Gölge 45889804c2 Update VITS 2021-09-30 14:47:56 +00:00
Eren Gölge 4f94f91305 Update WaveRNN 2021-09-30 14:47:56 +00:00
Eren Gölge 3d5205d66f Update WaveGrad 2021-09-30 14:47:56 +00:00
Eren Gölge fd95926009 Update GlowTTS 2021-09-30 14:47:56 +00:00
Eren Gölge 4baecdf92a Update GAN for Trainer_v2 2021-09-30 14:47:56 +00:00
Eren Gölge a156a40b47 Update ForwardTTS for Trainer_v2 2021-09-30 14:19:19 +00:00
Eren Gölge d9df33f837 Update `align_tts` for trainer_v2 2021-09-30 14:18:10 +00:00
Eren Gölge 8ada870a57 Refactor `trainer.py` for v2 2021-09-30 14:16:34 +00:00
Eren Gölge 7f388f26e3 Bump up to v0.3.1 2021-09-17 23:53:22 +00:00
Eren Gölge 2766dd1d6e
Fix #813 - GlowTTS training (#814)
* Fix #813

* Update glow_tts recipe

* Fix glow-tts test

* Linter fix

* Run data dep init only in training
2021-09-17 20:06:55 +02:00
Eren Gölge f563415052 Bump up to v0.3.0 2021-09-13 09:40:38 +00:00
Eren Gölge a97dc8d09f Fix trainer malformatted print 2021-09-13 08:32:02 +00:00
Eren Gölge 91bebebe18 Add new models to `.models.json`
SpeedySpeech model using `ForwardTTS`
UnivNet model fine-tuned on TacotronDDC_ph spectrograms
2021-09-13 08:22:14 +00:00
Eren Gölge 1ea011571a Update SpeedySpeech config 2021-09-12 15:33:27 +00:00
Eren Gölge cbbc9e0172 Add FastSpeechConfig 2021-09-11 10:20:37 +00:00
Eren Gölge 26f76fce22 Remove SpeedySpeech from .models.json 2021-09-10 17:47:27 +00:00
Eren Gölge d97952611d Remove unused import 2021-09-10 17:31:41 +00:00
Eren Gölge 7d8f77385a Use `glow-tts` in synthesis tests 2021-09-10 17:27:33 +00:00
Eren Gölge d5f256b34c Update tacotron `r` init 2021-09-10 17:26:23 +00:00
Eren Gölge ab37fa9c39 Edit AlignTTS 2021-09-10 17:25:00 +00:00
Eren Gölge 66732025e1 Add `base_model` field to `forward_tts` configs 2021-09-10 17:23:48 +00:00
Eren Gölge d6e29ef98a Style update 2021-09-10 08:30:33 +00:00
Eren Gölge a89eb12aca Fix glow_tts imports 2021-09-10 08:29:51 +00:00
Eren Gölge 570d5971be Implement `ForwardTTSLoss` 2021-09-10 08:29:12 +00:00
Eren Gölge 0541a25e90 Remove `fastpitch.py` and `speedy_speech.py` 2021-09-10 08:27:48 +00:00
Eren Gölge 3c16013199 Fix Vits imports 2021-09-10 08:26:34 +00:00
Eren Gölge 742f9c54da Warn user if nan in GL 2021-09-10 08:26:05 +00:00
Eren Gölge ed4b1d8514 Test `TTS.tts.utils.helpers` 2021-09-10 08:25:21 +00:00
Eren Gölge 8b7e094bde Implement `forward_tts`
- Generic API for feed-forward TTS models (FastPitch, SpeedySpeech)

- Tests for `forward-tts`

- Edit  FastPitchConfig and SpeedySpeechConfig to use `forward_tts`
2021-09-10 08:24:33 +00:00
Eren Gölge 3c740d4893 Style extract_tts_spectrogram.py 2021-09-10 08:21:21 +00:00
Eren Gölge bfc6ceac29 Move MAS to `TTS.tts.utils.helpers` 2021-09-09 10:57:19 +00:00
Eren Gölge 2dfc5bdd11 Fix best_model_path init if no best_mode 2021-09-09 09:01:52 +00:00
Eren Gölge abf5e48177 Fix logging current learning rate in trainer 2021-09-09 09:01:04 +00:00
Eren Gölge 6c4c1065b0 Fix trainer's scheduler restoring 2021-09-09 09:00:27 +00:00
Eren Gölge 807f1d3817 Fix `extract_tts_spectrograms.py` model init 2021-09-09 08:59:55 +00:00
Eren Gölge 537c8576ec Stage `TTS.tts.utils.helpers` 2021-09-08 13:35:18 +00:00
Eren Gölge 4761853c5c Fix imports 2021-09-08 13:34:40 +00:00
Eren Gölge e20ea57c87 Update comment and add a warning 2021-09-07 12:23:32 +00:00
Eren Gölge 82598f3fdb Bump up to v0.2.2 2021-09-06 16:59:41 +00:00
Eren Gölge 4cc544bc46 Add FastPitch model to `.models.json` 2021-09-06 16:59:22 +00:00
Eren Gölge 2c4bbbf9b9 Use pyworld for pitch 2021-09-06 15:16:58 +00:00
Eren Gölge c1513ec4cd Plot pitch over spectrogram 2021-09-06 15:16:58 +00:00
Eren Gölge d847a68e42 Reformat multi-speaker handling in GlowTTS 2021-09-06 15:16:58 +00:00
Eren Gölge 8d41060d36 Plot unnormalized pitch by `FastPitch` 2021-09-06 15:16:58 +00:00
Eren Gölge 2b59da802c Fix loader setup in `base_tts` 2021-09-06 15:16:58 +00:00
Eren Gölge 76c4929ab2 Fix attn mask reading bug 2021-09-06 15:16:58 +00:00
Eren Gölge 91a70e80b2 Refactor TTSDataset
Return a dict by `collate`
Refactor batch handling in `collate`
A couple of bug fixes
2021-09-06 15:16:58 +00:00
Eren Gölge 29248536c9 Update `PositionalEncoding` 2021-09-06 15:16:58 +00:00
Eren Gölge 4672889549 Update `generic.FFTransformer` 2021-09-06 15:16:58 +00:00
Eren Gölge 2bf9e83c49 FastPitch refactor and commenting 2021-09-06 15:16:58 +00:00
Eren Gölge 59b24e66cf Add `AlignerNetwork` 2021-09-06 15:16:58 +00:00
Eren Gölge 648655fa03 Add `PitchExtractor` and return dict by `collate` 2021-09-06 15:16:58 +00:00
Eren Gölge debf772ec5 Implement binary alignment loss 2021-09-06 15:16:58 +00:00
Eren Gölge 6e9d4062f2 Add `sort_by_audio_len` option 2021-09-06 15:16:58 +00:00
Eren Gölge 59d52a4cd8 Disable autcast for criterions 2021-09-06 15:16:58 +00:00
Eren Gölge 98a7271ce8 Refactor FastPitchv2 2021-09-06 15:16:58 +00:00
Eren Gölge e429afbce4 Enable aligner for FastPitch 2021-09-06 15:16:58 +00:00
Eren Gölge 81c228a2d8 Update FastPitch don't detach duration network inputs 2021-09-06 15:16:58 +00:00
Eren Gölge ca29033ef4 Refactor FastPitch model 2021-09-06 15:16:58 +00:00
Eren Gölge 42862f7fdb Format style of the recipes 2021-09-06 15:16:58 +00:00
Eren Gölge 5d59100a88 Don't use align_score for models with duration predictor 2021-09-06 15:16:58 +00:00
Eren Gölge fac9dbe661 Update FastPitchLoss 2021-09-06 15:16:58 +00:00
Eren Gölge b81560607b Update docstrings 2021-09-06 15:16:58 +00:00
Eren Gölge 57b3aec1b9 Update docstring format 2021-09-06 15:16:58 +00:00
Eren Gölge 7692bfe7f8 Update FastPitch config 2021-09-06 15:16:58 +00:00
Eren Gölge 8584f2b82d Update docstring format 2021-09-06 15:16:58 +00:00
Eren Gölge b7caad39e0 Make optional to detach duration predictor input 2021-09-06 15:16:58 +00:00
Eren Gölge 9af42f7886 Restore `last_epoch` of the scheduler 2021-09-06 15:16:58 +00:00
Eren Gölge aacbb3ed77 Fix SpeakerManager usage in `synthesize.py` 2021-09-06 15:16:58 +00:00
Eren Gölge 545a00fc04 Use absolute paths of the attention masks 2021-09-06 15:16:58 +00:00
Eren Gölge bc396c393f Add FastPitch model and FastPitchconfig 2021-09-06 15:16:58 +00:00
Eren Gölge 5a6ffaee08 Add yin based pitch computation 2021-09-06 15:16:58 +00:00
Eren Gölge e802b24ad0 Compute mean and std pitch 2021-09-06 15:16:58 +00:00
Eren Gölge 8fffd4e813 Don't print computed phonemes
It causes noise in logs
2021-09-06 15:16:58 +00:00
Eren Gölge d085642ac1 Cache pitch features
Cache the features at the beginning of `BaseTTS` training.
2021-09-06 15:16:58 +00:00
Eren Gölge 7590c7db7a Fix `base_tacotron` `aux_input` handling 2021-09-06 15:16:58 +00:00
Eren Gölge db32162eae Fix `FastPitchLoss` 2021-09-06 15:16:58 +00:00
Eren Gölge 94e8e0d416 Fix configs 2021-09-06 15:16:58 +00:00
Eren Gölge 0f19f8c911 Fix `compute_attention_masks.py` 2021-09-06 15:16:58 +00:00
Eren Gölge 994f2be2c1 Add comput_f0 field 2021-09-06 15:16:58 +00:00
Eren Gölge c8d999b010 Add FastPitchLoss 2021-09-06 15:16:58 +00:00
Eren Gölge fba257104d Compute F0 using librosa 2021-09-06 15:16:58 +00:00
Katsuya Iida 165e5814af
Update Japanese phonemizer (#758)
* Update default ja vocoder

* update

* Japanese phonemizer test

* Run make style

Co-authored-by: Eren Gölge <egolge@coqui.ai>
2021-09-01 09:33:15 +02:00
Eren Gölge 2b7e55f01f Fix vits args types 2021-08-30 23:24:20 +00:00
Eren Gölge b910a6ddce Bump up to v0.2.1 2021-08-30 16:31:24 +00:00
Eren Gölge d16da949a5 Merge branch 'fix_distribute' into dev 2021-08-30 16:31:07 +00:00
Eren Gölge 6782d3eab7 Fix linter issues ofr p3.6 2021-08-30 16:18:33 +00:00
Eren Gölge 738eee0cf9 Fix style 2021-08-30 13:12:13 +00:00
Eren Gölge 5255e089e6 Fix #767 2021-08-30 13:10:08 +00:00
Eren Gölge c560114324 Fix #750 2021-08-30 13:06:50 +00:00
Eren Gölge 18b2e41e5a Use `coqui_tts` as the default run name 2021-08-30 12:56:47 +00:00
Eren Gölge 9c86f1ac68 Fix usage of abstract class in vocoders 2021-08-30 08:10:35 +00:00
Eren Gölge 18da8f5dbd Update pylint 2.10.2 and fix lint issues 2021-08-30 08:10:35 +00:00
Eren Gölge f186856e5d Add option to sort input sequnce by audio len 2021-08-30 08:10:35 +00:00
Eren Gölge 2620f62ea8 Move duration_loss inside VitsGeneratorLoss 2021-08-27 07:07:07 +00:00
Eren Gölge 1692b8e4d9
Merge pull request #726 from fijipants/patch-1
Fix bug with log_func
2021-08-26 22:11:29 +02:00
Eren Gölge 49e1181ea4 Fixes for the vits model 2021-08-26 17:15:09 +00:00
Eren Gölge 5911eec3b1 Small trainer refactoring
1. Use a single Gradscaler for all the optimizers
2. Save terminal logs to a file. In DDP mode, each worker creates `trainer_N_log.txt`.
3. Fixes to allow only the main worker (rank==0) writing to Tensorboard
4. Pass parameters owned by the target optimizer to the grad_clip_norm
2021-08-26 17:08:58 +00:00
fijipants e9e01b09b0 Fix bug with log_func 2021-08-18 19:59:51 -04:00
fijipants 8f57f8adfd Update synthesizer.py 2021-08-18 19:56:52 -04:00
Eren Gölge 3ab8cef99e Fix VITS model SPD 2021-08-18 14:55:46 +00:00
Eren Gölge c5d1dd9d1b Fix restoring best_loss
Keep the default value if model checkpoint has no `model_loss`
2021-08-17 12:12:36 +00:00
Eren Gölge c8bbcdfd07 Fix `test_run` for DDP 2021-08-13 19:39:02 +00:00
Eren Gölge 7c0d564965 Syncronize DDP processes 2021-08-13 10:40:50 +00:00
Eren Gölge ecf5f17dca Fix distribute.py and ddp training 2021-08-12 22:22:32 +00:00
Eren Gölge b02c4fe347 Bump up to v0.2.0 2021-08-11 08:15:39 +00:00
Eren Gölge 537bc8487a Print model count when listing modelsk 2021-08-10 16:25:11 +00:00
Eren Gölge 09ed8426e8 Add the models released with v0.2.0 2021-08-10 15:46:31 +00:00
Eren Gölge 39004484b9 Fix 🐛
Fix synthesizer multi-speaker init
Fix #712
2021-08-10 12:56:32 +00:00
Eren Gölge c8b9ca3d71 Fix Tacotron num_char init 2021-08-10 08:56:34 +00:00
Eren Gölge 7eb94f760b Remove Ruslan model 2021-08-09 21:48:36 +00:00
Eren Gölge 6af03ac476 Fix `num_char` init in Tacotron models 2021-08-09 21:46:15 +00:00
Ayush Chaurasia e685ddfca7 Update trainer.py 2021-08-09 18:37:46 +00:00
Ayush Chaurasia 28870f8df4 update docstring 2021-08-09 18:35:35 +00:00
Ayush Chaurasia 8a246cbb66 Update trainer.py 2021-08-09 18:35:08 +00:00
Ayush Chaurasia f3e9d61330 Refactor logging initialization 2021-08-09 18:35:08 +00:00
Ayush Chaurasia 79b74a989d Update: add_text 2021-08-09 18:34:38 +00:00
Ayush Chaurasia 9fcf48b760 Delete logger_base.py 2021-08-09 18:34:00 +00:00
Ayush Chaurasia 290972fd35 reformat 2021-08-09 18:34:00 +00:00
Ayush Chaurasia 936a47504d Update Logger API, recipes 2021-08-09 18:34:00 +00:00
Ayush Chaurasia f63cf46c55 Unified logger API 2021-08-09 18:34:00 +00:00
Ayush Chaurasia f4434da5a3 Update disabled structure 2021-08-09 18:31:16 +00:00
Ayush Chaurasia f606741dc4 Add artifacts logging , wandb args 2021-08-09 18:31:16 +00:00
Ayush Chaurasia f5e50ad502 WandbLogger 2021-08-09 18:27:06 +00:00
Eren Gölge 06018251e6 Add VITS and GlowTTS class docs 🗒️ 2021-08-09 18:02:36 +00:00
Eren Gölge 6a7275881d Add VitsConfig docstring 2021-08-09 18:02:36 +00:00
Eren Gölge f7a72552f1 Make duration predictor dropout configurable 2021-08-09 18:02:36 +00:00
Eren Gölge c312acac7d Implement VITS model 🚀
VITS model implementation built on Glow TTS and HiFiGAN
layers.
2021-08-09 18:02:36 +00:00
Eren Gölge 060e746e21 Add `do_amp_to_db` option 2021-08-09 18:02:36 +00:00
Eren Gölge e94c1f894d Simplify `console_logger` 2021-08-09 18:02:36 +00:00
Eren Gölge dd55960732 Update `synthesizer.py`
Fixes and changes for multi-speaker model init and custom symbols  made
by mode.make_symbols()
2021-08-09 18:02:36 +00:00
Eren Gölge 232a5abb6a Update `tts.setup_model`
Run `model.make_symbols()` if availabe to set the symbol list
2021-08-09 18:02:36 +00:00
Eren Gölge f5a6aa974f Modify `symbols.py` not to add _arpanet 2021-08-09 18:02:36 +00:00
Eren Gölge d4deb2716f Modify `get_optimizer` to accept a model argument 2021-08-09 18:02:36 +00:00
Eren Gölge 003e5579e8 Enable `custom_symbols` in text processing
Models can define their own custom symbols lists with custom
`make_symbols()`
2021-08-09 18:02:36 +00:00
Eren Gölge bd4e29b4dd Add `compute_linear_spec=False` to `BaseTTSConfig` 2021-08-09 18:02:36 +00:00
Eren Gölge 960a35a121 Add `scheduler_after_epoch` to `BaseTrainingConfig` 2021-08-09 18:02:36 +00:00
Eren Gölge e4648ffef1 Fix multi-speaker init of Tacotron models & tests 2021-08-09 18:02:36 +00:00
Eren Gölge 01324c8e70 Update `base_tts.py`
Enable calling `make_symbols()` from the model if defined.
Compatibility changes for end2end `tts` models in batch formatting.
Changes in multi-speaker initialization.
Modify `test_run()` to work with dict output iof `synthesis`
2021-08-09 18:02:36 +00:00
Eren Gölge bf562cf437 Update `trainer.py`
Fix multi-speaker initialization of models. Add changes for end2end`tts`
models.
2021-08-09 18:02:36 +00:00
Agrin Hilmkil ced4cfdbbf Allow saving / loading checkpoints from cloud paths (#683)
* Allow saving / loading checkpoints from cloud paths

Allows saving and loading checkpoints directly from cloud paths like
Amazon S3 (s3://) and Google Cloud Storage (gs://) by using fsspec.

Note: The user will have to install the relevant dependency for each
protocol. Otherwise fsspec will fail and specify which dependency is
missing.

* Append suffix _fsspec to save/load function names

* Add a lower bound to the fsspec dependency

Skips the 0 major version.

* Add missing changes from refactor

* Use fsspec for remaining artifacts

* Add test case with path requiring fsspec

* Avoid writing logs to file unless output_path is local

* Document the possibility of using paths supported by fsspec

* Fix style and lint

* Add missing lint fixes

* Add type annotations to new functions

* Use Coqpit method for converting config to dict

* Fix type annotation in semi-new function

* Add return type for load_fsspec

* Fix bug where fs not always created

* Restore the experiment removal functionality
2021-08-09 18:02:36 +00:00
Eren Gölge d9e18e009b Skip phoneme cache pre-compute if the path exists 2021-08-09 18:02:36 +00:00
Eren Gölge 6c131d168e Bump the version to 0.1.3 2021-07-26 21:32:27 +02:00
Eren Gölge febd6105b5 Update default vocoder for de-thorsten 2021-07-26 16:08:52 +02:00
Eren Gölge 4b7b88dd3d Add fullband-melgan DE vocoder 2021-07-26 15:38:30 +02:00
Eren Gölge 764f684e1b Fix `server.py` for multi-speaker models 2021-07-26 15:38:30 +02:00
Eren Gölge 75b201c6c1
Merge pull request #673 from coqui-ai/fix_stopnet
Fix stopnet training for Tacotron models
2021-07-24 12:25:38 +02:00
Eren Gölge fc0c4600bd Fix stopnet training 2021-07-24 11:39:54 +02:00
Eren Gölge 30eed347b6
Merge pull request #581 from Edresson/dev
Compute speaker embeddings in batch for the LSTM  Speaker Encoder and Compute embeddings/ finding chars using config file.
2021-07-23 17:22:51 +02:00
Edresson Casanova d5adc35fdf
Add docstring to compute_embeddings script 2021-07-21 07:16:10 -03:00
Eren Gölge 05c75aa9d5 Fix linter issues 2021-07-16 13:37:38 +02:00
Eren Gölge 58cc414477 Fix WaveGrad `test_run` 2021-07-16 13:02:25 +02:00
WeberJulian 25832eb97b Changes for review 2021-07-15 11:38:45 +02:00
Edresson b1620d1f3f remove ignore generate eval flag 2021-07-15 03:34:28 -03:00
WeberJulian c79a82ed07 refix linter 2021-07-13 23:12:18 +02:00
WeberJulian 7d92b30946 Fix tests 2021-07-13 23:00:34 +02:00
WeberJulian 32974dd6a9 Fix test sentences synthesis 2021-07-13 16:07:13 +02:00
Edresson d906fea08c lint fix and eval as argparse in extract tts spectrograms 2021-07-13 02:15:31 -03:00
Edresson 2e5baffa9c Merge fix and eval split as argparse 2021-07-13 01:47:32 -03:00
Eren Gölge 93a74cbb71
Merge pull request #628 from Aloento/patch-2
Change to _get_preprocessor_by_name
2021-07-11 22:17:50 +02:00
Edresson 4eac1c4651 bug fix on train_encoder and unit tests 2021-07-11 12:00:39 -03:00
Aloento 6e3e6d5756
Change to _get_preprocessor_by_name 2021-07-08 09:53:13 +02:00