Commit Graph

384 Commits

Author SHA1 Message Date
Eren Gölge 2d8ce98d2a Update imports for symbols -> characters 2022-02-25 10:48:03 +01:00
Eren Gölge 9a95e15483 Refactor GlowTTS model and recipe for TTSTokenizer 2022-02-25 10:48:03 +01:00
Eren Gölge 04202da1ac Make style 2022-02-25 10:48:03 +01:00
Eren Gölge bb389479a4 Update setup_model for TTS.tts models 2022-02-25 10:48:03 +01:00
Eren Gölge d2525abe8c Remove get_characters from BaseTTS 2022-02-25 10:48:03 +01:00
Eren Gölge 73d27ebd45 Fix GlowTTS 2022-02-25 10:48:03 +01:00
Eren Gölge fbad17e084 Update imports for symbols -> characters 2022-02-25 10:48:02 +01:00
Eren Gölge bd461ace33 Refactor GlowTTS model and recipe for TTSTokenizer 2022-02-25 10:45:24 +01:00
Edresson Casanova ba6e56e01c Fix Glow-TTS multi-speaker inference 2022-02-18 19:25:29 +00:00
Eren Gölge 127118c637
Update TTS.tts formatters (#1228)
* Return Dict from tts formatters

* Make style
2022-02-11 23:03:43 +01:00
WeberJulian e778bad626 Add argument to enable dp speaker conditioning 2022-01-06 15:07:27 +01:00
WeberJulian e1accb6e28
Fix train_tts.py and uncomment code (#1051)
* Fix SE loading and language embedding logic

* remove trailing white space

* Uncomment resmapling code for SCL
2022-01-03 17:44:57 +01:00
Eren Gölge 36cef5966b Fix resnet speaker encoder 2021-12-30 15:36:35 +00:00
Eren Gölge 348b5c96a2 Fix speaker encoder test 2021-12-30 15:36:35 +00:00
Eren Gölge 7129b04d46 Update VITS model 2021-12-30 14:08:17 +00:00
Eren Gölge d29c3780d1 Use speaker_encoder from speaker manager in Vits 2021-12-20 11:54:10 +00:00
Eren Gölge 649dc9e9da Remove redundant code 2021-12-20 11:54:10 +00:00
Eren Gölge 704dddcffa Make style 2021-12-20 11:54:10 +00:00
WeberJulian 2bbcb558dc Prevent weighted sampler use when num_gpus > 1 2021-12-20 11:54:10 +00:00
WeberJulian 74cedfac38 Revert init multispeaker change 2021-12-20 11:54:10 +00:00
WeberJulian 6b03943526 Move multilingual logic out of the trainer 2021-12-20 11:54:10 +00:00
Edresson 67dda0abe1 Add the SCL resample TODO 2021-12-20 11:54:10 +00:00
WeberJulian 8b52fb89d1 Fix merge bug 2021-12-20 11:54:10 +00:00
WeberJulian 09eda31a3f Fix tests 2021-12-20 11:54:10 +00:00
Edresson 78a23e19df Fix pylint checks 2021-12-20 11:54:10 +00:00
WeberJulian 4cd0e4eb0d Remove self.audio_config from VITS 2021-12-20 11:54:10 +00:00
Edresson d39200e69b Remove torchaudio requeriment 2021-12-20 11:54:10 +00:00
WeberJulian 2e516869a1 Fix trailing whitespace 2021-12-20 11:54:10 +00:00
WeberJulian ffc269eaf4 Update docstring 2021-12-20 11:54:10 +00:00
Edresson 12968532fe Add the language embedding dim in the duration predictor class 2021-12-20 11:54:10 +00:00
Edresson f34596d957 Fix function name 2021-12-20 11:54:10 +00:00
Edresson 9daa33d1fd Remove unusable speaker manager function 2021-12-20 11:54:10 +00:00
Edresson 6fc3b9e679 Remove the unusable fine-tuning model 2021-12-20 11:54:10 +00:00
WeberJulian da6c1e858c Fix small issues 2021-12-20 11:54:10 +00:00
WeberJulian e8af6a9f08 Fix use_speaker_embedding logic 2021-12-20 11:54:10 +00:00
WeberJulian 120332d53f Fix phonemes 2021-12-20 11:54:10 +00:00
WeberJulian e995a63bd6 fix linter 2021-12-20 11:54:10 +00:00
WeberJulian 1472b6df49 make style 2021-12-20 11:54:10 +00:00
WeberJulian 3b5592abcf fix test vits 2021-12-20 11:54:10 +00:00
Julian WEBER 9a2f91327c get_aux_input 2021-12-20 11:54:10 +00:00
Edresson 1bd1a0546b Add audio resample in the speaker consistency loss 2021-12-20 11:54:10 +00:00
Edresson 1c6bcda950 Add freeze vocoder generator and flow-based decoder option 2021-12-20 11:54:10 +00:00
WeberJulian 2b952d8b97 freeze vits parts 2021-12-20 11:54:10 +00:00
WeberJulian 005bba60b0 get_speaker_weighted_sampler 2021-12-20 11:54:10 +00:00
Edresson 9de4539422 Update the VITS model docs 2021-12-20 11:54:10 +00:00
Edresson eeb8ac07d9 Add voice conversion fine tuning mode 2021-12-20 11:54:10 +00:00
Edresson 690b37d0ab Add support to use the speaker encoder as loss function in VITS model 2021-12-20 11:54:09 +00:00
Edresson de78556655 Fix the optimizer parameters bug in multilingual and multispeaker training 2021-12-20 11:54:09 +00:00
Edresson 9be5b75da3 Fix bug after merge 2021-12-20 11:54:09 +00:00
Edresson 76251b619a Fix d-vector multispeaker training bug 2021-12-20 11:54:09 +00:00
Edresson 7ef3ddc6ff Fix unit tests 2021-12-20 11:54:09 +00:00
Edresson 36dcd11453 Fix pylint issues 2021-12-20 11:54:09 +00:00
Edresson c53693c155 Implement vocoder Fine Tuning like SC-GlowTTS paper 2021-12-20 11:54:09 +00:00
Edresson c334d39acc Add voice conversion support for the model VITS trained with external speaker embedding 2021-12-20 11:54:09 +00:00
Edresson e997889ba8 Fix bug in VITS multilingual inference 2021-12-20 11:54:09 +00:00
Edresson 7c0b8ec572 Fix bugs in the non-multilingual VITS inference 2021-12-20 11:54:09 +00:00
Edresson 3fbbebd74d Fix pylint issues 2021-12-20 11:54:09 +00:00
Edresson ac9416fb86 Add multilingual inference support 2021-12-20 11:54:09 +00:00
Edresson dcb2374bc9 Add multilingual training support to the VITS model 2021-12-20 11:54:09 +00:00
Edresson f996afedb0 Implement multilingual dataloader support 2021-12-20 11:54:09 +00:00
Edresson 5f1c18187f Fix pylint issues 2021-12-20 11:54:09 +00:00
Edresson d91c595c5a Implement training support with d_vecs in the VITS model 2021-12-20 11:54:09 +00:00
Edresson e0ad838066 Select randomly a speaker from the speaker manager for the test setences 2021-12-20 11:54:09 +00:00
Edresson eb3e8affe1 Save speakers embeddings/ids before starting training 2021-12-20 11:54:09 +00:00
Eren Gölge 2ed9e3c241 Fix constant use of noise augment 2021-11-08 09:20:34 +01:00
Eren Gölge 2df0752e73
Model zoo tests (#900)
* Fix VITS model multi-speaker init

* Remove gdrive support in model manager

* Add model zoo tests
2021-10-29 17:54:16 +02:00
Eren Gölge 00becf2671 Fix import statements 2021-10-25 19:29:16 +02:00
Eren Gölge 2b7d159383 Update BaseTTS for multi-speaker training 2021-10-21 16:29:06 +00:00
Eren Gölge 82fed4add2 Make style 2021-10-21 16:05:51 +00:00
Eren Gölge cea8e1739b Update AlignTTS to use SpeakerManager 2021-10-20 18:22:41 +00:00
Eren Gölge 0e768dd4c5 Update comments 2021-10-20 18:21:26 +00:00
Eren Gölge 7c2cb7cc30 Update BaseTTS 2021-10-20 18:18:22 +00:00
Eren Gölge 330ee7d208 Comment BaseTacotron and remove unused funcs 2021-10-20 18:17:25 +00:00
Eren Gölge aa25f70b95 Update ForwardTTS for multi-speaker 2021-10-20 18:16:41 +00:00
Eren Gölge 0ebc2a400e Implement `_set_speaker_embedding` in GlowTTS 2021-10-20 18:15:20 +00:00
Eren Gölge 3da79a4de4 Comment Tacotron2 model 2021-10-20 18:14:04 +00:00
Eren Gölge c514351c0e Refactor multi-speaker init in BaseTTS-Tacotron1-2 2021-10-18 08:55:45 +00:00
Eren Gölge 127571423c Update multi-speaker init in BaseTTS 2021-10-18 08:54:41 +00:00
Eren Gölge a0a5d580e9 Approximate audio length from file size 2021-10-18 08:54:02 +00:00
Eren Gölge fcbfc53cb7 Fix linter 2021-10-15 10:24:19 +00:00
Eren Gölge 073a2d2eb0 Refactor VITS multi-speaker initialization 2021-10-15 10:20:00 +00:00
Eren Gölge 0565457faa Fix #846 2021-10-14 14:46:14 +00:00
Eren Gölge 4dbe7ed0de Fix all-zero duration case for GlowTTS 2021-10-01 09:24:26 +00:00
Eren Gölge 37959ad0c7 Make linter 2021-09-30 23:02:16 +00:00
Eren Gölge 4163b4f2e4 Update Tacotron models 2021-09-30 14:47:56 +00:00
Eren Gölge 45889804c2 Update VITS 2021-09-30 14:47:56 +00:00
Eren Gölge fd95926009 Update GlowTTS 2021-09-30 14:47:56 +00:00
Eren Gölge a156a40b47 Update ForwardTTS for Trainer_v2 2021-09-30 14:19:19 +00:00
Eren Gölge d9df33f837 Update `align_tts` for trainer_v2 2021-09-30 14:18:10 +00:00
Eren Gölge 8ada870a57 Refactor `trainer.py` for v2 2021-09-30 14:16:34 +00:00
Eren Gölge 2766dd1d6e
Fix #813 - GlowTTS training (#814)
* Fix #813

* Update glow_tts recipe

* Fix glow-tts test

* Linter fix

* Run data dep init only in training
2021-09-17 20:06:55 +02:00
Eren Gölge cbbc9e0172 Add FastSpeechConfig 2021-09-11 10:20:37 +00:00
Eren Gölge d97952611d Remove unused import 2021-09-10 17:31:41 +00:00
Eren Gölge d5f256b34c Update tacotron `r` init 2021-09-10 17:26:23 +00:00
Eren Gölge ab37fa9c39 Edit AlignTTS 2021-09-10 17:25:00 +00:00
Eren Gölge 66732025e1 Add `base_model` field to `forward_tts` configs 2021-09-10 17:23:48 +00:00
Eren Gölge a89eb12aca Fix glow_tts imports 2021-09-10 08:29:51 +00:00
Eren Gölge 0541a25e90 Remove `fastpitch.py` and `speedy_speech.py` 2021-09-10 08:27:48 +00:00
Eren Gölge 3c16013199 Fix Vits imports 2021-09-10 08:26:34 +00:00
Eren Gölge 8b7e094bde Implement `forward_tts`
- Generic API for feed-forward TTS models (FastPitch, SpeedySpeech)

- Tests for `forward-tts`

- Edit  FastPitchConfig and SpeedySpeechConfig to use `forward_tts`
2021-09-10 08:24:33 +00:00
Eren Gölge bfc6ceac29 Move MAS to `TTS.tts.utils.helpers` 2021-09-09 10:57:19 +00:00
Eren Gölge 4761853c5c Fix imports 2021-09-08 13:34:40 +00:00
Eren Gölge c1513ec4cd Plot pitch over spectrogram 2021-09-06 15:16:58 +00:00
Eren Gölge d847a68e42 Reformat multi-speaker handling in GlowTTS 2021-09-06 15:16:58 +00:00
Eren Gölge 8d41060d36 Plot unnormalized pitch by `FastPitch` 2021-09-06 15:16:58 +00:00
Eren Gölge 2b59da802c Fix loader setup in `base_tts` 2021-09-06 15:16:58 +00:00
Eren Gölge 2bf9e83c49 FastPitch refactor and commenting 2021-09-06 15:16:58 +00:00
Eren Gölge 648655fa03 Add `PitchExtractor` and return dict by `collate` 2021-09-06 15:16:58 +00:00
Eren Gölge 59d52a4cd8 Disable autcast for criterions 2021-09-06 15:16:58 +00:00
Eren Gölge 98a7271ce8 Refactor FastPitchv2 2021-09-06 15:16:58 +00:00
Eren Gölge e429afbce4 Enable aligner for FastPitch 2021-09-06 15:16:58 +00:00
Eren Gölge 81c228a2d8 Update FastPitch don't detach duration network inputs 2021-09-06 15:16:58 +00:00
Eren Gölge ca29033ef4 Refactor FastPitch model 2021-09-06 15:16:58 +00:00
Eren Gölge 5d59100a88 Don't use align_score for models with duration predictor 2021-09-06 15:16:58 +00:00
Eren Gölge b7caad39e0 Make optional to detach duration predictor input 2021-09-06 15:16:58 +00:00
Eren Gölge bc396c393f Add FastPitch model and FastPitchconfig 2021-09-06 15:16:58 +00:00
Eren Gölge e802b24ad0 Compute mean and std pitch 2021-09-06 15:16:58 +00:00
Eren Gölge d085642ac1 Cache pitch features
Cache the features at the beginning of `BaseTTS` training.
2021-09-06 15:16:58 +00:00
Eren Gölge 7590c7db7a Fix `base_tacotron` `aux_input` handling 2021-09-06 15:16:58 +00:00
Eren Gölge 994f2be2c1 Add comput_f0 field 2021-09-06 15:16:58 +00:00
Eren Gölge 2b7e55f01f Fix vits args types 2021-08-30 23:24:20 +00:00
Eren Gölge 18da8f5dbd Update pylint 2.10.2 and fix lint issues 2021-08-30 08:10:35 +00:00
Eren Gölge f186856e5d Add option to sort input sequnce by audio len 2021-08-30 08:10:35 +00:00
Eren Gölge 2620f62ea8 Move duration_loss inside VitsGeneratorLoss 2021-08-27 07:07:07 +00:00
Eren Gölge 49e1181ea4 Fixes for the vits model 2021-08-26 17:15:09 +00:00
Eren Gölge 3ab8cef99e Fix VITS model SPD 2021-08-18 14:55:46 +00:00
Eren Gölge 7c0d564965 Syncronize DDP processes 2021-08-13 10:40:50 +00:00
Eren Gölge ecf5f17dca Fix distribute.py and ddp training 2021-08-12 22:22:32 +00:00
Eren Gölge c8b9ca3d71 Fix Tacotron num_char init 2021-08-10 08:56:34 +00:00
Eren Gölge 6af03ac476 Fix `num_char` init in Tacotron models 2021-08-09 21:46:15 +00:00
Eren Gölge 06018251e6 Add VITS and GlowTTS class docs 🗒️ 2021-08-09 18:02:36 +00:00
Eren Gölge f7a72552f1 Make duration predictor dropout configurable 2021-08-09 18:02:36 +00:00
Eren Gölge c312acac7d Implement VITS model 🚀
VITS model implementation built on Glow TTS and HiFiGAN
layers.
2021-08-09 18:02:36 +00:00
Eren Gölge 232a5abb6a Update `tts.setup_model`
Run `model.make_symbols()` if availabe to set the symbol list
2021-08-09 18:02:36 +00:00
Eren Gölge e4648ffef1 Fix multi-speaker init of Tacotron models & tests 2021-08-09 18:02:36 +00:00
Eren Gölge 01324c8e70 Update `base_tts.py`
Enable calling `make_symbols()` from the model if defined.
Compatibility changes for end2end `tts` models in batch formatting.
Changes in multi-speaker initialization.
Modify `test_run()` to work with dict output iof `synthesis`
2021-08-09 18:02:36 +00:00
Agrin Hilmkil ced4cfdbbf Allow saving / loading checkpoints from cloud paths (#683)
* Allow saving / loading checkpoints from cloud paths

Allows saving and loading checkpoints directly from cloud paths like
Amazon S3 (s3://) and Google Cloud Storage (gs://) by using fsspec.

Note: The user will have to install the relevant dependency for each
protocol. Otherwise fsspec will fail and specify which dependency is
missing.

* Append suffix _fsspec to save/load function names

* Add a lower bound to the fsspec dependency

Skips the 0 major version.

* Add missing changes from refactor

* Use fsspec for remaining artifacts

* Add test case with path requiring fsspec

* Avoid writing logs to file unless output_path is local

* Document the possibility of using paths supported by fsspec

* Fix style and lint

* Add missing lint fixes

* Add type annotations to new functions

* Use Coqpit method for converting config to dict

* Fix type annotation in semi-new function

* Add return type for load_fsspec

* Fix bug where fs not always created

* Restore the experiment removal functionality
2021-08-09 18:02:36 +00:00
Eren Gölge d9e18e009b Skip phoneme cache pre-compute if the path exists 2021-08-09 18:02:36 +00:00
Eren Gölge fc0c4600bd Fix stopnet training 2021-07-24 11:39:54 +02:00
WeberJulian 25832eb97b Changes for review 2021-07-15 11:38:45 +02:00
WeberJulian c79a82ed07 refix linter 2021-07-13 23:12:18 +02:00
WeberJulian 7d92b30946 Fix tests 2021-07-13 23:00:34 +02:00
WeberJulian 32974dd6a9 Fix test sentences synthesis 2021-07-13 16:07:13 +02:00
eren golge 3c0454490f Fix #616 2021-07-06 14:44:03 +02:00
Eren Gölge f382e4c700 Fix linter warnings 2021-07-03 13:30:24 +02:00
Eren Gölge 196876feb1 Fix `ModelManager` model download 2021-07-02 10:47:05 +02:00
Eren Gölge 9352cb4136 Format Align TTS docstrings 2021-07-02 10:45:58 +02:00
Eren Gölge 95ad72f38f Fix glow tts initialization 2021-07-02 10:45:37 +02:00
Eren Gölge 40b0b5365e Let `get_characters` return `num_chars` 2021-07-02 10:45:00 +02:00
Eren Gölge 2e1a428b83 Update glowtts docstrings and docs 2021-06-30 14:30:55 +02:00
Eren Gölge 9790eddada Fix wrong argument name 🛠️ 2021-06-28 17:03:47 +02:00
Eren Gölge 51005cdab4 Update `tts.models.setup_model` 2021-06-28 17:03:19 +02:00
Eren Gölge 7b8c15ac49 Create base 🐸TTS model abstraction for tts models 2021-06-28 17:03:19 +02:00
Eren Gölge c7aad884cd Implement unified trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 6d7b5fbcde `tts` model abstraction with `TTSModel` 2021-06-28 17:03:19 +02:00
Eren Gölge 00c82c516d rename to 2021-06-28 17:03:19 +02:00
Eren Gölge 25238e0658 fix glow-tts `inference()` 2021-06-28 17:03:19 +02:00
Eren Gölge 419735f440 refactor and fix multi-speaker training in Trainer and Tacotron models 2021-06-28 17:03:19 +02:00
Eren Gölge 269e5a734e add max_decoder_steps argument to tacotron models 2021-06-28 17:03:19 +02:00
Eren Gölge db6a97d1a2 rename external speaker embedding arguments as `d_vectors` 2021-06-28 17:03:19 +02:00
Eren Gölge f82f1970b8 change `to(device)` to `type_as` in models 2021-06-28 17:03:19 +02:00
Eren Gölge 1fa15c195a docstring fix 2021-06-28 17:03:19 +02:00
Eren Gölge 1c8a3d7c86 make style 2021-06-28 17:03:19 +02:00
Eren Gölge b22b7620c3 update glow-tts output shapes to match [B, T, C] 2021-06-28 17:03:19 +02:00
Eren Gölge 8381379938 formating `cond_input` with a function in Tacotron models 2021-06-28 17:03:19 +02:00
Eren Gölge 6c495c6a6e fix glow-tts inference and forward functions for handling `cond_input`
and refactor its test
2021-06-28 17:03:19 +02:00
Eren Gölge 421194880d linter fixes 2021-06-28 17:03:19 +02:00
Eren Gölge d96ebcd6d3 make style 2021-06-28 17:03:19 +02:00
Eren Gölge b500338faa make style 2021-06-28 17:03:19 +02:00
Eren Gölge bb355b7441 update align_tts.py model for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge c70d0c9dae update `speedy_speech.py` model for trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 4e910993f1 update tacotron model to return `model_outputs` 2021-06-28 17:03:19 +02:00
Eren Gölge bb4deee64c update glow-tts for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 9134c7dfb6 update `sequence_mask` import globally 2021-06-28 17:03:19 +02:00
Eren Gölge 535a458f40 update Tacotron models for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge bdbfc95618 add `gradual_training` argument to tacotron.py 2021-06-28 17:03:19 +02:00
Eren Gölge 5a2e75f0ee import missings for tacotron.py 2021-06-28 17:03:19 +02:00
Eren Gölge da7d10e53c mode `setup_model()` to `models/__init__.py` 2021-06-28 17:03:19 +02:00
Alexander Korolev c1eb9bdcca
fix speaker dim inference 2021-06-01 15:15:26 +02:00
Alexander Korolev 5b89ef2c6e
fix speaker-embeddings dimension during inference 2021-06-01 11:06:35 +02:00
Eren Gölge c57f0b46bb reintro use_gst for backwars compat 2021-05-11 11:29:18 +02:00
Eren Gölge 05d9543ed8 init GST module using gst config in Tacotron models 2021-05-11 11:29:17 +02:00
Eren Gölge a21c0b5585 config update 2 WIP 2021-05-11 11:28:35 +02:00
Eren Gölge f7582107da
Merge pull request #453 from Edresson/dev
Script for spectrogram extraction using teacher forcing and Glow-TTS inference with MAS.
2021-05-06 17:53:28 +02:00
Eren Gölge 8cb27267a4 formatting 2021-05-03 14:26:35 +02:00
Edresson 8228091f92 add script for extraction of tts spectrograms 2021-04-23 14:17:46 -03:00
Eren Gölge d42748082a update argument name external_speaker_embedding_dim -> speaker_embedding_dim
add inference_noise_scale argument to glow-tts
2021-04-23 18:04:37 +02:00
Eren Gölge c955a12428 set the default layer size compatible with scglow 2021-04-23 18:04:37 +02:00
Eren Gölge 9cc17be53a formatting and a small bug fix in Tacotron model 2021-04-15 16:36:51 +02:00
Eren Gölge 3de5a89154 optionally enable prenet dropout at inference time for tacotron models 2021-04-13 13:24:56 +02:00
Eren Gölge b735076bb4 linter fixes 2021-04-12 13:14:11 +02:00
Eren Gölge a7f6045644 Merge branch 'reformat' into hifigan-reformat 2021-04-12 12:00:17 +02:00
Eren Gölge f519012dea reformatting and styling 2021-04-12 11:47:39 +02:00
Eren Gölge e5b9607bc3 isort all imports 2021-04-09 00:45:20 +02:00
Eren Gölge 0e79fa86ad format with black and pylint 2.7.3 2021-04-09 00:38:08 +02:00
Eren Gölge a3a840fd78 linter fixes 2021-03-30 14:39:16 +02:00
Eren Gölge 6b2e13bf62 compute normalized logp using torch primitives 2021-03-30 14:39:16 +02:00
Eren Gölge 7a382a5c2b stowed aligntts commit and small refactoring with feed_forward layers 2021-03-30 14:39:16 +02:00
Eren Gölge aec0b78aff duration predictor fix 2 2021-03-30 14:39:16 +02:00
Eren Gölge 07269e639b fix duration predictor in AlignTTS 2021-03-30 14:39:16 +02:00