Commit Graph

373 Commits

Author SHA1 Message Date
Edresson d91c595c5a Implement training support with d_vecs in the VITS model 2021-12-20 11:54:09 +00:00
Edresson e0ad838066 Select randomly a speaker from the speaker manager for the test setences 2021-12-20 11:54:09 +00:00
Edresson eb3e8affe1 Save speakers embeddings/ids before starting training 2021-12-20 11:54:09 +00:00
Eren Gölge 2ed9e3c241 Fix constant use of noise augment 2021-11-08 09:20:34 +01:00
Eren Gölge 2df0752e73
Model zoo tests (#900)
* Fix VITS model multi-speaker init

* Remove gdrive support in model manager

* Add model zoo tests
2021-10-29 17:54:16 +02:00
Eren Gölge 00becf2671 Fix import statements 2021-10-25 19:29:16 +02:00
Eren Gölge 2b7d159383 Update BaseTTS for multi-speaker training 2021-10-21 16:29:06 +00:00
Eren Gölge 82fed4add2 Make style 2021-10-21 16:05:51 +00:00
Eren Gölge cea8e1739b Update AlignTTS to use SpeakerManager 2021-10-20 18:22:41 +00:00
Eren Gölge 0e768dd4c5 Update comments 2021-10-20 18:21:26 +00:00
Eren Gölge 7c2cb7cc30 Update BaseTTS 2021-10-20 18:18:22 +00:00
Eren Gölge 330ee7d208 Comment BaseTacotron and remove unused funcs 2021-10-20 18:17:25 +00:00
Eren Gölge aa25f70b95 Update ForwardTTS for multi-speaker 2021-10-20 18:16:41 +00:00
Eren Gölge 0ebc2a400e Implement `_set_speaker_embedding` in GlowTTS 2021-10-20 18:15:20 +00:00
Eren Gölge 3da79a4de4 Comment Tacotron2 model 2021-10-20 18:14:04 +00:00
Eren Gölge c514351c0e Refactor multi-speaker init in BaseTTS-Tacotron1-2 2021-10-18 08:55:45 +00:00
Eren Gölge 127571423c Update multi-speaker init in BaseTTS 2021-10-18 08:54:41 +00:00
Eren Gölge a0a5d580e9 Approximate audio length from file size 2021-10-18 08:54:02 +00:00
Eren Gölge fcbfc53cb7 Fix linter 2021-10-15 10:24:19 +00:00
Eren Gölge 073a2d2eb0 Refactor VITS multi-speaker initialization 2021-10-15 10:20:00 +00:00
Eren Gölge 0565457faa Fix #846 2021-10-14 14:46:14 +00:00
Eren Gölge 4dbe7ed0de Fix all-zero duration case for GlowTTS 2021-10-01 09:24:26 +00:00
Eren Gölge 37959ad0c7 Make linter 2021-09-30 23:02:16 +00:00
Eren Gölge 4163b4f2e4 Update Tacotron models 2021-09-30 14:47:56 +00:00
Eren Gölge 45889804c2 Update VITS 2021-09-30 14:47:56 +00:00
Eren Gölge fd95926009 Update GlowTTS 2021-09-30 14:47:56 +00:00
Eren Gölge a156a40b47 Update ForwardTTS for Trainer_v2 2021-09-30 14:19:19 +00:00
Eren Gölge d9df33f837 Update `align_tts` for trainer_v2 2021-09-30 14:18:10 +00:00
Eren Gölge 8ada870a57 Refactor `trainer.py` for v2 2021-09-30 14:16:34 +00:00
Eren Gölge 2766dd1d6e
Fix #813 - GlowTTS training (#814)
* Fix #813

* Update glow_tts recipe

* Fix glow-tts test

* Linter fix

* Run data dep init only in training
2021-09-17 20:06:55 +02:00
Eren Gölge cbbc9e0172 Add FastSpeechConfig 2021-09-11 10:20:37 +00:00
Eren Gölge d97952611d Remove unused import 2021-09-10 17:31:41 +00:00
Eren Gölge d5f256b34c Update tacotron `r` init 2021-09-10 17:26:23 +00:00
Eren Gölge ab37fa9c39 Edit AlignTTS 2021-09-10 17:25:00 +00:00
Eren Gölge 66732025e1 Add `base_model` field to `forward_tts` configs 2021-09-10 17:23:48 +00:00
Eren Gölge a89eb12aca Fix glow_tts imports 2021-09-10 08:29:51 +00:00
Eren Gölge 0541a25e90 Remove `fastpitch.py` and `speedy_speech.py` 2021-09-10 08:27:48 +00:00
Eren Gölge 3c16013199 Fix Vits imports 2021-09-10 08:26:34 +00:00
Eren Gölge 8b7e094bde Implement `forward_tts`
- Generic API for feed-forward TTS models (FastPitch, SpeedySpeech)

- Tests for `forward-tts`

- Edit  FastPitchConfig and SpeedySpeechConfig to use `forward_tts`
2021-09-10 08:24:33 +00:00
Eren Gölge bfc6ceac29 Move MAS to `TTS.tts.utils.helpers` 2021-09-09 10:57:19 +00:00
Eren Gölge 4761853c5c Fix imports 2021-09-08 13:34:40 +00:00
Eren Gölge c1513ec4cd Plot pitch over spectrogram 2021-09-06 15:16:58 +00:00
Eren Gölge d847a68e42 Reformat multi-speaker handling in GlowTTS 2021-09-06 15:16:58 +00:00
Eren Gölge 8d41060d36 Plot unnormalized pitch by `FastPitch` 2021-09-06 15:16:58 +00:00
Eren Gölge 2b59da802c Fix loader setup in `base_tts` 2021-09-06 15:16:58 +00:00
Eren Gölge 2bf9e83c49 FastPitch refactor and commenting 2021-09-06 15:16:58 +00:00
Eren Gölge 648655fa03 Add `PitchExtractor` and return dict by `collate` 2021-09-06 15:16:58 +00:00
Eren Gölge 59d52a4cd8 Disable autcast for criterions 2021-09-06 15:16:58 +00:00
Eren Gölge 98a7271ce8 Refactor FastPitchv2 2021-09-06 15:16:58 +00:00
Eren Gölge e429afbce4 Enable aligner for FastPitch 2021-09-06 15:16:58 +00:00
Eren Gölge 81c228a2d8 Update FastPitch don't detach duration network inputs 2021-09-06 15:16:58 +00:00
Eren Gölge ca29033ef4 Refactor FastPitch model 2021-09-06 15:16:58 +00:00
Eren Gölge 5d59100a88 Don't use align_score for models with duration predictor 2021-09-06 15:16:58 +00:00
Eren Gölge b7caad39e0 Make optional to detach duration predictor input 2021-09-06 15:16:58 +00:00
Eren Gölge bc396c393f Add FastPitch model and FastPitchconfig 2021-09-06 15:16:58 +00:00
Eren Gölge e802b24ad0 Compute mean and std pitch 2021-09-06 15:16:58 +00:00
Eren Gölge d085642ac1 Cache pitch features
Cache the features at the beginning of `BaseTTS` training.
2021-09-06 15:16:58 +00:00
Eren Gölge 7590c7db7a Fix `base_tacotron` `aux_input` handling 2021-09-06 15:16:58 +00:00
Eren Gölge 994f2be2c1 Add comput_f0 field 2021-09-06 15:16:58 +00:00
Eren Gölge 2b7e55f01f Fix vits args types 2021-08-30 23:24:20 +00:00
Eren Gölge 18da8f5dbd Update pylint 2.10.2 and fix lint issues 2021-08-30 08:10:35 +00:00
Eren Gölge f186856e5d Add option to sort input sequnce by audio len 2021-08-30 08:10:35 +00:00
Eren Gölge 2620f62ea8 Move duration_loss inside VitsGeneratorLoss 2021-08-27 07:07:07 +00:00
Eren Gölge 49e1181ea4 Fixes for the vits model 2021-08-26 17:15:09 +00:00
Eren Gölge 3ab8cef99e Fix VITS model SPD 2021-08-18 14:55:46 +00:00
Eren Gölge 7c0d564965 Syncronize DDP processes 2021-08-13 10:40:50 +00:00
Eren Gölge ecf5f17dca Fix distribute.py and ddp training 2021-08-12 22:22:32 +00:00
Eren Gölge c8b9ca3d71 Fix Tacotron num_char init 2021-08-10 08:56:34 +00:00
Eren Gölge 6af03ac476 Fix `num_char` init in Tacotron models 2021-08-09 21:46:15 +00:00
Eren Gölge 06018251e6 Add VITS and GlowTTS class docs 🗒️ 2021-08-09 18:02:36 +00:00
Eren Gölge f7a72552f1 Make duration predictor dropout configurable 2021-08-09 18:02:36 +00:00
Eren Gölge c312acac7d Implement VITS model 🚀
VITS model implementation built on Glow TTS and HiFiGAN
layers.
2021-08-09 18:02:36 +00:00
Eren Gölge 232a5abb6a Update `tts.setup_model`
Run `model.make_symbols()` if availabe to set the symbol list
2021-08-09 18:02:36 +00:00
Eren Gölge e4648ffef1 Fix multi-speaker init of Tacotron models & tests 2021-08-09 18:02:36 +00:00
Eren Gölge 01324c8e70 Update `base_tts.py`
Enable calling `make_symbols()` from the model if defined.
Compatibility changes for end2end `tts` models in batch formatting.
Changes in multi-speaker initialization.
Modify `test_run()` to work with dict output iof `synthesis`
2021-08-09 18:02:36 +00:00
Agrin Hilmkil ced4cfdbbf Allow saving / loading checkpoints from cloud paths (#683)
* Allow saving / loading checkpoints from cloud paths

Allows saving and loading checkpoints directly from cloud paths like
Amazon S3 (s3://) and Google Cloud Storage (gs://) by using fsspec.

Note: The user will have to install the relevant dependency for each
protocol. Otherwise fsspec will fail and specify which dependency is
missing.

* Append suffix _fsspec to save/load function names

* Add a lower bound to the fsspec dependency

Skips the 0 major version.

* Add missing changes from refactor

* Use fsspec for remaining artifacts

* Add test case with path requiring fsspec

* Avoid writing logs to file unless output_path is local

* Document the possibility of using paths supported by fsspec

* Fix style and lint

* Add missing lint fixes

* Add type annotations to new functions

* Use Coqpit method for converting config to dict

* Fix type annotation in semi-new function

* Add return type for load_fsspec

* Fix bug where fs not always created

* Restore the experiment removal functionality
2021-08-09 18:02:36 +00:00
Eren Gölge d9e18e009b Skip phoneme cache pre-compute if the path exists 2021-08-09 18:02:36 +00:00
Eren Gölge fc0c4600bd Fix stopnet training 2021-07-24 11:39:54 +02:00
WeberJulian 25832eb97b Changes for review 2021-07-15 11:38:45 +02:00
WeberJulian c79a82ed07 refix linter 2021-07-13 23:12:18 +02:00
WeberJulian 7d92b30946 Fix tests 2021-07-13 23:00:34 +02:00
WeberJulian 32974dd6a9 Fix test sentences synthesis 2021-07-13 16:07:13 +02:00
eren golge 3c0454490f Fix #616 2021-07-06 14:44:03 +02:00
Eren Gölge f382e4c700 Fix linter warnings 2021-07-03 13:30:24 +02:00
Eren Gölge 196876feb1 Fix `ModelManager` model download 2021-07-02 10:47:05 +02:00
Eren Gölge 9352cb4136 Format Align TTS docstrings 2021-07-02 10:45:58 +02:00
Eren Gölge 95ad72f38f Fix glow tts initialization 2021-07-02 10:45:37 +02:00
Eren Gölge 40b0b5365e Let `get_characters` return `num_chars` 2021-07-02 10:45:00 +02:00
Eren Gölge 2e1a428b83 Update glowtts docstrings and docs 2021-06-30 14:30:55 +02:00
Eren Gölge 9790eddada Fix wrong argument name 🛠️ 2021-06-28 17:03:47 +02:00
Eren Gölge 51005cdab4 Update `tts.models.setup_model` 2021-06-28 17:03:19 +02:00
Eren Gölge 7b8c15ac49 Create base 🐸TTS model abstraction for tts models 2021-06-28 17:03:19 +02:00
Eren Gölge c7aad884cd Implement unified trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 6d7b5fbcde `tts` model abstraction with `TTSModel` 2021-06-28 17:03:19 +02:00
Eren Gölge 00c82c516d rename to 2021-06-28 17:03:19 +02:00
Eren Gölge 25238e0658 fix glow-tts `inference()` 2021-06-28 17:03:19 +02:00
Eren Gölge 419735f440 refactor and fix multi-speaker training in Trainer and Tacotron models 2021-06-28 17:03:19 +02:00
Eren Gölge 269e5a734e add max_decoder_steps argument to tacotron models 2021-06-28 17:03:19 +02:00
Eren Gölge db6a97d1a2 rename external speaker embedding arguments as `d_vectors` 2021-06-28 17:03:19 +02:00
Eren Gölge f82f1970b8 change `to(device)` to `type_as` in models 2021-06-28 17:03:19 +02:00
Eren Gölge 1fa15c195a docstring fix 2021-06-28 17:03:19 +02:00
Eren Gölge 1c8a3d7c86 make style 2021-06-28 17:03:19 +02:00
Eren Gölge b22b7620c3 update glow-tts output shapes to match [B, T, C] 2021-06-28 17:03:19 +02:00
Eren Gölge 8381379938 formating `cond_input` with a function in Tacotron models 2021-06-28 17:03:19 +02:00
Eren Gölge 6c495c6a6e fix glow-tts inference and forward functions for handling `cond_input`
and refactor its test
2021-06-28 17:03:19 +02:00
Eren Gölge 421194880d linter fixes 2021-06-28 17:03:19 +02:00
Eren Gölge d96ebcd6d3 make style 2021-06-28 17:03:19 +02:00
Eren Gölge b500338faa make style 2021-06-28 17:03:19 +02:00
Eren Gölge bb355b7441 update align_tts.py model for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge c70d0c9dae update `speedy_speech.py` model for trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 4e910993f1 update tacotron model to return `model_outputs` 2021-06-28 17:03:19 +02:00
Eren Gölge bb4deee64c update glow-tts for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 9134c7dfb6 update `sequence_mask` import globally 2021-06-28 17:03:19 +02:00
Eren Gölge 535a458f40 update Tacotron models for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge bdbfc95618 add `gradual_training` argument to tacotron.py 2021-06-28 17:03:19 +02:00
Eren Gölge 5a2e75f0ee import missings for tacotron.py 2021-06-28 17:03:19 +02:00
Eren Gölge da7d10e53c mode `setup_model()` to `models/__init__.py` 2021-06-28 17:03:19 +02:00
Alexander Korolev c1eb9bdcca
fix speaker dim inference 2021-06-01 15:15:26 +02:00
Alexander Korolev 5b89ef2c6e
fix speaker-embeddings dimension during inference 2021-06-01 11:06:35 +02:00
Eren Gölge c57f0b46bb reintro use_gst for backwars compat 2021-05-11 11:29:18 +02:00
Eren Gölge 05d9543ed8 init GST module using gst config in Tacotron models 2021-05-11 11:29:17 +02:00
Eren Gölge a21c0b5585 config update 2 WIP 2021-05-11 11:28:35 +02:00
Eren Gölge f7582107da
Merge pull request #453 from Edresson/dev
Script for spectrogram extraction using teacher forcing and Glow-TTS inference with MAS.
2021-05-06 17:53:28 +02:00
Eren Gölge 8cb27267a4 formatting 2021-05-03 14:26:35 +02:00
Edresson 8228091f92 add script for extraction of tts spectrograms 2021-04-23 14:17:46 -03:00
Eren Gölge d42748082a update argument name external_speaker_embedding_dim -> speaker_embedding_dim
add inference_noise_scale argument to glow-tts
2021-04-23 18:04:37 +02:00
Eren Gölge c955a12428 set the default layer size compatible with scglow 2021-04-23 18:04:37 +02:00
Eren Gölge 9cc17be53a formatting and a small bug fix in Tacotron model 2021-04-15 16:36:51 +02:00
Eren Gölge 3de5a89154 optionally enable prenet dropout at inference time for tacotron models 2021-04-13 13:24:56 +02:00
Eren Gölge b735076bb4 linter fixes 2021-04-12 13:14:11 +02:00
Eren Gölge a7f6045644 Merge branch 'reformat' into hifigan-reformat 2021-04-12 12:00:17 +02:00
Eren Gölge f519012dea reformatting and styling 2021-04-12 11:47:39 +02:00
Eren Gölge e5b9607bc3 isort all imports 2021-04-09 00:45:20 +02:00
Eren Gölge 0e79fa86ad format with black and pylint 2.7.3 2021-04-09 00:38:08 +02:00
Eren Gölge a3a840fd78 linter fixes 2021-03-30 14:39:16 +02:00
Eren Gölge 6b2e13bf62 compute normalized logp using torch primitives 2021-03-30 14:39:16 +02:00
Eren Gölge 7a382a5c2b stowed aligntts commit and small refactoring with feed_forward layers 2021-03-30 14:39:16 +02:00
Eren Gölge aec0b78aff duration predictor fix 2 2021-03-30 14:39:16 +02:00
Eren Gölge 07269e639b fix duration predictor in AlignTTS 2021-03-30 14:39:16 +02:00
Eren Gölge 2b3e12ea49 correct imports after refactoring, add AlignTTS (old SSMAS) and some formatting 2021-03-30 14:39:16 +02:00
Eren Gölge ecb6b0d6ad rename GlowTtts as GlowTTS 2021-03-30 14:39:16 +02:00
Eren Gölge 0514330869 fix mozilla/TTS#685 2021-03-18 13:33:23 +01:00
Eren Gölge 5c657715f2 fix #382 2021-03-16 17:31:48 +01:00
Eren Gölge 9a48ba3821 a ton of linter updates 2021-03-08 05:06:54 +01:00
Eren Gölge c990b3a59c linter fixes and test fixes 2021-01-22 02:32:35 +01:00
root 1faf565e3a add load_checkpoint func to tts models 2021-01-20 02:10:56 +00:00
erogol 428c224b88 commet update 2021-01-12 17:31:04 +01:00
erogol 79c841ccd3 mass refactoring and update 2021-01-11 17:26:58 +01:00
erogol d382d759b3 small fixes and test fixes 2021-01-08 15:48:40 +01:00
erogol a6259041d3 docstring for speedyspeech 2021-01-07 14:35:22 +01:00
erogol 14d33662ea input shapes for tacotron models 2021-01-06 13:19:40 +01:00
erogol f288e9a260 docstrings for taoctron models 2021-01-06 13:19:40 +01:00
erogol 7586fbc4de SS refactoring 2021-01-06 13:19:40 +01:00
erogol e82d31b6ac glow ttss refactoring 2021-01-06 13:19:40 +01:00
erogol aa40fe1aa0 SS model refacotring for multi speaker 2021-01-06 13:19:40 +01:00
erogol eb555855e4 small fixes 2021-01-06 13:19:40 +01:00
erogol ac5c9217d1 positional encoding masking for SS 2021-01-06 13:19:40 +01:00
erogol fede46e96e pylint and test fixes 2021-01-06 13:19:40 +01:00
erogol cf869e8922 add SS files 2021-01-06 13:19:40 +01:00
erogol fa6907fa0e update glow-tts parameters and fix rel-attn-win size 2021-01-06 13:19:40 +01:00
erogol 7b20d8cbd3 implement residual BN convolution and add it as an alternative encoder for glow-tts. also generic layers to layers/generic 2021-01-06 13:19:40 +01:00
erogol 7c3cdced1a make speaker_mapping a global variable to prevent reload. Fix glow-tts training 2020-12-01 03:23:25 +01:00
Edresson 07345099ee GlowTTS zero-shot TTS Support 2020-10-24 15:58:39 -03:00
Edresson b7f9ebd32b add check arguments for GlowTTS and multispeaker training bug fix 2020-10-19 17:17:58 -03:00
Edresson 99d5a0ac07 add Speaker Conditional GST support 2020-09-29 16:09:27 -03:00
erogol 10258724d1 linter fixes 2020-09-22 03:54:16 +02:00
erogol e0b9fa887f glow-tts modules added 2020-09-21 14:15:40 +02:00
erogol c008003506 do not check sample rate as loading stats file for normalization to enable interpolation for different sample rate vocoder 2020-09-18 12:52:19 +02:00
erogol 3660c57f1e time seperable convolution encoder, huber loss for duration predictor 2020-09-17 03:10:58 +02:00
erogol f1a75468c2 fix arguments 2020-09-12 04:00:25 +02:00
erogol 45fbc0d003 convolution encoder with GLU and res connections 2020-09-12 03:40:21 +02:00
erogol 15e6ab3912 glow-tts module renaming updates 2020-09-12 03:33:36 +02:00
erogol df19428ec6 rename the project to old TTS 2020-09-09 12:27:23 +02:00