Commit Graph

1549 Commits

Author SHA1 Message Date
Eren Gölge b81560607b Update docstrings 2021-09-06 15:16:58 +00:00
Eren Gölge 57b3aec1b9 Update docstring format 2021-09-06 15:16:58 +00:00
Eren Gölge 7692bfe7f8 Update FastPitch config 2021-09-06 15:16:58 +00:00
Eren Gölge 8584f2b82d Update docstring format 2021-09-06 15:16:58 +00:00
Eren Gölge b7caad39e0 Make optional to detach duration predictor input 2021-09-06 15:16:58 +00:00
Eren Gölge 9af42f7886 Restore `last_epoch` of the scheduler 2021-09-06 15:16:58 +00:00
Eren Gölge aacbb3ed77 Fix SpeakerManager usage in `synthesize.py` 2021-09-06 15:16:58 +00:00
Eren Gölge 545a00fc04 Use absolute paths of the attention masks 2021-09-06 15:16:58 +00:00
Eren Gölge bc396c393f Add FastPitch model and FastPitchconfig 2021-09-06 15:16:58 +00:00
Eren Gölge 5a6ffaee08 Add yin based pitch computation 2021-09-06 15:16:58 +00:00
Eren Gölge e802b24ad0 Compute mean and std pitch 2021-09-06 15:16:58 +00:00
Eren Gölge 8fffd4e813 Don't print computed phonemes
It causes noise in logs
2021-09-06 15:16:58 +00:00
Eren Gölge d085642ac1 Cache pitch features
Cache the features at the beginning of `BaseTTS` training.
2021-09-06 15:16:58 +00:00
Eren Gölge 7590c7db7a Fix `base_tacotron` `aux_input` handling 2021-09-06 15:16:58 +00:00
Eren Gölge db32162eae Fix `FastPitchLoss` 2021-09-06 15:16:58 +00:00
Eren Gölge 94e8e0d416 Fix configs 2021-09-06 15:16:58 +00:00
Eren Gölge 0f19f8c911 Fix `compute_attention_masks.py` 2021-09-06 15:16:58 +00:00
Eren Gölge 994f2be2c1 Add comput_f0 field 2021-09-06 15:16:58 +00:00
Eren Gölge c8d999b010 Add FastPitchLoss 2021-09-06 15:16:58 +00:00
Eren Gölge fba257104d Compute F0 using librosa 2021-09-06 15:16:58 +00:00
Katsuya Iida 165e5814af
Update Japanese phonemizer (#758)
* Update default ja vocoder

* update

* Japanese phonemizer test

* Run make style

Co-authored-by: Eren Gölge <egolge@coqui.ai>
2021-09-01 09:33:15 +02:00
Eren Gölge 2b7e55f01f Fix vits args types 2021-08-30 23:24:20 +00:00
Eren Gölge b910a6ddce Bump up to v0.2.1 2021-08-30 16:31:24 +00:00
Eren Gölge d16da949a5 Merge branch 'fix_distribute' into dev 2021-08-30 16:31:07 +00:00
Eren Gölge 6782d3eab7 Fix linter issues ofr p3.6 2021-08-30 16:18:33 +00:00
Eren Gölge 738eee0cf9 Fix style 2021-08-30 13:12:13 +00:00
Eren Gölge 5255e089e6 Fix #767 2021-08-30 13:10:08 +00:00
Eren Gölge c560114324 Fix #750 2021-08-30 13:06:50 +00:00
Eren Gölge 18b2e41e5a Use `coqui_tts` as the default run name 2021-08-30 12:56:47 +00:00
Eren Gölge 9c86f1ac68 Fix usage of abstract class in vocoders 2021-08-30 08:10:35 +00:00
Eren Gölge 18da8f5dbd Update pylint 2.10.2 and fix lint issues 2021-08-30 08:10:35 +00:00
Eren Gölge f186856e5d Add option to sort input sequnce by audio len 2021-08-30 08:10:35 +00:00
Eren Gölge 2620f62ea8 Move duration_loss inside VitsGeneratorLoss 2021-08-27 07:07:07 +00:00
Eren Gölge 1692b8e4d9
Merge pull request #726 from fijipants/patch-1
Fix bug with log_func
2021-08-26 22:11:29 +02:00
Eren Gölge 49e1181ea4 Fixes for the vits model 2021-08-26 17:15:09 +00:00
Eren Gölge 5911eec3b1 Small trainer refactoring
1. Use a single Gradscaler for all the optimizers
2. Save terminal logs to a file. In DDP mode, each worker creates `trainer_N_log.txt`.
3. Fixes to allow only the main worker (rank==0) writing to Tensorboard
4. Pass parameters owned by the target optimizer to the grad_clip_norm
2021-08-26 17:08:58 +00:00
fijipants e9e01b09b0 Fix bug with log_func 2021-08-18 19:59:51 -04:00
fijipants 8f57f8adfd Update synthesizer.py 2021-08-18 19:56:52 -04:00
Eren Gölge 3ab8cef99e Fix VITS model SPD 2021-08-18 14:55:46 +00:00
Eren Gölge c5d1dd9d1b Fix restoring best_loss
Keep the default value if model checkpoint has no `model_loss`
2021-08-17 12:12:36 +00:00
Eren Gölge c8bbcdfd07 Fix `test_run` for DDP 2021-08-13 19:39:02 +00:00
Eren Gölge 7c0d564965 Syncronize DDP processes 2021-08-13 10:40:50 +00:00
Eren Gölge ecf5f17dca Fix distribute.py and ddp training 2021-08-12 22:22:32 +00:00
Eren Gölge b02c4fe347 Bump up to v0.2.0 2021-08-11 08:15:39 +00:00
Eren Gölge 537bc8487a Print model count when listing modelsk 2021-08-10 16:25:11 +00:00
Eren Gölge 09ed8426e8 Add the models released with v0.2.0 2021-08-10 15:46:31 +00:00
Eren Gölge 39004484b9 Fix 🐛
Fix synthesizer multi-speaker init
Fix #712
2021-08-10 12:56:32 +00:00
Eren Gölge c8b9ca3d71 Fix Tacotron num_char init 2021-08-10 08:56:34 +00:00
Eren Gölge 7eb94f760b Remove Ruslan model 2021-08-09 21:48:36 +00:00
Eren Gölge 6af03ac476 Fix `num_char` init in Tacotron models 2021-08-09 21:46:15 +00:00
Ayush Chaurasia e685ddfca7 Update trainer.py 2021-08-09 18:37:46 +00:00
Ayush Chaurasia 28870f8df4 update docstring 2021-08-09 18:35:35 +00:00
Ayush Chaurasia 8a246cbb66 Update trainer.py 2021-08-09 18:35:08 +00:00
Ayush Chaurasia f3e9d61330 Refactor logging initialization 2021-08-09 18:35:08 +00:00
Ayush Chaurasia 79b74a989d Update: add_text 2021-08-09 18:34:38 +00:00
Ayush Chaurasia 9fcf48b760 Delete logger_base.py 2021-08-09 18:34:00 +00:00
Ayush Chaurasia 290972fd35 reformat 2021-08-09 18:34:00 +00:00
Ayush Chaurasia 936a47504d Update Logger API, recipes 2021-08-09 18:34:00 +00:00
Ayush Chaurasia f63cf46c55 Unified logger API 2021-08-09 18:34:00 +00:00
Ayush Chaurasia f4434da5a3 Update disabled structure 2021-08-09 18:31:16 +00:00
Ayush Chaurasia f606741dc4 Add artifacts logging , wandb args 2021-08-09 18:31:16 +00:00
Ayush Chaurasia f5e50ad502 WandbLogger 2021-08-09 18:27:06 +00:00
Eren Gölge 06018251e6 Add VITS and GlowTTS class docs 🗒️ 2021-08-09 18:02:36 +00:00
Eren Gölge 6a7275881d Add VitsConfig docstring 2021-08-09 18:02:36 +00:00
Eren Gölge f7a72552f1 Make duration predictor dropout configurable 2021-08-09 18:02:36 +00:00
Eren Gölge c312acac7d Implement VITS model 🚀
VITS model implementation built on Glow TTS and HiFiGAN
layers.
2021-08-09 18:02:36 +00:00
Eren Gölge 060e746e21 Add `do_amp_to_db` option 2021-08-09 18:02:36 +00:00
Eren Gölge e94c1f894d Simplify `console_logger` 2021-08-09 18:02:36 +00:00
Eren Gölge dd55960732 Update `synthesizer.py`
Fixes and changes for multi-speaker model init and custom symbols  made
by mode.make_symbols()
2021-08-09 18:02:36 +00:00
Eren Gölge 232a5abb6a Update `tts.setup_model`
Run `model.make_symbols()` if availabe to set the symbol list
2021-08-09 18:02:36 +00:00
Eren Gölge f5a6aa974f Modify `symbols.py` not to add _arpanet 2021-08-09 18:02:36 +00:00
Eren Gölge d4deb2716f Modify `get_optimizer` to accept a model argument 2021-08-09 18:02:36 +00:00
Eren Gölge 003e5579e8 Enable `custom_symbols` in text processing
Models can define their own custom symbols lists with custom
`make_symbols()`
2021-08-09 18:02:36 +00:00
Eren Gölge bd4e29b4dd Add `compute_linear_spec=False` to `BaseTTSConfig` 2021-08-09 18:02:36 +00:00
Eren Gölge 960a35a121 Add `scheduler_after_epoch` to `BaseTrainingConfig` 2021-08-09 18:02:36 +00:00
Eren Gölge e4648ffef1 Fix multi-speaker init of Tacotron models & tests 2021-08-09 18:02:36 +00:00
Eren Gölge 01324c8e70 Update `base_tts.py`
Enable calling `make_symbols()` from the model if defined.
Compatibility changes for end2end `tts` models in batch formatting.
Changes in multi-speaker initialization.
Modify `test_run()` to work with dict output iof `synthesis`
2021-08-09 18:02:36 +00:00
Eren Gölge bf562cf437 Update `trainer.py`
Fix multi-speaker initialization of models. Add changes for end2end`tts`
models.
2021-08-09 18:02:36 +00:00
Agrin Hilmkil ced4cfdbbf Allow saving / loading checkpoints from cloud paths (#683)
* Allow saving / loading checkpoints from cloud paths

Allows saving and loading checkpoints directly from cloud paths like
Amazon S3 (s3://) and Google Cloud Storage (gs://) by using fsspec.

Note: The user will have to install the relevant dependency for each
protocol. Otherwise fsspec will fail and specify which dependency is
missing.

* Append suffix _fsspec to save/load function names

* Add a lower bound to the fsspec dependency

Skips the 0 major version.

* Add missing changes from refactor

* Use fsspec for remaining artifacts

* Add test case with path requiring fsspec

* Avoid writing logs to file unless output_path is local

* Document the possibility of using paths supported by fsspec

* Fix style and lint

* Add missing lint fixes

* Add type annotations to new functions

* Use Coqpit method for converting config to dict

* Fix type annotation in semi-new function

* Add return type for load_fsspec

* Fix bug where fs not always created

* Restore the experiment removal functionality
2021-08-09 18:02:36 +00:00
Eren Gölge d9e18e009b Skip phoneme cache pre-compute if the path exists 2021-08-09 18:02:36 +00:00
Eren Gölge 6c131d168e Bump the version to 0.1.3 2021-07-26 21:32:27 +02:00
Eren Gölge febd6105b5 Update default vocoder for de-thorsten 2021-07-26 16:08:52 +02:00
Eren Gölge 4b7b88dd3d Add fullband-melgan DE vocoder 2021-07-26 15:38:30 +02:00
Eren Gölge 764f684e1b Fix `server.py` for multi-speaker models 2021-07-26 15:38:30 +02:00
Eren Gölge 75b201c6c1
Merge pull request #673 from coqui-ai/fix_stopnet
Fix stopnet training for Tacotron models
2021-07-24 12:25:38 +02:00
Eren Gölge fc0c4600bd Fix stopnet training 2021-07-24 11:39:54 +02:00
Eren Gölge 30eed347b6
Merge pull request #581 from Edresson/dev
Compute speaker embeddings in batch for the LSTM  Speaker Encoder and Compute embeddings/ finding chars using config file.
2021-07-23 17:22:51 +02:00
Edresson Casanova d5adc35fdf
Add docstring to compute_embeddings script 2021-07-21 07:16:10 -03:00
Eren Gölge 05c75aa9d5 Fix linter issues 2021-07-16 13:37:38 +02:00
Eren Gölge 58cc414477 Fix WaveGrad `test_run` 2021-07-16 13:02:25 +02:00
WeberJulian 25832eb97b Changes for review 2021-07-15 11:38:45 +02:00
Edresson b1620d1f3f remove ignore generate eval flag 2021-07-15 03:34:28 -03:00
WeberJulian c79a82ed07 refix linter 2021-07-13 23:12:18 +02:00
WeberJulian 7d92b30946 Fix tests 2021-07-13 23:00:34 +02:00
WeberJulian 32974dd6a9 Fix test sentences synthesis 2021-07-13 16:07:13 +02:00
Edresson d906fea08c lint fix and eval as argparse in extract tts spectrograms 2021-07-13 02:15:31 -03:00
Edresson 2e5baffa9c Merge fix and eval split as argparse 2021-07-13 01:47:32 -03:00
Eren Gölge 93a74cbb71
Merge pull request #628 from Aloento/patch-2
Change to _get_preprocessor_by_name
2021-07-11 22:17:50 +02:00
Edresson 4eac1c4651 bug fix on train_encoder and unit tests 2021-07-11 12:00:39 -03:00
Aloento 6e3e6d5756
Change to _get_preprocessor_by_name 2021-07-08 09:53:13 +02:00
Eren Gölge 8fbadad68e Bump up to v0.1.2 2021-07-06 14:44:59 +02:00
eren golge 3c0454490f Fix #616 2021-07-06 14:44:03 +02:00
Eren Gölge 0c347624e7 Bump up version to v0.1.1 2021-07-04 11:46:36 +02:00
Eren Gölge a05b234080 Raise an error when multiple GPUs are in use
User must define the target GPU by `CUDA_VISIBLE_DEVICES` and
use `distribute.py` for multi-gpu training.
2021-07-04 11:25:49 +02:00
Eren Gölge 270c3823eb Fix #608 2021-07-04 11:19:31 +02:00
Eren Gölge c25a2184e7 Add docs for `SpeakerManager` 2021-07-03 13:55:27 +02:00
Eren Gölge f382e4c700 Fix linter warnings 2021-07-03 13:30:24 +02:00
Eren Gölge 9e7824fe35 Fix UnivNet inference code 2021-07-02 10:48:34 +02:00
Eren Gölge 168f97cbe9 Let `Synthesizer` use the speaker manager out of the model 2021-07-02 10:47:55 +02:00
Eren Gölge 196876feb1 Fix `ModelManager` model download 2021-07-02 10:47:05 +02:00
Eren Gölge 9352cb4136 Format Align TTS docstrings 2021-07-02 10:45:58 +02:00
Eren Gölge 95ad72f38f Fix glow tts initialization 2021-07-02 10:45:37 +02:00
Eren Gölge 40b0b5365e Let `get_characters` return `num_chars` 2021-07-02 10:45:00 +02:00
Eren Gölge 0fa6a8c9b8 Fix glow tts default parameters 2021-07-02 10:44:23 +02:00
Eren Gölge a4c658f5ef Fix for using the `Synthesizer` out of the model 2021-07-02 10:43:38 +02:00
Eren Gölge db47f4f105 Update `.models.json` 2021-07-02 10:43:00 +02:00
Eren Gölge 2e1a428b83 Update glowtts docstrings and docs 2021-06-30 14:30:55 +02:00
Eren Gölge 5723eb4738 Fix config init in `process_args` 2021-06-29 16:41:08 +02:00
Eren Gölge 4b5421b42f Remove FAQ link from README.md 2021-06-29 13:20:40 +02:00
Eren Gölge 47b3b10d6d Bump up to v0.1.0 🚀 2021-06-29 13:07:59 +02:00
Eren Gölge 7ec5c31898 Merge branch 'univnet' into trainer-api 2021-06-29 10:27:12 +02:00
Eren Gölge 51398cd15b Add docstrings and typing for `audio.py` 2021-06-28 17:03:47 +02:00
Eren Gölge ae6405bb76 Docstrings for `Trainer` 2021-06-28 17:03:47 +02:00
Eren Gölge 6b265ae8e3 Docstring update 2021-06-28 17:03:47 +02:00
Eren Gölge ab563ce7cd Start training by config.json using `register_config` 2021-06-28 17:03:47 +02:00
Eren Gölge b3c073c99b Allow runing full path scripts with `distribute.py` 2021-06-28 17:03:47 +02:00
Eren Gölge d42d1c02ea Use `torch.linalg.qr` for pytorch > `v1.9.0` 2021-06-28 17:03:47 +02:00
Eren Gölge fbba37e01e Fix loading the `amp` scaler from a checkpoint 🛠️ 2021-06-28 17:03:47 +02:00
Eren Gölge a7617d8ab6 Add 🐍 python 3.9 to CI 2021-06-28 17:03:47 +02:00
Eren Gölge 9790eddada Fix wrong argument name 🛠️ 2021-06-28 17:03:47 +02:00
Eren Gölge 932ab107ae Docstring edit in `TTSDataset.py` ✍️ 2021-06-28 17:03:47 +02:00
Eren Gölge cfa5041db7 Fix `eval_log` for `gan.py` 🛠️ 2021-06-28 17:03:47 +02:00
Eren Gölge d700845b10 Move `TorchSTFT` to `utils.audio` 2021-06-28 17:03:47 +02:00
Eren Gölge 5b89cb4fec Fixup `trainer.py` 🛠️ 2021-06-28 17:03:47 +02:00
Eren Gölge 8c74f054f0 Enable support for 🐍 python 3.10
Bump up versions numpy 1.19.5 and TF 2.5.0
2021-06-28 17:03:47 +02:00
Eren Gölge 9455a2b01e Apply small fixes for API compatibility 2021-06-28 17:03:47 +02:00
Eren Gölge a5d5bc9063 Print `max_decoder_steps` when model reaches the limit 2021-06-28 17:03:47 +02:00
Eren Gölge e30f245e06 Update `synthesizer` for speaker and model init 2021-06-28 17:03:47 +02:00
Eren Gölge 15fa31b595 fixup configs 2021-06-28 17:03:47 +02:00
Eren Gölge f23b228e24 Update `speaker_manager` 2021-06-28 17:03:47 +02:00
Eren Gölge e53616078a Fixup `utils` for the trainer 2021-06-28 17:03:47 +02:00
Eren Gölge 106b63d8a9 Update `vocoder` utils 2021-06-28 17:03:47 +02:00
Eren Gölge 45947acb60 Update `TTS.bin` scripts for the new API 2021-06-28 17:03:47 +02:00
Eren Gölge d7225eedb0 Update `vocoder` datasets and `setup_dataset` 2021-06-28 17:03:20 +02:00
Eren Gölge d18198dff8 Implement `setup_model` for vocoder models 2021-06-28 17:03:20 +02:00
Eren Gölge e949e7ad58 Update vocoder models 2021-06-28 17:03:19 +02:00
Eren Gölge 51005cdab4 Update `tts.models.setup_model` 2021-06-28 17:03:19 +02:00
Eren Gölge 7b8c15ac49 Create base 🐸TTS model abstraction for tts models 2021-06-28 17:03:19 +02:00
Eren Gölge a358f74a52 Update vocoder model configs 2021-06-28 17:03:19 +02:00
Eren Gölge 786170fe7d Update tts model configs 2021-06-28 17:03:19 +02:00
Eren Gölge 98298ee671 Implement unified IO utils 2021-06-28 17:03:19 +02:00
Eren Gölge c7aad884cd Implement unified trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 6d7b5fbcde `tts` model abstraction with `TTSModel` 2021-06-28 17:03:19 +02:00
Eren Gölge d4dbd89752 fix calculation of `loader_start_time` 2021-06-28 17:03:19 +02:00
Eren Gölge c754a0e17d `TrainerAbstract` and related updates for `TrainerTTS` 2021-06-28 17:03:19 +02:00
Eren Gölge 00c82c516d rename to 2021-06-28 17:03:19 +02:00
Eren Gölge 166f0aeb9a merge if branches with the same implementation 2021-06-28 17:03:19 +02:00
Eren Gölge 03494ad642 adjust `distribute.py` for the `train_tts.py` 2021-06-28 17:03:19 +02:00
Eren Gölge fdfb18d230 downsize melgan test model size 2021-06-28 17:03:19 +02:00
Eren Gölge 25238e0658 fix glow-tts `inference()` 2021-06-28 17:03:19 +02:00
Eren Gölge 419735f440 refactor and fix multi-speaker training in Trainer and Tacotron models 2021-06-28 17:03:19 +02:00
Eren Gölge 269e5a734e add max_decoder_steps argument to tacotron models 2021-06-28 17:03:19 +02:00
Eren Gölge b3324bd914 fix speaker_manager init 2021-06-28 17:03:19 +02:00
Eren Gölge 2c38ef8441 use get_speaker_manager in Trainer and save speakers.json file when
needed
2021-06-28 17:03:19 +02:00
Eren Gölge d6b2b6add6 make style and linter fixes 2021-06-28 17:03:19 +02:00
Eren Gölge 802d461389 Compute d_vectors and speaker_ids separately in TTSDataset 2021-06-28 17:03:19 +02:00
Eren Gölge db6a97d1a2 rename external speaker embedding arguments as `d_vectors` 2021-06-28 17:03:19 +02:00
Eren Gölge 9042ae9195 use `to_cuda()` for moving data in `format_batch()` 2021-06-28 17:03:19 +02:00
Eren Gölge f82f1970b8 change `to(device)` to `type_as` in models 2021-06-28 17:03:19 +02:00
Eren Gölge 9c94b0c5c0 init `durations = None` 2021-06-28 17:03:19 +02:00
Eren Gölge 1fa15c195a docstring fix 2021-06-28 17:03:19 +02:00
Eren Gölge 1c8a3d7c86 make style 2021-06-28 17:03:19 +02:00
Eren Gölge 8cdd423234 styling formatting.py 2021-06-28 17:03:19 +02:00
Eren Gölge 30211512a4 fix type annotations 2021-06-28 17:03:19 +02:00
Eren Gölge b22b7620c3 update glow-tts output shapes to match [B, T, C] 2021-06-28 17:03:19 +02:00
Eren Gölge 8381379938 formating `cond_input` with a function in Tacotron models 2021-06-28 17:03:19 +02:00
Eren Gölge ef4ea9e527 update imports for `formatters` 2021-06-28 17:03:19 +02:00
Eren Gölge 6c495c6a6e fix glow-tts inference and forward functions for handling `cond_input`
and refactor its test
2021-06-28 17:03:19 +02:00
Eren Gölge f840268181 refactor `SpeakerManager` 2021-06-28 17:03:19 +02:00
Eren Gölge 421194880d linter fixes 2021-06-28 17:03:19 +02:00
Eren Gölge 8e52a69230 delete separate tts training scripts and pre-commit configuration 2021-06-28 17:03:19 +02:00
Eren Gölge d96ebcd6d3 make style 2021-06-28 17:03:19 +02:00
Eren Gölge b643e8b37c `logging/__init__.py` 2021-06-28 17:03:19 +02:00
Eren Gölge 0cee5042a9 fix logger imports 2021-06-28 17:03:19 +02:00
Eren Gölge 72dceca52c import missings 2021-06-28 17:03:19 +02:00
Eren Gölge 0eec238429 remove redundant imports 2021-06-28 17:03:19 +02:00
Eren Gölge b500338faa make style 2021-06-28 17:03:19 +02:00
Eren Gölge 469d2e620a update extract_tts_spectrogram for `cond_input` API of the models 2021-06-28 17:03:19 +02:00
Eren Gölge 5ab28fa618 update `extract_tts_spec...` using `SpeakerManager` 2021-06-28 17:03:19 +02:00
Eren Gölge c392fa4288 update `extract_tts_spectrograms` for the new model API 2021-06-28 17:03:19 +02:00
Eren Gölge 8f47f95998 correct import of `load_meta_data`
remove redundant import
2021-06-28 17:03:19 +02:00
Eren Gölge c680a07a20 fix `Synthesized` for the new `synthesis()` 2021-06-28 17:03:19 +02:00
Eren Gölge 73bf9673ed revert logging.info to print statements for trainer 2021-06-28 17:03:19 +02:00
Eren Gölge d25f017b42 update `setup_model.py` imports 2021-06-28 17:03:19 +02:00
Eren Gölge bb355b7441 update align_tts.py model for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 9203b863d9 update align_tts_loss for trainer 2021-06-28 17:03:19 +02:00
Eren Gölge fc9a0fb8ce update aling_tts_config for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge e298b8e364 update trainer.py for better logging handling, restoring models and
rename init_ functions with get_
2021-06-28 17:03:19 +02:00
Eren Gölge b8a4af4010 update `synthesis.py` for being more generic 2021-06-28 17:03:19 +02:00
Eren Gölge c70d0c9dae update `speedy_speech.py` model for trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 06ee57d816 update `speedy_speecy_config.py` for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 4e910993f1 update tacotron model to return `model_outputs` 2021-06-28 17:03:19 +02:00
Eren Gölge bb4deee64c update glow-tts for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 9134c7dfb6 update `sequence_mask` import globally 2021-06-28 17:03:19 +02:00
Eren Gölge b2218e882a update `glow_tts_config.py` for setting the optimizer and the scheduler 2021-06-28 17:03:19 +02:00
Eren Gölge 891631ab47 typing annotation for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 5f07315722 add trainer and train_tts 2021-06-28 17:03:19 +02:00
Eren Gölge 34f8a74e4d remove `truncated` from synthesizer 2021-06-28 17:03:19 +02:00
Eren Gölge 178eccbc16 update console logger 2021-06-28 17:03:19 +02:00
Eren Gölge f4f83b6379 update `synthesis.py` for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge 130781dab6 remove `tts.generic_utils` as all the functions are moved to other files 2021-06-28 17:03:19 +02:00
Eren Gölge 535a458f40 update Tacotron models for the trainer 2021-06-28 17:03:19 +02:00
Eren Gölge bdbfc95618 add `gradual_training` argument to tacotron.py 2021-06-28 17:03:19 +02:00
Eren Gölge 5a2e75f0ee import missings for tacotron.py 2021-06-28 17:03:19 +02:00
Eren Gölge da7d10e53c mode `setup_model()` to `models/__init__.py` 2021-06-28 17:03:19 +02:00
Eren Gölge ca302db7b0 add sequence_mask to `utils.data` 2021-06-28 17:03:19 +02:00
Eren Gölge 844abb3b1d `setup_loss()` in `layer/__init__.py` 2021-06-28 17:03:19 +02:00
Eren Gölge a20a1c7d06 rename preprocess.py -> formatters.py 2021-06-28 17:03:19 +02:00
Eren Gölge b9bccbb243 move load_meta_data and related functions to `datasets/__init__.py` 2021-06-28 17:03:19 +02:00
Eren Gölge d09385808a set test_sentences in config 2021-06-28 17:03:19 +02:00
Eren Gölge 8def3c87af trainer-API updates 2021-06-28 17:03:19 +02:00
Eren Gölge 42554cc711 rename MyDataset -> TTSDataset 2021-06-28 17:03:19 +02:00
Edresson 1c4e806f54 use speaker manager on compute embeddings script 2021-06-27 03:35:34 -03:00
Edresson Casanova eb84bb2bc8
Merge branch 'dev' into dev 2021-06-26 15:32:19 -03:00
Eren Gölge 987cf1178b Bump up to v0.0.16 2021-06-25 14:44:56 +02:00
Michael Hansen 3f172b84d8 Fix linting issues 2021-06-25 14:41:31 +02:00
Michael Hansen 4d8426fa0a Use eSpeak IPA lexicons by default for phoneme models 2021-06-25 14:41:05 +02:00
Michael Hansen 618b509204 Use combined characters available in TTS phonemes (like ç) 2021-06-25 14:41:05 +02:00
Michael Hansen da6f6a4a01 Update docstring for clean_gruut_phonemes 2021-06-25 14:41:05 +02:00
Michael Hansen 47191f3ecc Add tests for gruut phonemization 2021-06-25 14:41:05 +02:00
Michael Hansen 67869e77f9 Use gruut for phonemization 2021-06-25 14:41:05 +02:00
Eren Gölge 788992093d Add UnivNet vocoder 🚀 2021-06-23 13:51:04 +02:00
Eren Gölge 64fd59204c Use `torch.linalg.qr` for pytorch > `v1.9.0` 2021-06-23 13:49:42 +02:00
Eren Gölge aba840b4e6 Fix loading the `amp` scaler from a checkpoint 🛠️ 2021-06-23 13:49:42 +02:00
Eren Gölge 18e5393f16 Add 🐍 python 3.9 to CI 2021-06-23 13:49:36 +02:00
Eren Gölge 0ff2d2336a Fix wrong argument name 🛠️ 2021-06-22 16:21:11 +02:00
Eren Gölge 61c3cb871f Docstring edit in `TTSDataset.py` ✍️ 2021-06-22 16:21:11 +02:00
Eren Gölge 6f739ea07a Fix `eval_log` for `gan.py` 🛠️ 2021-06-22 16:21:11 +02:00
Eren Gölge ebb91c0fbb Move `TorchSTFT` to `utils.audio` 2021-06-22 16:21:11 +02:00
Eren Gölge 01c4b22a2f Fixup `trainer.py` 🛠️ 2021-06-22 16:21:11 +02:00
Eren Gölge 7de2756fc4 Enable support for 🐍 python 3.10
Bump up versions numpy 1.19.5 and TF 2.5.0
2021-06-22 16:21:11 +02:00
Eren Gölge 220e184f66 Apply small fixes for API compatibility 2021-06-22 16:21:11 +02:00
Eren Gölge 77d57dd301 Print `max_decoder_steps` when model reaches the limit 2021-06-22 16:21:11 +02:00
Eren Gölge 7dc2177df4 Update `synthesizer` for speaker and model init 2021-06-22 16:21:11 +02:00
Eren Gölge c3a0bc702e fixup configs 2021-06-22 16:21:11 +02:00
Eren Gölge 0e01c2594f Update `speaker_manager` 2021-06-22 16:21:11 +02:00
Eren Gölge 8182f5168f Fixup `utils` for the trainer 2021-06-22 16:21:11 +02:00
Eren Gölge b4bb567e04 Update `vocoder` utils 2021-06-22 16:21:11 +02:00
Eren Gölge f3ff5b1971 Update `TTS.bin` scripts for the new API 2021-06-22 16:21:11 +02:00
Eren Gölge aed919cf1c Update `vocoder` datasets and `setup_dataset` 2021-06-22 16:21:11 +02:00
Eren Gölge 59abf490a1 Implement `setup_model` for vocoder models 2021-06-22 16:21:11 +02:00
Eren Gölge 420820caf4 Update vocoder models 2021-06-22 16:21:11 +02:00
Eren Gölge d10f9c5676 Update `tts.models.setup_model` 2021-06-22 16:21:11 +02:00
Eren Gölge cae702980f Create base 🐸TTS model abstraction for tts models 2021-06-22 16:21:11 +02:00
Eren Gölge 70d968b169 Update vocoder model configs 2021-06-22 16:21:11 +02:00
Eren Gölge f8a3460818 Update tts model configs 2021-06-22 16:21:11 +02:00
Eren Gölge acd96a4940 Implement unified IO utils 2021-06-22 16:21:10 +02:00
Eren Gölge 6b907554f8 Implement unified trainer 2021-06-22 16:21:10 +02:00
Eren Gölge 20c4a8c8e1 `tts` model abstraction with `TTSModel` 2021-06-22 16:21:10 +02:00
Eren Gölge b934665fc0 fix calculation of `loader_start_time` 2021-06-22 16:21:10 +02:00
Eren Gölge 64f0f57757 `TrainerAbstract` and related updates for `TrainerTTS` 2021-06-22 16:21:10 +02:00
Eren Gölge f077a356e0 rename to 2021-06-22 16:21:10 +02:00
Eren Gölge 4575b70826 merge if branches with the same implementation 2021-06-22 16:21:10 +02:00
Eren Gölge 59be1b9af1 adjust `distribute.py` for the `train_tts.py` 2021-06-22 16:21:10 +02:00
Eren Gölge 614738cc85 downsize melgan test model size 2021-06-22 13:12:52 +02:00
Eren Gölge 4f29725eb6 fix glow-tts `inference()` 2021-06-22 13:12:52 +02:00
Eren Gölge a87c886497 refactor and fix multi-speaker training in Trainer and Tacotron models 2021-06-22 13:12:52 +02:00
Eren Gölge 0206bb847b add max_decoder_steps argument to tacotron models 2021-06-22 13:12:52 +02:00
Eren Gölge cbb52b3d83 fix speaker_manager init 2021-06-22 13:12:52 +02:00
Eren Gölge d2fd6a34a1 use get_speaker_manager in Trainer and save speakers.json file when
needed
2021-06-22 13:12:52 +02:00
Eren Gölge 147550c65f make style and linter fixes 2021-06-22 13:12:52 +02:00
Eren Gölge a605dd3d08 Compute d_vectors and speaker_ids separately in TTSDataset 2021-06-22 13:12:52 +02:00
Eren Gölge f00ef90ce6 rename external speaker embedding arguments as `d_vectors` 2021-06-22 13:12:52 +02:00
Eren Gölge e7b7268c43 use `to_cuda()` for moving data in `format_batch()` 2021-06-22 13:12:52 +02:00
Eren Gölge 26a3312f0d change `to(device)` to `type_as` in models 2021-06-22 13:12:52 +02:00
Eren Gölge c09622459e init `durations = None` 2021-06-22 13:12:52 +02:00
Eren Gölge 2e31659dd9 docstring fix 2021-06-22 13:12:52 +02:00
Eren Gölge 7a0750a4f5 make style 2021-06-22 13:12:52 +02:00
Eren Gölge 534401377d styling formatting.py 2021-06-22 13:12:52 +02:00
Eren Gölge e229f5c081 fix type annotations 2021-06-22 13:12:52 +02:00
Eren Gölge 506189bdee update glow-tts output shapes to match [B, T, C] 2021-06-22 13:12:52 +02:00
Eren Gölge f568833d28 formating `cond_input` with a function in Tacotron models 2021-06-22 13:12:52 +02:00
Eren Gölge 254707c610 update imports for `formatters` 2021-06-22 13:12:52 +02:00
Eren Gölge 223502d827 fix glow-tts inference and forward functions for handling `cond_input`
and refactor its test
2021-06-22 13:12:52 +02:00
Eren Gölge d4b1acfa81 refactor `SpeakerManager` 2021-06-22 13:12:52 +02:00
Eren Gölge 26e7c0960c linter fixes 2021-06-22 13:12:52 +02:00
Eren Gölge 79f7c5da1e delete separate tts training scripts and pre-commit configuration 2021-06-22 13:12:52 +02:00
Eren Gölge ca787be193 make style 2021-06-22 13:12:52 +02:00
Eren Gölge d376647ca0 `logging/__init__.py` 2021-06-22 13:12:52 +02:00
Eren Gölge bb58a0588e fix logger imports 2021-06-22 13:12:52 +02:00
Eren Gölge 9bbc924377 import missings 2021-06-22 13:12:52 +02:00
Eren Gölge b4d4ce0d7e remove redundant imports 2021-06-22 13:12:52 +02:00
Eren Gölge aefa71155c make style 2021-06-22 13:12:52 +02:00
Eren Gölge 88d8a94a10 update extract_tts_spectrogram for `cond_input` API of the models 2021-06-22 13:12:52 +02:00
Eren Gölge 667bb708b6 update `extract_tts_spec...` using `SpeakerManager` 2021-06-22 13:12:52 +02:00
Eren Gölge 830306d2fd update `extract_tts_spectrograms` for the new model API 2021-06-22 13:12:52 +02:00
Eren Gölge c673eb8ef8 correct import of `load_meta_data`
remove redundant import
2021-06-22 13:12:52 +02:00
Eren Gölge f0a419546b fix `Synthesized` for the new `synthesis()` 2021-06-22 13:12:52 +02:00
Eren Gölge c7ff175592 revert logging.info to print statements for trainer 2021-06-22 13:12:52 +02:00
Eren Gölge fd6afe5ae5 update `setup_model.py` imports 2021-06-22 13:12:52 +02:00
Eren Gölge c82d91051d update align_tts.py model for the trainer 2021-06-22 13:12:52 +02:00
Eren Gölge 4f66e816d1 update align_tts_loss for trainer 2021-06-22 13:12:52 +02:00
Eren Gölge 8213ad8b5f update aling_tts_config for the trainer 2021-06-22 13:12:52 +02:00
Eren Gölge 8dfd4c91ff update trainer.py for better logging handling, restoring models and
rename init_ functions with get_
2021-06-22 13:12:52 +02:00
Eren Gölge fb9289d365 update `synthesis.py` for being more generic 2021-06-22 13:12:52 +02:00
Eren Gölge f121b0ff5d update `speedy_speech.py` model for trainer 2021-06-22 13:12:52 +02:00
Eren Gölge 843b3ba960 update `speedy_speecy_config.py` for the trainer 2021-06-22 13:12:52 +02:00
Eren Gölge c9790bee2c update tacotron model to return `model_outputs` 2021-06-22 13:12:52 +02:00
Eren Gölge f09ec7e3a7 update glow-tts for the trainer 2021-06-22 13:12:52 +02:00
Eren Gölge 3346a6d9dc update `sequence_mask` import globally 2021-06-22 13:12:52 +02:00
Eren Gölge 9765b1aa6b update `glow_tts_config.py` for setting the optimizer and the scheduler 2021-06-22 13:12:52 +02:00
Eren Gölge 6bf6543df8 typing annotation for the trainer 2021-06-22 13:12:52 +02:00
Eren Gölge 57cdddef16 add trainer and train_tts 2021-06-22 13:12:52 +02:00
Eren Gölge d769af9e3b remove `truncated` from synthesizer 2021-06-22 13:12:52 +02:00
Eren Gölge 570633ab80 update console logger 2021-06-22 13:12:52 +02:00
Eren Gölge 2ac6b824ca update `synthesis.py` for the trainer 2021-06-22 13:12:52 +02:00
Eren Gölge c9e5527070 remove `tts.generic_utils` as all the functions are moved to other files 2021-06-22 13:12:52 +02:00
Eren Gölge 2ab723cd10 update Tacotron models for the trainer 2021-06-22 13:12:52 +02:00
Eren Gölge d6b6a15b5c add `gradual_training` argument to tacotron.py 2021-06-22 13:12:52 +02:00
Eren Gölge 118a7f2b43 import missings for tacotron.py 2021-06-22 13:12:52 +02:00
Eren Gölge c98149d488 mode `setup_model()` to `models/__init__.py` 2021-06-22 13:12:52 +02:00
Eren Gölge 86edf6ab0e add sequence_mask to `utils.data` 2021-06-22 13:12:52 +02:00
Eren Gölge c61486b1e3 `setup_loss()` in `layer/__init__.py` 2021-06-22 13:12:52 +02:00
Eren Gölge f07209d2e0 rename preprocess.py -> formatters.py 2021-06-22 13:12:52 +02:00
Eren Gölge facb782851 move load_meta_data and related functions to `datasets/__init__.py` 2021-06-22 13:12:52 +02:00
Eren Gölge b9d4355d20 set test_sentences in config 2021-06-22 13:12:52 +02:00
Eren Gölge 7bdd0eb72f trainer-API updates 2021-06-22 13:12:52 +02:00
Eren Gölge 0f284841d1 rename MyDataset -> TTSDataset 2021-06-22 13:12:52 +02:00
Edresson 99d40e98d9 fix Lint checks 2021-06-18 14:59:01 -03:00
Edresson 28bec238ca fix Lint checks 2021-06-18 14:33:50 -03:00
Edresson 83644056e3 fix Lint checks 2021-06-18 14:32:28 -03:00
Edresson Casanova e78e3cd81e
Merge branch 'dev' into dev 2021-06-18 14:10:03 -03:00
Edresson b74b510d3c Compute embeddings and find characters using config file 2021-06-18 14:04:49 -03:00
Adam Froghyar b0aa189348 Forcing do_trim_silence to False in the extract TTS script 2021-06-14 10:44:00 +02:00
Eren Gölge d245b5d48f bump up v0.0.15.1 2021-06-08 09:21:01 +02:00
Edresson 14b209c7e9 Create a batch for more fast inference on LSTM Speaker Encoder 2021-06-05 03:12:17 -03:00
Eren Gölge b8b79a5e5a fix `use_cuda` bug in `server.py` 2021-06-04 14:02:53 +02:00
Eren Gölge 203ab855c3 bump up to v0.0.15 2021-06-04 13:52:54 +02:00
Eren Gölge ba9bcf7c6b auto upload to pypi on release 2021-06-04 12:20:06 +02:00
Eren Gölge e66753bd0d fixup! new japanese model placeholder in `.models.json` 2021-06-03 18:04:28 +02:00
Eren Gölge bd434636a9 new japanese model placeholder in `.models.json` 2021-06-02 15:54:37 +02:00
Eren Gölge 401fbd8978 bump up to v0.0.15 2021-06-02 11:48:17 +02:00
Eren Gölge 49c5e5d820 maket style japanese PR 2021-06-02 11:44:46 +02:00
Eren Gölge 73b4083c6c
Merge pull request #502 from kaiidams/kaiidams/kokoro
Japanese Tacotron 2 model
2021-06-02 10:20:08 +02:00
Katsuya Iida 6d8310d2a9 Set the version to the same with the dev branch. 2021-06-02 07:48:28 +09:00
Alexander Korolev c1eb9bdcca
fix speaker dim inference 2021-06-01 15:15:26 +02:00
Katsuya Iida 1cc18d1972 Move unittest of Japanese phonemizer. 2021-06-01 18:51:34 +09:00
Alexander Korolev 5b89ef2c6e
fix speaker-embeddings dimension during inference 2021-06-01 11:06:35 +02:00
Eren Gölge d0ab0382fc linter fixes 2021-06-01 09:15:32 +02:00
Eren Gölge bec85ac58d make style 2021-05-31 16:37:15 +02:00
Eren Gölge d9f1268f99 init tb_logger None for rank > 0 processes 2021-05-31 15:47:07 +02:00
Eren Gölge 301c516abd Merge branch 'dev' of https://github.com/coqui-ai/TTS into dev 2021-05-31 15:46:25 +02:00
Edresson 7448177b72 use SpeakerManager on compute embeddings script 2021-05-29 21:11:53 -03:00
Katsuya Iida c4a5a73f18 update Kokoro config 2021-05-29 19:17:27 +09:00
Katsuya Iida 3a9ac2de4a Merge remote-tracking branch 'coqui-ai/main' into kaiidams/kokoro 2021-05-29 09:39:23 +09:00
Katsuya Iida d0c9c1ca5c Move TTS/tts/utils/japanese 2021-05-29 09:21:47 +09:00
Edresson 099142d4dd bug fix 2021-05-27 21:50:56 -03:00
Edresson 208bb0f0ee add batched speaker encoder inference 2021-05-27 20:01:00 -03:00
Edresson 825734a3a9 remove unused embeddings export 2021-05-27 19:10:24 -03:00
Katsuya Iida c4987e9d4e Move import at the head of the file. 2021-05-28 00:22:57 +09:00
Eren Gölge 925c08cf95 replace unidecode with anyascii 2021-05-27 14:02:44 +02:00
Eren Gölge e08c58db3b bump up version to v0.14.1 2021-05-27 13:11:01 +02:00
Eren Gölge c6f22aaa67 fix #509 2021-05-27 13:09:15 +02:00
Edresson 1496f271dc update Compute embeddings script 2021-05-27 00:45:18 -03:00
Edresson bc5307caa0 add unit tests for SoftmaxAngleProtoLoss and ResnetSpeakerEncoder and bugfix 2021-05-26 20:35:58 -03:00
Edresson c90037c2e9 solve merge problems 2021-05-26 16:01:30 -03:00
Katsuya Iida f921a05bdb Fixed lint errors 2021-05-26 19:02:16 +09:00
Edresson Casanova f89cb6aec2
Merge branch 'dev' into dev 2021-05-25 17:30:25 -03:00
Edresson d570c2d790 pylint fix and data loader bug fix 2021-05-26 01:11:37 -03:00
Katsuya Iida 0536aa6d0f Japanese Tacotron 2 model 2021-05-22 17:12:19 +09:00
Eren Gölge 5482a0f62d type def for gradual_training 2021-05-19 14:03:26 +02:00
Eren Gölge df6a98d0c3 type def for gradual_training 2021-05-19 14:00:44 +02:00
Eren Gölge 16576d6408 bump version number 2021-05-19 12:35:10 +02:00
Eren Gölge 8a7c40736c set use_phonemes false 2021-05-19 01:27:26 +02:00
Eren Gölge ccfaa6b1d5 add `needs_phonemizer` field to models.json. If set true these models
are only compatible with v0.0.13 or below.
2021-05-18 17:57:28 +02:00
Eren Gölge a14fcf2a13 remove text_processing test 2021-05-18 17:57:28 +02:00
Eren Gölge d7fae3f515 remove all espeaker and phonemizer deps 2021-05-18 17:57:28 +02:00
Eren Gölge ced05e812a move chinese phonemizer 2021-05-18 17:57:28 +02:00
Eren Gölge 218af1d9a2 change `list` to `List` in config 2021-05-18 17:30:27 +02:00
Eren Gölge 4df31f7fbd unused_speakers argument for ignoring speaker ids in multi-speaker
training
2021-05-18 14:50:03 +02:00
Eren Gölge c2c7dff805 use relaxted coqpit parser 2021-05-18 14:49:47 +02:00
Edresson 856ea19758 bug fix in dataloader and update inference 2021-05-18 03:43:16 -03:00
Eren Gölge d1b469935d tacotron DDC LJSpeech recipe 2021-05-17 11:42:14 +02:00
Eren Gölge 34a42d379f update tacotron_config.py for checking `r` and the docstring 2021-05-17 11:35:30 +02:00
Eren Gölge 12722501bb styling 2021-05-15 23:48:31 +02:00
Eren Gölge 8b1014d188 add docstrings with default value fixes 2021-05-15 23:45:10 +02:00
Eren Gölge da49089a72 update melgan training test batch size 2021-05-12 10:12:11 +02:00
Edresson 3433c2f348 add compute embedding for the new speaker encoder 2021-05-12 03:06:46 -03:00
Eren Gölge 0213e1cbf4 update configs for tts models to match the field typed with the expected
values
2021-05-12 00:57:38 +02:00
Eren Gölge 715b0a65a0 update main.yml for python x64
fix test
2021-05-12 00:57:29 +02:00
Edresson 3fcc748b2e implement the Speaker Encoder H/ASP 2021-05-11 16:27:05 -03:00
Eren Gölge 843d1b3d98 linter fixes 2021-05-11 11:30:00 +02:00
Eren Gölge 19fb1d743d style update 2021-05-11 11:30:00 +02:00
Eren Gölge 6e980b49c4 fix synthesizer.py for Coqpit 2021-05-11 11:29:18 +02:00
Eren Gölge db14dcd95a remove old load_config 2021-05-11 11:29:18 +02:00
Eren Gölge a21ac883dd add get_cuda() 2021-05-11 11:29:18 +02:00
Eren Gölge 21dd4d7960 fix load_config imports for Coqpit 2021-05-11 11:29:18 +02:00
Eren Gölge c57f0b46bb reintro use_gst for backwars compat 2021-05-11 11:29:18 +02:00
Eren Gölge 18e76a2309 fix speaker encoder model initialization 2021-05-11 11:29:18 +02:00
Eren Gölge 10de40bba1 make num_workers mandatory config field 2021-05-11 11:29:18 +02:00
Eren Gölge df1ddd3539 allow read_json_with_comments for backward compat 2021-05-11 11:29:18 +02:00
Eren Gölge 9f7599e3c3 fix train_encoder for coqpit 2021-05-11 11:29:18 +02:00
Eren Gölge f8e52965dd add speaker encoder coqpit 2021-05-11 11:29:18 +02:00
Eren Gölge ce2bba543e remove extra from utils and move funcs to io.py 2021-05-11 11:29:18 +02:00
Eren Gölge 812dbc2b06 rm config.json 2021-05-11 11:29:18 +02:00
Eren Gölge 3fde2001b1 train_encoder refactoring for coqpit 2021-05-11 11:29:18 +02:00
Eren Gölge 9ee70af9bb code styling 2021-05-11 11:29:18 +02:00
Eren Gölge 10db2baa06 global shared Coqpit configs 2021-05-11 11:29:18 +02:00
Eren Gölge 3dec62b183 add Coqpits for the vocoder models 2021-05-11 11:29:18 +02:00
Eren Gölge 6f4eed94f5 remove *.json vocoder configs 2021-05-11 11:29:18 +02:00
Eren Gölge 78b3825d0b update train scripts for coqpit 2021-05-11 11:29:18 +02:00
Eren Gölge 757e90b1cc load_config function to initialize the right Coqpit for the given model 2021-05-11 11:29:18 +02:00
Eren Gölge e6f45b9eb7 update train_vocoder_gan.py for coqpit 2021-05-11 11:29:18 +02:00
Eren Gölge bcebd69d09 remove bash tts training tests 2021-05-11 11:29:17 +02:00
Eren Gölge 7663bc63c1 add Coqpit configs for the TTS models 2021-05-11 11:29:17 +02:00
Eren Gölge 7227e8f1d2 update train_align_tts.py for coqpit 2021-05-11 11:29:17 +02:00
Eren Gölge 51a7e06945 glow_tts_config.py and train test on python 2021-05-11 11:29:17 +02:00
Eren Gölge 720fe13056 update glow_tts modules and training script for coqpit use 2021-05-11 11:29:17 +02:00
Eren Gölge 816e7ee698 remove default configs.json as replacing with Coqpit configs 2021-05-11 11:29:17 +02:00
Eren Gölge 35341d5482 move bash script based tests to python with coqpit 2021-05-11 11:29:17 +02:00
Eren Gölge 647163397d coqpit refactoring 2021-05-11 11:29:17 +02:00
Eren Gölge eaa130e813 fix tacotron for coqpit 2021-05-11 11:29:17 +02:00
Eren Gölge 65d7ad4250 refactor train_speedy_speech.py for coqpit 2021-05-11 11:29:17 +02:00
Eren Gölge 4a58fdfd59 comment out check-arguments before copying fields to the configs 2021-05-11 11:29:17 +02:00
Eren Gölge 05d9543ed8 init GST module using gst config in Tacotron models 2021-05-11 11:29:17 +02:00
Eren Gölge 93a00373f6 move split_dataset 2021-05-11 11:29:17 +02:00
Eren Gölge 9c18e40f64 black formatting 2021-05-11 11:29:17 +02:00
Eren Gölge c34c8137d7 update compute_statistics for coqpit 2021-05-11 11:29:17 +02:00
Eren Gölge 79d7215142 config refactor #5 WIP 2021-05-11 11:29:17 +02:00
Eren Gölge dc50f5f0b0 config refactor #4 WIP 2021-05-11 11:28:35 +02:00
Eren Gölge 97bd5f9734 [ci skip] config update #3 WIP 2021-05-11 11:28:35 +02:00
Eren Gölge a21c0b5585 config update 2 WIP 2021-05-11 11:28:35 +02:00
Eren Gölge e092ae40dc config update WIP 2021-05-11 11:28:35 +02:00
Eren Gölge 06f80a4806 update check argument 2021-05-11 11:28:35 +02:00
Eren Gölge bf7ddfa542
Merge pull request #481 from chmodsss/main
Accessing __version__ command
2021-05-11 10:20:48 +02:00
Edresson 85ccad7e0a add Audio data augamentation Addtive and RIR 2021-05-11 00:59:57 -03:00
Edresson 77d85c6cc5 add softmaxproto loss and bug fix in data loader 2021-05-10 17:08:38 -03:00
chmodsss 607d5cf377 [#480] Adding version variable 2021-05-10 19:46:34 +02:00
Adam Froghyar 7ddc885f37 deleted a line the broke GravesAttention 2021-05-10 15:42:59 +02:00
Edresson 78bad25f2b update voxceleb download link 2021-05-07 23:45:15 -03:00
Eren Gölge f7582107da
Merge pull request #453 from Edresson/dev
Script for spectrogram extraction using teacher forcing and Glow-TTS inference with MAS.
2021-05-06 17:53:28 +02:00
Edresson 501c8e0302 remove unused vars on extract tts spectrograms script 2021-05-04 19:04:13 -03:00
Eren Gölge 0325c58862
Merge pull request #468 from shaun95/patch-1
Update losses.py
2021-05-03 14:45:24 +02:00
Eren Gölge 8cb27267a4 formatting 2021-05-03 14:26:35 +02:00
Eren Gölge 87d674a038 bumpup librosa version to 0.8.0 2021-05-03 14:25:09 +02:00
shaun 7d0ec62bf1
Update losses.py
The block of code for use_l1_spec_loss is repeated which doubles the amount of L1 loss when enabled.
The weight for L1 loss in hifigan_ljspeech configutation will likely need to be doubled to compensate (l1_spec_loss_weight)
2021-05-02 14:14:24 +02:00
Edresson 3ecd556bbe add unit test for extract tts spectrograms script 2021-05-01 13:41:56 -03:00
Edresson 446b1da936 create inference function 2021-04-29 18:18:37 -03:00
Eren Gölge f02f0338c2 fix .models.json and add testing to check released models availability 2021-04-29 09:32:36 +02:00
Eren Gölge fd95e9b8a4 [ci skip] Add sam models 2021-04-28 21:57:31 +02:00
Agrin Hilmkil 351d0ed6ae Remove unnecessary fsspec usage 2021-04-28 11:21:08 +02:00
Agrin Hilmkil 167f86417e Move dev, tf, notebook dependencies to extras 2021-04-28 11:20:06 +02:00
Eren Gölge 1235e54738 test for synthesize.py 2021-04-27 14:17:38 +02:00
Eren Gölge 4719414f2e remove imports 2021-04-27 11:25:17 +02:00
Eren Gölge add97cddc1 move function and remove import 2021-04-27 11:22:56 +02:00
Eren Gölge 734e6a515c bug fix 2021-04-27 10:27:45 +02:00
Eren Gölge 6bdd81667e place holders for sc-glow and hifigan models 2021-04-26 19:53:12 +02:00
Eren Gölge 2f0716073e enable multi-speaker CoquiTTS models for synthesize.py 2021-04-26 19:36:53 +02:00
Eren Gölge b531fa699c remove conflicy noise 2021-04-26 15:27:52 +02:00
Eren Gölge f37b488876 Merge branch 'speaker-manager' of https://github.com/coqui-ai/TTS into speaker-manager 2021-04-26 15:25:25 +02:00
Eren Gölge b82daa5e86 style and linter fixes 2021-04-26 15:22:24 +02:00
Edresson 20e42a3381 add save audio option 2021-04-23 15:00:00 -03:00
Edresson 8228091f92 add script for extraction of tts spectrograms 2021-04-23 14:17:46 -03:00
Eren Gölge 4cf211348d styling and linting 2021-04-23 18:04:37 +02:00
Eren Gölge 7eb0c60d2e let synthesizer to pass speaker encoder file paths to speaker manager 2021-04-23 18:04:37 +02:00
Eren Gölge f69195739e let speaker manager compute mean x_vector from multiple wav files 2021-04-23 18:04:37 +02:00
Eren Gölge 179722e3a7 new arguments to synthesize.py for loading speaker encoder and speaker wavs 2021-04-23 18:04:37 +02:00
Eren Gölge dfa415a8b8 small refactor in server.py 2021-04-23 18:04:37 +02:00
Eren Gölge c80d21f311 load speaker_encoder_ap and compute x_vector directly from the input file in speaker manager 2021-04-23 18:04:37 +02:00
Eren Gölge ad047c8195 html formatting, enable multi-speaker model on the server with a dropdown menu to select the speaker 2021-04-23 18:04:37 +02:00
Eren Gölge f9f3d04d14 remove moved function 2021-04-23 18:04:37 +02:00
Eren Gölge 10c988ac8c update server.py 2021-04-23 18:04:37 +02:00
Eren Gölge 6d0f5e0459 use SpeakerManager in Synthesizer 2021-04-23 18:04:37 +02:00
Eren Gölge e97126314c add ```unique``` argument to make_symbols to fix the incompat. issue of the
SC-Glow models
2021-04-23 18:04:37 +02:00
Eren Gölge d08888e603 formating speakers.py 2021-04-23 18:04:37 +02:00
Eren Gölge df422223a3 initial SpeakerManager implementation 2021-04-23 18:04:37 +02:00
Eren Gölge 7a7aeb35f5 fix the glow-tts in setup_model 2021-04-23 18:04:37 +02:00
Eren Gölge d42748082a update argument name external_speaker_embedding_dim -> speaker_embedding_dim
add inference_noise_scale argument to glow-tts
2021-04-23 18:04:37 +02:00
Eren Gölge 2da81f5bb6 add load_chekpoint to speaker encoder 2021-04-23 18:04:37 +02:00
Eren Gölge 1229ccbf07 update argument name in server.py 2021-04-23 18:04:37 +02:00
Eren Gölge af2d36faeb update synthesize.py for multi-speaker setting 2021-04-23 18:04:37 +02:00
Eren Gölge 99dc07a7dd add ```unique``` param to keep scglow models compatible (they are duplicate symbols ins the character set) 2021-04-23 18:04:37 +02:00
Eren Gölge c955a12428 set the default layer size compatible with scglow 2021-04-23 18:04:37 +02:00
Eren Gölge 3ace2440fa fix a mistake from rebase 2021-04-23 18:04:37 +02:00
Eren Gölge aadb2106ec code styling 2021-04-23 18:04:37 +02:00
Eren Gölge af7baa3387 refactoring to allow defining the speaker file externally 2021-04-23 18:04:37 +02:00
kirianguiller 7dccbfdcd5 handle multi speaker and gst in Synthetizer class 2021-04-23 18:04:37 +02:00
Edresson d2b6326b8b change optimizer initialization for compatibility with Hifi-GAN official implementation 2021-04-23 07:54:39 -03:00
WeberJulian 4205284f92
Change name of the functions 2021-04-23 10:09:55 +02:00
WeberJulian a26498181b Change back the default value 2021-04-22 16:10:17 +02:00
Julian Weber 355e1f47ab fix dumb mistake 2021-04-22 15:50:29 +02:00
Julian Weber c125b71f36 fix windows support 2021-04-22 15:14:24 +02:00
Jörg Thalheim f5fd7f78d4 server: also listen to ipv6
The [::] address will listen to both ipv4/ipv6 addresses.
2021-04-22 12:38:55 +02:00
Eren Gölge ef37633cb3 [ci skip] use prenet_dropout by default with Tacotron models 2021-04-22 12:38:55 +02:00
Eren Gölge e1d960da9e use SpeakerManager in Synthesizer 2021-04-21 13:13:27 +02:00
Eren Gölge 04b6881b66 add ```unique``` argument to make_symbols to fix the incompat. issue of the
SC-Glow models
2021-04-21 13:12:35 +02:00
Eren Gölge 790946faec formating speakers.py 2021-04-21 13:12:11 +02:00
Eren Gölge ab313814de initial SpeakerManager implementation 2021-04-21 13:11:46 +02:00
Eren Gölge 09890c7421 fix the glow-tts in setup_model 2021-04-21 13:10:40 +02:00
Eren Gölge 8764d02eb2 update argument name external_speaker_embedding_dim -> speaker_embedding_dim
add inference_noise_scale argument to glow-tts
2021-04-21 13:09:44 +02:00