Commit Graph

373 Commits

Author SHA1 Message Date
Eren Gölge d309f50e53
Implement FreeVC (#2451)
* Update .gitignore

* Draft FreeVC implementation

* Tests and relevant updates

* Update API tests

* Add missings

* Update requirements

* :(

* Lazy handle for vc

* Update docs for voice conversion

* Make style
2023-03-25 18:33:23 +01:00
Khalid Bashir 14c80dd1fd
vits.py training fixed due to return_complex (#2418)
Torch set default value for `return_complex=True` for `torch.stft` method
This turned warning into error:-
```
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1591, in fit
    self._fit()
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1544, in _fit
    self.train_epoch()
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1309, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1162, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1023, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx)
  File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 970, in _model_train_step
    return model.train_step(*input_args)
  File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 1293, in train_step
    mel_slice_hat = wav_to_mel(
  File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 191, in wav_to_mel
    spec = torch.stft(
  File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 641, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.
```
2023-03-19 00:22:04 +01:00
Roee Shenberg 3c15f0619a
Bug fixes in OverFlow audio generation (#2380) 2023-03-15 12:02:11 +01:00
Daniel Vera Nieto dfb48737fb Style fixed 2023-03-13 16:11:15 +01:00
Dani Vera 0d12229b64
Update vits.py
This should fix the issue https://github.com/coqui-ai/TTS/issues/1986 without breaking batch data sampling.
2023-03-10 18:35:16 +01:00
manmay nakhashi 624513018d
add energy by default to Fastspeech2 config (#2326)
* add energy by default

* added energy to base tts

* fix energy dataset

* fix styles

* fix test
2023-03-06 10:20:25 +01:00
thennal10 d39bc74f57
OverFlow with test sentences (#2253)
* Fix typo in function definiton

* Swap hasattr out

hasattr(self, "speaker_manager")  and hasattr(self, "language_manager") seems to be redundant since BaseTTS defines both.
2023-03-01 09:11:30 +01:00
Eren Gölge 914280a556
Bump up to v0.11.0 (#2329)
* Make style

* Bump up to v0.11.0
2023-02-08 13:58:49 +01:00
Shivam Mehta d83ee8fe45
Adding neural HMM TTS Model (#2272)
* Adding neural HMM TTS

* Adding tests

* Adding neural hmm on readme

* renaming training recipe

* Removing overflow\s decoder parameters from the config

* Update the Trainer requirement version for a compatible one (#2276)

* Bump up to v0.10.2

* Adding neural HMM TTS

* Adding tests

* Adding neural hmm on readme

* renaming training recipe

* Removing overflow\s decoder parameters from the config

* fixing documentation

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-01-23 11:53:04 +01:00
Eren G??lge 6e3f74fc29 Fix #2191 2023-01-15 23:11:57 +01:00
manmay nakhashi bc422f2f3c
Fastspeech2 (#2073)
* added EnergyDataset

* add energy to Dataset

* add comupte_energy

* added energy params

* added energy to forward_tts

* added plot_avg_energy for visualisation

* Update forward_tts.py

* create file

* added fastspeech2 recipe

* add fastspeech2 config

* removed energy from fast pitch

* add energy loss to forward tts

* Update fastspeech2_config.py

* change run_name

* Update numpy_transforms.py

* fix typo

* fix typo

* fix typo

* linting issues

* use_energy default value --> False

* Update numpy_transforms.py

* linting fixes

* fix typo

* liniting_fix

* liniting_fix

* fix

* fixes

* fixes

* lint fix

* lint fixws

* added training test

* wrong import

* wrong import

* trailing whitespace

* style fix

* changed class name because of error

* class name change

* class name change

* change class name

* fixed styles
2023-01-15 22:39:22 +01:00
Khalid Bashir 42afad5e79
Fixed bug related to yourtts speaker embeddings issue (#2234)
* Fixed bug related to yourtts speaker embeddings issue

* Reverted code for base_tts

* Bug fix on VITS d_vector_file type

* Ignore the test speakers on YourTTS recipe

* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference

* Update YourTTS config file

* Update ModelManager._update_path to deal with list attributes

* Fix lint checks

* Remove unused code

* Fix unit tests

* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files

* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
2023-01-02 14:20:02 +01:00
Eren Gölge a9167cf239
Fixup overflow (#2218)
* Update overflow config

* Pulling shuffle and drop_last  from config

* Print training stats for overflow
2022-12-15 00:56:48 +01:00
Eren Gölge ecea43ec81
Adding pre-trained Overflow model (#2211)
* Adding pretrained Overflow model

* Stabilize HMM

* Fixup model manager

* Return `audio_unique_name` by default

* Distribute max split size over datasets

* Fixup eval_split_size

* Make style
2022-12-14 16:55:48 +01:00
Shivam Mehta 3b8b105b0d
Adding OverFlow (#2183)
* Adding encoder

* currently modifying hmm

* Adding hmm

* Adding overflow

* Adding overflow setting up flat start

* Removing runs

* adding normalization parameters

* Fixing models on same device

* Training overflow and plotting evaluations

* Adding inference

* At the end of epoch the test sentences are coming on cpu instead of gpu

* Adding figures from model during training to monitor

* reverting tacotron2 training recipe

* fixing inference on gpu for test sentences on config

* moving helpers and texts within overflows source code

* renaming to overflow

* moving loss to the model file

* Fixing the rename

* Model training but not plotting the test config sentences's audios

* Formatting logs

* Changing model name to camelcase

* Fixing test log

* Fixing plotting bug

* Adding some tests

* Adding more tests to overflow

* Adding all tests for overflow

* making changes to camel case in config

* Adding information about parameters and docstring

* removing compute_mel_statistics moved statistic computation to the model instead

* Added overflow in readme

* Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json
2022-12-12 12:44:15 +01:00
Edresson Casanova ee20e30958 Fix VITS multi-speaker voice conversion inference 2022-12-05 09:15:01 -03:00
Eren Gölge 9321b22203
Fix scheduler order 2022-12-05 12:26:15 +01:00
Eren Gölge 8cb1433e6e
Cache fsspec downloads (#2132)
* Cache fsspec downloaded files

* Use diff paths for test

* Make fsspec caching optional

* Decom GPU docker tests

* Make progress bar optional for better CI log

* Check path local
2022-11-09 22:12:48 +01:00
Victor Shepardson 5307a2229b
Fix Capacitron training (#2086) 2022-11-01 12:52:06 +01:00
Edresson Casanova f3b947e706
Minors bug fixes on VITS/YourTTS and inference (#2054)
* Set the right device to the speaker encoder

* Bug fix on inference list_language_idxs parameter

* Bug fix on speaker encoder resample audio transform
2022-10-06 22:23:54 +02:00
Edresson Casanova 3faccbda97
Fix dataset handling with the new embedding file keys (#1991) 2022-09-19 23:44:14 +02:00
Eren Gölge 9e5a469c64
d-vector handling (#1945)
* Update BaseDatasetConfig

- Add dataset_name
- Chane name to formatter_name

* Update compute_embedding

- Allow entering dataset by args
- Use released model by default
- Use the new key format

* Update loading

* Update recipes

* Update other dep code

* Update tests

* Fixup

* Load multiple embedding files

* Fix argument names in dep code

* Update docs

* Fix argument name

* Fix linter
2022-09-13 14:10:33 +02:00
Eren Gölge fcb0bb58ae
Handle when no batch sampler (#1882) 2022-08-18 11:26:04 +02:00
Eren Gölge bfc63829ac
Implement bucketed weighted sampling for VITS (#1871) 2022-08-15 11:08:11 +02:00
manmay nakhashi 7fd9b89ebf
fix get_random_embeddings --> get_random_embedding (#1726)
* fix get_random_embeddings --> get_random_embedding

function typo leads to training crash, no such function

* fix typo

get_random_embedding
2022-08-07 14:06:03 +02:00
ivan provalov 903d9c791a
Fix for FloorDiv Function Warning (#1760)
* Fix for Floor Function Warning

Fix for Floor Function Warning

* Adding double quotes to fix formatting

Adding double quotes to fix formatting

* Update glow_tts.py

* Update glow_tts.py
2022-07-20 11:31:22 +02:00
Eren Gölge 49bac724c0
Implement VitsAudioConfig (#1556)
* Implement VitsAudioConfig

* Update VITS LJSpeech recipe

* Update VITS VCTK recipe

* Make style

* Add missing decorator

* Add missing param

* Make style

* Update recipes

* Fix test

* Bug fix

* Exclude tests folder

* Make linter

* Make style
2022-07-12 18:49:58 +02:00
WeberJulian c614f21982
Add durations as aux input for VITS (#1694)
* Add durations as aux input for VITS

* Make style

* Fix tts_tests

* Fix test_get_aux_input
2022-07-12 14:25:21 +02:00
Eren Gölge f70e82cd19
Use fsspec and torch for embedding file IO (#1581)
* Use fsspec and torch for embedding file

* Fixup

* Fix load and save files

* Fix compute embedding script

* Set use_cuda to true if available

* Add dummy speakers.pth file

* Make style

* Change default speakers file extension

Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
2022-06-01 13:49:42 +02:00
a-froghyar 8be21ec387
Capacitron (#977)
* new CI config

* initial Capacitron implementation

* delete old unused file

* fix empty formatting changes

* update losses and training script

* fix previous commit

* fix commit

* Add Capacitron test and first round of test fixes

* revert formatter change

* add changes to the synthesizer

* add stepwise gradual lr scheduler and changes to the recipe

* add inference script for dev use

* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting

* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model

* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style

* chore: fix linting

* feat: add Tacotron 2 support

* leftover from dev

* chore:rename parser args

* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers

* chore: revert arbitrary trainer changes

* fmt: revert formatting bug

* formatting again

* formatting fixed

* fix: log func

* fix: update optimizer
- Implemented load_state_dict for continuing training

* fix: clean optimizer init for standard models

* improvement: purge espeak flags and add training scripts

* Delete capacitronT2.py

delete old training script, new one is pushed

* feat: capacitron trainer methods
- extracted capacitron specific training  operations from the trainer into custom
methods in taco1 and taco2 models

* chore: renaming and merging capacitron and gst style args

* fix: bug fixes from the previous commit

* fix: implement state_dict method on CapacitronOptimizer

* fix: call method

* fix: inference naming

* Delete train_capacitron.py

* fix: synthesize

* feat: update tests

* chore: fix style

* Delete capacitron_inference.py

* fix: fix train tts t2 capacitron tests

* fix: double forward in T2 train step

* fix: double forward in T1 train step

* fix: run make style

* fix: remove unused import

* fix: test for T1 capacitron

* fix: make lint

* feat: add blizzard2013 recipes

* make style

* fix: update recipes

* chore: make style

* Plot test sentences in Tacotron

* chore: make style and fix import

* fix: call forward first before problematic floordiv op

* fix: update recipes

* feat: add min_audio_len to recipes

* aux_input["style_mel"]

* chore: make style

* Make capacitron T2 recipe more stable

* Remove T1 capacitron Ljspeech

* feat: implement new grad clipping routine and update configs

* make style

* Add pretrained checkpoints

* Add default vocoder

* Change trainer package

* Fix grad clip issue for tacotron

* Fix scheduler issue with tacotron

Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-05-20 16:17:11 +02:00
Edresson Casanova ee99a6c1e2 Fix voice conversion inference (#1583)
* Add voice conversion zoo test

* Fix style

* Fix unit test
2022-05-20 15:50:25 +02:00
Edresson Casanova e5d8ec2402
Change the VITS upsampling interpolation trick to linear (#1564) 2022-05-13 10:52:39 +02:00
Edresson Casanova c6008e5235
Add audio length sampler balancer (#1561)
* Add audio length sampler balancer

* Add unit tests
2022-05-12 19:59:19 +02:00
Eren Gölge 6e460b7e42
Add an assert for the upsampling trick (#1538) 2022-05-12 19:55:24 +02:00
Eren Gölge e45ae57aef
Merge pull request #1550 from coqui-ai/fix-upsampling-asserts
Fix VITS upsampling asserts
2022-05-12 14:51:41 +02:00
Edresson Casanova 175ca06388 Add reinit text encoder and duration predictor parameter (#1562)
* Add reinit encoder and duration predictor option

* Add .data to prevent any overlooked autograd hook
2022-05-12 09:08:36 -03:00
Edresson Casanova 182711043c Fix the VITS upsampling asserts
Fix style
2022-05-12 09:08:29 -03:00
Eren Gölge c18bd21b3f Return durations at VITS inference 2022-05-11 11:30:05 +02:00
Eren Gölge 5021a03de0 Use torch.no_grad for VITS inference 2022-05-11 11:29:36 +02:00
Eren Gölge 3f03e3012c Fix batch_group_size in VITS 2022-05-07 13:44:44 +02:00
WeberJulian fbdf76b2fc returns y_mask in VITS inference (#1540)
* returns y_mask

* make style
2022-05-03 13:49:24 +02:00
Edresson Casanova 8d228ab22a
Trick to Upsampling to High sampling rates using VITS model (#1456)
* Add upsample VITS support

* Fix the bug in inference

* Fix lint checks

* Add RMS based norm in save_wav method

* Style fix

* Add the period for VITS multi-period discriminator in model_args

* Bug fix in speaker encoder load in inference time

* Add unit tests

* Remove useless detach_z_vocoder parameter

* Add docs for VITS upsampling

* Fix the docs

* Rename TTS_part_sample_rate to encoder_sample_rate

* Add upsampling_init and upsampling_z methods

* Add asserts for encoder_sample_rate part

* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Edresson Casanova 060e0f9368
Add EmbeddingManager and BaseIDManager (#1374) 2022-03-31 13:41:16 +02:00
Edresson Casanova 37896e1743
Bug fix in freeze encoder (#1391)
* Fix the bug in freeze encoder

* Remove emb_l definition for non-multilingual training

* Fix unit tests
2022-03-24 18:16:04 +01:00
Eren Gölge 0870a4faa2
Make style (#1405) 2022-03-16 12:13:55 +01:00
Edresson Casanova dbe9da7f15
Add Voice conversion inference support (#1337)
* Add support for voice conversion inference

* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json

* Rebase bug fix

* Use the average d-vector for inference
2022-03-10 14:57:12 +01:00
Edresson Casanova 917f417ac4
Add alphas to control language and speaker balancer (#1216)
* Add alphas to control language and speaker balancer

* Add docs for speaker and language samplers

* Change the Samplers weights to float for save memory

* Change the test_samplers to unittest format

* Add get_sampler method in BaseTTS

* Fix rebase issues

* Add language and speaker samplers support for DDP training

* Rename distributed sampler wrapper

* Remove the DistributedSamplerWrapper and use the one from Trainer

* Bugfix after rebase

* Move the samplers config to tts config
2022-03-10 14:56:09 +01:00
Eren Gölge dd4287de1f Update models 2022-03-03 20:23:00 +01:00
Eren Gölge 1425a023fe Make style and lint 2022-03-02 13:25:35 +01:00
Eren Gölge c68885b3fd Update Vits speaker encoder init 2022-03-02 13:20:23 +01:00
Eren Gölge 27b67b7945 Fix import 2022-03-02 09:15:20 +01:00
Eren Gölge 942df0fb05 Update vits dataset 2022-03-02 09:14:32 +01:00
Eren Gölge 1e414b3a09 Make stlye 2022-02-25 11:31:56 +01:00
Eren Gölge acc83cd3e6 Update Vits model API 2022-02-25 11:31:56 +01:00
Eren Gölge fe656659be Implement BaseTTS 2022-02-25 11:31:56 +01:00
Eren Gölge 83c5ddc5b7 Update imports 2022-02-25 11:31:56 +01:00
Eren Gölge 14c117978d Fix return outputs 2022-02-25 11:31:56 +01:00
Eren Gölge 424d04e4f6 Make stlye 2022-02-25 11:31:56 +01:00
Eren Gölge c68962c574 Update forward tts binary loss 2022-02-25 11:30:24 +01:00
Eren Gölge 00c7600103 Update Vits model API 2022-02-25 11:30:24 +01:00
Eren Gölge 35fc7270ff Implement BaseTTS 2022-02-25 11:28:47 +01:00
Eren Gölge 1e219fef0a Revert drop_last 2022-02-25 11:26:59 +01:00
Eren Gölge 4b96bfe925 Fix train logging 2022-02-25 11:26:59 +01:00
Eren Gölge ab8a4ca2c3 Revert random segment 2022-02-25 11:26:59 +01:00
Eren Gölge 8622226f3f Make style 2022-02-25 11:26:59 +01:00
Eren Gölge 54c6bb2a8c Fix add speaker VITS 2022-02-25 11:26:59 +01:00
Eren Gölge 38314194e7 Set `drop_last` 2022-02-25 11:26:59 +01:00
Eren Gölge f70e4bb8c6 Add new speakers to the vits model 2022-02-25 11:26:59 +01:00
Eren Gölge 1f0c8179da Make style 2022-02-25 11:26:59 +01:00
Eren Gölge 34c4be5e49 Update forwardtts 2022-02-25 11:26:59 +01:00
Eren Gölge 2829027d8b Refactor VITS model 2022-02-25 11:26:59 +01:00
Eren Gölge ef63c99524 Implement `start_by_longest` option for TTSDatase 2022-02-25 11:26:18 +01:00
Eren Gölge 146fbfd7c9 Extend unittests 2022-02-25 11:25:00 +01:00
Eren Gölge 2fe16de8e3 Make lint 2022-02-25 11:25:00 +01:00
Eren Gölge 235f7d9b02 Extend glow_tts model tests 2022-02-25 11:24:13 +01:00
Eren Gölge 001da8afc8 Update Vits for the new model API 2022-02-25 11:21:19 +01:00
Eren Gölge 5176ae9e53 Fixes small compat. issues 2022-02-25 11:21:19 +01:00
Eren Gölge 4c5cb44eeb Update setup_model 2022-02-25 11:11:35 +01:00
Eren Gölge 7c4243fba7 Update GlowTTS 2022-02-25 11:11:35 +01:00
Eren Gölge bacf79f4fb Update AlignTTS 2022-02-25 11:11:35 +01:00
Eren Gölge 18f726af65 Update ForwardTTS 2022-02-25 11:11:35 +01:00
Eren Gölge d0ec4b91e5 Update Tacotron models 2022-02-25 11:11:35 +01:00
Eren Gölge ea965a5683 Update VITS for the new API 2022-02-25 11:11:35 +01:00
Eren Gölge 93957d58a1 Refactorin VITS for the tokenizer API 2022-02-25 11:05:06 +01:00
Eren Gölge 452dbc43d8 Update imports for symbols -> characters 2022-02-25 11:05:06 +01:00
Eren Gölge 8071fa0020 Refactor GlowTTS model and recipe for TTSTokenizer 2022-02-25 11:05:06 +01:00
Eren Gölge 7575367b9f Refactorin VITS for the tokenizer API 2022-02-25 10:57:35 +01:00
Eren Gölge 4cd690e4c1 Updates BaseTTS and configs 2022-02-25 10:57:35 +01:00
Eren Gölge 4597d4e5b6 Remove get_characters from BaseTTS 2022-02-25 10:48:03 +01:00
Eren Gölge 2d8ce98d2a Update imports for symbols -> characters 2022-02-25 10:48:03 +01:00
Eren Gölge 9a95e15483 Refactor GlowTTS model and recipe for TTSTokenizer 2022-02-25 10:48:03 +01:00
Eren Gölge 04202da1ac Make style 2022-02-25 10:48:03 +01:00
Eren Gölge bb389479a4 Update setup_model for TTS.tts models 2022-02-25 10:48:03 +01:00
Eren Gölge d2525abe8c Remove get_characters from BaseTTS 2022-02-25 10:48:03 +01:00
Eren Gölge 73d27ebd45 Fix GlowTTS 2022-02-25 10:48:03 +01:00
Eren Gölge fbad17e084 Update imports for symbols -> characters 2022-02-25 10:48:02 +01:00
Eren Gölge bd461ace33 Refactor GlowTTS model and recipe for TTSTokenizer 2022-02-25 10:45:24 +01:00
Edresson Casanova ba6e56e01c Fix Glow-TTS multi-speaker inference 2022-02-18 19:25:29 +00:00
Eren Gölge 127118c637
Update TTS.tts formatters (#1228)
* Return Dict from tts formatters

* Make style
2022-02-11 23:03:43 +01:00
WeberJulian e778bad626 Add argument to enable dp speaker conditioning 2022-01-06 15:07:27 +01:00