coqui-tts

Commit Graph

Author	SHA1	Message	Date
Eren Gölge	d309f50e53	Implement FreeVC (#2451 ) * Update .gitignore * Draft FreeVC implementation * Tests and relevant updates * Update API tests * Add missings * Update requirements * :( * Lazy handle for vc * Update docs for voice conversion * Make style	2023-03-25 18:33:23 +01:00
Khalid Bashir	14c80dd1fd	vits.py training fixed due to return_complex (#2418 ) Torch set default value for `return_complex=True` for `torch.stft` method This turned warning into error:- ``` Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1591, in fit self._fit() File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1544, in _fit self.train_epoch() File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1309, in train_epoch _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time) File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1162, in train_step outputs, loss_dict_new, step_time = self._optimize( File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1023, in _optimize outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx) File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 970, in _model_train_step return model.train_step(*input_args) File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 1293, in train_step mel_slice_hat = wav_to_mel( File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 191, in wav_to_mel spec = torch.stft( File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 641, in stft return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. ```	2023-03-19 00:22:04 +01:00
Roee Shenberg	3c15f0619a	Bug fixes in OverFlow audio generation (#2380 )	2023-03-15 12:02:11 +01:00
Daniel Vera Nieto	dfb48737fb	Style fixed	2023-03-13 16:11:15 +01:00
Dani Vera	0d12229b64	Update vits.py This should fix the issue https://github.com/coqui-ai/TTS/issues/1986 without breaking batch data sampling.	2023-03-10 18:35:16 +01:00
manmay nakhashi	624513018d	add energy by default to Fastspeech2 config (#2326 ) * add energy by default * added energy to base tts * fix energy dataset * fix styles * fix test	2023-03-06 10:20:25 +01:00
thennal10	d39bc74f57	OverFlow with test sentences (#2253 ) * Fix typo in function definiton * Swap hasattr out hasattr(self, "speaker_manager") and hasattr(self, "language_manager") seems to be redundant since BaseTTS defines both.	2023-03-01 09:11:30 +01:00
Eren Gölge	914280a556	Bump up to v0.11.0 (#2329 ) * Make style * Bump up to v0.11.0	2023-02-08 13:58:49 +01:00
Shivam Mehta	d83ee8fe45	Adding neural HMM TTS Model (#2272 ) * Adding neural HMM TTS * Adding tests * Adding neural hmm on readme * renaming training recipe * Removing overflow\s decoder parameters from the config * Update the Trainer requirement version for a compatible one (#2276) * Bump up to v0.10.2 * Adding neural HMM TTS * Adding tests * Adding neural hmm on readme * renaming training recipe * Removing overflow\s decoder parameters from the config * fixing documentation Co-authored-by: Edresson Casanova <edresson1@gmail.com> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-01-23 11:53:04 +01:00
Eren G??lge	6e3f74fc29	Fix #2191	2023-01-15 23:11:57 +01:00
manmay nakhashi	bc422f2f3c	Fastspeech2 (#2073 ) * added EnergyDataset * add energy to Dataset * add comupte_energy * added energy params * added energy to forward_tts * added plot_avg_energy for visualisation * Update forward_tts.py * create file * added fastspeech2 recipe * add fastspeech2 config * removed energy from fast pitch * add energy loss to forward tts * Update fastspeech2_config.py * change run_name * Update numpy_transforms.py * fix typo * fix typo * fix typo * linting issues * use_energy default value --> False * Update numpy_transforms.py * linting fixes * fix typo * liniting_fix * liniting_fix * fix * fixes * fixes * lint fix * lint fixws * added training test * wrong import * wrong import * trailing whitespace * style fix * changed class name because of error * class name change * class name change * change class name * fixed styles	2023-01-15 22:39:22 +01:00
Khalid Bashir	42afad5e79	Fixed bug related to yourtts speaker embeddings issue (#2234 ) * Fixed bug related to yourtts speaker embeddings issue * Reverted code for base_tts * Bug fix on VITS d_vector_file type * Ignore the test speakers on YourTTS recipe * Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference * Update YourTTS config file * Update ModelManager._update_path to deal with list attributes * Fix lint checks * Remove unused code * Fix unit tests * Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files * Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes Co-authored-by: Edresson Casanova <edresson1@gmail.com>	2023-01-02 14:20:02 +01:00
Eren Gölge	a9167cf239	Fixup overflow (#2218 ) * Update overflow config * Pulling shuffle and drop_last from config * Print training stats for overflow	2022-12-15 00:56:48 +01:00
Eren Gölge	ecea43ec81	Adding pre-trained Overflow model (#2211 ) * Adding pretrained Overflow model * Stabilize HMM * Fixup model manager * Return `audio_unique_name` by default * Distribute max split size over datasets * Fixup eval_split_size * Make style	2022-12-14 16:55:48 +01:00
Shivam Mehta	3b8b105b0d	Adding OverFlow (#2183 ) * Adding encoder * currently modifying hmm * Adding hmm * Adding overflow * Adding overflow setting up flat start * Removing runs * adding normalization parameters * Fixing models on same device * Training overflow and plotting evaluations * Adding inference * At the end of epoch the test sentences are coming on cpu instead of gpu * Adding figures from model during training to monitor * reverting tacotron2 training recipe * fixing inference on gpu for test sentences on config * moving helpers and texts within overflows source code * renaming to overflow * moving loss to the model file * Fixing the rename * Model training but not plotting the test config sentences's audios * Formatting logs * Changing model name to camelcase * Fixing test log * Fixing plotting bug * Adding some tests * Adding more tests to overflow * Adding all tests for overflow * making changes to camel case in config * Adding information about parameters and docstring * removing compute_mel_statistics moved statistic computation to the model instead * Added overflow in readme * Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json	2022-12-12 12:44:15 +01:00
Edresson Casanova	ee20e30958	Fix VITS multi-speaker voice conversion inference	2022-12-05 09:15:01 -03:00
Eren Gölge	9321b22203	Fix scheduler order	2022-12-05 12:26:15 +01:00
Eren Gölge	8cb1433e6e	Cache fsspec downloads (#2132 ) * Cache fsspec downloaded files * Use diff paths for test * Make fsspec caching optional * Decom GPU docker tests * Make progress bar optional for better CI log * Check path local	2022-11-09 22:12:48 +01:00
Victor Shepardson	5307a2229b	Fix Capacitron training (#2086 )	2022-11-01 12:52:06 +01:00
Edresson Casanova	f3b947e706	Minors bug fixes on VITS/YourTTS and inference (#2054 ) * Set the right device to the speaker encoder * Bug fix on inference list_language_idxs parameter * Bug fix on speaker encoder resample audio transform	2022-10-06 22:23:54 +02:00
Edresson Casanova	3faccbda97	Fix dataset handling with the new embedding file keys (#1991 )	2022-09-19 23:44:14 +02:00
Eren Gölge	9e5a469c64	d-vector handling (#1945 ) * Update BaseDatasetConfig - Add dataset_name - Chane name to formatter_name * Update compute_embedding - Allow entering dataset by args - Use released model by default - Use the new key format * Update loading * Update recipes * Update other dep code * Update tests * Fixup * Load multiple embedding files * Fix argument names in dep code * Update docs * Fix argument name * Fix linter	2022-09-13 14:10:33 +02:00
Eren Gölge	fcb0bb58ae	Handle when no batch sampler (#1882 )	2022-08-18 11:26:04 +02:00
Eren Gölge	bfc63829ac	Implement bucketed weighted sampling for VITS (#1871 )	2022-08-15 11:08:11 +02:00
manmay nakhashi	7fd9b89ebf	fix get_random_embeddings --> get_random_embedding (#1726 ) * fix get_random_embeddings --> get_random_embedding function typo leads to training crash, no such function * fix typo get_random_embedding	2022-08-07 14:06:03 +02:00
ivan provalov	903d9c791a	Fix for FloorDiv Function Warning (#1760 ) * Fix for Floor Function Warning Fix for Floor Function Warning * Adding double quotes to fix formatting Adding double quotes to fix formatting * Update glow_tts.py * Update glow_tts.py	2022-07-20 11:31:22 +02:00
Eren Gölge	49bac724c0	Implement VitsAudioConfig (#1556 ) * Implement VitsAudioConfig * Update VITS LJSpeech recipe * Update VITS VCTK recipe * Make style * Add missing decorator * Add missing param * Make style * Update recipes * Fix test * Bug fix * Exclude tests folder * Make linter * Make style	2022-07-12 18:49:58 +02:00
WeberJulian	c614f21982	Add durations as aux input for VITS (#1694 ) * Add durations as aux input for VITS * Make style * Fix tts_tests * Fix test_get_aux_input	2022-07-12 14:25:21 +02:00
Eren Gölge	f70e82cd19	Use fsspec and torch for embedding file IO (#1581 ) * Use fsspec and torch for embedding file * Fixup * Fix load and save files * Fix compute embedding script * Set use_cuda to true if available * Add dummy speakers.pth file * Make style * Change default speakers file extension Co-authored-by: WeberJulian <julian.weber@hotmail.fr>	2022-06-01 13:49:42 +02:00
a-froghyar	8be21ec387	Capacitron (#977 ) * new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-05-20 16:17:11 +02:00
Edresson Casanova	ee99a6c1e2	Fix voice conversion inference (#1583 ) * Add voice conversion zoo test * Fix style * Fix unit test	2022-05-20 15:50:25 +02:00
Edresson Casanova	e5d8ec2402	Change the VITS upsampling interpolation trick to linear (#1564 )	2022-05-13 10:52:39 +02:00
Edresson Casanova	c6008e5235	Add audio length sampler balancer (#1561 ) * Add audio length sampler balancer * Add unit tests	2022-05-12 19:59:19 +02:00
Eren Gölge	6e460b7e42	Add an assert for the upsampling trick (#1538 )	2022-05-12 19:55:24 +02:00
Eren Gölge	e45ae57aef	Merge pull request #1550 from coqui-ai/fix-upsampling-asserts Fix VITS upsampling asserts	2022-05-12 14:51:41 +02:00
Edresson Casanova	175ca06388	Add reinit text encoder and duration predictor parameter (#1562 ) * Add reinit encoder and duration predictor option * Add .data to prevent any overlooked autograd hook	2022-05-12 09:08:36 -03:00
Edresson Casanova	182711043c	Fix the VITS upsampling asserts Fix style	2022-05-12 09:08:29 -03:00
Eren Gölge	c18bd21b3f	Return durations at VITS inference	2022-05-11 11:30:05 +02:00
Eren Gölge	5021a03de0	Use torch.no_grad for VITS inference	2022-05-11 11:29:36 +02:00
Eren Gölge	3f03e3012c	Fix batch_group_size in VITS	2022-05-07 13:44:44 +02:00
WeberJulian	fbdf76b2fc	returns y_mask in VITS inference (#1540 ) * returns y_mask * make style	2022-05-03 13:49:24 +02:00
Edresson Casanova	8d228ab22a	Trick to Upsampling to High sampling rates using VITS model (#1456 ) * Add upsample VITS support * Fix the bug in inference * Fix lint checks * Add RMS based norm in save_wav method * Style fix * Add the period for VITS multi-period discriminator in model_args * Bug fix in speaker encoder load in inference time * Add unit tests * Remove useless detach_z_vocoder parameter * Add docs for VITS upsampling * Fix the docs * Rename TTS_part_sample_rate to encoder_sample_rate * Add upsampling_init and upsampling_z methods * Add asserts for encoder_sample_rate part * Move upsampling tests to test_vits.py	2022-04-26 11:47:46 +02:00
Edresson Casanova	060e0f9368	Add EmbeddingManager and BaseIDManager (#1374 )	2022-03-31 13:41:16 +02:00
Edresson Casanova	37896e1743	Bug fix in freeze encoder (#1391 ) * Fix the bug in freeze encoder * Remove emb_l definition for non-multilingual training * Fix unit tests	2022-03-24 18:16:04 +01:00
Eren Gölge	0870a4faa2	Make style (#1405 )	2022-03-16 12:13:55 +01:00
Edresson Casanova	dbe9da7f15	Add Voice conversion inference support (#1337 ) * Add support for voice conversion inference * Cache d_vectors_by_speaker for fast inference using a bigger speakers.json * Rebase bug fix * Use the average d-vector for inference	2022-03-10 14:57:12 +01:00
Edresson Casanova	917f417ac4	Add alphas to control language and speaker balancer (#1216 ) * Add alphas to control language and speaker balancer * Add docs for speaker and language samplers * Change the Samplers weights to float for save memory * Change the test_samplers to unittest format * Add get_sampler method in BaseTTS * Fix rebase issues * Add language and speaker samplers support for DDP training * Rename distributed sampler wrapper * Remove the DistributedSamplerWrapper and use the one from Trainer * Bugfix after rebase * Move the samplers config to tts config	2022-03-10 14:56:09 +01:00
Eren Gölge	dd4287de1f	Update models	2022-03-03 20:23:00 +01:00
Eren Gölge	1425a023fe	Make style and lint	2022-03-02 13:25:35 +01:00
Eren Gölge	c68885b3fd	Update Vits speaker encoder init	2022-03-02 13:20:23 +01:00
Eren Gölge	27b67b7945	Fix import	2022-03-02 09:15:20 +01:00
Eren Gölge	942df0fb05	Update vits dataset	2022-03-02 09:14:32 +01:00
Eren Gölge	1e414b3a09	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	acc83cd3e6	Update Vits model API	2022-02-25 11:31:56 +01:00
Eren Gölge	fe656659be	Implement BaseTTS	2022-02-25 11:31:56 +01:00
Eren Gölge	83c5ddc5b7	Update imports	2022-02-25 11:31:56 +01:00
Eren Gölge	14c117978d	Fix return outputs	2022-02-25 11:31:56 +01:00
Eren Gölge	424d04e4f6	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	c68962c574	Update forward tts binary loss	2022-02-25 11:30:24 +01:00
Eren Gölge	00c7600103	Update Vits model API	2022-02-25 11:30:24 +01:00
Eren Gölge	35fc7270ff	Implement BaseTTS	2022-02-25 11:28:47 +01:00
Eren Gölge	1e219fef0a	Revert drop_last	2022-02-25 11:26:59 +01:00
Eren Gölge	4b96bfe925	Fix train logging	2022-02-25 11:26:59 +01:00
Eren Gölge	ab8a4ca2c3	Revert random segment	2022-02-25 11:26:59 +01:00
Eren Gölge	8622226f3f	Make style	2022-02-25 11:26:59 +01:00
Eren Gölge	54c6bb2a8c	Fix add speaker VITS	2022-02-25 11:26:59 +01:00
Eren Gölge	38314194e7	Set `drop_last`	2022-02-25 11:26:59 +01:00
Eren Gölge	f70e4bb8c6	Add new speakers to the vits model	2022-02-25 11:26:59 +01:00
Eren Gölge	1f0c8179da	Make style	2022-02-25 11:26:59 +01:00
Eren Gölge	34c4be5e49	Update forwardtts	2022-02-25 11:26:59 +01:00
Eren Gölge	2829027d8b	Refactor VITS model	2022-02-25 11:26:59 +01:00
Eren Gölge	ef63c99524	Implement `start_by_longest` option for TTSDatase	2022-02-25 11:26:18 +01:00
Eren Gölge	146fbfd7c9	Extend unittests	2022-02-25 11:25:00 +01:00
Eren Gölge	2fe16de8e3	Make lint	2022-02-25 11:25:00 +01:00
Eren Gölge	235f7d9b02	Extend glow_tts model tests	2022-02-25 11:24:13 +01:00
Eren Gölge	001da8afc8	Update Vits for the new model API	2022-02-25 11:21:19 +01:00
Eren Gölge	5176ae9e53	Fixes small compat. issues	2022-02-25 11:21:19 +01:00
Eren Gölge	4c5cb44eeb	Update setup_model	2022-02-25 11:11:35 +01:00
Eren Gölge	7c4243fba7	Update GlowTTS	2022-02-25 11:11:35 +01:00
Eren Gölge	bacf79f4fb	Update AlignTTS	2022-02-25 11:11:35 +01:00
Eren Gölge	18f726af65	Update ForwardTTS	2022-02-25 11:11:35 +01:00
Eren Gölge	d0ec4b91e5	Update Tacotron models	2022-02-25 11:11:35 +01:00
Eren Gölge	ea965a5683	Update VITS for the new API	2022-02-25 11:11:35 +01:00
Eren Gölge	93957d58a1	Refactorin VITS for the tokenizer API	2022-02-25 11:05:06 +01:00
Eren Gölge	452dbc43d8	Update imports for symbols -> characters	2022-02-25 11:05:06 +01:00
Eren Gölge	8071fa0020	Refactor GlowTTS model and recipe for TTSTokenizer	2022-02-25 11:05:06 +01:00
Eren Gölge	7575367b9f	Refactorin VITS for the tokenizer API	2022-02-25 10:57:35 +01:00
Eren Gölge	4cd690e4c1	Updates BaseTTS and configs	2022-02-25 10:57:35 +01:00
Eren Gölge	4597d4e5b6	Remove get_characters from BaseTTS	2022-02-25 10:48:03 +01:00
Eren Gölge	2d8ce98d2a	Update imports for symbols -> characters	2022-02-25 10:48:03 +01:00
Eren Gölge	9a95e15483	Refactor GlowTTS model and recipe for TTSTokenizer	2022-02-25 10:48:03 +01:00
Eren Gölge	04202da1ac	Make style	2022-02-25 10:48:03 +01:00
Eren Gölge	bb389479a4	Update setup_model for TTS.tts models	2022-02-25 10:48:03 +01:00
Eren Gölge	d2525abe8c	Remove get_characters from BaseTTS	2022-02-25 10:48:03 +01:00
Eren Gölge	73d27ebd45	Fix GlowTTS	2022-02-25 10:48:03 +01:00
Eren Gölge	fbad17e084	Update imports for symbols -> characters	2022-02-25 10:48:02 +01:00
Eren Gölge	bd461ace33	Refactor GlowTTS model and recipe for TTSTokenizer	2022-02-25 10:45:24 +01:00
Edresson Casanova	ba6e56e01c	Fix Glow-TTS multi-speaker inference	2022-02-18 19:25:29 +00:00
Eren Gölge	127118c637	Update TTS.tts formatters (#1228 ) * Return Dict from tts formatters * Make style	2022-02-11 23:03:43 +01:00
WeberJulian	e778bad626	Add argument to enable dp speaker conditioning	2022-01-06 15:07:27 +01:00

1 2 3 4 5 ...

373 Commits