coqui-tts

Commit Graph

Author	SHA1	Message	Date
Eren Gölge	8cb1433e6e	Cache fsspec downloads (#2132 ) * Cache fsspec downloaded files * Use diff paths for test * Make fsspec caching optional * Decom GPU docker tests * Make progress bar optional for better CI log * Check path local	2022-11-09 22:12:48 +01:00
Edresson Casanova	f3b947e706	Minors bug fixes on VITS/YourTTS and inference (#2054 ) * Set the right device to the speaker encoder * Bug fix on inference list_language_idxs parameter * Bug fix on speaker encoder resample audio transform	2022-10-06 22:23:54 +02:00
Eren Gölge	5f5d441ee5	Write non-speech files in a TXT (#2048 ) * Write non-speech files in a txt * Save 16-bit wav out of vad	2022-10-06 13:25:54 +02:00
Edresson Casanova	3faccbda97	Fix dataset handling with the new embedding file keys (#1991 )	2022-09-19 23:44:14 +02:00
Eren Gölge	0a112f7841	Add metafile arg (#1977 )	2022-09-16 14:41:49 +02:00
Eren Gölge	9e5a469c64	d-vector handling (#1945 ) * Update BaseDatasetConfig - Add dataset_name - Chane name to formatter_name * Update compute_embedding - Allow entering dataset by args - Use released model by default - Use the new key format * Update loading * Update recipes * Update other dep code * Update tests * Fixup * Load multiple embedding files * Fix argument names in dep code * Update docs * Fix argument name * Fix linter	2022-09-13 14:10:33 +02:00
Edresson Casanova	159eeeef64	Fix find unique phonemes script (#1928 ) * Fix find unique phonemes script * Fix unit tests	2022-09-08 10:17:35 +02:00
Stanislav Kachnov	2c9f00a808	Fix tune wavegrad (#1844 ) * fix imports in tune_wavegrad * load_config returns Coqpit object instead None * set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program * fix var order in the result of batch collating * make style * make style with black and isort	2022-08-22 09:55:32 +02:00
Eren Gölge	bfc63829ac	Implement bucketed weighted sampling for VITS (#1871 )	2022-08-15 11:08:11 +02:00
camillem	5c821d9fa1	Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469 ) Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-06-27 10:32:43 +02:00
p0p4k	71281ff1e4	Add support for model_info in CLI (#1623 ) * model_info * model_info * model_info_by_idx and name * model_info_by_idx and name * model_info * Update manage.py * fixed linter * fixed linter * fixed linter * fixed linter * fixed return style checks * fixed linter * fixed linter * fixed idx always positive * added comments * fix parser.args check * fix parser.args check * Make style Co-authored-by: Eren G??lge <egolge@coqui.ai>	2022-06-20 23:28:17 +02:00
Eren Gölge	f70e82cd19	Use fsspec and torch for embedding file IO (#1581 ) * Use fsspec and torch for embedding file * Fixup * Fix load and save files * Fix compute embedding script * Set use_cuda to true if available * Add dummy speakers.pth file * Make style * Change default speakers file extension Co-authored-by: WeberJulian <julian.weber@hotmail.fr>	2022-06-01 13:49:42 +02:00
a-froghyar	8be21ec387	Capacitron (#977 ) * new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-05-20 16:17:11 +02:00
Edresson Casanova	ee99a6c1e2	Fix voice conversion inference (#1583 ) * Add voice conversion zoo test * Fix style * Fix unit test	2022-05-20 15:50:25 +02:00
Edresson Casanova	6233f4fcd7	Bug fix in compute embedding without eval partition	2022-04-26 13:58:03 -03:00
Edresson Casanova	060e0f9368	Add EmbeddingManager and BaseIDManager (#1374 )	2022-03-31 13:41:16 +02:00
WeberJulian	c66a6241fd	Enforce phonemizer definition for synthesis (#1441 ) * Enforce phonemizer definition for synthesis * Fix train_tts, tokenizer init can now edit config * Add small change to trigger CI pipeline * fix wrong output path for one tts_test * Fix style * Test config overides by args and tokenizer * Fix style	2022-03-25 23:15:33 +01:00
Edresson Casanova	3435bc8fca	Fix style tests	2022-03-23 15:05:32 -03:00
Edresson Casanova	ea53d6feb3	Replace webrtcvad by silero-vad	2022-03-23 14:39:31 -03:00
Eren Gölge	72d85e53c9	Update model file extension (#1422 ) * Update model file ext to ```.pth``` * Update docs * Rename more * Find model files	2022-03-22 17:55:00 +01:00
Eren Gölge	0870a4faa2	Make style (#1405 )	2022-03-16 12:13:55 +01:00
Edresson Casanova	f81892483d	REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349 ) * Rename Speaker encoder module to encoder * Add a generic emotion dataset formatter * Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config * Add class map in emotion config * Add Base encoder config * Add evaluation encoder script * Fix the bug in plot_embeddings * Enable Weight decay for encoder training * Add argumnet to disable storage * Add Perfect Sampler and remove storage * Add evaluation during encoder training * Fix lint checks * Remove useless config parameter * Active evaluation in speaker encoder test and use multispeaker dataset for this test * Unit tests fixs * Remove useless tests for speedup the aux_tests * Use get_optimizer in Encoder * Add BaseEncoder Class * Fix the unitests * Add Perfect Batch Sampler unit test * Add compute encoder accuracy in a function	2022-03-11 14:43:40 +01:00
Edresson Casanova	36e9ea2f97	Open bible dataset formatter (#1365 ) * Add support for voice conversion inference * Cache d_vectors_by_speaker for fast inference using a bigger speakers.json * Rebase bug fix * Use the average d-vector for inference * Fix the bug in find unique chars script * Add OpenBible formatter Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-03-11 10:43:31 +01:00
Edresson Casanova	dbe9da7f15	Add Voice conversion inference support (#1337 ) * Add support for voice conversion inference * Cache d_vectors_by_speaker for fast inference using a bigger speakers.json * Rebase bug fix * Use the average d-vector for inference	2022-03-10 14:57:12 +01:00
Edresson Casanova	f381e29b91	REBASED: Add support for the speaker encoder training using torch spectrograms (#1348 ) * Add support for the speaker encoder training using torch spectrograms * Remove useless function in speaker encoder dataset class	2022-03-10 14:54:51 +01:00
Eren Gölge	1425a023fe	Make style and lint	2022-03-02 13:25:35 +01:00
Eren Gölge	1e414b3a09	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	bf540f4323	Update imports for trainer	2022-02-25 11:31:56 +01:00
Eren Gölge	424d04e4f6	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	be3a03126a	Update imports for trainer	2022-02-25 11:28:14 +01:00
Eren Gölge	8622226f3f	Make style	2022-02-25 11:26:59 +01:00
Eren Gölge	27db089d6c	Change TrainingArgs -> TrainerArgs	2022-02-25 11:26:59 +01:00
Eren Gölge	1f0c8179da	Make style	2022-02-25 11:26:59 +01:00
Eren Gölge	bc2243bac4	Fix tests	2022-02-25 11:25:00 +01:00
Eren Gölge	2fe16de8e3	Make lint	2022-02-25 11:25:00 +01:00
Eren Gölge	8e248913d6	Update train_tts for the new API	2022-02-25 11:24:13 +01:00
Eren Gölge	730f7c0df4	Add file_ext args to resample.py	2022-02-25 11:15:46 +01:00
Eren Gölge	4d99fee3e2	Update spec extractor	2022-02-25 11:12:44 +01:00
Eren Gölge	38a0b3b6c7	Update train_tts.py	2022-02-25 11:11:35 +01:00
Eren Gölge	fbad17e084	Update imports for symbols -> characters	2022-02-25 10:48:02 +01:00
Eren Gölge	a51b031bff	Merge branch 'dev' into dev-fix-glowtts-infer	2022-02-21 12:01:40 +03:00
Edresson Casanova	28a7464975	Fix the bug in split dataset function (#1251 ) * Fix the bug in split_dataset * Make eval_split_size configurable * Change test_loader to use load_tts_samples function * Change eval_split_portion to eval_split_size and permits to set the absolute number of samples in eval * Fix samplers unit test * Add data unit test on GitHub workflow	2022-02-21 11:59:36 +03:00
Edresson Casanova	bc5db13d06	Fix the bug in extract tts spectrogram script	2022-02-19 19:24:00 +00:00
Eren Gölge	127118c637	Update TTS.tts formatters (#1228 ) * Return Dict from tts formatters * Make style	2022-02-11 23:03:43 +01:00
Edresson Casanova	0860d73cf8	Remove Tensorflow requeriment (#1225 ) * Remove TF modules * Remove TF unit tests * Remove TF vocoder modules * Remove TF convert scripts * Remove TF requirement * Remove the Docs TF instructions * Remove TF inference support	2022-02-10 16:14:54 +01:00
WeberJulian	c7f5e005e1	Compute embedding for new audios only	2022-01-06 15:41:38 +01:00
WeberJulian	e1accb6e28	Fix train_tts.py and uncomment code (#1051 ) * Fix SE loading and language embedding logic * remove trailing white space * Uncomment resmapling code for SCL	2022-01-03 17:44:57 +01:00
Eren Gölge	56378b12f7	Fix speaker encoder init	2021-12-21 12:26:25 +00:00
Eren Gölge	4c50f6f4df	Add functions to get and check and argument in config and config.model_args	2021-12-20 11:54:10 +00:00
Eren Gölge	704dddcffa	Make style	2021-12-20 11:54:10 +00:00

1 2 3 4 5 ...

385 Commits