coqui-tts

Commit Graph

Author	SHA1	Message	Date
Aleś Bułojčyk	d124f78430	Recipe for Belarusian TTS (#2756 ) * Changes from jhlfrfufyfn <jhlfrfufyfn@gmail.com> * Recipe for Belarusian TTS --------- Co-authored-by: jhlfrfufyfn <jhlfrfufyfn@gmail.com>	2023-07-31 10:26:21 +02:00
Javier	c140df5a58	Adds multi-language support for VITS onnx, fixes onnx inference error when speaker_id is None or not passed, fixes onnx exporting for models with init_discriminator=false (#2816 )	2023-07-31 10:19:49 +02:00
Eren Gölge	b739326503	Bump up to v0.16.0	2023-07-24 16:04:10 +02:00
Eren Gölge	8aacb81849	Fix Tortoise load (#2791 ) * Remove key prunning in tortoise * Make lint	2023-07-24 13:42:47 +02:00
logan hart	6fdb88f8e2	Add Delightful-TTS implementation (#2095 ) * add configs * Update config file * Add model configs * Add model layers * Add layer files * Add layer modules * change config names * Add emotion manager * fIX missing ap bug * Fix missing ap bug * Add base TTS e2e class * Fix wrong variable name in load_tts_samples * Add training script * Remove range predictor and gaussian upsampling * Add helper function * Add vctk recipe * Add conformer docs * Fix linting in conformer.py * Add Docs * remove duplicate import * refactor args * Fix bugs * Removew emotion embedding * remove unused arg * Remove emotion embedding arg * Remove emotion embedding arg * fix style issues * Fix bugs * Fix bugs * Add unittests * make style * fix formatter bug * fix test * Add pyworld compute pitch func * Update requirments.txt * Fix dataset Bug * Chnge layer norm to instance norm * Add missing import * Remove emotions.py * remove ssim loss * Add init layers func to aligner * refactor model layers * remove audio_config arg * Rename loss func * Rename to delightful-tts * Rename loss func * Remove unused modules * refactor imports * replace audio config with audio processor * Add change sample rate option * remove broken resample func * update recipe * fix style, add config docs * fix tests and multispeaker embd dim * remove pyworld * Make style and fix inference * Split tts tests * Fixup * Fixup * Fixup * Add argument names * Set "random" speaker in the model Tortoise/Bark * Use a diff f0_cache path for delightfull tts * Fix delightful speaker handling * Fix lint * Make style --------- Co-authored-by: loganhart420 <loganartpersonal@gmail.com> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-07-24 13:41:26 +02:00
Eren Gölge	0de12ec5aa	API tests (#2790 ) * Separate API tests and only run when uplifted * Make style	2023-07-24 12:14:21 +02:00
Paul O'Leary McCann	c0aabb8596	Make Japanese-specific dependencies optional (#2776 ) * Don't install MeCab by default * Add optional [ja] deps, like [dev] etc * Add JA requirements file * Add JA requirements to requirements_all This should help the tests run.	2023-07-24 11:28:27 +02:00
Eren Gölge	672ec3b35e	Fix #2749 (#2750 )	2023-07-08 11:40:44 +02:00
Eren Gölge	b5cd644132	Bump up to v0.15.6	2023-07-08 10:33:09 +02:00
Eren Gölge	a2984fb435	Fix #2745 (#2748 )	2023-07-07 20:23:27 +02:00
Eren Gölge	7b5c8422c8	Export multispeaker onnx (#2743 )	2023-07-06 13:36:50 +02:00
JiangCheng	53938e2d32	Squashed commit of the following: commit `dd612fd72e` Author: JiangCheng <jiangcheng@kezaihui.com> Date: Mon Jun 5 16:04:54 2023 +0800 Failed to download the file and need to delete the created file path	2023-07-05 12:08:05 +02:00
ZhouGongZaiShi	d5f16d77c2	delete meaningless print() (#2662 )	2023-07-04 11:38:17 +02:00
PiaoYang	630327c4e6	Update compute_embeddings.py (#2668 ) * [Typo] Fix variable name. More readable description. Update train_yourtts.py Reformat. Reformat using black again. * Add `old_append`. Fix bool argparse. * Reformat.	2023-07-04 11:37:47 +02:00
ChaseC	8957799e45	fix loading of model and vocoder configs (#2698 )	2023-07-04 11:32:00 +02:00
Eren Gölge	505ac1aa8f	Bump up to v0.15.5	2023-07-03 11:18:06 +02:00
Eren G??lge	21a3f280de	Bump up to v0.15.4	2023-06-30 15:05:00 +02:00
Eren Gölge	f9cde7bb1b	Bump up to v0.15.3	2023-06-30 14:30:18 +02:00
Eren G??lge	413a345d66	Bump up to v0.15.2	2023-06-30 14:16:47 +02:00
Eren G??lge	cb9c320691	Fixup	2023-06-30 14:13:11 +02:00
Eren G??lge	dfd8d313a2	Bump up to v1.5.1	2023-06-29 17:53:09 +02:00
Eren G??lge	a035b25340	Bump up to v0.15.0	2023-06-28 15:24:20 +02:00
Eren G??lge	34b9a18c47	Fixup	2023-06-28 12:26:04 +02:00
Eren G??lge	91cc11d636	Remove commented codes	2023-06-28 12:14:37 +02:00
Eren G??lge	6b9ebf5aab	Merge branch 'p3_11' into dev	2023-06-28 12:13:04 +02:00
Eren Gölge	c844b6570a	Inference API for 🐶Bark (#2685 ) * Add bark requirements * Draft Bark implementation * Download HF models * Update synthesizer * Add bark model * Make style * Update pylintrc * Update model URLs * Update Bark Config * Fix here and ther * Make style * Make lint * Update requirements * Update requirements	2023-06-28 11:55:27 +02:00
Eren G??lge	a13b1352a4	Fixup	2023-06-26 19:30:26 +02:00
Eren G??lge	17ac188958	Drop fairseq for Hubert	2023-06-26 19:27:48 +02:00
Eren G??lge	c03768bb53	Make style	2023-06-26 17:16:26 +02:00
Eren G??lge	a1c431e6a9	Fixups	2023-06-26 12:55:18 +02:00
Eren G??lge	a58fb6c01b	Update requirements	2023-06-22 13:53:19 +02:00
Eren G??lge	e888e8a56d	Fix manage	2023-06-22 10:13:20 +02:00
Eren Gölge	fff8b762bc	Merge branch 'dev' into bark	2023-06-21 15:49:05 +02:00
Eren Gölge	4cf8652392	Fix Tortoise load (#2697 ) * Handle missing gpt weights * Make style * Fix lint	2023-06-21 15:42:01 +02:00
Eren G??lge	cf98ae04df	Make lint	2023-06-21 12:05:08 +02:00
Eren G??lge	3b9fca2398	Make style	2023-06-21 12:02:06 +02:00
Eren G??lge	0f8932a6a9	Fix here and ther	2023-06-21 11:59:27 +02:00
Eren G??lge	03c347b7f3	Update Bark Config	2023-06-21 11:58:18 +02:00
Eren G??lge	695e862aad	Update model URLs	2023-06-21 11:57:46 +02:00
Eren G??lge	f4c88ed677	Make style	2023-06-19 14:22:32 +02:00
Eren G??lge	37b708dac7	Add bark model	2023-06-19 14:16:06 +02:00
Eren G??lge	2364c38d16	Update synthesizer	2023-06-19 14:15:21 +02:00
Eren G??lge	5a31fad502	Download HF models	2023-06-19 14:14:04 +02:00
Eren G??lge	f59da4dba5	Draft Bark implementation	2023-06-12 14:32:39 +02:00
Tsai Meng-Ting	d65819422b	Update stochastic_duration_predictor.py (#2663 ) fix a typo	2023-06-12 11:10:54 +02:00
Eren Gölge	49cf6a5d62	Bump up to v0.14.3	2023-06-06 09:41:59 +02:00
Eren Gölge	8e415732dd	Fixup	2023-06-06 09:41:46 +02:00
Eren Gölge	547a72c97d	Fixup	2023-06-05 22:38:56 +02:00
Eren Gölge	a494f0c92a	Bump up to v0.14.1	2023-06-05 11:29:10 +02:00
Eren Gölge	50b1074779	Make `tts` ready	2023-06-05 11:29:10 +02:00
Eren Gölge	e785d101a1	Port Fairseq TTS models (#2628 ) * Load fairseq models * Add docs and missing files * Managing fairseq models and docs for API * Make style * Use scarf URL * Add tests * Fix URL * Pass cpu * Make lint * Fixup * Make lint * fixup * Fixup * Change tokenization order * Update README * Fixup * Fixup	2023-06-05 11:15:13 +02:00
Shukrullo Turgunov	0d5e68a09f	fix typo (#2647 ) * fix typo * typo fix	2023-06-05 09:58:16 +02:00
Reuben Morais	23a7a9a363	Fetch all built-in speakers (#2626 )	2023-05-22 17:28:08 +02:00
Eren Gölge	aef7f6d980	Bump up to v0.14.1	2023-05-18 11:13:09 +02:00
Eren Gölge	9e99e0f42d	Disable reduction	2023-05-18 11:12:51 +02:00
Eren Gölge	bc0a532c7a	Bump up to v0.14.0	2023-05-16 10:08:41 +02:00
Eren Gölge	4de797bb11	Draft ONNX export for VITS (#2563 ) * Draft ONNX export for VITS Could not get it work to output variable length sequence * Fixup for onnx constant output * Make style * Remove commented code	2023-05-16 01:07:56 +02:00
manmay nakhashi	a3d5801c44	Tortoise TTS inference (#2547 ) * initial commit * Tortoise inference * revert path change * style fix * remove accidental remove * style fixes * style fixes * removed unwanted assests and deps * remove changes * remove cvvp * style fix black * added tortoise config and updated config and args, refactoring the code * added tortoise to api * Pull mel_norm from url * Use TTS cleaners * Let download model files * add ability to pass tortoise presets through coqui api * fix tests * fix style and tests * fix tts commandline for tortoise * Add config.json to tortoise * Use kwargs * Use regular model api for loading tortoise * Add load from dir to synthesizer * Fix Tortoise floats * Use model_dir when there are multiple urls * Use `synthesize` when exists * lint fixes and resolve preset bug * resolve a download bug and update model link * fix json * do tortoise inference from voice dir * fix * fix test * fix speaker id and remove assests * update inference_tests.yml * replace inference_test.yml * fix extra dir as None * fix tests * remove space * Reformat docstring * Add docs * Update docs * lint fixes --------- Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-05-16 00:58:21 +02:00
Eren Gölge	9b5822d625	Update VAD for silence trimming. (#2604 ) * Update vad for mp3 and fault tolerance * Make style * Remove importt * Remove stupid defaults	2023-05-11 11:09:23 +02:00
Eren Gölge	dfb51e06b2	Add jenny model (#2603 )	2023-05-08 12:05:40 +02:00
Michael Görner	27e237ed08	use default_factory for audio parameter (#2576 ) Python 3.11 complains about the mutable default and other members were already adapted to use the factory, so I expect this line just went unnoticed until now.	2023-05-08 11:17:36 +02:00
prakharpbuf	c1875f68df	typos and minor fixes (#2508 ) * Update tacotron1-2.md * Update README.md * Update Tutorial_2_train_your_first_TTS_model.ipynb * Update synthesizer.py There is no arg called --speaker_name * Update formatting_your_dataset.md * Update AnalyzeDataset.ipynb * Update AnalyzeDataset.ipynb * Update AnalyzeDataset.ipynb * Update finetuning.md * Update train_yourtts.py * Update train_yourtts.py * Update train_yourtts.py * Update finetuning.md	2023-04-26 15:22:57 +02:00
Eren Gölge	2071088bab	Bump up to v0.13.3	2023-04-17 16:13:35 +02:00
Eren Gölge	1a6a5710fd	Make lint	2023-04-17 15:02:56 +02:00
Eren Gölge	a44a0e1fd2	Update model urls	2023-04-17 14:53:27 +02:00
Eren Gölge	2533a18d62	Add BN tests	2023-04-17 13:37:10 +02:00
Eren Gölge	2d49c05259	Remove import	2023-04-17 13:05:29 +02:00
Eren Gölge	5e5768d784	Fix API	2023-04-17 13:05:19 +02:00
Eren Gölge	cd83991067	Add BN phonemizer	2023-04-17 12:54:00 +02:00
Eren Gölge	36be05290d	Add models	2023-04-17 12:52:32 +02:00
Eren Gölge	e4c5c27854	Bump up to v0.13.2	2023-04-14 10:23:39 +02:00
Eren Gölge	dba5cec497	Merge pull request #2509 from coqui-ai/update_vad Update VAD	2023-04-13 19:35:17 +02:00
Eren Gölge	5a9bda13f3	Make style	2023-04-13 14:19:06 +02:00
Eren Gölge	c9375e4b8b	Make style	2023-04-13 14:17:06 +02:00
Eren Gölge	758ef84cc2	Using 🐸Studio models with `tts` command	2023-04-13 14:14:41 +02:00
Eren G??lge	537dc0e933	Update VAD	2023-04-13 00:39:46 +02:00
Eren Gölge	e33e7170ed	Bump up to v0.13.1	2023-04-12 16:20:53 +02:00
Eren Gölge	8da3342676	Ping API	2023-04-12 16:20:53 +02:00
Eren Gölge	cbb592b295	Fixup	2023-04-10 14:50:11 +02:00
Eren Gölge	b8b9f09de5	Fixup	2023-04-10 14:06:31 +02:00
Eren Gölge	a49c1931d9	Fixup	2023-04-10 13:33:42 +02:00
Eren Gölge	5bd1fb6b2c	Fix API for voice conversion	2023-04-10 13:32:16 +02:00
Eren Gölge	30109af2a0	Merge pull request #2480 from MattyB95/librosa_v0.10.0 Update Librosa Version To V0.10.0	2023-04-07 12:32:33 +02:00
Eren Gölge	1233365cf4	Bump up to v0.13.0	2023-04-05 15:09:31 +02:00
Eren Gölge	ad8b9bf2be	🐸 Coqui Studio API integration (#2484 ) * Warn when lang is not avail * Make style * Implement Coqui Studio API * Test * Update docs * Set action * Make style * Make lint * Update README * Make style * Fix action * Run actions	2023-04-05 15:06:50 +02:00
Matthew Boakes	4c829e74a1	Update Librosa Version To V0.10.0	2023-04-05 00:59:20 +01:00
Yingzhi WANG	95fa2c9fd6	fix typo (#2475 )	2023-04-03 23:31:09 +02:00
p0p	91cf1b2da9	[minor] batch["speaker_ids"] getting set two times (#2470 ) * [minor] batch["speaker_ids"] getting set two times just to make it consistent with language_ids * Update vits.py style.	2023-04-03 11:35:21 +02:00
Rajiv P	c2d15cd413	[minor] hifigan_generator.py typo (#2462 ) resblock2 description updated.	2023-03-28 12:43:36 +02:00
Eren Gölge	d309f50e53	Implement FreeVC (#2451 ) * Update .gitignore * Draft FreeVC implementation * Tests and relevant updates * Update API tests * Add missings * Update requirements * :( * Lazy handle for vc * Update docs for voice conversion * Make style	2023-03-25 18:33:23 +01:00
Khalid Bashir	14c80dd1fd	vits.py training fixed due to return_complex (#2418 ) Torch set default value for `return_complex=True` for `torch.stft` method This turned warning into error:- ``` Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1591, in fit self._fit() File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1544, in _fit self.train_epoch() File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1309, in train_epoch _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time) File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1162, in train_step outputs, loss_dict_new, step_time = self._optimize( File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1023, in _optimize outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx) File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 970, in _model_train_step return model.train_step(*input_args) File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 1293, in train_step mel_slice_hat = wav_to_mel( File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 191, in wav_to_mel spec = torch.stft( File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 641, in stft return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. ```	2023-03-19 00:22:04 +01:00
Eren Gölge	2db262747e	Bump up to v0.12.0	2023-03-17 13:21:03 +01:00
Roee Shenberg	3c15f0619a	Bug fixes in OverFlow audio generation (#2380 )	2023-03-15 12:02:11 +01:00
Daniel Vera Nieto	dfb48737fb	Style fixed	2023-03-13 16:11:15 +01:00
Dani Vera	0d12229b64	Update vits.py This should fix the issue https://github.com/coqui-ai/TTS/issues/1986 without breaking batch data sampling.	2023-03-10 18:35:16 +01:00
manmay nakhashi	624513018d	add energy by default to Fastspeech2 config (#2326 ) * add energy by default * added energy to base tts * fix energy dataset * fix styles * fix test	2023-03-06 10:20:25 +01:00
Florian Quirin	478c8178b8	Basic Mary-TTS API compatibility (#2352 ) * added basic Mary-TTS API endpoints to server - imported `parse_qs` from `urllib.parse` to parse HTTP POST parameters - imported `render_template_string` from `flask` to return text as endpoint result - added new routes: - `/locales` - returns list of locales (currently locale of active model) - `/voices` - returns list of voices (currently locale and name of active model) - `/process` - accepts synth. request (GET and POST) with parameter `INPUT_TEXT` (other parameters ignored since we have only one active model) * better log messages for Mary-TTS API - smaller tweaks to log output * use f-string in log print to please linter * updated server.py to match 'make style' result	2023-03-06 10:08:21 +01:00
thennal10	d39bc74f57	OverFlow with test sentences (#2253 ) * Fix typo in function definiton * Swap hasattr out hasattr(self, "speaker_manager") and hasattr(self, "language_manager") seems to be redundant since BaseTTS defines both.	2023-03-01 09:11:30 +01:00
Edresson Casanova	16b9862252	Fix Speaker Consistency Loss (SCL) (#2364 )	2023-02-27 09:14:00 +03:00
Eren G??lge	661725b95e	Bump up to v0.11.1	2023-02-10 15:59:05 +01:00
Eren G??lge	0196b4dfbf	Merge branch 'add_neural_hmm_model' into dev	2023-02-10 15:23:56 +01:00
Eren Gölge	914280a556	Bump up to v0.11.0 (#2329 ) * Make style * Bump up to v0.11.0	2023-02-08 13:58:49 +01:00
Eren G??lge	85b3a04b37	Merge branch 'api_model_path' into dev	2023-02-06 11:18:00 +01:00
marius851000	1f4d8bf0f1	Fix tts-server for multi-lingual models (#2257 )	2023-02-06 10:54:34 +01:00
Eren G??lge	6ee94f8bad	Fixup	2023-01-30 14:02:25 +01:00
Eren G??lge	713e8c8d04	Add pretrained model	2023-01-30 13:55:17 +01:00
Eren G??lge	7fddabc8ac	Implement cloning in API	2023-01-30 13:35:48 +01:00
Eren G??lge	335b8ed44e	Add vocoder path	2023-01-30 12:59:29 +01:00
Martin Weinelt	994be163e1	Use packaging.version for version comparisons (#2310 ) * Use packaging.version for version comparisons The distutils package is deprecated¹ and relies on PEP 386² version comparisons, which have been superseded by PEP 440³ which is implemented through the packaging module. With more recent distutils versions, provided through setuptools vendoring, we are seeing the following exception during version comparisons: > TypeError: '<' not supported between instances of 'str' and 'int' This is fixed by this migration. [1] https://docs.python.org/3/library/distutils.html [2] https://peps.python.org/pep-0386/ [3] https://peps.python.org/pep-0440/ * Improve espeak version detection robustness On many modern systems espeak is just a symlink to espeak-ng. In that case looking for the 3rd word in the version output will break the version comparison, when it finds `text-to-speech:`, instead of a proper version. This will not break during runtime, where espeak-ng would be prioritized, but the phonemizer and tokenizer tests force the backend to `espeak`, which exhibits this breakage. This improves the version detection by simply looking for the version after the "text-to-speech:" token. * Replace distuils.copy_tree with shutil.copytree The distutils module is deprecated and slated for removal in Python 3.12. Its usage should be replaced, in this case by a compatible method from shutil.	2023-01-29 23:47:00 +01:00
Eren G??lge	cf076345e7	Make style	2023-01-23 13:49:51 +01:00
Eren G??lge	13334d507c	Load model from path	2023-01-23 13:45:45 +01:00
Gerard Sant Muniesa	c59b3f75b8	Add Catalan text cleaners for Catalan support (#2295 )	2023-01-23 11:56:30 +01:00
Shivam Mehta	d83ee8fe45	Adding neural HMM TTS Model (#2272 ) * Adding neural HMM TTS * Adding tests * Adding neural hmm on readme * renaming training recipe * Removing overflow\s decoder parameters from the config * Update the Trainer requirement version for a compatible one (#2276) * Bump up to v0.10.2 * Adding neural HMM TTS * Adding tests * Adding neural hmm on readme * renaming training recipe * Removing overflow\s decoder parameters from the config * fixing documentation Co-authored-by: Edresson Casanova <edresson1@gmail.com> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-01-23 11:53:04 +01:00
Eren Gölge	497f22b20b	Cache speaker encoder model (#2284 )	2023-01-23 11:49:51 +01:00
Eren G??lge	6e3f74fc29	Fix #2191	2023-01-15 23:11:57 +01:00
manmay nakhashi	bc422f2f3c	Fastspeech2 (#2073 ) * added EnergyDataset * add energy to Dataset * add comupte_energy * added energy params * added energy to forward_tts * added plot_avg_energy for visualisation * Update forward_tts.py * create file * added fastspeech2 recipe * add fastspeech2 config * removed energy from fast pitch * add energy loss to forward tts * Update fastspeech2_config.py * change run_name * Update numpy_transforms.py * fix typo * fix typo * fix typo * linting issues * use_energy default value --> False * Update numpy_transforms.py * linting fixes * fix typo * liniting_fix * liniting_fix * fix * fixes * fixes * lint fix * lint fixws * added training test * wrong import * wrong import * trailing whitespace * style fix * changed class name because of error * class name change * class name change * change class name * fixed styles	2023-01-15 22:39:22 +01:00
Eren Gölge	14d45b5347	Bump up to v0.10.2	2023-01-11 01:06:02 +01:00
Khalid Bashir	42afad5e79	Fixed bug related to yourtts speaker embeddings issue (#2234 ) * Fixed bug related to yourtts speaker embeddings issue * Reverted code for base_tts * Bug fix on VITS d_vector_file type * Ignore the test speakers on YourTTS recipe * Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference * Update YourTTS config file * Update ModelManager._update_path to deal with list attributes * Fix lint checks * Remove unused code * Fix unit tests * Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files * Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes Co-authored-by: Edresson Casanova <edresson1@gmail.com>	2023-01-02 14:20:02 +01:00
Julian Weber	a07397733b	Multilingual tokenizer (#2229 ) * Implement multilingual tokenizer * Add multi_phonemizer receipe * Fix lint * Add TestMultiPhonemizer * Fix lint * make style	2023-01-02 10:03:19 +01:00
Eren G??lge	f814d52394	Bump up to v0.10.1	2022-12-26 14:29:46 +01:00
Eren G??lge	8c32a6998a	Add pth files to manager	2022-12-26 14:29:25 +01:00
Eren G??lge	cf765cb3f2	Add ca and fa models	2022-12-26 14:29:10 +01:00
Eren G??lge	46b0ad37e7	Bump up to v0.10.0	2022-12-15 11:19:23 +01:00
Eren Gölge	a9167cf239	Fixup overflow (#2218 ) * Update overflow config * Pulling shuffle and drop_last from config * Print training stats for overflow	2022-12-15 00:56:48 +01:00
Eren Gölge	ecea43ec81	Adding pre-trained Overflow model (#2211 ) * Adding pretrained Overflow model * Stabilize HMM * Fixup model manager * Return `audio_unique_name` by default * Distribute max split size over datasets * Fixup eval_split_size * Make style	2022-12-14 16:55:48 +01:00
Edresson Casanova	3b1a28fa95	Add YourTTS VCTK recipe (#2198 ) * Add YourTTS VCTK recipe * Fix lint * Add compute_embeddings and resample_files functions to be able to reuse it * Add automatic download and speaker embedding computation for YourTTS VCTK recipe * Add parameter for eval metadata file on compute embeddings function	2022-12-12 16:14:25 +01:00
Shivam Mehta	3b8b105b0d	Adding OverFlow (#2183 ) * Adding encoder * currently modifying hmm * Adding hmm * Adding overflow * Adding overflow setting up flat start * Removing runs * adding normalization parameters * Fixing models on same device * Training overflow and plotting evaluations * Adding inference * At the end of epoch the test sentences are coming on cpu instead of gpu * Adding figures from model during training to monitor * reverting tacotron2 training recipe * fixing inference on gpu for test sentences on config * moving helpers and texts within overflows source code * renaming to overflow * moving loss to the model file * Fixing the rename * Model training but not plotting the test config sentences's audios * Formatting logs * Changing model name to camelcase * Fixing test log * Fixing plotting bug * Adding some tests * Adding more tests to overflow * Adding all tests for overflow * making changes to camel case in config * Adding information about parameters and docstring * removing compute_mel_statistics moved statistic computation to the model instead * Added overflow in readme * Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json	2022-12-12 12:44:15 +01:00
p0p4k	2e153d54a8	Adding missing key to formatter (#2194 ) quick fix for #2156. added 'root_path' key.	2022-12-12 12:25:37 +01:00
Eren Gölge	1ddc484b49	Python API implementation (#2195 ) * Draft implementation * Fix style * Add api tests * Fix lint * Update docs * Update tests * Set env * Fixup * Fixup * Fix lint * Revert	2022-12-12 12:04:20 +01:00
Eren Gölge	fdeefcc612	Handle espeak 1.48.15 (#2203 )	2022-12-12 11:23:45 +01:00
Edresson Casanova	ee20e30958	Fix VITS multi-speaker voice conversion inference	2022-12-05 09:15:01 -03:00
Eren Gölge	9321b22203	Fix scheduler order	2022-12-05 12:26:15 +01:00
Eren G??lge	bc6120c330	[ci skip]Bump up to v0.9.0	2022-11-16 16:45:02 +01:00
logan hart	ff9b63d02a	Add neon models (#2140 ) * Add neon ljspeech vits model * Add neon german model * Update .models.json * Add neon spanish model * Add french model * Add Dutch model * Add Hungarian model * Add Greek model * Remove uneeded description * Update .models.json * Update .models.json * Handling neon models * Add all neon models * Update .models.json * Split zoo_tests * Update test names * Update model testing Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-11-16 16:12:39 +01:00
Eren Gölge	8cb1433e6e	Cache fsspec downloads (#2132 ) * Cache fsspec downloaded files * Use diff paths for test * Make fsspec caching optional * Decom GPU docker tests * Make progress bar optional for better CI log * Check path local	2022-11-09 22:12:48 +01:00
Eren G??lge	b686c09704	Fix #2062	2022-11-07 09:22:43 +01:00
freezerain	fcbfca869f	Fix back/forward slash in file path in mailabs formatter (#1938 ) * mailabs formatter: back/forward slash in file path fix * formatters.mailabs() path rework for Windows os * new formatter added "mailabs_win" * lint test fix commit * mailabs_win: removed, mailabs: "/" replaced with os.sep for windows compatibility * Black small style fix	2022-11-01 12:54:40 +01:00
Victor Shepardson	5307a2229b	Fix Capacitron training (#2086 )	2022-11-01 12:52:06 +01:00
Eren Gölge	dae79b0acd	Remove `/` prefix from the relative path (#2065 )	2022-10-10 13:32:27 +02:00
Eren Gölge	843fa6f3fa	Check num of columns in coqui format (#2066 ) * Check 4 colums in coqui format * Fix encoding * Fixup	2022-10-10 12:13:32 +02:00
Edresson Casanova	f3b947e706	Minors bug fixes on VITS/YourTTS and inference (#2054 ) * Set the right device to the speaker encoder * Bug fix on inference list_language_idxs parameter * Bug fix on speaker encoder resample audio transform	2022-10-06 22:23:54 +02:00
Eren Gölge	5f5d441ee5	Write non-speech files in a TXT (#2048 ) * Write non-speech files in a txt * Save 16-bit wav out of vad	2022-10-06 13:25:54 +02:00
Edresson Casanova	d6ad9a05b4	Fix colliding dataset cache file names (#1994 ) * Fix colliding dataset cache file names * Remove unused code	2022-09-21 12:54:07 +02:00
Edresson Casanova	3faccbda97	Fix dataset handling with the new embedding file keys (#1991 )	2022-09-19 23:44:14 +02:00
Eren Gölge	0a112f7841	Add metafile arg (#1977 )	2022-09-16 14:41:49 +02:00
Julian Weber	896e46d0e5	Fix vc (#1971 )	2022-09-16 12:01:26 +02:00
Eren Gölge	b95cf3363c	Prevent installing mecab-ko (#1967 )	2022-09-14 10:28:07 +02:00
Eren Gölge	9e5a469c64	d-vector handling (#1945 ) * Update BaseDatasetConfig - Add dataset_name - Chane name to formatter_name * Update compute_embedding - Allow entering dataset by args - Use released model by default - Use the new key format * Update loading * Update recipes * Update other dep code * Update tests * Fixup * Load multiple embedding files * Fix argument names in dep code * Update docs * Fix argument name * Fix linter	2022-09-13 14:10:33 +02:00
Edresson Casanova	371772c355	Replace pyworld by pyin (#1946 ) * Replace pyworld by pyin * Fix unit tests	2022-09-09 10:43:14 +02:00
happylittlecat	4546b4cbd8	Add espeak support for Chinese (#1905 ) * fix description * add espeak support for chinese * add espeak support for chinese	2022-09-08 12:32:41 +02:00
harmlessman	5abbe56642	Korean Phonemizer (#1822 ) * Update requirements.txt install jamo for korean * Update formatters.py add KSS formatter KSS is a korean single speech dataset (12hours) * Add files via upload add phonemizer for korean * Add files via upload add korean phonemizer * Update requirements.txt * change code style with `black` and `pylint` * reflecting pylint's Evaluation * reflecting pylint's Evaluation * reflecting pylint's Evaluation-2 * isort * edit about separator write test case and add 'nltk' for requirements.txt * add korean g2p (g2pkk) * isort * TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name) TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return) * black	2022-09-08 12:06:07 +02:00
Edresson Casanova	159eeeef64	Fix find unique phonemes script (#1928 ) * Fix find unique phonemes script * Fix unit tests	2022-09-08 10:17:35 +02:00
KyuubiYoru	3b7dff568a	Fixes a race condition with multiple simultaneous get requests. (#1807 ) * Fixes a race condition with multiple simultaneous get requests. * Removed unused import * Removed unused threading import * Changed lock style to notation * make style Co-authored-by: WeberJulian <julian.weber@hotmail.fr>	2022-09-08 10:16:16 +02:00
Julian Weber	bb59718c03	Add capacitron v2 model (#1768 ) * Add capacitron v2 in .models.json * Put right commit hash	2022-09-08 09:43:56 +02:00
Edresson Casanova	096b35f639	Add VCTK speaker encoder recipe (#1912 )	2022-08-26 16:19:03 +02:00
Eren Gölge	e5430a6519	Add new DE Thorsten models (#1898 ) - Tacotron2-DDC - HifiGAN vocoder	2022-08-22 11:27:39 +02:00
Eren G??lge	8845f06fd9	Bump up to v0.8.0	2022-08-22 11:26:47 +02:00
Stanislav Kachnov	2c9f00a808	Fix tune wavegrad (#1844 ) * fix imports in tune_wavegrad * load_config returns Coqpit object instead None * set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program * fix var order in the result of batch collating * make style * make style with black and isort	2022-08-22 09:55:32 +02:00
Eren Gölge	fcb0bb58ae	Handle when no batch sampler (#1882 )	2022-08-18 11:26:04 +02:00
Eren Gölge	7442bcefa5	Remove deprecated files (#1873 ) - samplers.py is moved - distribute.py is replaces by the 👟Trainer	2022-08-15 12:16:37 +02:00
Eren Gölge	4333492341	Fix BCE loss issue (#1872 ) * Fix BCE loss issue * Remove import	2022-08-15 11:27:21 +02:00
manmay nakhashi	e4db7c51b5	Update capacitron_layers.py (#1664 ) crashing because of dimension miss match at line no. 57 [batch, 256] vs [batch , 1, 512] enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)	2022-08-15 11:08:50 +02:00
Eren Gölge	bfc63829ac	Implement bucketed weighted sampling for VITS (#1871 )	2022-08-15 11:08:11 +02:00
Eren Gölge	d46fbc240c	Introduce numpy and torch transforms (#1705 ) * Refactor audio processing functions * Add tests for numpy transforms * Fix imports * Fix imports2	2022-08-08 11:57:50 +02:00
manmay nakhashi	7fd9b89ebf	fix get_random_embeddings --> get_random_embedding (#1726 ) * fix get_random_embeddings --> get_random_embedding function typo leads to training crash, no such function * fix typo get_random_embedding	2022-08-07 14:06:03 +02:00
rbaraglia	75ac9e3f0c	Fix language flags generated by espeak-ng phonemizer (#1801 ) * fix language flags generated by espeak-ng phonemizer * Style * Updated language flag regex to consider all language codes alike	2022-08-07 13:57:40 +02:00
Lars Kiesow	8c645080ac	Adjust default to be able to process longer sentences (#1835 ) Running `tts --text "$text" --out_path …` with a somewhat longer sentences in the text will lead to warnings like “Decoder stopped with max_decoder_steps 500” and the sentences just being cut off in the resulting WAV file. This happens quite frequently when feeding longer texts (e.g. a blog post) to `tts`. It's particular frustrating since the error is not always obvious in the output. You have to notice that there are missing parts. This is something other users seem to have run into as well [1]. This patch simply increases the maximum number of steps allowed for the tacotron decoder to fix this issue, resulting in a smoother default behavior. [1] https://github.com/mozilla/TTS/issues/734	2022-08-07 13:51:29 +02:00
p0p4k	903a77c197	Update wavenet.py (#1796 ) * Update wavenet.py Current version does not use "in_channels" argument. In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor. However, since it is a generic implementation, I believe it is better to update it for a more general use. * "in_channels -> hidden_channels"	2022-08-01 12:20:37 +02:00
p0p4k	4fe50801b5	Update README.md; download progress bar in CLI. (#1797 ) * Update README.md - minor PR - added model_info usage guide based on #1623 in README.md . * "added tqdm bar for model download" * Update manage.py * fixed style * fixed style * sort imports	2022-08-01 12:17:47 +02:00
Eren G??lge	7d8b1665c8	Fix rand_segment edge case (input_len == seg_len - 1)	2022-08-01 11:37:45 +02:00
vanIvan	5094499eba	Fix & update WaveRNN vocoder model (#1749 ) * Fixes KeyError bug. Adding logging to dashboard. * Make pep8 compliant * Make style compliant * Still fixing style	2022-07-26 15:05:11 +02:00
p0p4k	10195c4eba	Update decoder.py (#1792 ) Minor comment correction.	2022-07-26 13:06:06 +02:00
ivan provalov	903d9c791a	Fix for FloorDiv Function Warning (#1760 ) * Fix for Floor Function Warning Fix for Floor Function Warning * Adding double quotes to fix formatting Adding double quotes to fix formatting * Update glow_tts.py * Update glow_tts.py	2022-07-20 11:31:22 +02:00
Eren Gölge	f7587fc134	Fix SSIM loss correction	2022-07-13 10:47:12 +02:00
Eren Gölge	bc1f93c299	Fix device allocation	2022-07-12 19:05:25 +02:00
Eren Gölge	49bac724c0	Implement VitsAudioConfig (#1556 ) * Implement VitsAudioConfig * Update VITS LJSpeech recipe * Update VITS VCTK recipe * Make style * Add missing decorator * Add missing param * Make style * Update recipes * Fix test * Bug fix * Exclude tests folder * Make linter * Make style	2022-07-12 18:49:58 +02:00
a-froghyar	34b80e0280	feat: updated recipes and lr fix (#1718 ) - updated the recipes activating more losses for more stable training - re-enabling guided attention loss - fixed a bug about not the correct lr fetched for logging	2022-07-12 15:00:53 +02:00
Eren G??lge	48a4f3647f	Make lint	2022-07-12 14:58:26 +02:00
WeberJulian	c614f21982	Add durations as aux input for VITS (#1694 ) * Add durations as aux input for VITS * Make style * Fix tts_tests * Fix test_get_aux_input	2022-07-12 14:25:21 +02:00
Eren G??lge	2cf89b88c9	Make style	2022-07-12 14:12:57 +02:00
Eren G??lge	a6f73a18cb	Fix BCELoss adressing #1192	2022-07-12 14:11:34 +02:00
Eren G??lge	c17ff17a18	Fix SSIM loss	2022-07-12 12:35:24 +02:00
Eren G??lge	f1e35596e8	Remove redundant config field	2022-07-11 13:39:41 +02:00
WeberJulian	5cef6facb0	Fix tokenizer for punc only (#1717 )	2022-07-06 22:59:41 +02:00
camillem	5c821d9fa1	Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469 ) Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-06-27 10:32:43 +02:00
manmay nakhashi	577ec406f4	Fix checkpointing GAN models (#1641 ) * checkpoint sae step crash fix * checkpoint save step crash fix * Update gan.py updated requested changes * crash fix	2022-06-22 12:07:46 +02:00
Eren G??lge	00e67092d8	Bump up to v0.7.1	2022-06-21 14:12:55 +02:00
Eren G??lge	3328be7a8e	Remove GL message	2022-06-21 12:39:31 +02:00
WeberJulian	30c72e0d05	Add Thorsten VITS model (#1675 ) Co-authored-by: Eren Gölge <egolge@coqui.ai>	2022-06-21 11:39:49 +02:00
p0p4k	71281ff1e4	Add support for model_info in CLI (#1623 ) * model_info * model_info * model_info_by_idx and name * model_info_by_idx and name * model_info * Update manage.py * fixed linter * fixed linter * fixed linter * fixed linter * fixed return style checks * fixed linter * fixed linter * fixed idx always positive * added comments * fix parser.args check * fix parser.args check * Make style Co-authored-by: Eren G??lge <egolge@coqui.ai>	2022-06-20 23:28:17 +02:00
Eren G??lge	8b75e8be9c	Bump up to v0.7.0	2022-06-20 13:50:09 +02:00
WeberJulian	6126c23498	Add synpaflex formatter (#1616 ) * Add synpaflex formatter * Fix formatter * Make style	2022-06-20 13:36:26 +02:00
WeberJulian	f09ea11c71	Internal formatter (#1629 ) * Add coqui formatter * Make style	2022-06-08 14:31:03 +02:00
Eren Gölge	f70e82cd19	Use fsspec and torch for embedding file IO (#1581 ) * Use fsspec and torch for embedding file * Fixup * Fix load and save files * Fix compute embedding script * Set use_cuda to true if available * Add dummy speakers.pth file * Make style * Change default speakers file extension Co-authored-by: WeberJulian <julian.weber@hotmail.fr>	2022-06-01 13:49:42 +02:00
Noran Raskin	a790df4e94	Training recipes for thorsten dataset (#1020 ) * Fix style * Fix isort * Remove tensorboardX from requirements Co-authored-by: logan hart <72301874+loganhart420@users.noreply.github.com> Co-authored-by: Eren Gölge <egolge@coqui.ai>	2022-05-30 12:07:31 +02:00
André R. de Miranda	3b84ef9524	Fixed use_cuda issue in compute_embeddings.py Added use_cuda argument in self.init_encoder method	2022-05-20 12:46:46 -03:00
a-froghyar	8be21ec387	Capacitron (#977 ) * new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-05-20 16:17:11 +02:00
Edresson Casanova	ee99a6c1e2	Fix voice conversion inference (#1583 ) * Add voice conversion zoo test * Fix style * Fix unit test	2022-05-20 15:50:25 +02:00
Edresson Casanova	e5d8ec2402	Change the VITS upsampling interpolation trick to linear (#1564 )	2022-05-13 10:52:39 +02:00
Edresson Casanova	c6008e5235	Add audio length sampler balancer (#1561 ) * Add audio length sampler balancer * Add unit tests	2022-05-12 19:59:19 +02:00

... 2 3 4 5 6 ...

1918 Commits