coqui-tts

Commit Graph

Author	SHA1	Message	Date
WeberJulian	a5c0d9780f	rename manager	2023-12-11 18:48:31 +01:00
WeberJulian	36143fee26	Add basic speaker manager	2023-12-11 15:25:46 +01:00
Frederico S. Oliveira	163f9a3fdf	Merge branch 'coqui-ai:dev' into dev	2023-12-11 10:04:07 -03:00
Aaron-Li	b6e929696a	support multiple GPU training	2023-12-08 16:55:32 +08:00
Eren Gölge	e49c512d99	Merge pull request #3351 from aaron-lii/chinese-puncs fix pause problem of Chinese speech	2023-12-04 15:57:42 +01:00
Edresson Casanova	5f900f156a	Add XTTS Fine tuning gradio demo (#3296 ) * Add XTTS FT demo data processing pipeline * Add training and inference columns * Uses tabs instead of columns * Fix demo freezing issue * Update demo * Convert stereo to mono * Bug fix on XTTS inference * Update gradio demo * Update gradio demo * Update gradio demo * Update gradio demo * Add parameters to be able to set then on colab demo * Add erros messages * Add intuitive error messages * Update * Add max_audio_length parameter * Add XTTS fine-tuner docs * Update XTTS finetuner docs * Delete trainer to freeze memory * Delete unused variables * Add gc.collect() * Update xtts.md --------- Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-12-01 23:52:23 +01:00
Aaron-Li	7b8808186a	fix pause problem of Chinese speech	2023-12-01 23:30:03 +08:00
Frederico S. Oliveira	bcd500fa7b	Fixing bug Correction in training the Fastspeech/Fastspeech2/FastPitch/SpeedySpeech model using external speaker embedding.	2023-11-30 17:27:05 -03:00
Enno Hermann	39321d02be	fix: correctly strip/restore initial punctuation (#3336 ) * refactor(punctuation): remove orphan code for handling lone punctuation The case of lone punctuation is already handled at the top of restore(). The removed if statement would never be called and would in fact raise an AttributeError because the _punc_index named tuple doesn't have the attribute `mark`. * refactor(punctuation): remove unused argument * fix(punctuation): correctly handle initial punctuation Stripping and restoring initial punctuation didn't work correctly because the string-splitting caused an additional empty string to be inserted in the text list (because `".A".split(".")` => `["", "A"]`). Now, an initial empty string is skipped and relevant test cases are added. Fixes #3333	2023-11-30 13:03:16 +01:00
Eren G??lge	3b8894a3dd	Make style	2023-11-27 14:15:50 +01:00
Eren G??lge	11ec9f7471	Add hi in config defaults	2023-11-24 15:38:36 +01:00
Eren G??lge	32065139e7	Simple text cleaner for "hi"	2023-11-24 15:14:34 +01:00
Enno Hermann	2af0220996	fix: don't pass quotes to espeak (#3286 ) Previously, the text was wrapped in an additional set of quotes that was passed to Espeak. This could result in different phonemization in certain edges and caused the insertion of an initial separator "_" that had to be removed. Compare: $ espeak-ng -q -b 1 -v en-us --ipa=1 '"A"' _ˈɐ $ espeak-ng -q -b 1 -v en-us --ipa=1 'A' ˈeɪ Fixes #2619	2023-11-24 12:25:37 +01:00
Edresson Casanova	11283fce07	Ensures that only GPT model is in training mode during XTTS GPT training (#3241 ) * Ensures that only GPT model is in training mode during training * Fix parallel wavegan unit test	2023-11-17 15:13:46 +01:00
Eren G??lge	44880f09ed	Make style	2023-11-17 13:43:34 +01:00
Eren G??lge	26efdf6ee7	Make k_diffusion optional	2023-11-17 13:42:33 +01:00
Julian Weber	fbc18b8c34	Fix zh bug (#3238 )	2023-11-16 17:51:37 +01:00
Julian Weber	675f983550	Add sentence splitting (#3227 ) * Add sentence spliting * update requirements * update default args v2 * Add spanish * Fix return gpt_latents * Update requirements * Fix requirements	2023-11-16 11:01:11 +01:00
Edresson Casanova	73a5bd08c0	Fix XTTS GPT padding and inference issues (#3216 ) * Fix end artifact for fine tuning models * Bug fix on zh-cn inference * Remove ununsed code	2023-11-15 14:02:05 +01:00
Julian Weber	04901fb2e4	Add speed control for inference (#3214 ) * Add speed control for inference * Fix XTTS tests * Add speed control tests	2023-11-14 16:07:17 +01:00
Eren Gölge	ac3df409a6	Merge pull request #3208 from coqui-ai/fix_max_mel_len fix max generation length for XTTS	2023-11-13 14:32:56 +01:00
Eren G??lge	92fa988aec	Fixup	2023-11-13 13:44:06 +01:00
WeberJulian	b85536b23f	fix max generation length	2023-11-13 13:18:45 +01:00
Eren G??lge	b2682d39c5	Make style	2023-11-13 13:01:01 +01:00
Eren G??lge	a16360af85	Implement chunking gpt_cond	2023-11-13 13:00:08 +01:00
Enno Hermann	3b1e7038bc	fix(formatters): set missing root_path attribute (#3182 ) Fixes #2778	2023-11-09 16:49:52 +01:00
Aarni Koskela	a8e9163fb3	xtts/tokenizer: merge duplicate implementations of preprocess_text (#3170 ) This was found via ruff: > F811 Redefinition of unused `preprocess_text` from line 570	2023-11-09 16:32:12 +01:00
Matthew Boakes	1b9c400bca	PyTorch 2.1 Updates (Weight Norm and TorchAudio I/O) (#3176 ) * Replaced PyTorch weight_norm With parametrizations.weight_norm * TorchAudio: Migrating The I/O Functions To Use The Dispatcher Mechanism * Corrected Code Style --------- Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-11-09 16:31:03 +01:00
Gorkem	66a1e248d0	torchaudio should use proper backend to load audio (#3179 )	2023-11-09 16:28:39 +01:00
Julian Weber	03ad90135b	Add lang code in XTTS doc (#3158 ) * Add lang code in XTTS doc * Remove ununsed config and args * update docs * woops	2023-11-08 13:47:33 +01:00
Gorkem	78a596618a	Fix for exception on streaming if last chunk empty (#3160 )	2023-11-08 11:32:02 +01:00
Julian Weber	ce1a39a9a4	Add char limit warn (#3130 ) * Add char limit warning * Adding v2 langs * cached_property for cutlet * Fix import	2023-11-08 10:24:23 +01:00
Edresson Casanova	5f9ab6cfaa	Fix style Co-authored-by: Aarni Koskela <akx@iki.fi>	2023-11-06 19:22:34 -03:00
Edresson Casanova	09fb317e6d	Remove unused code	2023-11-06 17:36:32 -03:00
Edresson Casanova	b146de4ce8	Bug fix on XTTS v2.0 Trainer	2023-11-06 20:26:01 +01:00
Edresson Casanova	1b6f8d0e46	Update unit tests and recipes	2023-11-06 20:25:06 +01:00
Edresson Casanova	72b2bac0f8	Load reference in 24khz to avoid issued with multiple sr references	2023-11-06 20:25:06 +01:00
Edresson Casanova	00294ffdf6	Update XTTS docs	2023-11-06 20:24:06 +01:00
Edresson Casanova	459ad70dc8	Add support for multiples speaker references on XTTS inference	2023-11-06 20:22:35 +01:00
Eren Gölge	f0cb19ecca	Drop diffusion from XTTS (#3150 ) * Drop diffusion for XTTS * Make style * Drop diffusion deps in code * Restore thrashed	2023-11-06 20:15:49 +01:00
Eren G??lge	5d418bb84a	Update docs	2023-11-06 18:48:41 +01:00
Eren G??lge	9bbf6eb8dd	Drop use_ne_hifigan	2023-11-06 18:43:38 +01:00
Eren G??lge	9d54bd7655	Fixup XTTS	2023-11-06 18:13:58 +01:00
Edresson Casanova	e45227d9ff	XTTS v2.0 (#3137 ) * Implement most similar ref training approach * Use non-enhanced hifigan for test samples * Add Perceiver * Update GPT Trainer for perceiver support * Update XTTS docs * Bug fix masking with XTTS perceiver * Bug fix on gpt forward * Bug Fix on XTTS v2.0 training * Add XTTS v2.0 unit tests * Add XTTS v2.0 inference unit tests * Bug Fix on diffusion inference * Add XTTS v2.0 training recipe * Placeholder model entry * Add cloning params to config * Make prompt embedding configurable * Make cloning configurable * Cheap fix for a cheaper fix * Prevent resampling * Update model entry * Update docs * Update requirements * Code linting * Add xtts v2 to sep tests * Bug fix on XTTS get_gpt_cond_latents * Bug fix on rebase * Make style * Bug fix in Japenese tokenizer * Add num2words to deps * Remove unused kwarg and added num_beams=1 as default --------- Co-authored-by: Eren G??lge <egolge@coqui.ai>	2023-11-06 14:58:18 +01:00
Aarni Koskela	38f6f8f0bb	Run `make style` & re-enable it in CI (#3127 )	2023-11-06 11:36:37 +01:00
WeberJulian	1c98821359	Remove unused load_audio function	2023-10-27 22:27:18 +02:00
WeberJulian	d4e08c8d6c	Add features to get_conditioning_latents	2023-10-26 14:57:33 +02:00
WeberJulian	c1133724a1	Move lang token add to tokenizer	2023-10-26 14:52:13 +02:00
WeberJulian	6fa46d197d	Fix get_conditioning_latents when using only ne	2023-10-26 14:51:35 +02:00
Edresson Casanova	01839af926	Bug fix on XTTS masking training	2023-10-24 18:30:14 -03:00
Edresson Casanova	0f96abb5ec	Add FT inference example on XTTS docs	2023-10-23 13:23:30 -03:00
Edresson Casanova	37b7945474	Update XTTS train not implemented error to point to the XTTS docs	2023-10-23 11:39:17 -03:00
Edresson Casanova	ec7f54768a	Rebase bug fix and update recipe	2023-10-21 17:37:51 -03:00
Edresson Casanova	affaf11148	Add XTTS training unit test	2023-10-21 13:41:12 -03:00
Edresson Casanova	1f92741d6a	Fix issue #2971	2023-10-21 13:37:21 -03:00
Edresson Casanova	5f98dbeec9	Update Ljspeech XTTS recipe	2023-10-21 13:37:21 -03:00
Edresson Casanova	9e3598c3b7	Bug Fix on inference using XTTS trainer checkpoint	2023-10-21 13:37:21 -03:00
Edresson Casanova	c4ceaabe2c	Add test sentences during the training	2023-10-21 13:33:56 -03:00
Edresson Casanova	2f868dd5c2	Bug fix on reproducible evaluation	2023-10-21 13:33:56 -03:00
Edresson Casanova	bafab049c2	Add prompting masking	2023-10-21 13:33:56 -03:00
Edresson Casanova	47d613df3a	Add reproducible evaluation	2023-10-21 13:33:56 -03:00
Edresson Casanova	40a4e631ea	Update mel spectrogram for the style encoder	2023-10-21 13:33:56 -03:00
Edresson Casanova	a32961bcb4	Add XTTS base training code	2023-10-21 13:33:56 -03:00
Julian Weber	dad6a7b0b6	Preserve [ja] token of the text processing	2023-10-21 11:26:03 +02:00
Julian Weber	c7a16042e3	Remove global cutlet import	2023-10-21 11:18:58 +02:00
Edresson Casanova	59576fc0ec	Bug fix on XTTS v1.1 inference (#3093 ) * Bug fix on XTTS v1.1 inference * Update .models.json --------- Co-authored-by: Julian Weber <julian.weber@hotmail.fr>	2023-10-20 17:29:43 -03:00
Julian Weber	cf97116185	XTTS v1.1 (#3089 ) * Add support for ne_hifigan * Update model.json * Update hash * Fix model loading * Enhance text_normalization * Add xtts to zoo test exception * Add model hash check * Add get_number_tokens	2023-10-20 16:02:08 +02:00
Aya Jafari	ffddf10458	unit test fix	2023-10-13 10:56:47 -03:00
Aya Jafari	6eaecab0ca	fixed bugs in fastpitch tts synthesis	2023-10-10 23:02:31 -03:00
Julian Weber	e5e0cbffc9	Streaming inference for XTTS 🚀 (#3035 )	2023-10-06 18:34:06 +02:00
Edresson Casanova	4c3c11c958	Tortoise inference fix and fix zoo unit tests (#3010 )	2023-09-29 13:40:57 +02:00
Aarni Koskela	33a7c722f6	Merge duplicate on_train_step_start functions in delightful_tts	2023-09-27 01:10:44 +03:00
Aarni Koskela	861c68b0b8	Rename misnamed setter	2023-09-27 01:09:59 +03:00
Aarni Koskela	09e14e68db	Remove duplicate get_named_beta_schedules	2023-09-27 01:09:59 +03:00
Aarni Koskela	59f85a7122	Remove duplicate code from xtts.tokenizer	2023-09-27 01:09:59 +03:00
loupzeur	da8b6bbce1	fix: xtts not taking into account device flag (#2951 ) * fix: xtts not taking into account device flag * Style changes --------- Co-authored-by: Julian Weber <julian.weber@hotmail.fr>	2023-09-20 09:57:02 +02:00
Eren Gölge	4033db5f4b	🔥 XTTS implementation	2023-09-13 17:51:24 +02:00
Edresson Casanova	4d3f23b5d3	Add CML-TTS dataset YourTTS training recipe (#2934 )	2023-09-12 11:49:14 +02:00
Aleś Bułojčyk	fead04f779	Add phonemizer for Belarusian language (#2856 )	2023-08-28 11:20:45 +02:00
Eren Gölge	a7a96d08dd	Fix loading Bark (#2893 ) * Fixup hubert path * Make style	2023-08-26 11:59:00 +02:00
Jake Tae	409db505d2	Add device support in TTS and Synthesizer (#2855 ) * fix: resolve merge conflicts * fix: retain backwards compatability in functions * feature: utilize device for voice transfer * feature: use device for vocoder * chore: cleanup vocoder cpu logic * fix: add necessary vocoder output device check * fix: add necessary vocoder output device check * fix: indentation * fix: check if waveform is pt tensor before cpu conversion --------- Co-authored-by: Jake Tae <jaketae@Jakes-MacBook-Pro-2.local>	2023-08-14 21:04:44 +02:00
Eren Gölge	3a104d5c49	Update Studio API for XTTS (#2861 ) * Update Studio API for XTTS * Update the docs * Update README.md * Update README.md Update README	2023-08-13 12:04:12 +02:00
Eren G??lge	37b558ccb9	Make style	2023-08-11 12:55:23 +02:00
Eren G??lge	9a8352b8da	Fix import error with Bark	2023-08-11 03:33:59 +02:00
Eren Gölge	4186f42b21	Handle missing JA phonemizer (#2843 ) * Handle missing JA phonemizer * Make style	2023-08-07 13:19:38 +02:00
Javier	4e7f8cd021	Add fairseq onnx support and strict configuration, fixes some onnx errors (#2831 )	2023-08-04 11:02:59 +02:00
Eren Gölge	69f080eb47	Fix DelightfulTTS (#2823 ) * Fix tests * Make style	2023-07-31 13:52:45 +02:00
Eren Gölge	483888b9d8	Add kwargs to ignore extra arguments w/o error (#2822 )	2023-07-31 11:37:35 +02:00
Aleś Bułojčyk	d124f78430	Recipe for Belarusian TTS (#2756 ) * Changes from jhlfrfufyfn <jhlfrfufyfn@gmail.com> * Recipe for Belarusian TTS --------- Co-authored-by: jhlfrfufyfn <jhlfrfufyfn@gmail.com>	2023-07-31 10:26:21 +02:00
Javier	c140df5a58	Adds multi-language support for VITS onnx, fixes onnx inference error when speaker_id is None or not passed, fixes onnx exporting for models with init_discriminator=false (#2816 )	2023-07-31 10:19:49 +02:00
Eren Gölge	8aacb81849	Fix Tortoise load (#2791 ) * Remove key prunning in tortoise * Make lint	2023-07-24 13:42:47 +02:00
logan hart	6fdb88f8e2	Add Delightful-TTS implementation (#2095 ) * add configs * Update config file * Add model configs * Add model layers * Add layer files * Add layer modules * change config names * Add emotion manager * fIX missing ap bug * Fix missing ap bug * Add base TTS e2e class * Fix wrong variable name in load_tts_samples * Add training script * Remove range predictor and gaussian upsampling * Add helper function * Add vctk recipe * Add conformer docs * Fix linting in conformer.py * Add Docs * remove duplicate import * refactor args * Fix bugs * Removew emotion embedding * remove unused arg * Remove emotion embedding arg * Remove emotion embedding arg * fix style issues * Fix bugs * Fix bugs * Add unittests * make style * fix formatter bug * fix test * Add pyworld compute pitch func * Update requirments.txt * Fix dataset Bug * Chnge layer norm to instance norm * Add missing import * Remove emotions.py * remove ssim loss * Add init layers func to aligner * refactor model layers * remove audio_config arg * Rename loss func * Rename to delightful-tts * Rename loss func * Remove unused modules * refactor imports * replace audio config with audio processor * Add change sample rate option * remove broken resample func * update recipe * fix style, add config docs * fix tests and multispeaker embd dim * remove pyworld * Make style and fix inference * Split tts tests * Fixup * Fixup * Fixup * Add argument names * Set "random" speaker in the model Tortoise/Bark * Use a diff f0_cache path for delightfull tts * Fix delightful speaker handling * Fix lint * Make style --------- Co-authored-by: loganhart420 <loganartpersonal@gmail.com> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-07-24 13:41:26 +02:00
Eren Gölge	0de12ec5aa	API tests (#2790 ) * Separate API tests and only run when uplifted * Make style	2023-07-24 12:14:21 +02:00
Paul O'Leary McCann	c0aabb8596	Make Japanese-specific dependencies optional (#2776 ) * Don't install MeCab by default * Add optional [ja] deps, like [dev] etc * Add JA requirements file * Add JA requirements to requirements_all This should help the tests run.	2023-07-24 11:28:27 +02:00
Eren Gölge	672ec3b35e	Fix #2749 (#2750 )	2023-07-08 11:40:44 +02:00
Eren Gölge	a2984fb435	Fix #2745 (#2748 )	2023-07-07 20:23:27 +02:00
Eren Gölge	7b5c8422c8	Export multispeaker onnx (#2743 )	2023-07-06 13:36:50 +02:00
ZhouGongZaiShi	d5f16d77c2	delete meaningless print() (#2662 )	2023-07-04 11:38:17 +02:00
Eren G??lge	cb9c320691	Fixup	2023-06-30 14:13:11 +02:00
Eren G??lge	91cc11d636	Remove commented codes	2023-06-28 12:14:37 +02:00
Eren G??lge	6b9ebf5aab	Merge branch 'p3_11' into dev	2023-06-28 12:13:04 +02:00
Eren Gölge	c844b6570a	Inference API for 🐶Bark (#2685 ) * Add bark requirements * Draft Bark implementation * Download HF models * Update synthesizer * Add bark model * Make style * Update pylintrc * Update model URLs * Update Bark Config * Fix here and ther * Make style * Make lint * Update requirements * Update requirements	2023-06-28 11:55:27 +02:00
Eren G??lge	a13b1352a4	Fixup	2023-06-26 19:30:26 +02:00
Eren G??lge	17ac188958	Drop fairseq for Hubert	2023-06-26 19:27:48 +02:00
Eren G??lge	c03768bb53	Make style	2023-06-26 17:16:26 +02:00
Eren G??lge	a1c431e6a9	Fixups	2023-06-26 12:55:18 +02:00
Eren Gölge	fff8b762bc	Merge branch 'dev' into bark	2023-06-21 15:49:05 +02:00
Eren Gölge	4cf8652392	Fix Tortoise load (#2697 ) * Handle missing gpt weights * Make style * Fix lint	2023-06-21 15:42:01 +02:00
Eren G??lge	cf98ae04df	Make lint	2023-06-21 12:05:08 +02:00
Eren G??lge	3b9fca2398	Make style	2023-06-21 12:02:06 +02:00
Eren G??lge	0f8932a6a9	Fix here and ther	2023-06-21 11:59:27 +02:00
Eren G??lge	03c347b7f3	Update Bark Config	2023-06-21 11:58:18 +02:00
Eren G??lge	f4c88ed677	Make style	2023-06-19 14:22:32 +02:00
Eren G??lge	37b708dac7	Add bark model	2023-06-19 14:16:06 +02:00
Eren G??lge	f59da4dba5	Draft Bark implementation	2023-06-12 14:32:39 +02:00
Tsai Meng-Ting	d65819422b	Update stochastic_duration_predictor.py (#2663 ) fix a typo	2023-06-12 11:10:54 +02:00
Eren Gölge	e785d101a1	Port Fairseq TTS models (#2628 ) * Load fairseq models * Add docs and missing files * Managing fairseq models and docs for API * Make style * Use scarf URL * Add tests * Fix URL * Pass cpu * Make lint * Fixup * Make lint * fixup * Fixup * Change tokenization order * Update README * Fixup * Fixup	2023-06-05 11:15:13 +02:00
Eren Gölge	9e99e0f42d	Disable reduction	2023-05-18 11:12:51 +02:00
Eren Gölge	4de797bb11	Draft ONNX export for VITS (#2563 ) * Draft ONNX export for VITS Could not get it work to output variable length sequence * Fixup for onnx constant output * Make style * Remove commented code	2023-05-16 01:07:56 +02:00
manmay nakhashi	a3d5801c44	Tortoise TTS inference (#2547 ) * initial commit * Tortoise inference * revert path change * style fix * remove accidental remove * style fixes * style fixes * removed unwanted assests and deps * remove changes * remove cvvp * style fix black * added tortoise config and updated config and args, refactoring the code * added tortoise to api * Pull mel_norm from url * Use TTS cleaners * Let download model files * add ability to pass tortoise presets through coqui api * fix tests * fix style and tests * fix tts commandline for tortoise * Add config.json to tortoise * Use kwargs * Use regular model api for loading tortoise * Add load from dir to synthesizer * Fix Tortoise floats * Use model_dir when there are multiple urls * Use `synthesize` when exists * lint fixes and resolve preset bug * resolve a download bug and update model link * fix json * do tortoise inference from voice dir * fix * fix test * fix speaker id and remove assests * update inference_tests.yml * replace inference_test.yml * fix extra dir as None * fix tests * remove space * Reformat docstring * Add docs * Update docs * lint fixes --------- Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-05-16 00:58:21 +02:00
Michael Görner	27e237ed08	use default_factory for audio parameter (#2576 ) Python 3.11 complains about the mutable default and other members were already adapted to use the factory, so I expect this line just went unnoticed until now.	2023-05-08 11:17:36 +02:00
Eren Gölge	1a6a5710fd	Make lint	2023-04-17 15:02:56 +02:00
Eren Gölge	2533a18d62	Add BN tests	2023-04-17 13:37:10 +02:00
Eren Gölge	2d49c05259	Remove import	2023-04-17 13:05:29 +02:00
Eren Gölge	cd83991067	Add BN phonemizer	2023-04-17 12:54:00 +02:00
Matthew Boakes	4c829e74a1	Update Librosa Version To V0.10.0	2023-04-05 00:59:20 +01:00
Yingzhi WANG	95fa2c9fd6	fix typo (#2475 )	2023-04-03 23:31:09 +02:00
p0p	91cf1b2da9	[minor] batch["speaker_ids"] getting set two times (#2470 ) * [minor] batch["speaker_ids"] getting set two times just to make it consistent with language_ids * Update vits.py style.	2023-04-03 11:35:21 +02:00
Eren Gölge	d309f50e53	Implement FreeVC (#2451 ) * Update .gitignore * Draft FreeVC implementation * Tests and relevant updates * Update API tests * Add missings * Update requirements * :( * Lazy handle for vc * Update docs for voice conversion * Make style	2023-03-25 18:33:23 +01:00
Khalid Bashir	14c80dd1fd	vits.py training fixed due to return_complex (#2418 ) Torch set default value for `return_complex=True` for `torch.stft` method This turned warning into error:- ``` Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1591, in fit self._fit() File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1544, in _fit self.train_epoch() File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1309, in train_epoch _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time) File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1162, in train_step outputs, loss_dict_new, step_time = self._optimize( File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 1023, in _optimize outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx) File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 970, in _model_train_step return model.train_step(*input_args) File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 1293, in train_step mel_slice_hat = wav_to_mel( File "/workspace/coqui-tts/TTS/tts/models/vits.py", line 191, in wav_to_mel spec = torch.stft( File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 641, in stft return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. ```	2023-03-19 00:22:04 +01:00
Roee Shenberg	3c15f0619a	Bug fixes in OverFlow audio generation (#2380 )	2023-03-15 12:02:11 +01:00
Daniel Vera Nieto	dfb48737fb	Style fixed	2023-03-13 16:11:15 +01:00
Dani Vera	0d12229b64	Update vits.py This should fix the issue https://github.com/coqui-ai/TTS/issues/1986 without breaking batch data sampling.	2023-03-10 18:35:16 +01:00
manmay nakhashi	624513018d	add energy by default to Fastspeech2 config (#2326 ) * add energy by default * added energy to base tts * fix energy dataset * fix styles * fix test	2023-03-06 10:20:25 +01:00
thennal10	d39bc74f57	OverFlow with test sentences (#2253 ) * Fix typo in function definiton * Swap hasattr out hasattr(self, "speaker_manager") and hasattr(self, "language_manager") seems to be redundant since BaseTTS defines both.	2023-03-01 09:11:30 +01:00
Eren Gölge	914280a556	Bump up to v0.11.0 (#2329 ) * Make style * Bump up to v0.11.0	2023-02-08 13:58:49 +01:00
Martin Weinelt	994be163e1	Use packaging.version for version comparisons (#2310 ) * Use packaging.version for version comparisons The distutils package is deprecated¹ and relies on PEP 386² version comparisons, which have been superseded by PEP 440³ which is implemented through the packaging module. With more recent distutils versions, provided through setuptools vendoring, we are seeing the following exception during version comparisons: > TypeError: '<' not supported between instances of 'str' and 'int' This is fixed by this migration. [1] https://docs.python.org/3/library/distutils.html [2] https://peps.python.org/pep-0386/ [3] https://peps.python.org/pep-0440/ * Improve espeak version detection robustness On many modern systems espeak is just a symlink to espeak-ng. In that case looking for the 3rd word in the version output will break the version comparison, when it finds `text-to-speech:`, instead of a proper version. This will not break during runtime, where espeak-ng would be prioritized, but the phonemizer and tokenizer tests force the backend to `espeak`, which exhibits this breakage. This improves the version detection by simply looking for the version after the "text-to-speech:" token. * Replace distuils.copy_tree with shutil.copytree The distutils module is deprecated and slated for removal in Python 3.12. Its usage should be replaced, in this case by a compatible method from shutil.	2023-01-29 23:47:00 +01:00
Gerard Sant Muniesa	c59b3f75b8	Add Catalan text cleaners for Catalan support (#2295 )	2023-01-23 11:56:30 +01:00
Shivam Mehta	d83ee8fe45	Adding neural HMM TTS Model (#2272 ) * Adding neural HMM TTS * Adding tests * Adding neural hmm on readme * renaming training recipe * Removing overflow\s decoder parameters from the config * Update the Trainer requirement version for a compatible one (#2276) * Bump up to v0.10.2 * Adding neural HMM TTS * Adding tests * Adding neural hmm on readme * renaming training recipe * Removing overflow\s decoder parameters from the config * fixing documentation Co-authored-by: Edresson Casanova <edresson1@gmail.com> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-01-23 11:53:04 +01:00
Eren Gölge	497f22b20b	Cache speaker encoder model (#2284 )	2023-01-23 11:49:51 +01:00
Eren G??lge	6e3f74fc29	Fix #2191	2023-01-15 23:11:57 +01:00
manmay nakhashi	bc422f2f3c	Fastspeech2 (#2073 ) * added EnergyDataset * add energy to Dataset * add comupte_energy * added energy params * added energy to forward_tts * added plot_avg_energy for visualisation * Update forward_tts.py * create file * added fastspeech2 recipe * add fastspeech2 config * removed energy from fast pitch * add energy loss to forward tts * Update fastspeech2_config.py * change run_name * Update numpy_transforms.py * fix typo * fix typo * fix typo * linting issues * use_energy default value --> False * Update numpy_transforms.py * linting fixes * fix typo * liniting_fix * liniting_fix * fix * fixes * fixes * lint fix * lint fixws * added training test * wrong import * wrong import * trailing whitespace * style fix * changed class name because of error * class name change * class name change * change class name * fixed styles	2023-01-15 22:39:22 +01:00
Khalid Bashir	42afad5e79	Fixed bug related to yourtts speaker embeddings issue (#2234 ) * Fixed bug related to yourtts speaker embeddings issue * Reverted code for base_tts * Bug fix on VITS d_vector_file type * Ignore the test speakers on YourTTS recipe * Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference * Update YourTTS config file * Update ModelManager._update_path to deal with list attributes * Fix lint checks * Remove unused code * Fix unit tests * Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files * Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes Co-authored-by: Edresson Casanova <edresson1@gmail.com>	2023-01-02 14:20:02 +01:00
Julian Weber	a07397733b	Multilingual tokenizer (#2229 ) * Implement multilingual tokenizer * Add multi_phonemizer receipe * Fix lint * Add TestMultiPhonemizer * Fix lint * make style	2023-01-02 10:03:19 +01:00
Eren Gölge	a9167cf239	Fixup overflow (#2218 ) * Update overflow config * Pulling shuffle and drop_last from config * Print training stats for overflow	2022-12-15 00:56:48 +01:00
Eren Gölge	ecea43ec81	Adding pre-trained Overflow model (#2211 ) * Adding pretrained Overflow model * Stabilize HMM * Fixup model manager * Return `audio_unique_name` by default * Distribute max split size over datasets * Fixup eval_split_size * Make style	2022-12-14 16:55:48 +01:00
Shivam Mehta	3b8b105b0d	Adding OverFlow (#2183 ) * Adding encoder * currently modifying hmm * Adding hmm * Adding overflow * Adding overflow setting up flat start * Removing runs * adding normalization parameters * Fixing models on same device * Training overflow and plotting evaluations * Adding inference * At the end of epoch the test sentences are coming on cpu instead of gpu * Adding figures from model during training to monitor * reverting tacotron2 training recipe * fixing inference on gpu for test sentences on config * moving helpers and texts within overflows source code * renaming to overflow * moving loss to the model file * Fixing the rename * Model training but not plotting the test config sentences's audios * Formatting logs * Changing model name to camelcase * Fixing test log * Fixing plotting bug * Adding some tests * Adding more tests to overflow * Adding all tests for overflow * making changes to camel case in config * Adding information about parameters and docstring * removing compute_mel_statistics moved statistic computation to the model instead * Added overflow in readme * Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json	2022-12-12 12:44:15 +01:00
p0p4k	2e153d54a8	Adding missing key to formatter (#2194 ) quick fix for #2156. added 'root_path' key.	2022-12-12 12:25:37 +01:00
Eren Gölge	fdeefcc612	Handle espeak 1.48.15 (#2203 )	2022-12-12 11:23:45 +01:00
Edresson Casanova	ee20e30958	Fix VITS multi-speaker voice conversion inference	2022-12-05 09:15:01 -03:00

1 2 3 4 5 ...

1055 Commits