coqui-tts

Commit Graph

Author	SHA1	Message	Date
Enno Hermann	e5c208d254	feat(cleaners): add multilingual phoneme cleaner This doesn't convert numbers into English words.	2024-06-14 15:06:03 +02:00
Enno Hermann	03de4b889e	docs: fix readthedocs links [ci skip]	2024-06-13 22:48:34 +02:00
Enno Hermann	07cbcf825c	fix(espeak_wrapper): read phonemize() input from file Avoids utf8 encoding issues on Windows when passing the text directly. Fixes https://github.com/coqui-ai/TTS/discussions/3761	2024-05-29 10:10:05 +02:00
Enno Hermann	49fcbd908b	fix(espeak_wrapper): avoid stuck process on windows Fixes #24	2024-05-29 07:39:03 +02:00
Enno Hermann	203f60f1e1	refactor(espeak_wrapper): remove sync argument _espeak_exe is always called with sync=True, so remove code for sync==False	2024-05-28 21:30:55 +02:00
Enno Hermann	df088e99df	Merge pull request #19 from idiap/toml Move from setup.py to pyproject.toml, simplify requirements	2024-05-27 08:59:09 +01:00
Enno Hermann	018f1e6453	docs(bark): update docstrings and type hints	2024-05-15 22:56:55 +02:00
Enno Hermann	59a6c9fdf2	fix(bark): add missing argument for load_voice() Fixes https://github.com/coqui-ai/TTS/issues/2795	2024-05-15 22:56:28 +02:00
Enno Hermann	6d563af623	chore: remove obsolete code for torch<2 Minimum torch version is 2.1 now.	2024-05-08 18:08:40 +02:00
Enno Hermann	865a48156d	fix: make korean g2p deps optional	2024-05-08 18:08:40 +02:00
Enno Hermann	55ed162f2a	fix: make chinese g2p deps optional	2024-05-08 18:08:40 +02:00
Enno Hermann	ea893c3795	fix: make bangla g2p deps optional	2024-05-08 18:08:40 +02:00
Enno Hermann	ec50006855	style: run pre-commit Automatic changes from: pre-commit run --all-files	2024-05-08 12:17:47 +02:00
Enno Hermann	fb92e13ebb	build: remove unused/obsolete code	2024-05-08 12:13:41 +02:00
Enno Hermann	962f9bbbcf	refactor(espeak_wrapper): fix ruff lint suggestions	2024-05-01 13:31:39 +02:00
Enno Hermann	7b2289a454	fix(espeak_wrapper): capture stderr separately Fixes https://github.com/coqui-ai/TTS/issues/2728 Previously, error messages from espeak were treated as normal output and also converted to phonemes. This captures and logs them separately.	2024-05-01 12:31:49 +02:00
Enno Hermann	52a52b5e21	fix(LanguageManager): allow initialisation from config with language ids file Previously, running `LanguageManager.init_from_config(config)` would never use the `language_ids_file` if that field is present because it was overwritten in the next line with a new manager that manually parses languages from the datasets in the config. Now that is only used as a fallback.	2024-04-19 11:57:27 +02:00
Enno Hermann	b3c9685aee	fix(tokenizer): add debug logging	2024-04-11 16:58:12 +02:00
Enno Hermann	2ad790d169	Merge pull request #4 from idiap/hindi feat(xtts): support Hindi for sentence-splitting and fine-tuning	2024-04-11 16:49:44 +02:00
Enno Hermann	d41686502e	feat(xtts): support hindi for sentence-splitting and fine-tuning The XTTS model itself already supports Hindi, it was just in these components.	2024-04-08 15:57:56 +02:00
Enno Hermann	b711e19cb6	refactor: remove verbose arguments Can be handled by adjusting logging levels instead.	2024-04-03 15:19:45 +02:00
Enno Hermann	b6ab85a050	fix: use logging instead of print statements Fixes #1691	2024-04-03 15:19:45 +02:00
Enno Hermann	d772724125	fix: update repository links, package names, metadata	2024-04-03 12:02:44 +02:00
Enno Hermann	7630abb43f	refactor(bin.find_unique_chars): use existing function	2024-03-30 22:22:40 +01:00
Enno Hermann	adbcba06da	refactor(dataset): get audio length with torchaudio Removes a (GPL) dependency	2024-03-14 20:48:29 +01:00
Enno Hermann	e5c6da1c98	Merge pull request #20 from eginhard/return-complex fix: torch.stft will soon require return_complex=True	2024-03-13 13:50:21 +01:00
Enno Hermann	e95f8950eb	fix: torch.stft will soon require return_complex=True Refactor that removes the deprecation warning: torch.view_as_real(torch.stft(, return_complex=True)) is equal to torch.stft(, return_complex=False) https://pytorch.org/docs/stable/generated/torch.stft.html	2024-03-13 12:06:27 +01:00
Enno Hermann	89a061f1d1	docs(tts.models.vits): clarify use of discriminator/generator [ci skip]	2024-03-12 18:59:05 +01:00
Enno Hermann	2e8f47a33d	Merge pull request #10 from eginhard/fix-pinyin fix chinese pinyin phonemes	2024-03-09 16:23:28 +01:00
Enno Hermann	309f39a45f	fix(xtts_manager): name_to_id() should return dict This is how the other embedding managers work	2024-03-08 14:47:00 +01:00
Enno Hermann	efdafd5a7f	style: run black	2024-03-07 11:46:51 +01:00
Enno Hermann	017c84d005	style: make style && make lint	2024-03-06 22:45:35 +01:00
Enno Hermann	e05243c4c8	refactor: read/write csv files with standard library	2024-03-06 16:18:09 +01:00
Enno Hermann	24298da5fc	Merge pull request #1 from eginhard/lint-overhaul Lint overhaul (pylint to ruff)	2024-03-06 16:10:26 +01:00
wangjie	b184e9f0fe	fix chinese pinyin phonemes	2024-01-12 09:11:56 +08:00
Edresson Casanova	5dcc16d193	Bug fix in MP3 and FLAC compute length on TTSDataset (#3092 ) * Bug Fix on XTTS load * Bug fix in MP3 length on TTSDataset * Update TTS/tts/datasets/dataset.py Co-authored-by: Aarni Koskela <akx@iki.fi> * Uses mutagen for all audio formats * Add dataloader test wit hall supported audio formats * Use mutagen.File * Update * Fix aux unit tests * Bug fixe on unit tests --------- Co-authored-by: Aarni Koskela <akx@iki.fi>	2023-12-27 13:23:43 -03:00
Aarni Koskela	d6ea806469	Run `make style`	2023-12-13 14:56:41 +02:00
Aarni Koskela	bd172dabbf	xtts/stream_generator: remove duplicate import + code	2023-12-13 14:56:41 +02:00
Aarni Koskela	32abb1a7c4	xtts/perceiver_encoder: Delete duplicate exists()	2023-12-13 14:56:41 +02:00
Aarni Koskela	aa549e9028	Fix trailing whitespace	2023-12-13 14:56:41 +02:00
Aarni Koskela	33b69c6c09	Add some noqa directives (for now)	2023-12-13 14:56:41 +02:00
Aarni Koskela	00f8f4892a	Ruff autofix unnecessary passes	2023-12-13 14:56:41 +02:00
Aarni Koskela	bc2cf296a3	Ruff autofix PLW3301	2023-12-13 14:56:41 +02:00
Aarni Koskela	64bb41f4fa	Ruff autofix C41	2023-12-13 14:56:41 +02:00
Aarni Koskela	449820ec7d	Ruff autofix E71*	2023-12-13 14:56:41 +02:00
Aarni Koskela	90991e89b4	Ruff autofix unused imports and import order	2023-12-13 14:56:41 +02:00
Eren Gölge	8c1a8b522b	Merge pull request #3405 from coqui-ai/studio_speakers Add studio speakers to open source XTTS!	2023-12-12 16:10:09 +01:00
Eren Gölge	934b87bbd1	Merge pull request #3391 from aaron-lii/multi-gpu support multiple GPU training for XTTS	2023-12-12 13:51:26 +01:00
WeberJulian	5cd750ac7e	Fix API and CI	2023-12-11 20:21:53 +01:00
WeberJulian	e3c9dab7a3	Make CLI work	2023-12-11 18:49:18 +01:00
WeberJulian	a5c0d9780f	rename manager	2023-12-11 18:48:31 +01:00
WeberJulian	36143fee26	Add basic speaker manager	2023-12-11 15:25:46 +01:00
Frederico S. Oliveira	163f9a3fdf	Merge branch 'coqui-ai:dev' into dev	2023-12-11 10:04:07 -03:00
Aaron-Li	b6e929696a	support multiple GPU training	2023-12-08 16:55:32 +08:00
Eren Gölge	e49c512d99	Merge pull request #3351 from aaron-lii/chinese-puncs fix pause problem of Chinese speech	2023-12-04 15:57:42 +01:00
Edresson Casanova	5f900f156a	Add XTTS Fine tuning gradio demo (#3296 ) * Add XTTS FT demo data processing pipeline * Add training and inference columns * Uses tabs instead of columns * Fix demo freezing issue * Update demo * Convert stereo to mono * Bug fix on XTTS inference * Update gradio demo * Update gradio demo * Update gradio demo * Update gradio demo * Add parameters to be able to set then on colab demo * Add erros messages * Add intuitive error messages * Update * Add max_audio_length parameter * Add XTTS fine-tuner docs * Update XTTS finetuner docs * Delete trainer to freeze memory * Delete unused variables * Add gc.collect() * Update xtts.md --------- Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-12-01 23:52:23 +01:00
Aaron-Li	7b8808186a	fix pause problem of Chinese speech	2023-12-01 23:30:03 +08:00
Frederico S. Oliveira	bcd500fa7b	Fixing bug Correction in training the Fastspeech/Fastspeech2/FastPitch/SpeedySpeech model using external speaker embedding.	2023-11-30 17:27:05 -03:00
Enno Hermann	39321d02be	fix: correctly strip/restore initial punctuation (#3336 ) * refactor(punctuation): remove orphan code for handling lone punctuation The case of lone punctuation is already handled at the top of restore(). The removed if statement would never be called and would in fact raise an AttributeError because the _punc_index named tuple doesn't have the attribute `mark`. * refactor(punctuation): remove unused argument * fix(punctuation): correctly handle initial punctuation Stripping and restoring initial punctuation didn't work correctly because the string-splitting caused an additional empty string to be inserted in the text list (because `".A".split(".")` => `["", "A"]`). Now, an initial empty string is skipped and relevant test cases are added. Fixes #3333	2023-11-30 13:03:16 +01:00
Eren G??lge	3b8894a3dd	Make style	2023-11-27 14:15:50 +01:00
Eren G??lge	11ec9f7471	Add hi in config defaults	2023-11-24 15:38:36 +01:00
Eren G??lge	32065139e7	Simple text cleaner for "hi"	2023-11-24 15:14:34 +01:00
Enno Hermann	2af0220996	fix: don't pass quotes to espeak (#3286 ) Previously, the text was wrapped in an additional set of quotes that was passed to Espeak. This could result in different phonemization in certain edges and caused the insertion of an initial separator "_" that had to be removed. Compare: $ espeak-ng -q -b 1 -v en-us --ipa=1 '"A"' _ˈɐ $ espeak-ng -q -b 1 -v en-us --ipa=1 'A' ˈeɪ Fixes #2619	2023-11-24 12:25:37 +01:00
Edresson Casanova	11283fce07	Ensures that only GPT model is in training mode during XTTS GPT training (#3241 ) * Ensures that only GPT model is in training mode during training * Fix parallel wavegan unit test	2023-11-17 15:13:46 +01:00
Eren G??lge	44880f09ed	Make style	2023-11-17 13:43:34 +01:00
Eren G??lge	26efdf6ee7	Make k_diffusion optional	2023-11-17 13:42:33 +01:00
Julian Weber	fbc18b8c34	Fix zh bug (#3238 )	2023-11-16 17:51:37 +01:00
Julian Weber	675f983550	Add sentence splitting (#3227 ) * Add sentence spliting * update requirements * update default args v2 * Add spanish * Fix return gpt_latents * Update requirements * Fix requirements	2023-11-16 11:01:11 +01:00
Edresson Casanova	73a5bd08c0	Fix XTTS GPT padding and inference issues (#3216 ) * Fix end artifact for fine tuning models * Bug fix on zh-cn inference * Remove ununsed code	2023-11-15 14:02:05 +01:00
Julian Weber	04901fb2e4	Add speed control for inference (#3214 ) * Add speed control for inference * Fix XTTS tests * Add speed control tests	2023-11-14 16:07:17 +01:00
Eren Gölge	ac3df409a6	Merge pull request #3208 from coqui-ai/fix_max_mel_len fix max generation length for XTTS	2023-11-13 14:32:56 +01:00
Eren G??lge	92fa988aec	Fixup	2023-11-13 13:44:06 +01:00
WeberJulian	b85536b23f	fix max generation length	2023-11-13 13:18:45 +01:00
Eren G??lge	b2682d39c5	Make style	2023-11-13 13:01:01 +01:00
Eren G??lge	a16360af85	Implement chunking gpt_cond	2023-11-13 13:00:08 +01:00
Enno Hermann	3b1e7038bc	fix(formatters): set missing root_path attribute (#3182 ) Fixes #2778	2023-11-09 16:49:52 +01:00
Aarni Koskela	a8e9163fb3	xtts/tokenizer: merge duplicate implementations of preprocess_text (#3170 ) This was found via ruff: > F811 Redefinition of unused `preprocess_text` from line 570	2023-11-09 16:32:12 +01:00
Matthew Boakes	1b9c400bca	PyTorch 2.1 Updates (Weight Norm and TorchAudio I/O) (#3176 ) * Replaced PyTorch weight_norm With parametrizations.weight_norm * TorchAudio: Migrating The I/O Functions To Use The Dispatcher Mechanism * Corrected Code Style --------- Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-11-09 16:31:03 +01:00
Gorkem	66a1e248d0	torchaudio should use proper backend to load audio (#3179 )	2023-11-09 16:28:39 +01:00
Julian Weber	03ad90135b	Add lang code in XTTS doc (#3158 ) * Add lang code in XTTS doc * Remove ununsed config and args * update docs * woops	2023-11-08 13:47:33 +01:00
Gorkem	78a596618a	Fix for exception on streaming if last chunk empty (#3160 )	2023-11-08 11:32:02 +01:00
Julian Weber	ce1a39a9a4	Add char limit warn (#3130 ) * Add char limit warning * Adding v2 langs * cached_property for cutlet * Fix import	2023-11-08 10:24:23 +01:00
Edresson Casanova	5f9ab6cfaa	Fix style Co-authored-by: Aarni Koskela <akx@iki.fi>	2023-11-06 19:22:34 -03:00
Edresson Casanova	09fb317e6d	Remove unused code	2023-11-06 17:36:32 -03:00
Edresson Casanova	b146de4ce8	Bug fix on XTTS v2.0 Trainer	2023-11-06 20:26:01 +01:00
Edresson Casanova	1b6f8d0e46	Update unit tests and recipes	2023-11-06 20:25:06 +01:00
Edresson Casanova	72b2bac0f8	Load reference in 24khz to avoid issued with multiple sr references	2023-11-06 20:25:06 +01:00
Edresson Casanova	00294ffdf6	Update XTTS docs	2023-11-06 20:24:06 +01:00
Edresson Casanova	459ad70dc8	Add support for multiples speaker references on XTTS inference	2023-11-06 20:22:35 +01:00
Eren Gölge	f0cb19ecca	Drop diffusion from XTTS (#3150 ) * Drop diffusion for XTTS * Make style * Drop diffusion deps in code * Restore thrashed	2023-11-06 20:15:49 +01:00
Eren G??lge	5d418bb84a	Update docs	2023-11-06 18:48:41 +01:00
Eren G??lge	9bbf6eb8dd	Drop use_ne_hifigan	2023-11-06 18:43:38 +01:00
Eren G??lge	9d54bd7655	Fixup XTTS	2023-11-06 18:13:58 +01:00
Edresson Casanova	e45227d9ff	XTTS v2.0 (#3137 ) * Implement most similar ref training approach * Use non-enhanced hifigan for test samples * Add Perceiver * Update GPT Trainer for perceiver support * Update XTTS docs * Bug fix masking with XTTS perceiver * Bug fix on gpt forward * Bug Fix on XTTS v2.0 training * Add XTTS v2.0 unit tests * Add XTTS v2.0 inference unit tests * Bug Fix on diffusion inference * Add XTTS v2.0 training recipe * Placeholder model entry * Add cloning params to config * Make prompt embedding configurable * Make cloning configurable * Cheap fix for a cheaper fix * Prevent resampling * Update model entry * Update docs * Update requirements * Code linting * Add xtts v2 to sep tests * Bug fix on XTTS get_gpt_cond_latents * Bug fix on rebase * Make style * Bug fix in Japenese tokenizer * Add num2words to deps * Remove unused kwarg and added num_beams=1 as default --------- Co-authored-by: Eren G??lge <egolge@coqui.ai>	2023-11-06 14:58:18 +01:00
Aarni Koskela	38f6f8f0bb	Run `make style` & re-enable it in CI (#3127 )	2023-11-06 11:36:37 +01:00
WeberJulian	1c98821359	Remove unused load_audio function	2023-10-27 22:27:18 +02:00
WeberJulian	d4e08c8d6c	Add features to get_conditioning_latents	2023-10-26 14:57:33 +02:00
WeberJulian	c1133724a1	Move lang token add to tokenizer	2023-10-26 14:52:13 +02:00
WeberJulian	6fa46d197d	Fix get_conditioning_latents when using only ne	2023-10-26 14:51:35 +02:00
Edresson Casanova	01839af926	Bug fix on XTTS masking training	2023-10-24 18:30:14 -03:00

1 2 3 4 5 ...

1055 Commits