coqui-tts

Commit Graph

Author	SHA1	Message	Date
logan hart	6fdb88f8e2	Add Delightful-TTS implementation (#2095 ) * add configs * Update config file * Add model configs * Add model layers * Add layer files * Add layer modules * change config names * Add emotion manager * fIX missing ap bug * Fix missing ap bug * Add base TTS e2e class * Fix wrong variable name in load_tts_samples * Add training script * Remove range predictor and gaussian upsampling * Add helper function * Add vctk recipe * Add conformer docs * Fix linting in conformer.py * Add Docs * remove duplicate import * refactor args * Fix bugs * Removew emotion embedding * remove unused arg * Remove emotion embedding arg * Remove emotion embedding arg * fix style issues * Fix bugs * Fix bugs * Add unittests * make style * fix formatter bug * fix test * Add pyworld compute pitch func * Update requirments.txt * Fix dataset Bug * Chnge layer norm to instance norm * Add missing import * Remove emotions.py * remove ssim loss * Add init layers func to aligner * refactor model layers * remove audio_config arg * Rename loss func * Rename to delightful-tts * Rename loss func * Remove unused modules * refactor imports * replace audio config with audio processor * Add change sample rate option * remove broken resample func * update recipe * fix style, add config docs * fix tests and multispeaker embd dim * remove pyworld * Make style and fix inference * Split tts tests * Fixup * Fixup * Fixup * Add argument names * Set "random" speaker in the model Tortoise/Bark * Use a diff f0_cache path for delightfull tts * Fix delightful speaker handling * Fix lint * Make style --------- Co-authored-by: loganhart420 <loganartpersonal@gmail.com> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-07-24 13:41:26 +02:00
Eren Gölge	d309f50e53	Implement FreeVC (#2451 ) * Update .gitignore * Draft FreeVC implementation * Tests and relevant updates * Update API tests * Add missings * Update requirements * :( * Lazy handle for vc * Update docs for voice conversion * Make style	2023-03-25 18:33:23 +01:00
manmay nakhashi	624513018d	add energy by default to Fastspeech2 config (#2326 ) * add energy by default * added energy to base tts * fix energy dataset * fix styles * fix test	2023-03-06 10:20:25 +01:00
thennal10	d39bc74f57	OverFlow with test sentences (#2253 ) * Fix typo in function definiton * Swap hasattr out hasattr(self, "speaker_manager") and hasattr(self, "language_manager") seems to be redundant since BaseTTS defines both.	2023-03-01 09:11:30 +01:00
Eren Gölge	914280a556	Bump up to v0.11.0 (#2329 ) * Make style * Bump up to v0.11.0	2023-02-08 13:58:49 +01:00
Eren G??lge	6e3f74fc29	Fix #2191	2023-01-15 23:11:57 +01:00
Eren Gölge	a9167cf239	Fixup overflow (#2218 ) * Update overflow config * Pulling shuffle and drop_last from config * Print training stats for overflow	2022-12-15 00:56:48 +01:00
Eren Gölge	ecea43ec81	Adding pre-trained Overflow model (#2211 ) * Adding pretrained Overflow model * Stabilize HMM * Fixup model manager * Return `audio_unique_name` by default * Distribute max split size over datasets * Fixup eval_split_size * Make style	2022-12-14 16:55:48 +01:00
Victor Shepardson	5307a2229b	Fix Capacitron training (#2086 )	2022-11-01 12:52:06 +01:00
Eren Gölge	9e5a469c64	d-vector handling (#1945 ) * Update BaseDatasetConfig - Add dataset_name - Chane name to formatter_name * Update compute_embedding - Allow entering dataset by args - Use released model by default - Use the new key format * Update loading * Update recipes * Update other dep code * Update tests * Fixup * Load multiple embedding files * Fix argument names in dep code * Update docs * Fix argument name * Fix linter	2022-09-13 14:10:33 +02:00
manmay nakhashi	7fd9b89ebf	fix get_random_embeddings --> get_random_embedding (#1726 ) * fix get_random_embeddings --> get_random_embedding function typo leads to training crash, no such function * fix typo get_random_embedding	2022-08-07 14:06:03 +02:00
Eren Gölge	f70e82cd19	Use fsspec and torch for embedding file IO (#1581 ) * Use fsspec and torch for embedding file * Fixup * Fix load and save files * Fix compute embedding script * Set use_cuda to true if available * Add dummy speakers.pth file * Make style * Change default speakers file extension Co-authored-by: WeberJulian <julian.weber@hotmail.fr>	2022-06-01 13:49:42 +02:00
Edresson Casanova	c6008e5235	Add audio length sampler balancer (#1561 ) * Add audio length sampler balancer * Add unit tests	2022-05-12 19:59:19 +02:00
Edresson Casanova	060e0f9368	Add EmbeddingManager and BaseIDManager (#1374 )	2022-03-31 13:41:16 +02:00
Eren Gölge	0870a4faa2	Make style (#1405 )	2022-03-16 12:13:55 +01:00
Edresson Casanova	917f417ac4	Add alphas to control language and speaker balancer (#1216 ) * Add alphas to control language and speaker balancer * Add docs for speaker and language samplers * Change the Samplers weights to float for save memory * Change the test_samplers to unittest format * Add get_sampler method in BaseTTS * Fix rebase issues * Add language and speaker samplers support for DDP training * Rename distributed sampler wrapper * Remove the DistributedSamplerWrapper and use the one from Trainer * Bugfix after rebase * Move the samplers config to tts config	2022-03-10 14:56:09 +01:00
Eren Gölge	fe656659be	Implement BaseTTS	2022-02-25 11:31:56 +01:00
Eren Gölge	424d04e4f6	Make stlye	2022-02-25 11:31:56 +01:00
Eren Gölge	35fc7270ff	Implement BaseTTS	2022-02-25 11:28:47 +01:00
Eren Gölge	1e219fef0a	Revert drop_last	2022-02-25 11:26:59 +01:00
Eren Gölge	8622226f3f	Make style	2022-02-25 11:26:59 +01:00
Eren Gölge	38314194e7	Set `drop_last`	2022-02-25 11:26:59 +01:00
Eren Gölge	ef63c99524	Implement `start_by_longest` option for TTSDatase	2022-02-25 11:26:18 +01:00
Eren Gölge	5176ae9e53	Fixes small compat. issues	2022-02-25 11:21:19 +01:00
Eren Gölge	18f726af65	Update ForwardTTS	2022-02-25 11:11:35 +01:00
Eren Gölge	452dbc43d8	Update imports for symbols -> characters	2022-02-25 11:05:06 +01:00
Eren Gölge	8071fa0020	Refactor GlowTTS model and recipe for TTSTokenizer	2022-02-25 11:05:06 +01:00
Eren Gölge	4cd690e4c1	Updates BaseTTS and configs	2022-02-25 10:57:35 +01:00
Eren Gölge	4597d4e5b6	Remove get_characters from BaseTTS	2022-02-25 10:48:03 +01:00
Eren Gölge	2d8ce98d2a	Update imports for symbols -> characters	2022-02-25 10:48:03 +01:00
Eren Gölge	9a95e15483	Refactor GlowTTS model and recipe for TTSTokenizer	2022-02-25 10:48:03 +01:00
Eren Gölge	d2525abe8c	Remove get_characters from BaseTTS	2022-02-25 10:48:03 +01:00
Eren Gölge	fbad17e084	Update imports for symbols -> characters	2022-02-25 10:48:02 +01:00
Eren Gölge	bd461ace33	Refactor GlowTTS model and recipe for TTSTokenizer	2022-02-25 10:45:24 +01:00
Eren Gölge	704dddcffa	Make style	2021-12-20 11:54:10 +00:00
WeberJulian	2bbcb558dc	Prevent weighted sampler use when num_gpus > 1	2021-12-20 11:54:10 +00:00
WeberJulian	74cedfac38	Revert init multispeaker change	2021-12-20 11:54:10 +00:00
WeberJulian	6b03943526	Move multilingual logic out of the trainer	2021-12-20 11:54:10 +00:00
WeberJulian	e8af6a9f08	Fix use_speaker_embedding logic	2021-12-20 11:54:10 +00:00
WeberJulian	1472b6df49	make style	2021-12-20 11:54:10 +00:00
WeberJulian	3b5592abcf	fix test vits	2021-12-20 11:54:10 +00:00
WeberJulian	005bba60b0	get_speaker_weighted_sampler	2021-12-20 11:54:10 +00:00
Edresson	76251b619a	Fix d-vector multispeaker training bug	2021-12-20 11:54:09 +00:00
Edresson	ac9416fb86	Add multilingual inference support	2021-12-20 11:54:09 +00:00
Edresson	dcb2374bc9	Add multilingual training support to the VITS model	2021-12-20 11:54:09 +00:00
Edresson	f996afedb0	Implement multilingual dataloader support	2021-12-20 11:54:09 +00:00
Eren Gölge	2ed9e3c241	Fix constant use of noise augment	2021-11-08 09:20:34 +01:00
Eren Gölge	2b7d159383	Update BaseTTS for multi-speaker training	2021-10-21 16:29:06 +00:00
Eren Gölge	7c2cb7cc30	Update BaseTTS	2021-10-20 18:18:22 +00:00
Eren Gölge	127571423c	Update multi-speaker init in BaseTTS	2021-10-18 08:54:41 +00:00

1 2

71 Commits