coqui-tts

Commit Graph

Author	SHA1	Message	Date
Jindrich Matousek	0938f1cfa1	Merge branch 'coqui-ai:main' into main	2023-09-04 14:27:19 +02:00
Jake Tae	b79b6f0762	feature: add device flag to tts cli (#2875 )	2023-08-28 11:20:12 +02:00
Jindrich Matousek	4085a229fe	Merge remote-tracking branch 'upstream/main'	2023-08-21 17:22:20 +01:00
Eren Gölge	3a104d5c49	Update Studio API for XTTS (#2861 ) * Update Studio API for XTTS * Update the docs * Update README.md * Update README.md Update README	2023-08-13 12:04:12 +02:00
logan hart	6fdb88f8e2	Add Delightful-TTS implementation (#2095 ) * add configs * Update config file * Add model configs * Add model layers * Add layer files * Add layer modules * change config names * Add emotion manager * fIX missing ap bug * Fix missing ap bug * Add base TTS e2e class * Fix wrong variable name in load_tts_samples * Add training script * Remove range predictor and gaussian upsampling * Add helper function * Add vctk recipe * Add conformer docs * Fix linting in conformer.py * Add Docs * remove duplicate import * refactor args * Fix bugs * Removew emotion embedding * remove unused arg * Remove emotion embedding arg * Remove emotion embedding arg * fix style issues * Fix bugs * Fix bugs * Add unittests * make style * fix formatter bug * fix test * Add pyworld compute pitch func * Update requirments.txt * Fix dataset Bug * Chnge layer norm to instance norm * Add missing import * Remove emotions.py * remove ssim loss * Add init layers func to aligner * refactor model layers * remove audio_config arg * Rename loss func * Rename to delightful-tts * Rename loss func * Remove unused modules * refactor imports * replace audio config with audio processor * Add change sample rate option * remove broken resample func * update recipe * fix style, add config docs * fix tests and multispeaker embd dim * remove pyworld * Make style and fix inference * Split tts tests * Fixup * Fixup * Fixup * Add argument names * Set "random" speaker in the model Tortoise/Bark * Use a diff f0_cache path for delightfull tts * Fix delightful speaker handling * Fix lint * Make style --------- Co-authored-by: loganhart420 <loganartpersonal@gmail.com> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-07-24 13:41:26 +02:00
PiaoYang	630327c4e6	Update compute_embeddings.py (#2668 ) * [Typo] Fix variable name. More readable description. Update train_yourtts.py Reformat. Reformat using black again. * Add `old_append`. Fix bool argparse. * Reformat.	2023-07-04 11:37:47 +02:00
Jindrich Matousek	b761d488a7	Merge branch 'coqui-ai:main' into main	2023-07-03 08:45:07 +02:00
Eren Gölge	c844b6570a	Inference API for 🐶Bark (#2685 ) * Add bark requirements * Draft Bark implementation * Download HF models * Update synthesizer * Add bark model * Make style * Update pylintrc * Update model URLs * Update Bark Config * Fix here and ther * Make style * Make lint * Update requirements * Update requirements	2023-06-28 11:55:27 +02:00
Eren Gölge	8e415732dd	Fixup	2023-06-06 09:41:46 +02:00
Eren Gölge	547a72c97d	Fixup	2023-06-05 22:38:56 +02:00
Eren Gölge	50b1074779	Make `tts` ready	2023-06-05 11:29:10 +02:00
Jindrich Matousek	1476c5203f	Merge branch 'coqui-ai:main' into main	2023-05-16 15:47:14 +02:00
manmay nakhashi	a3d5801c44	Tortoise TTS inference (#2547 ) * initial commit * Tortoise inference * revert path change * style fix * remove accidental remove * style fixes * style fixes * removed unwanted assests and deps * remove changes * remove cvvp * style fix black * added tortoise config and updated config and args, refactoring the code * added tortoise to api * Pull mel_norm from url * Use TTS cleaners * Let download model files * add ability to pass tortoise presets through coqui api * fix tests * fix style and tests * fix tts commandline for tortoise * Add config.json to tortoise * Use kwargs * Use regular model api for loading tortoise * Add load from dir to synthesizer * Fix Tortoise floats * Use model_dir when there are multiple urls * Use `synthesize` when exists * lint fixes and resolve preset bug * resolve a download bug and update model link * fix json * do tortoise inference from voice dir * fix * fix test * fix speaker id and remove assests * update inference_tests.yml * replace inference_test.yml * fix extra dir as None * fix tests * remove space * Reformat docstring * Add docs * Update docs * lint fixes --------- Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-05-16 00:58:21 +02:00
Eren Gölge	9b5822d625	Update VAD for silence trimming. (#2604 ) * Update vad for mp3 and fault tolerance * Make style * Remove importt * Remove stupid defaults	2023-05-11 11:09:23 +02:00
Jindrich Matousek	e11392a8e6	Merge branch 'coqui-ai:main' into main	2023-04-14 17:09:57 +02:00
Eren Gölge	dba5cec497	Merge pull request #2509 from coqui-ai/update_vad Update VAD	2023-04-13 19:35:17 +02:00
Eren Gölge	5a9bda13f3	Make style	2023-04-13 14:19:06 +02:00
Eren Gölge	c9375e4b8b	Make style	2023-04-13 14:17:06 +02:00
Eren Gölge	758ef84cc2	Using 🐸Studio models with `tts` command	2023-04-13 14:14:41 +02:00
Jindrich Matousek	a60b423f76	Merge remote-tracking branch 'upstream/main'	2023-04-13 13:21:06 +02:00
Eren G??lge	537dc0e933	Update VAD	2023-04-13 00:39:46 +02:00
Eren Gölge	d309f50e53	Implement FreeVC (#2451 ) * Update .gitignore * Draft FreeVC implementation * Tests and relevant updates * Update API tests * Add missings * Update requirements * :( * Lazy handle for vc * Update docs for voice conversion * Make style	2023-03-25 18:33:23 +01:00
Jindrich Matousek	c04f5046a2	Delete `synthesize_file.py` as it was moved to Coqui-TTS_utils repo	2023-03-16 15:42:52 +01:00
Jindrich Matousek	cbdec704dc	Merge branch 'coqui-ai:main' into main	2023-02-20 08:32:10 +01:00
Jindrich Matousek	7761e5cee0	Merge branch 'coqui-ai:main' into main	2023-02-08 15:17:28 +01:00
Eren Gölge	914280a556	Bump up to v0.11.0 (#2329 ) * Make style * Bump up to v0.11.0	2023-02-08 13:58:49 +01:00
Martin Weinelt	994be163e1	Use packaging.version for version comparisons (#2310 ) * Use packaging.version for version comparisons The distutils package is deprecated¹ and relies on PEP 386² version comparisons, which have been superseded by PEP 440³ which is implemented through the packaging module. With more recent distutils versions, provided through setuptools vendoring, we are seeing the following exception during version comparisons: > TypeError: '<' not supported between instances of 'str' and 'int' This is fixed by this migration. [1] https://docs.python.org/3/library/distutils.html [2] https://peps.python.org/pep-0386/ [3] https://peps.python.org/pep-0440/ * Improve espeak version detection robustness On many modern systems espeak is just a symlink to espeak-ng. In that case looking for the 3rd word in the version output will break the version comparison, when it finds `text-to-speech:`, instead of a proper version. This will not break during runtime, where espeak-ng would be prioritized, but the phonemizer and tokenizer tests force the backend to `espeak`, which exhibits this breakage. This improves the version detection by simply looking for the version after the "text-to-speech:" token. * Replace distuils.copy_tree with shutil.copytree The distutils module is deprecated and slated for removal in Python 3.12. Its usage should be replaced, in this case by a compatible method from shutil.	2023-01-29 23:47:00 +01:00
Jindrich Matousek	f278da4fc9	Merge branch 'coqui-ai:main' into main	2022-12-28 14:12:58 +01:00
Edresson Casanova	3b1a28fa95	Add YourTTS VCTK recipe (#2198 ) * Add YourTTS VCTK recipe * Fix lint * Add compute_embeddings and resample_files functions to be able to reuse it * Add automatic download and speaker embedding computation for YourTTS VCTK recipe * Add parameter for eval metadata file on compute embeddings function	2022-12-12 16:14:25 +01:00
Jindrich Matousek	5c0d71c746	Merge remote-tracking branch 'upstream/main'	2022-12-03 17:29:38 +01:00
Eren Gölge	8cb1433e6e	Cache fsspec downloads (#2132 ) * Cache fsspec downloaded files * Use diff paths for test * Make fsspec caching optional * Decom GPU docker tests * Make progress bar optional for better CI log * Check path local	2022-11-09 22:12:48 +01:00
Edresson Casanova	f3b947e706	Minors bug fixes on VITS/YourTTS and inference (#2054 ) * Set the right device to the speaker encoder * Bug fix on inference list_language_idxs parameter * Bug fix on speaker encoder resample audio transform	2022-10-06 22:23:54 +02:00
Eren Gölge	5f5d441ee5	Write non-speech files in a TXT (#2048 ) * Write non-speech files in a txt * Save 16-bit wav out of vad	2022-10-06 13:25:54 +02:00
Edresson Casanova	3faccbda97	Fix dataset handling with the new embedding file keys (#1991 )	2022-09-19 23:44:14 +02:00
Eren Gölge	0a112f7841	Add metafile arg (#1977 )	2022-09-16 14:41:49 +02:00
Eren Gölge	9e5a469c64	d-vector handling (#1945 ) * Update BaseDatasetConfig - Add dataset_name - Chane name to formatter_name * Update compute_embedding - Allow entering dataset by args - Use released model by default - Use the new key format * Update loading * Update recipes * Update other dep code * Update tests * Fixup * Load multiple embedding files * Fix argument names in dep code * Update docs * Fix argument name * Fix linter	2022-09-13 14:10:33 +02:00
Edresson Casanova	159eeeef64	Fix find unique phonemes script (#1928 ) * Fix find unique phonemes script * Fix unit tests	2022-09-08 10:17:35 +02:00
Jindrich Matousek	97de55595f	Merge remote-tracking branch 'upstream/main'	2022-08-24 12:21:18 +02:00
Eren Gölge	946afa8197	v0.8.0 (#1810 ) * Fix checkpointing GAN models (#1641) * checkpoint sae step crash fix * checkpoint save step crash fix * Update gan.py updated requested changes * crash fix * Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469) Co-authored-by: Eren Gölge <erogol@hotmail.com> * Fix Publish CI (#1597) * Try out manylinux * temporary removal of useless pipeline * remove check and use only manylinux * Try --plat-name * Add install requirements * Add back other actions * Add PR trigger * Remove conditions * Fix sythax * Roll back some changes * Add other python versions * Add test pypi upload * Add username * Add back __token__ as username * Modify name of entry to testpypi * Set it to release only * Fix version checking * Fix tokenizer for punc only (#1717) * Remove redundant config field * Fix SSIM loss * Separate loss tests * Fix BCELoss adressing #1192 * Make style * Add durations as aux input for VITS (#1694) * Add durations as aux input for VITS * Make style * Fix tts_tests * Fix test_get_aux_input * Make lint * feat: updated recipes and lr fix (#1718) - updated the recipes activating more losses for more stable training - re-enabling guided attention loss - fixed a bug about not the correct lr fetched for logging * Implement VitsAudioConfig (#1556) * Implement VitsAudioConfig * Update VITS LJSpeech recipe * Update VITS VCTK recipe * Make style * Add missing decorator * Add missing param * Make style * Update recipes * Fix test * Bug fix * Exclude tests folder * Make linter * Make style * Fix device allocation * Fix SSIM loss correction * Fix aux tests (#1753) * Set n_jobs to 1 for resample script * Delete resample test * Set n_jobs 1 in vad test * delete vad test * Revert "Delete resample test" This reverts commit `bb7c8466af`. * Remove tests with resample * Fix for FloorDiv Function Warning (#1760) * Fix for Floor Function Warning Fix for Floor Function Warning * Adding double quotes to fix formatting Adding double quotes to fix formatting * Update glow_tts.py * Update glow_tts.py * Fix type in download_vctk.sh (#1739) typo in comment * Update decoder.py (#1792) Minor comment correction. * Update requirements.txt (#1791) Support for #1775 * Update README.md (#1776) Fix typo in different and code sample * Fix & update WaveRNN vocoder model (#1749) * Fixes KeyError bug. Adding logging to dashboard. * Make pep8 compliant * Make style compliant * Still fixing style * Fix rand_segment edge case (input_len == seg_len - 1) * Update requirements.txt; inflect==5.6 (#1809) New inflect version (6.0) depends on pydantic which has some issues irrelevant to 🐸 TTS. #1808 Force inflect==5.6 (pydantic free) install to solve dependency issue. * Update README.md; download progress bar in CLI. (#1797) * Update README.md - minor PR - added model_info usage guide based on #1623 in README.md . * "added tqdm bar for model download" * Update manage.py * fixed style * fixed style * sort imports * Update wavenet.py (#1796) * Update wavenet.py Current version does not use "in_channels" argument. In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor. However, since it is a generic implementation, I believe it is better to update it for a more general use. * "in_channels -> hidden_channels" * Adjust default to be able to process longer sentences (#1835) Running `tts --text "$text" --out_path …` with a somewhat longer sentences in the text will lead to warnings like “Decoder stopped with max_decoder_steps 500” and the sentences just being cut off in the resulting WAV file. This happens quite frequently when feeding longer texts (e.g. a blog post) to `tts`. It's particular frustrating since the error is not always obvious in the output. You have to notice that there are missing parts. This is something other users seem to have run into as well [1]. This patch simply increases the maximum number of steps allowed for the tacotron decoder to fix this issue, resulting in a smoother default behavior. [1] https://github.com/mozilla/TTS/issues/734 * Fix language flags generated by espeak-ng phonemizer (#1801) * fix language flags generated by espeak-ng phonemizer * Style * Updated language flag regex to consider all language codes alike * fix get_random_embeddings --> get_random_embedding (#1726) * fix get_random_embeddings --> get_random_embedding function typo leads to training crash, no such function * fix typo get_random_embedding * Introduce numpy and torch transforms (#1705) * Refactor audio processing functions * Add tests for numpy transforms * Fix imports * Fix imports2 * Implement bucketed weighted sampling for VITS (#1871) * Update capacitron_layers.py (#1664) crashing because of dimension miss match at line no. 57 [batch, 256] vs [batch , 1, 512] enc_out = torch.cat([enc_out, speaker_embedding], dim=-1) * updates to dataset analysis notebooks for compatibility with latest version of TTS (#1853) * Fix BCE loss issue (#1872) * Fix BCE loss issue * Remove import * Remove deprecated files (#1873) - samplers.py is moved - distribute.py is replaces by the 👟Trainer * Handle when no batch sampler (#1882) * Fix tune wavegrad (#1844) * fix imports in tune_wavegrad * load_config returns Coqpit object instead None * set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program * fix var order in the result of batch collating * make style * make style with black and isort * Bump up to v0.8.0 * Add new DE Thorsten models (#1898) - Tacotron2-DDC - HifiGAN vocoder Co-authored-by: manmay nakhashi <manmay.nakhashi@gmail.com> Co-authored-by: camillem <camillem@users.noreply.github.com> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: a-froghyar <adamfroghyar@gmail.com> Co-authored-by: ivan provalov <iprovalo@yahoo.com> Co-authored-by: Tsai Meng-Ting <sarah13680@gmail.com> Co-authored-by: p0p4k <rajiv.punmiya@gmail.com> Co-authored-by: Yuri Pourre <yuripourre@users.noreply.github.com> Co-authored-by: vanIvan <alfa1211@gmail.com> Co-authored-by: Lars Kiesow <lkiesow@uos.de> Co-authored-by: rbaraglia <baraglia.r@live.fr> Co-authored-by: jchai.me <jreus@users.noreply.github.com> Co-authored-by: Stanislav Kachnov <42406556+geth-network@users.noreply.github.com>	2022-08-22 14:54:38 +02:00
Stanislav Kachnov	2c9f00a808	Fix tune wavegrad (#1844 ) * fix imports in tune_wavegrad * load_config returns Coqpit object instead None * set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program * fix var order in the result of batch collating * make style * make style with black and isort	2022-08-22 09:55:32 +02:00
Eren Gölge	bfc63829ac	Implement bucketed weighted sampling for VITS (#1871 )	2022-08-15 11:08:11 +02:00
camillem	5c821d9fa1	Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469 ) Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-06-27 10:32:43 +02:00
Jindrich Matousek	2f81e8701e	Merge branch 'main' of https://github.com/coqui-ai/TTS into coqui-ai-main	2022-06-24 16:06:48 +02:00
p0p4k	71281ff1e4	Add support for model_info in CLI (#1623 ) * model_info * model_info * model_info_by_idx and name * model_info_by_idx and name * model_info * Update manage.py * fixed linter * fixed linter * fixed linter * fixed linter * fixed return style checks * fixed linter * fixed linter * fixed idx always positive * added comments * fix parser.args check * fix parser.args check * Make style Co-authored-by: Eren G??lge <egolge@coqui.ai>	2022-06-20 23:28:17 +02:00
Eren Gölge	f70e82cd19	Use fsspec and torch for embedding file IO (#1581 ) * Use fsspec and torch for embedding file * Fixup * Fix load and save files * Fix compute embedding script * Set use_cuda to true if available * Add dummy speakers.pth file * Make style * Change default speakers file extension Co-authored-by: WeberJulian <julian.weber@hotmail.fr>	2022-06-01 13:49:42 +02:00
Jindrich Matousek	1104c47524	Change rights to executable	2022-05-25 15:46:43 +02:00
Jindrich Matousek	2b4dd71d5d	Fix concatenate audio files	2022-05-22 22:19:25 +02:00
a-froghyar	8be21ec387	Capacitron (#977 ) * new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-05-20 16:17:11 +02:00
Edresson Casanova	ee99a6c1e2	Fix voice conversion inference (#1583 ) * Add voice conversion zoo test * Fix style * Fix unit test	2022-05-20 15:50:25 +02:00
Jindrich Matousek	c273295333	Add the option to concatenate audio to a sinfle output wav file	2022-05-18 14:54:35 +02:00

1 2 3 4 5 ...

424 Commits