coqui-tts

Commit Graph

Author	SHA1	Message	Date
Eren Gölge	4186f42b21	Handle missing JA phonemizer (#2843 ) * Handle missing JA phonemizer * Make style	2023-08-07 13:19:38 +02:00
Eren Gölge	69f080eb47	Fix DelightfulTTS (#2823 ) * Fix tests * Make style	2023-07-31 13:52:45 +02:00
Aleś Bułojčyk	d124f78430	Recipe for Belarusian TTS (#2756 ) * Changes from jhlfrfufyfn <jhlfrfufyfn@gmail.com> * Recipe for Belarusian TTS --------- Co-authored-by: jhlfrfufyfn <jhlfrfufyfn@gmail.com>	2023-07-31 10:26:21 +02:00
Eren Gölge	8aacb81849	Fix Tortoise load (#2791 ) * Remove key prunning in tortoise * Make lint	2023-07-24 13:42:47 +02:00
logan hart	6fdb88f8e2	Add Delightful-TTS implementation (#2095 ) * add configs * Update config file * Add model configs * Add model layers * Add layer files * Add layer modules * change config names * Add emotion manager * fIX missing ap bug * Fix missing ap bug * Add base TTS e2e class * Fix wrong variable name in load_tts_samples * Add training script * Remove range predictor and gaussian upsampling * Add helper function * Add vctk recipe * Add conformer docs * Fix linting in conformer.py * Add Docs * remove duplicate import * refactor args * Fix bugs * Removew emotion embedding * remove unused arg * Remove emotion embedding arg * Remove emotion embedding arg * fix style issues * Fix bugs * Fix bugs * Add unittests * make style * fix formatter bug * fix test * Add pyworld compute pitch func * Update requirments.txt * Fix dataset Bug * Chnge layer norm to instance norm * Add missing import * Remove emotions.py * remove ssim loss * Add init layers func to aligner * refactor model layers * remove audio_config arg * Rename loss func * Rename to delightful-tts * Rename loss func * Remove unused modules * refactor imports * replace audio config with audio processor * Add change sample rate option * remove broken resample func * update recipe * fix style, add config docs * fix tests and multispeaker embd dim * remove pyworld * Make style and fix inference * Split tts tests * Fixup * Fixup * Fixup * Add argument names * Set "random" speaker in the model Tortoise/Bark * Use a diff f0_cache path for delightfull tts * Fix delightful speaker handling * Fix lint * Make style --------- Co-authored-by: loganhart420 <loganartpersonal@gmail.com> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-07-24 13:41:26 +02:00
Paul O'Leary McCann	c0aabb8596	Make Japanese-specific dependencies optional (#2776 ) * Don't install MeCab by default * Add optional [ja] deps, like [dev] etc * Add JA requirements file * Add JA requirements to requirements_all This should help the tests run.	2023-07-24 11:28:27 +02:00
Eren G??lge	17ac188958	Drop fairseq for Hubert	2023-06-26 19:27:48 +02:00
Eren Gölge	e785d101a1	Port Fairseq TTS models (#2628 ) * Load fairseq models * Add docs and missing files * Managing fairseq models and docs for API * Make style * Use scarf URL * Add tests * Fix URL * Pass cpu * Make lint * Fixup * Make lint * fixup * Fixup * Change tokenization order * Update README * Fixup * Fixup	2023-06-05 11:15:13 +02:00
Eren Gölge	4de797bb11	Draft ONNX export for VITS (#2563 ) * Draft ONNX export for VITS Could not get it work to output variable length sequence * Fixup for onnx constant output * Make style * Remove commented code	2023-05-16 01:07:56 +02:00
manmay nakhashi	a3d5801c44	Tortoise TTS inference (#2547 ) * initial commit * Tortoise inference * revert path change * style fix * remove accidental remove * style fixes * style fixes * removed unwanted assests and deps * remove changes * remove cvvp * style fix black * added tortoise config and updated config and args, refactoring the code * added tortoise to api * Pull mel_norm from url * Use TTS cleaners * Let download model files * add ability to pass tortoise presets through coqui api * fix tests * fix style and tests * fix tts commandline for tortoise * Add config.json to tortoise * Use kwargs * Use regular model api for loading tortoise * Add load from dir to synthesizer * Fix Tortoise floats * Use model_dir when there are multiple urls * Use `synthesize` when exists * lint fixes and resolve preset bug * resolve a download bug and update model link * fix json * do tortoise inference from voice dir * fix * fix test * fix speaker id and remove assests * update inference_tests.yml * replace inference_test.yml * fix extra dir as None * fix tests * remove space * Reformat docstring * Add docs * Update docs * lint fixes --------- Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-05-16 00:58:21 +02:00
Eren Gölge	1a6a5710fd	Make lint	2023-04-17 15:02:56 +02:00
Eren Gölge	2533a18d62	Add BN tests	2023-04-17 13:37:10 +02:00
Eren Gölge	2d49c05259	Remove import	2023-04-17 13:05:29 +02:00
Eren Gölge	cd83991067	Add BN phonemizer	2023-04-17 12:54:00 +02:00
Eren Gölge	914280a556	Bump up to v0.11.0 (#2329 ) * Make style * Bump up to v0.11.0	2023-02-08 13:58:49 +01:00
Martin Weinelt	994be163e1	Use packaging.version for version comparisons (#2310 ) * Use packaging.version for version comparisons The distutils package is deprecated¹ and relies on PEP 386² version comparisons, which have been superseded by PEP 440³ which is implemented through the packaging module. With more recent distutils versions, provided through setuptools vendoring, we are seeing the following exception during version comparisons: > TypeError: '<' not supported between instances of 'str' and 'int' This is fixed by this migration. [1] https://docs.python.org/3/library/distutils.html [2] https://peps.python.org/pep-0386/ [3] https://peps.python.org/pep-0440/ * Improve espeak version detection robustness On many modern systems espeak is just a symlink to espeak-ng. In that case looking for the 3rd word in the version output will break the version comparison, when it finds `text-to-speech:`, instead of a proper version. This will not break during runtime, where espeak-ng would be prioritized, but the phonemizer and tokenizer tests force the backend to `espeak`, which exhibits this breakage. This improves the version detection by simply looking for the version after the "text-to-speech:" token. * Replace distuils.copy_tree with shutil.copytree The distutils module is deprecated and slated for removal in Python 3.12. Its usage should be replaced, in this case by a compatible method from shutil.	2023-01-29 23:47:00 +01:00
Gerard Sant Muniesa	c59b3f75b8	Add Catalan text cleaners for Catalan support (#2295 )	2023-01-23 11:56:30 +01:00
Eren Gölge	497f22b20b	Cache speaker encoder model (#2284 )	2023-01-23 11:49:51 +01:00
manmay nakhashi	bc422f2f3c	Fastspeech2 (#2073 ) * added EnergyDataset * add energy to Dataset * add comupte_energy * added energy params * added energy to forward_tts * added plot_avg_energy for visualisation * Update forward_tts.py * create file * added fastspeech2 recipe * add fastspeech2 config * removed energy from fast pitch * add energy loss to forward tts * Update fastspeech2_config.py * change run_name * Update numpy_transforms.py * fix typo * fix typo * fix typo * linting issues * use_energy default value --> False * Update numpy_transforms.py * linting fixes * fix typo * liniting_fix * liniting_fix * fix * fixes * fixes * lint fix * lint fixws * added training test * wrong import * wrong import * trailing whitespace * style fix * changed class name because of error * class name change * class name change * change class name * fixed styles	2023-01-15 22:39:22 +01:00
Khalid Bashir	42afad5e79	Fixed bug related to yourtts speaker embeddings issue (#2234 ) * Fixed bug related to yourtts speaker embeddings issue * Reverted code for base_tts * Bug fix on VITS d_vector_file type * Ignore the test speakers on YourTTS recipe * Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference * Update YourTTS config file * Update ModelManager._update_path to deal with list attributes * Fix lint checks * Remove unused code * Fix unit tests * Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files * Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes Co-authored-by: Edresson Casanova <edresson1@gmail.com>	2023-01-02 14:20:02 +01:00
Julian Weber	a07397733b	Multilingual tokenizer (#2229 ) * Implement multilingual tokenizer * Add multi_phonemizer receipe * Fix lint * Add TestMultiPhonemizer * Fix lint * make style	2023-01-02 10:03:19 +01:00
Shivam Mehta	3b8b105b0d	Adding OverFlow (#2183 ) * Adding encoder * currently modifying hmm * Adding hmm * Adding overflow * Adding overflow setting up flat start * Removing runs * adding normalization parameters * Fixing models on same device * Training overflow and plotting evaluations * Adding inference * At the end of epoch the test sentences are coming on cpu instead of gpu * Adding figures from model during training to monitor * reverting tacotron2 training recipe * fixing inference on gpu for test sentences on config * moving helpers and texts within overflows source code * renaming to overflow * moving loss to the model file * Fixing the rename * Model training but not plotting the test config sentences's audios * Formatting logs * Changing model name to camelcase * Fixing test log * Fixing plotting bug * Adding some tests * Adding more tests to overflow * Adding all tests for overflow * making changes to camel case in config * Adding information about parameters and docstring * removing compute_mel_statistics moved statistic computation to the model instead * Added overflow in readme * Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json	2022-12-12 12:44:15 +01:00
Eren Gölge	fdeefcc612	Handle espeak 1.48.15 (#2203 )	2022-12-12 11:23:45 +01:00
Julian Weber	896e46d0e5	Fix vc (#1971 )	2022-09-16 12:01:26 +02:00
Eren Gölge	b95cf3363c	Prevent installing mecab-ko (#1967 )	2022-09-14 10:28:07 +02:00
Eren Gölge	9e5a469c64	d-vector handling (#1945 ) * Update BaseDatasetConfig - Add dataset_name - Chane name to formatter_name * Update compute_embedding - Allow entering dataset by args - Use released model by default - Use the new key format * Update loading * Update recipes * Update other dep code * Update tests * Fixup * Load multiple embedding files * Fix argument names in dep code * Update docs * Fix argument name * Fix linter	2022-09-13 14:10:33 +02:00
happylittlecat	4546b4cbd8	Add espeak support for Chinese (#1905 ) * fix description * add espeak support for chinese * add espeak support for chinese	2022-09-08 12:32:41 +02:00
harmlessman	5abbe56642	Korean Phonemizer (#1822 ) * Update requirements.txt install jamo for korean * Update formatters.py add KSS formatter KSS is a korean single speech dataset (12hours) * Add files via upload add phonemizer for korean * Add files via upload add korean phonemizer * Update requirements.txt * change code style with `black` and `pylint` * reflecting pylint's Evaluation * reflecting pylint's Evaluation * reflecting pylint's Evaluation-2 * isort * edit about separator write test case and add 'nltk' for requirements.txt * add korean g2p (g2pkk) * isort * TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name) TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return) * black	2022-09-08 12:06:07 +02:00
rbaraglia	75ac9e3f0c	Fix language flags generated by espeak-ng phonemizer (#1801 ) * fix language flags generated by espeak-ng phonemizer * Style * Updated language flag regex to consider all language codes alike	2022-08-07 13:57:40 +02:00
Eren G??lge	7d8b1665c8	Fix rand_segment edge case (input_len == seg_len - 1)	2022-08-01 11:37:45 +02:00
Eren Gölge	49bac724c0	Implement VitsAudioConfig (#1556 ) * Implement VitsAudioConfig * Update VITS LJSpeech recipe * Update VITS VCTK recipe * Make style * Add missing decorator * Add missing param * Make style * Update recipes * Fix test * Bug fix * Exclude tests folder * Make linter * Make style	2022-07-12 18:49:58 +02:00
Eren G??lge	48a4f3647f	Make lint	2022-07-12 14:58:26 +02:00
Eren G??lge	2cf89b88c9	Make style	2022-07-12 14:12:57 +02:00
Eren G??lge	c17ff17a18	Fix SSIM loss	2022-07-12 12:35:24 +02:00
WeberJulian	5cef6facb0	Fix tokenizer for punc only (#1717 )	2022-07-06 22:59:41 +02:00
Eren Gölge	f70e82cd19	Use fsspec and torch for embedding file IO (#1581 ) * Use fsspec and torch for embedding file * Fixup * Fix load and save files * Fix compute embedding script * Set use_cuda to true if available * Add dummy speakers.pth file * Make style * Change default speakers file extension Co-authored-by: WeberJulian <julian.weber@hotmail.fr>	2022-06-01 13:49:42 +02:00
André R. de Miranda	3b84ef9524	Fixed use_cuda issue in compute_embeddings.py Added use_cuda argument in self.init_encoder method	2022-05-20 12:46:46 -03:00
a-froghyar	8be21ec387	Capacitron (#977 ) * new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2022-05-20 16:17:11 +02:00
Edresson Casanova	c6008e5235	Add audio length sampler balancer (#1561 ) * Add audio length sampler balancer * Add unit tests	2022-05-12 19:59:19 +02:00
Edresson Casanova	a97eed696a	Fix the bug in eSpeak wrapper for eSpeak version 1.48.15 (#1560 )	2022-05-12 15:15:18 +02:00
Eren Gölge	c3f8c4d5eb	Return default SpeakerManager if no d_vector_file	2022-05-11 11:31:45 +02:00
Eren Gölge	121e9ed685	Pass use_cuda to init_encoder	2022-05-11 11:31:17 +02:00
Edresson Casanova	060e0f9368	Add EmbeddingManager and BaseIDManager (#1374 )	2022-03-31 13:41:16 +02:00
WeberJulian	c66a6241fd	Enforce phonemizer definition for synthesis (#1441 ) * Enforce phonemizer definition for synthesis * Fix train_tts, tokenizer init can now edit config * Add small change to trigger CI pipeline * fix wrong output path for one tts_test * Fix style * Test config overides by args and tokenizer * Fix style	2022-03-25 23:15:33 +01:00
Eren Gölge	1c3623af33	Fix model manager (#1436 ) * Fix manager * Make style	2022-03-23 12:57:14 +01:00
Eren Gölge	0870a4faa2	Make style (#1405 )	2022-03-16 12:13:55 +01:00
WeberJulian	690c96ed28	Fix default phonemizer for ja and zh (#1399 )	2022-03-16 12:13:22 +01:00
Edresson Casanova	f81892483d	REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support (#1349 ) * Rename Speaker encoder module to encoder * Add a generic emotion dataset formatter * Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config * Add class map in emotion config * Add Base encoder config * Add evaluation encoder script * Fix the bug in plot_embeddings * Enable Weight decay for encoder training * Add argumnet to disable storage * Add Perfect Sampler and remove storage * Add evaluation during encoder training * Fix lint checks * Remove useless config parameter * Active evaluation in speaker encoder test and use multispeaker dataset for this test * Unit tests fixs * Remove useless tests for speedup the aux_tests * Use get_optimizer in Encoder * Add BaseEncoder Class * Fix the unitests * Add Perfect Batch Sampler unit test * Add compute encoder accuracy in a function	2022-03-11 14:43:40 +01:00
Edresson Casanova	dbe9da7f15	Add Voice conversion inference support (#1337 ) * Add support for voice conversion inference * Cache d_vectors_by_speaker for fast inference using a bigger speakers.json * Rebase bug fix * Use the average d-vector for inference	2022-03-10 14:57:12 +01:00
Edresson Casanova	917f417ac4	Add alphas to control language and speaker balancer (#1216 ) * Add alphas to control language and speaker balancer * Add docs for speaker and language samplers * Change the Samplers weights to float for save memory * Change the test_samplers to unittest format * Add get_sampler method in BaseTTS * Fix rebase issues * Add language and speaker samplers support for DDP training * Rename distributed sampler wrapper * Remove the DistributedSamplerWrapper and use the one from Trainer * Bugfix after rebase * Move the samplers config to tts config	2022-03-10 14:56:09 +01:00

1 2 3 4 5 ...

358 Commits