coqui-tts

Commit Graph

Author	SHA1	Message	Date
Eren G??lge	b75e90ba85	Make text splitting optional	2023-11-27 14:53:11 +01:00
Eren G??lge	3b8894a3dd	Make style	2023-11-27 14:15:50 +01:00
Eren G??lge	2fd8cf3d94	Make xtts runnable by version names	2023-11-27 14:15:16 +01:00
Enno Hermann	8c5227ed84	Fix tts_with_vc (#3275 ) * Revert "fix for issue 3067" This reverts commit `041b4b6723`. Fixes #3143. The original issue (#3067) was people trying to use tts.tts_with_vc_to_file() with XTTS and was "fixed" in #3109. But XTTS has integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there is no point in passing it through FreeVC afterwards. So, reverting this commit because it breaks tts.tts_with_vc_to_file() for any model that doesn't have integrated VC, i.e. all models this method is meant for. * fix: support multi-speaker models in tts_with_vc/tts_with_vc_to_file * fix: only compute spk embeddings for models that support it Fixes #1440. Passing a `speaker_wav` argument to regular Vits models failed because they don't support voice cloning. Now that argument is simply ignored.	2023-11-24 12:26:37 +01:00
Tessa Painter	64f391b583	Made the tqdm `progress_bar` objects of static download methods a static class variable (#3297 )	2023-11-24 12:23:59 +01:00
Enno Hermann	96678c7ba2	refactor: use copy_model_files() from Trainer	2023-11-17 01:18:23 +01:00
Enno Hermann	5119e651a1	chore(utils.io): remove unused code These are all available in Trainer.	2023-11-17 01:18:23 +01:00
Enno Hermann	39fe38bda4	refactor: use save_fsspec() from Trainer	2023-11-17 01:18:23 +01:00
Enno Hermann	3c2d5a9e03	Remove duplicate AudioProcessor code and fix ExtractTTSpectrogram.ipynb (#3230 ) * chore: remove unused argument * refactor(audio.processor): remove duplicate stft+griffin_lim * chore(audio.processor): remove unused compute_stft_paddings Same function available in numpy_transforms * refactor(audio.processor): remove duplicate db_to_amp * refactor(audio.processor): remove duplicate amp_to_db * refactor(audio.processor): remove duplicate linear_to_mel * refactor(audio.processor): remove duplicate mel_to_linear * refactor(audio.processor): remove duplicate build_mel_basis * refactor(audio.processor): remove duplicate stft_parameters * refactor(audio.processor): use pre-/deemphasis from numpy_transforms * refactor(audio.processor): use rms_volume_norm from numpy_transforms * chore(audio.processor): remove duplicate assert Already checked in numpy_transforms.compute_f0 * refactor(audio.processor): use find_endpoint from numpy_transforms * refactor(audio.processor): use trim_silence from numpy_transforms * refactor(audio.processor): use volume_norm from numpy_transforms * refactor(audio.processor): use load_wav from numpy_transforms * fix(bin.extract_tts_spectrograms): set quantization bits * fix(ExtractTTSpectrogram.ipynb): adapt to current TTS code Fixes #2447, #2574 * refactor(audio.processor): remove duplicate quantization methods	2023-11-16 10:57:06 +01:00
Enno Hermann	99edd6daa3	Fix ModelManager.list_models() (#3128 ) * fix(utils.manage): remove hard-coded model_type variable * refactor(utils.manage): address lint issues, fix typos Addressed the following: TTS/utils/manage.py:307:12: R1705: Unnecessary "else" after "return" (no-else-return) TTS/utils/manage.py:308:21: W1514: Using open without explicitly specifying an encoding (unspecified-encoding) TTS/utils/manage.py:299:4: R1710: Either all return statements in a function should return an expression, or none of them should. (inconsistent-return-statements) TTS/utils/manage.py:299:4: R0201: Method could be a function (no-self-use) TTS/utils/manage.py:314:4: R0201: Method could be a function (no-self-use)	2023-11-08 11:29:01 +01:00
Edresson Casanova	e45227d9ff	XTTS v2.0 (#3137 ) * Implement most similar ref training approach * Use non-enhanced hifigan for test samples * Add Perceiver * Update GPT Trainer for perceiver support * Update XTTS docs * Bug fix masking with XTTS perceiver * Bug fix on gpt forward * Bug Fix on XTTS v2.0 training * Add XTTS v2.0 unit tests * Add XTTS v2.0 inference unit tests * Bug Fix on diffusion inference * Add XTTS v2.0 training recipe * Placeholder model entry * Add cloning params to config * Make prompt embedding configurable * Make cloning configurable * Cheap fix for a cheaper fix * Prevent resampling * Update model entry * Update docs * Update requirements * Code linting * Add xtts v2 to sep tests * Bug fix on XTTS get_gpt_cond_latents * Bug fix on rebase * Make style * Bug fix in Japenese tokenizer * Add num2words to deps * Remove unused kwarg and added num_beams=1 as default --------- Co-authored-by: Eren G??lge <egolge@coqui.ai>	2023-11-06 14:58:18 +01:00
Aarni Koskela	38f6f8f0bb	Run `make style` & re-enable it in CI (#3127 )	2023-11-06 11:36:37 +01:00
Julian Weber	cf97116185	XTTS v1.1 (#3089 ) * Add support for ne_hifigan * Update model.json * Update hash * Fix model loading * Enhance text_normalization * Add xtts to zoo test exception * Add model hash check * Add get_number_tokens	2023-10-20 16:02:08 +02:00
David Garvey	a151d70242	Add stdout option (#3027 ) * add add cli options for play and speed --play argument uses simpleaudio to play the tts wav --speed <float 0.0-2.0> passes speed argument to Coqui Studio models * remove simpleaudio not referenced in file * fix simpleaudio dependency version * add ALSA headers for simpleaudio compilation * Dockerfile ALSA headers for simpleaudio * base changes to use stdout instead of play audio Considering conversion to pipe wav data for audio playback with ohter program like aplay. This is incomplete code. Using to get feedback before proceeding with implementation. * remove play for pipe_out arg that suppresses stdout removed play and simpleaudio dependency in place of pipe fuctionality to allow passing wav file data to a program dedicated to playing audio. * scipy.io.wavfile.write fails with /dev/null target * Streaming inference for XTTS 🚀 (#3035) * v0.17.7 * Redownload XTTS with the local and remote config do not match * Remove unused method * Print a message when it is already donwloaded * Try-except to present error when the user dont have connection * Fix style * 0.17.8 * v0.17.8 --------- Co-authored-by: Julian Weber <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com> Co-authored-by: Edresson Casanova <edresson1@gmail.com> Co-authored-by: ggoknar <ggoknar@coqui.ai>	2023-10-16 12:07:21 +02:00
Dusty Hagstrom	13cd076a7f	Synthesizer skips over embeddings file if model only has one speaker (#2587 ) * It looks like the Neon model is special in that t does not have a speaker_name and it wants to get the only item available. This was blocking a valid model with one speaker and a d_vector_file from being executed to get the embedding. * Update synthesizer.py oh my how embarrassing	2023-10-16 11:55:45 +02:00
Edresson Casanova	2852404bdf	Fix style	2023-10-06 17:42:46 -03:00
Edresson Casanova	99650044a4	Try-except to present error when the user dont have connection	2023-10-06 17:37:05 -03:00
Edresson Casanova	529ea3f67f	Print a message when it is already donwloaded	2023-10-06 17:26:40 -03:00
Edresson Casanova	ee1ef1c51e	Remove unused method	2023-10-06 17:21:22 -03:00
Edresson Casanova	4a6103fec9	Redownload XTTS with the local and remote config do not match	2023-10-06 17:16:30 -03:00
Eren Gölge	bb05dcb9b4	Merge pull request #2922 from coqui-ai/be_tts Adding Belarusian TTS model	2023-09-27 09:48:28 +02:00
Eren G??lge	9d0b76ce23	Check env var for COQUI_TOS_AGREED	2023-09-14 17:51:40 +02:00
Eren G??lge	ded7fd4fb2	Make style	2023-09-14 15:23:37 +02:00
Eren G??lge	44b61d2b92	Fixup	2023-09-14 15:22:54 +02:00
Eren Gölge	623ea41634	Fix model tests (#2943 )	2023-09-14 15:21:48 +02:00
Eren Gölge	4033db5f4b	🔥 XTTS implementation	2023-09-13 17:51:24 +02:00
Eren Gölge	562a9509f2	Add BE model	2023-09-04 13:57:03 +02:00
Cohee	b3b1555d82	Fix exception handling in manage.py (#2912 )	2023-09-04 12:54:30 +02:00
Jake Tae	409db505d2	Add device support in TTS and Synthesizer (#2855 ) * fix: resolve merge conflicts * fix: retain backwards compatability in functions * feature: utilize device for voice transfer * feature: use device for vocoder * chore: cleanup vocoder cpu logic * fix: add necessary vocoder output device check * fix: add necessary vocoder output device check * fix: indentation * fix: check if waveform is pt tensor before cpu conversion --------- Co-authored-by: Jake Tae <jaketae@Jakes-MacBook-Pro-2.local>	2023-08-14 21:04:44 +02:00
Julian Weber	febcaf710a	Add customizable data home path (#2871 ) * Add customizable data home path * Add TTS_HOME as an option	2023-08-14 21:02:48 +02:00
Eren Gölge	3a104d5c49	Update Studio API for XTTS (#2861 ) * Update Studio API for XTTS * Update the docs * Update README.md * Update README.md Update README	2023-08-13 12:04:12 +02:00
Eren Gölge	17ddd65741	Please p3.11	2023-07-31 15:53:19 +02:00
Aleś Bułojčyk	d124f78430	Recipe for Belarusian TTS (#2756 ) * Changes from jhlfrfufyfn <jhlfrfufyfn@gmail.com> * Recipe for Belarusian TTS --------- Co-authored-by: jhlfrfufyfn <jhlfrfufyfn@gmail.com>	2023-07-31 10:26:21 +02:00
logan hart	6fdb88f8e2	Add Delightful-TTS implementation (#2095 ) * add configs * Update config file * Add model configs * Add model layers * Add layer files * Add layer modules * change config names * Add emotion manager * fIX missing ap bug * Fix missing ap bug * Add base TTS e2e class * Fix wrong variable name in load_tts_samples * Add training script * Remove range predictor and gaussian upsampling * Add helper function * Add vctk recipe * Add conformer docs * Fix linting in conformer.py * Add Docs * remove duplicate import * refactor args * Fix bugs * Removew emotion embedding * remove unused arg * Remove emotion embedding arg * Remove emotion embedding arg * fix style issues * Fix bugs * Fix bugs * Add unittests * make style * fix formatter bug * fix test * Add pyworld compute pitch func * Update requirments.txt * Fix dataset Bug * Chnge layer norm to instance norm * Add missing import * Remove emotions.py * remove ssim loss * Add init layers func to aligner * refactor model layers * remove audio_config arg * Rename loss func * Rename to delightful-tts * Rename loss func * Remove unused modules * refactor imports * replace audio config with audio processor * Add change sample rate option * remove broken resample func * update recipe * fix style, add config docs * fix tests and multispeaker embd dim * remove pyworld * Make style and fix inference * Split tts tests * Fixup * Fixup * Fixup * Add argument names * Set "random" speaker in the model Tortoise/Bark * Use a diff f0_cache path for delightfull tts * Fix delightful speaker handling * Fix lint * Make style --------- Co-authored-by: loganhart420 <loganartpersonal@gmail.com> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-07-24 13:41:26 +02:00
JiangCheng	53938e2d32	Squashed commit of the following: commit `dd612fd72e` Author: JiangCheng <jiangcheng@kezaihui.com> Date: Mon Jun 5 16:04:54 2023 +0800 Failed to download the file and need to delete the created file path	2023-07-05 12:08:05 +02:00
Eren G??lge	34b9a18c47	Fixup	2023-06-28 12:26:04 +02:00
Eren G??lge	6b9ebf5aab	Merge branch 'p3_11' into dev	2023-06-28 12:13:04 +02:00
Eren Gölge	c844b6570a	Inference API for 🐶Bark (#2685 ) * Add bark requirements * Draft Bark implementation * Download HF models * Update synthesizer * Add bark model * Make style * Update pylintrc * Update model URLs * Update Bark Config * Fix here and ther * Make style * Make lint * Update requirements * Update requirements	2023-06-28 11:55:27 +02:00
Eren G??lge	a1c431e6a9	Fixups	2023-06-26 12:55:18 +02:00
Eren G??lge	a58fb6c01b	Update requirements	2023-06-22 13:53:19 +02:00
Eren G??lge	e888e8a56d	Fix manage	2023-06-22 10:13:20 +02:00
Eren Gölge	fff8b762bc	Merge branch 'dev' into bark	2023-06-21 15:49:05 +02:00
Eren G??lge	0f8932a6a9	Fix here and ther	2023-06-21 11:59:27 +02:00
Eren G??lge	f4c88ed677	Make style	2023-06-19 14:22:32 +02:00
Eren G??lge	2364c38d16	Update synthesizer	2023-06-19 14:15:21 +02:00
Eren G??lge	5a31fad502	Download HF models	2023-06-19 14:14:04 +02:00
Eren Gölge	e785d101a1	Port Fairseq TTS models (#2628 ) * Load fairseq models * Add docs and missing files * Managing fairseq models and docs for API * Make style * Use scarf URL * Add tests * Fix URL * Pass cpu * Make lint * Fixup * Make lint * fixup * Fixup * Change tokenization order * Update README * Fixup * Fixup	2023-06-05 11:15:13 +02:00
Shukrullo Turgunov	0d5e68a09f	fix typo (#2647 ) * fix typo * typo fix	2023-06-05 09:58:16 +02:00
manmay nakhashi	a3d5801c44	Tortoise TTS inference (#2547 ) * initial commit * Tortoise inference * revert path change * style fix * remove accidental remove * style fixes * style fixes * removed unwanted assests and deps * remove changes * remove cvvp * style fix black * added tortoise config and updated config and args, refactoring the code * added tortoise to api * Pull mel_norm from url * Use TTS cleaners * Let download model files * add ability to pass tortoise presets through coqui api * fix tests * fix style and tests * fix tts commandline for tortoise * Add config.json to tortoise * Use kwargs * Use regular model api for loading tortoise * Add load from dir to synthesizer * Fix Tortoise floats * Use model_dir when there are multiple urls * Use `synthesize` when exists * lint fixes and resolve preset bug * resolve a download bug and update model link * fix json * do tortoise inference from voice dir * fix * fix test * fix speaker id and remove assests * update inference_tests.yml * replace inference_test.yml * fix extra dir as None * fix tests * remove space * Reformat docstring * Add docs * Update docs * lint fixes --------- Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: Eren Gölge <erogol@hotmail.com>	2023-05-16 00:58:21 +02:00
Eren Gölge	9b5822d625	Update VAD for silence trimming. (#2604 ) * Update vad for mp3 and fault tolerance * Make style * Remove importt * Remove stupid defaults	2023-05-11 11:09:23 +02:00

1 2 3 4 5 ...

358 Commits