Commit Graph

954 Commits

Author SHA1 Message Date
WeberJulian 5cd750ac7e Fix API and CI 2023-12-11 20:21:53 +01:00
WeberJulian e3c9dab7a3 Make CLI work 2023-12-11 18:49:18 +01:00
WeberJulian a5c0d9780f rename manager 2023-12-11 18:48:31 +01:00
WeberJulian 36143fee26 Add basic speaker manager 2023-12-11 15:25:46 +01:00
Eren Gölge e49c512d99
Merge pull request #3351 from aaron-lii/chinese-puncs
fix pause problem of Chinese speech
2023-12-04 15:57:42 +01:00
Edresson Casanova 5f900f156a
Add XTTS Fine tuning gradio demo (#3296)
* Add XTTS FT demo data processing pipeline

* Add training and inference columns

* Uses tabs instead of columns

* Fix demo freezing issue

* Update demo

* Convert stereo to mono

* Bug fix on XTTS inference

* Update gradio demo

* Update gradio demo

* Update gradio demo

* Update gradio demo

* Add parameters to be able to set then on colab demo

* Add erros messages

* Add intuitive error messages

* Update

* Add max_audio_length parameter

* Add XTTS fine-tuner docs

* Update XTTS finetuner docs

* Delete trainer to freeze memory

* Delete unused variables

* Add gc.collect()

* Update xtts.md

---------

Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-12-01 23:52:23 +01:00
Aaron-Li 7b8808186a fix pause problem of Chinese speech 2023-12-01 23:30:03 +08:00
Enno Hermann 39321d02be
fix: correctly strip/restore initial punctuation (#3336)
* refactor(punctuation): remove orphan code for handling lone punctuation

The case of lone punctuation is already handled at the top of restore(). The
removed if statement would never be called and would in fact raise an
AttributeError because the _punc_index named tuple doesn't have the attribute
`mark`.

* refactor(punctuation): remove unused argument

* fix(punctuation): correctly handle initial punctuation

Stripping and restoring initial punctuation didn't work correctly because the
string-splitting caused an additional empty string to be inserted in the text
list (because `".A".split(".")` => `["", "A"]`). Now, an initial empty string is
skipped and relevant test cases are added.

Fixes #3333
2023-11-30 13:03:16 +01:00
Eren G??lge 3b8894a3dd Make style 2023-11-27 14:15:50 +01:00
Eren G??lge 11ec9f7471 Add hi in config defaults 2023-11-24 15:38:36 +01:00
Eren G??lge 32065139e7 Simple text cleaner for "hi" 2023-11-24 15:14:34 +01:00
Enno Hermann 2af0220996
fix: don't pass quotes to espeak (#3286)
Previously, the text was wrapped in an additional set of quotes that was passed
to Espeak. This could result in different phonemization in certain edges and
caused the insertion of an initial separator "_" that had to be removed.
Compare:
$ espeak-ng -q -b 1 -v en-us --ipa=1 '"A"'
_ˈɐ
$ espeak-ng -q -b 1 -v en-us --ipa=1 'A'
ˈeɪ

Fixes #2619
2023-11-24 12:25:37 +01:00
Edresson Casanova 11283fce07
Ensures that only GPT model is in training mode during XTTS GPT training (#3241)
* Ensures that only GPT model is in training mode during training

* Fix parallel wavegan unit test
2023-11-17 15:13:46 +01:00
Eren G??lge 44880f09ed Make style 2023-11-17 13:43:34 +01:00
Eren G??lge 26efdf6ee7 Make k_diffusion optional 2023-11-17 13:42:33 +01:00
Julian Weber fbc18b8c34
Fix zh bug (#3238) 2023-11-16 17:51:37 +01:00
Julian Weber 675f983550
Add sentence splitting (#3227)
* Add sentence spliting

* update requirements

* update default args v2

* Add spanish

* Fix return gpt_latents

* Update requirements

* Fix requirements
2023-11-16 11:01:11 +01:00
Edresson Casanova 73a5bd08c0
Fix XTTS GPT padding and inference issues (#3216)
* Fix end artifact for fine tuning models

* Bug fix on zh-cn inference

* Remove ununsed code
2023-11-15 14:02:05 +01:00
Julian Weber 04901fb2e4
Add speed control for inference (#3214)
* Add speed control for inference

* Fix XTTS tests

* Add speed control tests
2023-11-14 16:07:17 +01:00
Eren Gölge ac3df409a6
Merge pull request #3208 from coqui-ai/fix_max_mel_len
fix max generation length for XTTS
2023-11-13 14:32:56 +01:00
Eren G??lge 92fa988aec Fixup 2023-11-13 13:44:06 +01:00
WeberJulian b85536b23f fix max generation length 2023-11-13 13:18:45 +01:00
Eren G??lge b2682d39c5 Make style 2023-11-13 13:01:01 +01:00
Eren G??lge a16360af85 Implement chunking gpt_cond 2023-11-13 13:00:08 +01:00
Enno Hermann 3b1e7038bc
fix(formatters): set missing root_path attribute (#3182)
Fixes #2778
2023-11-09 16:49:52 +01:00
Aarni Koskela a8e9163fb3
xtts/tokenizer: merge duplicate implementations of preprocess_text (#3170)
This was found via ruff:

> F811 Redefinition of unused `preprocess_text` from line 570
2023-11-09 16:32:12 +01:00
Matthew Boakes 1b9c400bca
PyTorch 2.1 Updates (Weight Norm and TorchAudio I/O) (#3176)
* Replaced PyTorch weight_norm With parametrizations.weight_norm

* TorchAudio: Migrating The I/O Functions To Use The Dispatcher Mechanism

* Corrected Code Style

---------

Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-11-09 16:31:03 +01:00
Gorkem 66a1e248d0
torchaudio should use proper backend to load audio (#3179) 2023-11-09 16:28:39 +01:00
Julian Weber 03ad90135b
Add lang code in XTTS doc (#3158)
* Add lang code in XTTS doc

* Remove ununsed config and args

* update docs

* woops
2023-11-08 13:47:33 +01:00
Gorkem 78a596618a
Fix for exception on streaming if last chunk empty (#3160) 2023-11-08 11:32:02 +01:00
Julian Weber ce1a39a9a4
Add char limit warn (#3130)
* Add char limit warning

* Adding v2 langs

* cached_property for cutlet

* Fix import
2023-11-08 10:24:23 +01:00
Edresson Casanova 5f9ab6cfaa
Fix style
Co-authored-by: Aarni Koskela <akx@iki.fi>
2023-11-06 19:22:34 -03:00
Edresson Casanova 09fb317e6d Remove unused code 2023-11-06 17:36:32 -03:00
Edresson Casanova b146de4ce8 Bug fix on XTTS v2.0 Trainer 2023-11-06 20:26:01 +01:00
Edresson Casanova 1b6f8d0e46 Update unit tests and recipes 2023-11-06 20:25:06 +01:00
Edresson Casanova 72b2bac0f8 Load reference in 24khz to avoid issued with multiple sr references 2023-11-06 20:25:06 +01:00
Edresson Casanova 00294ffdf6 Update XTTS docs 2023-11-06 20:24:06 +01:00
Edresson Casanova 459ad70dc8 Add support for multiples speaker references on XTTS inference 2023-11-06 20:22:35 +01:00
Eren Gölge f0cb19ecca
Drop diffusion from XTTS (#3150)
* Drop diffusion for XTTS

* Make style

* Drop diffusion deps in code

* Restore thrashed
2023-11-06 20:15:49 +01:00
Eren G??lge 5d418bb84a Update docs 2023-11-06 18:48:41 +01:00
Eren G??lge 9bbf6eb8dd Drop use_ne_hifigan 2023-11-06 18:43:38 +01:00
Eren G??lge 9d54bd7655 Fixup XTTS 2023-11-06 18:13:58 +01:00
Edresson Casanova e45227d9ff
XTTS v2.0 (#3137)
* Implement most similar ref training approach

* Use non-enhanced hifigan for test samples

* Add Perceiver

* Update GPT Trainer for perceiver support

* Update XTTS docs

* Bug fix masking with XTTS perceiver

* Bug fix on gpt forward

* Bug Fix on XTTS v2.0 training

* Add XTTS v2.0 unit tests

* Add XTTS v2.0 inference unit tests

* Bug Fix on diffusion inference

* Add XTTS v2.0 training recipe

* Placeholder model entry

* Add cloning params to config

* Make prompt embedding configurable

* Make cloning configurable

* Cheap fix for a cheaper fix

* Prevent resampling

* Update model entry

* Update docs

* Update requirements

* Code linting

* Add xtts v2 to sep tests

* Bug fix on XTTS get_gpt_cond_latents

* Bug fix on rebase

* Make style

* Bug fix in Japenese tokenizer

* Add num2words to deps

* Remove unused kwarg and added num_beams=1 as default

---------

Co-authored-by: Eren G??lge <egolge@coqui.ai>
2023-11-06 14:58:18 +01:00
Aarni Koskela 38f6f8f0bb
Run `make style` & re-enable it in CI (#3127) 2023-11-06 11:36:37 +01:00
WeberJulian 1c98821359 Remove unused load_audio function 2023-10-27 22:27:18 +02:00
WeberJulian d4e08c8d6c Add features to get_conditioning_latents 2023-10-26 14:57:33 +02:00
WeberJulian c1133724a1 Move lang token add to tokenizer 2023-10-26 14:52:13 +02:00
WeberJulian 6fa46d197d Fix get_conditioning_latents when using only ne 2023-10-26 14:51:35 +02:00
Edresson Casanova 01839af926 Bug fix on XTTS masking training 2023-10-24 18:30:14 -03:00
Edresson Casanova 0f96abb5ec Add FT inference example on XTTS docs 2023-10-23 13:23:30 -03:00