Commit Graph

4598 Commits

Author SHA1 Message Date
Eren G??lge 32065139e7 Simple text cleaner for "hi" 2023-11-24 15:14:34 +01:00
Eren G??lge 6dd43b0ce2 Update to XTTS v2.0.3 2023-11-24 14:36:04 +01:00
Julian Weber a55755c8df
update deepspeed version (#3281) 2023-11-24 12:35:49 +01:00
Kaszanas 1bf5926196
Introducing Development Dockerfile (#3263)
* Moved Dockerfile, COPY at the end

This change should prevent re-installation of the dependencies upon
every change of the repository's contents. Typically if Docker detects
that something changed in a layer, all downstream layers are invalidated
and rebuilt.

* Moved Dockerfile back to main directory

Main dockerfile in a separate directory can cause issues with the
current CI/CD setup. This can be a good change for later.

* Introduced Dockerfile.dev, updated CONTRIBUTING

Dockerfile.dev can be used as a separate development environment for
anyone that does not wish to install the dependencies locally.
2023-11-24 12:30:15 +01:00
TITC 4d0f53d2ee
Misjudgment of `is_multi_lingual` When Loading Multilingual Model via `model_path` (#3273)
* load multilingual model by path

* use config to assert multi lingual or not
2023-11-24 12:28:31 +01:00
Enno Hermann 8c5227ed84
Fix tts_with_vc (#3275)
* Revert "fix for issue 3067"

This reverts commit 041b4b6723.

Fixes #3143. The original issue (#3067) was people trying to use
tts.tts_with_vc_to_file() with XTTS and was "fixed" in #3109. But XTTS has
integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there
is no point in passing it through FreeVC afterwards. So, reverting this commit
because it breaks tts.tts_with_vc_to_file() for any model that doesn't have
integrated VC, i.e. all models this method is meant for.

* fix: support multi-speaker models in tts_with_vc/tts_with_vc_to_file

* fix: only compute spk embeddings for models that support it

Fixes #1440. Passing a `speaker_wav` argument to regular Vits models failed
because they don't support voice cloning. Now that argument is simply ignored.
2023-11-24 12:26:37 +01:00
Enno Hermann 2af0220996
fix: don't pass quotes to espeak (#3286)
Previously, the text was wrapped in an additional set of quotes that was passed
to Espeak. This could result in different phonemization in certain edges and
caused the insertion of an initial separator "_" that had to be removed.
Compare:
$ espeak-ng -q -b 1 -v en-us --ipa=1 '"A"'
_ˈɐ
$ espeak-ng -q -b 1 -v en-us --ipa=1 'A'
ˈeɪ

Fixes #2619
2023-11-24 12:25:37 +01:00
Enno Hermann 4a2684be34
fix(bin.synthesize): more informative error for wrong --language argument (#3294)
In multilingual models, the target language is specified via the
`--language_idx` argument. However, the `tts` CLI also accepts a `--language`
argument for use with Coqui Studio, so it is easy to choose the wrong one,
resulting in the following confusing error at synthesis time:

```
AssertionError:   Language None is not supported. Supported languages are
['en', 'es', 'fr', 'de', 'it', 'pt', 'pl', 'tr', 'ru', 'nl', 'cs', 'ar',
'zh-cn', 'hu', 'ko', 'ja']
```

This commit adds a better error message when `--language` is passed for a
non-studio model.

Fixes #3270, fixes #3291
2023-11-24 12:24:42 +01:00
Tessa Painter 64f391b583
Made the tqdm `progress_bar` objects of static download methods a static class variable (#3297) 2023-11-24 12:23:59 +01:00
Eren Gölge b47d9c6e36
Merge pull request #3243 from idiap/checkpoints
Remove duplicate/unused code
2023-11-22 23:52:06 +01:00
Eren Gölge 29dede20d3
Merge pull request #3249 from coqui-ai/run_ci_for_v0.20.6
Run CI for v0.20.6
2023-11-17 15:45:26 +01:00
Eren Gölge c011ab7455 Update to v0.20.6 2023-11-17 15:16:32 +01:00
Eren G??lge 52cb1e2f68 Update model hash for v2.0.2 2023-11-17 15:16:32 +01:00
Edresson Casanova 6075fa208c Ensures that only GPT model is in training mode during XTTS GPT training (#3241)
* Ensures that only GPT model is in training mode during training

* Fix parallel wavegan unit test
2023-11-17 15:15:22 +01:00
Eren G??lge a3279f9294 Make style 2023-11-17 15:15:22 +01:00
Eren G??lge f21067a84a Make k_diffusion optional 2023-11-17 15:15:21 +01:00
Eren G??lge 44494daa27 Update CI version 2023-11-17 15:15:21 +01:00
Eren G??lge c864acf2b7 Update versions 2023-11-17 15:15:21 +01:00
Edresson Casanova 11283fce07
Ensures that only GPT model is in training mode during XTTS GPT training (#3241)
* Ensures that only GPT model is in training mode during training

* Fix parallel wavegan unit test
2023-11-17 15:13:46 +01:00
Eren Gölge 14579a4607
Merge pull request #3248 from coqui-ai/slacker_deps
Update versions
2023-11-17 15:13:19 +01:00
Eren G??lge 44880f09ed Make style 2023-11-17 13:43:34 +01:00
Eren G??lge 26efdf6ee7 Make k_diffusion optional 2023-11-17 13:42:33 +01:00
Eren G??lge 08d11e9198 Update CI version 2023-11-17 13:01:32 +01:00
Eren G??lge 63d7145647 Update versions 2023-11-17 12:10:46 +01:00
Enno Hermann 0fb0d67de7 refactor: use save_checkpoint()/save_best_model() from Trainer 2023-11-17 01:18:23 +01:00
Enno Hermann 96678c7ba2 refactor: use copy_model_files() from Trainer 2023-11-17 01:18:23 +01:00
Enno Hermann 5119e651a1 chore(utils.io): remove unused code
These are all available in Trainer.
2023-11-17 01:18:23 +01:00
Enno Hermann 39fe38bda4 refactor: use save_fsspec() from Trainer 2023-11-17 01:18:23 +01:00
Enno Hermann fdf0c8b10a chore(encoder): remove unused code 2023-11-17 01:18:23 +01:00
Eren Gölge 7e4375da2b
Update to v0.20.6 2023-11-16 17:52:13 +01:00
Julian Weber fbc18b8c34
Fix zh bug (#3238) 2023-11-16 17:51:37 +01:00
Julian Weber 675f983550
Add sentence splitting (#3227)
* Add sentence spliting

* update requirements

* update default args v2

* Add spanish

* Fix return gpt_latents

* Update requirements

* Fix requirements
2023-11-16 11:01:11 +01:00
Enno Hermann 3c2d5a9e03
Remove duplicate AudioProcessor code and fix ExtractTTSpectrogram.ipynb (#3230)
* chore: remove unused argument

* refactor(audio.processor): remove duplicate stft+griffin_lim

* chore(audio.processor): remove unused compute_stft_paddings

Same function available in numpy_transforms

* refactor(audio.processor): remove duplicate db_to_amp

* refactor(audio.processor): remove duplicate amp_to_db

* refactor(audio.processor): remove duplicate linear_to_mel

* refactor(audio.processor): remove duplicate mel_to_linear

* refactor(audio.processor): remove duplicate build_mel_basis

* refactor(audio.processor): remove duplicate stft_parameters

* refactor(audio.processor): use pre-/deemphasis from numpy_transforms

* refactor(audio.processor): use rms_volume_norm from numpy_transforms

* chore(audio.processor): remove duplicate assert

Already checked in numpy_transforms.compute_f0

* refactor(audio.processor): use find_endpoint from numpy_transforms

* refactor(audio.processor): use trim_silence from numpy_transforms

* refactor(audio.processor): use volume_norm from numpy_transforms

* refactor(audio.processor): use load_wav from numpy_transforms

* fix(bin.extract_tts_spectrograms): set quantization bits

* fix(ExtractTTSpectrogram.ipynb): adapt to current TTS code

Fixes #2447, #2574

* refactor(audio.processor): remove duplicate quantization methods
2023-11-16 10:57:06 +01:00
Eren Gölge 88630c60e5
Update to v0.20.5 2023-11-15 14:02:51 +01:00
Edresson Casanova 73a5bd08c0
Fix XTTS GPT padding and inference issues (#3216)
* Fix end artifact for fine tuning models

* Bug fix on zh-cn inference

* Remove ununsed code
2023-11-15 14:02:05 +01:00
Ikko Eltociear Ashimine 15f0ac57d6
Update README.md (#3215)
Dicord -> Discord
2023-11-15 13:59:56 +01:00
Julian Weber 04901fb2e4
Add speed control for inference (#3214)
* Add speed control for inference

* Fix XTTS tests

* Add speed control tests
2023-11-14 16:07:17 +01:00
Eren Gölge d96f3885d5
Update to v0.20.4 2023-11-13 17:07:25 +01:00
Eren Gölge ac3df409a6
Merge pull request #3208 from coqui-ai/fix_max_mel_len
fix max generation length for XTTS
2023-11-13 14:32:56 +01:00
Eren Gölge f32a465711
Merge pull request #3207 from coqui-ai/update_xtts_cloning
Update XTTS cloning
2023-11-13 14:32:43 +01:00
Eren G??lge 92fa988aec Fixup 2023-11-13 13:44:06 +01:00
WeberJulian b85536b23f fix max generation length 2023-11-13 13:18:45 +01:00
Eren G??lge b2682d39c5 Make style 2023-11-13 13:01:01 +01:00
Eren G??lge a16360af85 Implement chunking gpt_cond 2023-11-13 13:00:08 +01:00
Eren Gölge 6f1cba2f81
Update to v0.20.3 2023-11-09 17:41:37 +01:00
Enno Hermann 3b1e7038bc
fix(formatters): set missing root_path attribute (#3182)
Fixes #2778
2023-11-09 16:49:52 +01:00
Aarni Koskela a8e9163fb3
xtts/tokenizer: merge duplicate implementations of preprocess_text (#3170)
This was found via ruff:

> F811 Redefinition of unused `preprocess_text` from line 570
2023-11-09 16:32:12 +01:00
Matthew Boakes 1b9c400bca
PyTorch 2.1 Updates (Weight Norm and TorchAudio I/O) (#3176)
* Replaced PyTorch weight_norm With parametrizations.weight_norm

* TorchAudio: Migrating The I/O Functions To Use The Dispatcher Mechanism

* Corrected Code Style

---------

Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-11-09 16:31:03 +01:00
Gorkem 66a1e248d0
torchaudio should use proper backend to load audio (#3179) 2023-11-09 16:28:39 +01:00
Eren Gölge 46d9c27212
Update to v0.20.2 2023-11-08 16:07:56 +01:00