Edresson Casanova
8d228ab22a
Trick to Upsampling to High sampling rates using VITS model ( #1456 )
...
* Add upsample VITS support
* Fix the bug in inference
* Fix lint checks
* Add RMS based norm in save_wav method
* Style fix
* Add the period for VITS multi-period discriminator in model_args
* Bug fix in speaker encoder load in inference time
* Add unit tests
* Remove useless detach_z_vocoder parameter
* Add docs for VITS upsampling
* Fix the docs
* Rename TTS_part_sample_rate to encoder_sample_rate
* Add upsampling_init and upsampling_z methods
* Add asserts for encoder_sample_rate part
* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Edresson Casanova
060e0f9368
Add EmbeddingManager and BaseIDManager ( #1374 )
2022-03-31 13:41:16 +02:00
WeberJulian
c66a6241fd
Enforce phonemizer definition for synthesis ( #1441 )
...
* Enforce phonemizer definition for synthesis
* Fix train_tts, tokenizer init can now edit config
* Add small change to trigger CI pipeline
* fix wrong output path for one tts_test
* Fix style
* Test config overides by args and tokenizer
* Fix style
2022-03-25 23:15:33 +01:00
Edresson Casanova
37896e1743
Bug fix in freeze encoder ( #1391 )
...
* Fix the bug in freeze encoder
* Remove emb_l definition for non-multilingual training
* Fix unit tests
2022-03-24 18:16:04 +01:00
Eren Gölge
72d85e53c9
Update model file extension ( #1422 )
...
* Update model file ext to ```.pth```
* Update docs
* Rename more
* Find model files
2022-03-22 17:55:00 +01:00
Eren Gölge
0870a4faa2
Make style ( #1405 )
2022-03-16 12:13:55 +01:00
Edresson Casanova
f81892483d
REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support ( #1349 )
...
* Rename Speaker encoder module to encoder
* Add a generic emotion dataset formatter
* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config
* Add class map in emotion config
* Add Base encoder config
* Add evaluation encoder script
* Fix the bug in plot_embeddings
* Enable Weight decay for encoder training
* Add argumnet to disable storage
* Add Perfect Sampler and remove storage
* Add evaluation during encoder training
* Fix lint checks
* Remove useless config parameter
* Active evaluation in speaker encoder test and use multispeaker dataset for this test
* Unit tests fixs
* Remove useless tests for speedup the aux_tests
* Use get_optimizer in Encoder
* Add BaseEncoder Class
* Fix the unitests
* Add Perfect Batch Sampler unit test
* Add compute encoder accuracy in a function
2022-03-11 14:43:40 +01:00
Edresson Casanova
917f417ac4
Add alphas to control language and speaker balancer ( #1216 )
...
* Add alphas to control language and speaker balancer
* Add docs for speaker and language samplers
* Change the Samplers weights to float for save memory
* Change the test_samplers to unittest format
* Add get_sampler method in BaseTTS
* Fix rebase issues
* Add language and speaker samplers support for DDP training
* Rename distributed sampler wrapper
* Remove the DistributedSamplerWrapper and use the one from Trainer
* Bugfix after rebase
* Move the samplers config to tts config
2022-03-10 14:56:09 +01:00
Eren Gölge
1425a023fe
Make style and lint
2022-03-02 13:25:35 +01:00
Eren Gölge
27b67b7945
Fix import
2022-03-02 09:15:20 +01:00
Eren Gölge
690de1ab06
Update Characters and add more tests
2022-02-25 11:32:44 +01:00
Eren Gölge
14c117978d
Fix return outputs
2022-02-25 11:31:56 +01:00
Eren Gölge
424d04e4f6
Make stlye
2022-02-25 11:31:56 +01:00
Eren Gölge
c0b40a0cb7
Update VITS tests
2022-02-25 11:31:20 +01:00
Eren Gölge
b0cff949f5
Update tests
2022-02-25 11:28:14 +01:00
Eren Gölge
1f0c8179da
Make style
2022-02-25 11:26:59 +01:00
Eren Gölge
ef63c99524
Implement `start_by_longest` option for TTSDatase
2022-02-25 11:26:18 +01:00
Eren Gölge
c4c471d61d
Allow padding for shorter segments
2022-02-25 11:25:48 +01:00
Eren Gölge
bc2243bac4
Fix tests
2022-02-25 11:25:00 +01:00
Eren Gölge
21940952bf
Make lint
2022-02-25 11:25:00 +01:00
Eren Gölge
146fbfd7c9
Extend unittests
2022-02-25 11:25:00 +01:00
Eren Gölge
2fe16de8e3
Make lint
2022-02-25 11:25:00 +01:00
Eren Gölge
d0eb3e4ef2
Add get_tests_data_path
2022-02-25 11:24:13 +01:00
Eren Gölge
235f7d9b02
Extend glow_tts model tests
2022-02-25 11:24:13 +01:00
Eren Gölge
5176ae9e53
Fixes small compat. issues
2022-02-25 11:21:19 +01:00
Eren Gölge
edec27738b
Delete `use_espeak_phonemes` from tests
2022-02-25 11:18:00 +01:00
Eren Gölge
0a47a7eac0
Update tests
2022-02-25 11:12:44 +01:00
Eren Gölge
b341951b78
Update loader tests
2022-02-25 11:12:44 +01:00
Eren Gölge
196ae74273
Update data loader tests
2022-02-25 11:05:06 +01:00
Eren Gölge
75c507c36a
Update VITS LJspeech recipe
2022-02-25 10:57:35 +01:00
Eren Gölge
04202da1ac
Make style
2022-02-25 10:48:03 +01:00
Eren Gölge
961e98a461
Add OOV case to tokenizer tests
2022-02-25 10:48:03 +01:00
Eren Gölge
8c8093ce23
Make style
2022-02-25 10:48:03 +01:00
Eren Gölge
f1ea3ad182
Remove old text processing tests
2022-02-25 10:48:02 +01:00
Eren Gölge
ba3b60c90f
Test TTSTokenizer
2022-02-25 10:48:02 +01:00
Eren Gölge
79a84410f2
Test punctuations
2022-02-25 10:48:02 +01:00
Eren Gölge
99d9bb7a17
Test Phonemizers
2022-02-25 10:48:02 +01:00
Eren Gölge
a1df4f9887
Test character classes
2022-02-25 10:45:24 +01:00
Eren Gölge
a51b031bff
Merge branch 'dev' into dev-fix-glowtts-infer
2022-02-21 12:01:40 +03:00
Edresson Casanova
28a7464975
Fix the bug in split dataset function ( #1251 )
...
* Fix the bug in split_dataset
* Make eval_split_size configurable
* Change test_loader to use load_tts_samples function
* Change eval_split_portion to eval_split_size and permits to set the absolute number of samples in eval
* Fix samplers unit test
* Add data unit test on GitHub workflow
2022-02-21 11:59:36 +03:00
Edresson Casanova
531821545e
Fix inference test issue
2022-02-19 12:21:32 +00:00
Edresson Casanova
5218d6b7a4
Fix unit tests issue
2022-02-19 12:15:03 +00:00
Edresson Casanova
fc7081fc5e
Add Inference test using TTS API in all models unit tests
2022-02-18 21:06:08 +00:00
Edresson Casanova
5cca4aa8ae
Add FastPitch Speaker embedding train unit test
2022-02-18 20:16:52 +00:00
Edresson Casanova
759f9ac76a
Add Glow-TTS d-vectors training unit test
2022-02-18 20:03:36 +00:00
Edresson Casanova
06cad27e31
Add Glow-TTS multi-speaker unit test
2022-02-18 18:20:47 +00:00
Eren Gölge
127118c637
Update TTS.tts formatters ( #1228 )
...
* Return Dict from tts formatters
* Make style
2022-02-11 23:03:43 +01:00
Edresson Casanova
0860d73cf8
Remove Tensorflow requeriment ( #1225 )
...
* Remove TF modules
* Remove TF unit tests
* Remove TF vocoder modules
* Remove TF convert scripts
* Remove TF requirement
* Remove the Docs TF instructions
* Remove TF inference support
2022-02-10 16:14:54 +01:00
Eren Gölge
8fd1ee1926
Print urls when BadZipError
2022-01-01 15:26:35 +00:00
Eren Gölge
254c110ec1
Print testing model
2022-01-01 13:57:01 +00:00
Eren Gölge
61874bc0a0
Fix your_tts inference from the listed models
2021-12-31 13:45:05 +00:00
Eren Gölge
36cef5966b
Fix resnet speaker encoder
2021-12-30 15:36:35 +00:00
Eren Gölge
348b5c96a2
Fix speaker encoder test
2021-12-30 15:36:35 +00:00
Eren Gölge
497332bd46
Add custom asserts to tests
2021-12-30 14:08:17 +00:00
Eren Gölge
2033e17c44
Add VITS model tests
2021-12-29 16:51:40 +00:00
Eren Gölge
56378b12f7
Fix speaker encoder init
2021-12-21 12:26:25 +00:00
Eren Gölge
704dddcffa
Make style
2021-12-20 11:54:10 +00:00
WeberJulian
8b3769c957
Fix seed in test_samplers to avoid random fails
2021-12-20 11:54:10 +00:00
WeberJulian
6f01eed672
Add test for language_weighted_sampler
2021-12-20 11:54:10 +00:00
Edresson
a57ddfb4ec
Add remove silence vad script Unit test
2021-12-20 11:54:10 +00:00
Edresson
e068fab6b2
Add find unique phonemes unit tests
2021-12-20 11:54:10 +00:00
WeberJulian
54e33bff61
Make a multilingual test use chars
2021-12-20 11:54:10 +00:00
WeberJulian
09eda31a3f
Fix tests
2021-12-20 11:54:10 +00:00
Edresson
06d89f93a8
Add VITS multilingual d-vectors unit test
2021-12-20 11:54:10 +00:00
Edresson
f394d60695
Fix the bug in multispeaker vits
2021-12-20 11:54:10 +00:00
WeberJulian
1472b6df49
make style
2021-12-20 11:54:10 +00:00
WeberJulian
3b5592abcf
fix test vits
2021-12-20 11:54:10 +00:00
Edresson
bbdb5c38e6
Add VITS multispeaker train unit test
2021-12-20 11:54:09 +00:00
Edresson
92f7f4f400
Active the multispeaker mode in multilingual training
2021-12-20 11:54:09 +00:00
Edresson
e68b042493
Add VITS d-vector unit test
2021-12-20 11:54:09 +00:00
Edresson
959cc8f03c
Add VITS multilingual unit test
2021-12-20 11:54:09 +00:00
Edresson
3fbbebd74d
Fix pylint issues
2021-12-20 11:54:09 +00:00
Michael Hansen
3bc043faeb
Upgrade to gruut 2.0 ( #882 )
2021-10-31 11:41:55 +01:00
Eren Gölge
2df0752e73
Model zoo tests ( #900 )
...
* Fix VITS model multi-speaker init
* Remove gdrive support in model manager
* Add model zoo tests
2021-10-29 17:54:16 +02:00
Eren Gölge
25759d6a61
Split tests
2021-10-21 17:30:15 +00:00
Eren Gölge
e62d3c5cf7
Use absolute imports for tts configs and models
2021-10-21 16:29:06 +00:00
Eren Gölge
4dbe7ed0de
Fix all-zero duration case for GlowTTS
2021-10-01 09:24:26 +00:00
Eren Gölge
7edbe04fe0
Fix WaveRNN config and test
2021-09-30 16:20:12 +00:00
Eren Gölge
4cacbf0d45
Fix WaveRNN test
2021-09-30 14:47:56 +00:00
Eren Gölge
2766dd1d6e
Fix #813 - GlowTTS training ( #814 )
...
* Fix #813
* Update glow_tts recipe
* Fix glow-tts test
* Linter fix
* Run data dep init only in training
2021-09-17 20:06:55 +02:00
Eren Gölge
1e7db32e90
Test FastPitch train
2021-09-11 10:19:47 +00:00
Eren Gölge
26f76fce22
Remove SpeedySpeech from .models.json
2021-09-10 17:47:27 +00:00
Eren Gölge
7ec23e69d4
Skip TF tests on GPU
2021-09-10 17:28:58 +00:00
Eren Gölge
1ebf9ec6bf
Remove speedy_speech implementation
2021-09-10 17:28:20 +00:00
Eren Gölge
7d8f77385a
Use `glow-tts` in synthesis tests
2021-09-10 17:27:33 +00:00
Eren Gölge
d6e29ef98a
Style update
2021-09-10 08:30:33 +00:00
Eren Gölge
3abc3a1d32
Fix GPU init in tests
2021-09-10 08:28:10 +00:00
Eren Gölge
ed4b1d8514
Test `TTS.tts.utils.helpers`
2021-09-10 08:25:21 +00:00
Eren Gölge
8b7e094bde
Implement `forward_tts`
...
- Generic API for feed-forward TTS models (FastPitch, SpeedySpeech)
- Tests for `forward-tts`
- Edit FastPitchConfig and SpeedySpeechConfig to use `forward_tts`
2021-09-10 08:24:33 +00:00
Eren Gölge
4761853c5c
Fix imports
2021-09-08 13:34:40 +00:00
Eren Gölge
e72c265cd4
Fix linter issues
2021-09-06 15:16:58 +00:00
Eren Gölge
fd287aa438
Update loader tests for dict return
2021-09-06 15:16:58 +00:00
Eren Gölge
2c4bbbf9b9
Use pyworld for pitch
2021-09-06 15:16:58 +00:00
Eren Gölge
076d0cb258
Add tests for certain FastPitch functions
2021-09-06 15:16:58 +00:00
Eren Gölge
d63a6bb690
Set BaseDatasetConfig for tests
2021-09-06 15:16:58 +00:00
Eren Gölge
fba257104d
Compute F0 using librosa
2021-09-06 15:16:58 +00:00
Katsuya Iida
165e5814af
Update Japanese phonemizer ( #758 )
...
* Update default ja vocoder
* update
* Japanese phonemizer test
* Run make style
Co-authored-by: Eren Gölge <egolge@coqui.ai>
2021-09-01 09:33:15 +02:00
Eren Gölge
f186856e5d
Add option to sort input sequnce by audio len
2021-08-30 08:10:35 +00:00
Eren Gölge
c312acac7d
Implement VITS model 🚀
...
VITS model implementation built on Glow TTS and HiFiGAN
layers.
2021-08-09 18:02:36 +00:00
Eren Gölge
003e5579e8
Enable `custom_symbols` in text processing
...
Models can define their own custom symbols lists with custom
`make_symbols()`
2021-08-09 18:02:36 +00:00
Eren Gölge
e4648ffef1
Fix multi-speaker init of Tacotron models & tests
2021-08-09 18:02:36 +00:00
Agrin Hilmkil
ced4cfdbbf
Allow saving / loading checkpoints from cloud paths ( #683 )
...
* Allow saving / loading checkpoints from cloud paths
Allows saving and loading checkpoints directly from cloud paths like
Amazon S3 (s3://) and Google Cloud Storage (gs://) by using fsspec.
Note: The user will have to install the relevant dependency for each
protocol. Otherwise fsspec will fail and specify which dependency is
missing.
* Append suffix _fsspec to save/load function names
* Add a lower bound to the fsspec dependency
Skips the 0 major version.
* Add missing changes from refactor
* Use fsspec for remaining artifacts
* Add test case with path requiring fsspec
* Avoid writing logs to file unless output_path is local
* Document the possibility of using paths supported by fsspec
* Fix style and lint
* Add missing lint fixes
* Add type annotations to new functions
* Use Coqpit method for converting config to dict
* Fix type annotation in semi-new function
* Add return type for load_fsspec
* Fix bug where fs not always created
* Restore the experiment removal functionality
2021-08-09 18:02:36 +00:00
Eren Gölge
75b201c6c1
Merge pull request #673 from coqui-ai/fix_stopnet
...
Fix stopnet training for Tacotron models
2021-07-24 12:25:38 +02:00
Eren Gölge
fc0c4600bd
Fix stopnet training
2021-07-24 11:39:54 +02:00
Eren Gölge
30eed347b6
Merge pull request #581 from Edresson/dev
...
Compute speaker embeddings in batch for the LSTM Speaker Encoder and Compute embeddings/ finding chars using config file.
2021-07-23 17:22:51 +02:00
WeberJulian
c79a82ed07
refix linter
2021-07-13 23:12:18 +02:00
Edresson
2e5baffa9c
Merge fix and eval split as argparse
2021-07-13 01:47:32 -03:00
Edresson
4eac1c4651
bug fix on train_encoder and unit tests
2021-07-11 12:00:39 -03:00
Eren Gölge
1e9538aaef
Add more model tests to `test_synthesize`
2021-07-04 11:45:49 +02:00
Eren Gölge
47b3b10d6d
Bump up to v0.1.0 🚀
2021-06-29 13:07:59 +02:00
Eren Gölge
9790eddada
Fix wrong argument name 🛠️
2021-06-28 17:03:47 +02:00
Eren Gölge
626c9d41e6
Update tests for the new trainer API
2021-06-28 17:03:19 +02:00
Eren Gölge
fcfd95669a
Update model test configs
2021-06-28 17:03:19 +02:00
Eren Gölge
00c82c516d
rename to
2021-06-28 17:03:19 +02:00
Eren Gölge
fdfb18d230
downsize melgan test model size
2021-06-28 17:03:19 +02:00
Eren Gölge
25238e0658
fix glow-tts `inference()`
2021-06-28 17:03:19 +02:00
Eren Gölge
82582993cc
use one testing sentence in tts tests
2021-06-28 17:03:19 +02:00
Eren Gölge
419735f440
refactor and fix multi-speaker training in Trainer and Tacotron models
2021-06-28 17:03:19 +02:00
Eren Gölge
269e5a734e
add max_decoder_steps argument to tacotron models
2021-06-28 17:03:19 +02:00
Eren Gölge
304d60197b
reduce multiband melgan test model size
2021-06-28 17:03:19 +02:00
Eren Gölge
db6a97d1a2
rename external speaker embedding arguments as `d_vectors`
2021-06-28 17:03:19 +02:00
Eren Gölge
877bf66b61
reduce size of the metadata.csv used at testing
2021-06-28 17:03:19 +02:00
Eren Gölge
87c61d210a
update test to be less demanding
2021-06-28 17:03:19 +02:00
Eren Gölge
6d6896fd99
reduce fullband-melgan test model size
2021-06-28 17:03:19 +02:00
Eren Gölge
1443d03af1
update test for the new input output API of the tts models
2021-06-28 17:03:19 +02:00
Eren Gölge
ef4ea9e527
update imports for `formatters`
2021-06-28 17:03:19 +02:00
Eren Gölge
6c495c6a6e
fix glow-tts inference and forward functions for handling `cond_input`
...
and refactor its test
2021-06-28 17:03:19 +02:00
Eren Gölge
b500338faa
make style
2021-06-28 17:03:19 +02:00
Eren Gölge
d25f017b42
update `setup_model.py` imports
2021-06-28 17:03:19 +02:00
Eren Gölge
7dff6be871
update tts training tests to use the trainer
2021-06-28 17:03:19 +02:00
Eren Gölge
9134c7dfb6
update `sequence_mask` import globally
2021-06-28 17:03:19 +02:00
Eren Gölge
8def3c87af
trainer-API updates
2021-06-28 17:03:19 +02:00
Eren Gölge
42554cc711
rename MyDataset -> TTSDataset
2021-06-28 17:03:19 +02:00
Edresson Casanova
eb84bb2bc8
Merge branch 'dev' into dev
2021-06-26 15:32:19 -03:00
Eren Gölge
6c7bbcaef0
Use `en-us` for testing phoneme models
2021-06-25 16:52:17 +02:00
Michael Hansen
a41f53fe72
Fix silly error in tests
2021-06-25 14:41:35 +02:00
Michael Hansen
3f172b84d8
Fix linting issues
2021-06-25 14:41:31 +02:00
Michael Hansen
4d8426fa0a
Use eSpeak IPA lexicons by default for phoneme models
2021-06-25 14:41:05 +02:00
Michael Hansen
47191f3ecc
Add tests for gruut phonemization
2021-06-25 14:41:05 +02:00
Edresson
28bec238ca
fix Lint checks
2021-06-18 14:33:50 -03:00
Edresson
83644056e3
fix Lint checks
2021-06-18 14:32:28 -03:00
Eren Gölge
db48c69f0f
reduce fullband melgan model size for testing
2021-06-02 11:44:53 +02:00
Eren Gölge
49c5e5d820
maket style japanese PR
2021-06-02 11:44:46 +02:00
Eren Gölge
0c14278c30
reorg test files
2021-06-02 11:40:26 +02:00
Eren Gölge
73b4083c6c
Merge pull request #502 from kaiidams/kaiidams/kokoro
...
Japanese Tacotron 2 model
2021-06-02 10:20:08 +02:00
Katsuya Iida
1cc18d1972
Move unittest of Japanese phonemizer.
2021-06-01 18:51:34 +09:00
Eren Gölge
bec85ac58d
make style
2021-05-31 16:37:15 +02:00
Eren Gölge
301c516abd
Merge branch 'dev' of https://github.com/coqui-ai/TTS into dev
2021-05-31 15:46:25 +02:00
Edresson
cc192b6843
add resnet speaker encoder train unit test
2021-05-29 22:43:41 -03:00
Eren Gölge
925c08cf95
replace unidecode with anyascii
2021-05-27 14:02:44 +02:00