Eren Gölge
9e5a469c64
d-vector handling ( #1945 )
...
* Update BaseDatasetConfig
- Add dataset_name
- Chane name to formatter_name
* Update compute_embedding
- Allow entering dataset by args
- Use released model by default
- Use the new key format
* Update loading
* Update recipes
* Update other dep code
* Update tests
* Fixup
* Load multiple embedding files
* Fix argument names in dep code
* Update docs
* Fix argument name
* Fix linter
2022-09-13 14:10:33 +02:00
Edresson Casanova
371772c355
Replace pyworld by pyin ( #1946 )
...
* Replace pyworld by pyin
* Fix unit tests
2022-09-09 10:43:14 +02:00
happylittlecat
4546b4cbd8
Add espeak support for Chinese ( #1905 )
...
* fix description
* add espeak support for chinese
* add espeak support for chinese
2022-09-08 12:32:41 +02:00
harmlessman
5abbe56642
Korean Phonemizer ( #1822 )
...
* Update requirements.txt
install jamo for korean
* Update formatters.py
add KSS formatter
KSS is a korean single speech dataset (12hours)
* Add files via upload
add phonemizer for korean
* Add files via upload
add korean phonemizer
* Update requirements.txt
* change code style with `black` and `pylint`
* reflecting pylint's Evaluation
* reflecting pylint's Evaluation
* reflecting pylint's Evaluation-2
* isort
* edit about separator
write test case and add 'nltk' for requirements.txt
* add korean g2p (g2pkk)
* isort
* TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name)
TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return)
* black
2022-09-08 12:06:07 +02:00
Edresson Casanova
159eeeef64
Fix find unique phonemes script ( #1928 )
...
* Fix find unique phonemes script
* Fix unit tests
2022-09-08 10:17:35 +02:00
KyuubiYoru
3b7dff568a
Fixes a race condition with multiple simultaneous get requests. ( #1807 )
...
* Fixes a race condition with multiple simultaneous get requests.
* Removed unused import
* Removed unused threading import
* Changed lock style to notation
* make style
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
2022-09-08 10:16:16 +02:00
Julian Weber
bb59718c03
Add capacitron v2 model ( #1768 )
...
* Add capacitron v2 in .models.json
* Put right commit hash
2022-09-08 09:43:56 +02:00
Edresson Casanova
096b35f639
Add VCTK speaker encoder recipe ( #1912 )
2022-08-26 16:19:03 +02:00
Eren Gölge
e5430a6519
Add new DE Thorsten models ( #1898 )
...
- Tacotron2-DDC
- HifiGAN vocoder
2022-08-22 11:27:39 +02:00
Eren G??lge
8845f06fd9
Bump up to v0.8.0
2022-08-22 11:26:47 +02:00
Stanislav Kachnov
2c9f00a808
Fix tune wavegrad ( #1844 )
...
* fix imports in tune_wavegrad
* load_config returns Coqpit object instead None
* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program
* fix var order in the result of batch collating
* make style
* make style with black and isort
2022-08-22 09:55:32 +02:00
Eren Gölge
fcb0bb58ae
Handle when no batch sampler ( #1882 )
2022-08-18 11:26:04 +02:00
Eren Gölge
7442bcefa5
Remove deprecated files ( #1873 )
...
- samplers.py is moved
- distribute.py is replaces by the 👟 Trainer
2022-08-15 12:16:37 +02:00
Eren Gölge
4333492341
Fix BCE loss issue ( #1872 )
...
* Fix BCE loss issue
* Remove import
2022-08-15 11:27:21 +02:00
manmay nakhashi
e4db7c51b5
Update capacitron_layers.py ( #1664 )
...
crashing because of dimension miss match at line no. 57
[batch, 256] vs [batch , 1, 512]
enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)
2022-08-15 11:08:50 +02:00
Eren Gölge
bfc63829ac
Implement bucketed weighted sampling for VITS ( #1871 )
2022-08-15 11:08:11 +02:00
Eren Gölge
d46fbc240c
Introduce numpy and torch transforms ( #1705 )
...
* Refactor audio processing functions
* Add tests for numpy transforms
* Fix imports
* Fix imports2
2022-08-08 11:57:50 +02:00
manmay nakhashi
7fd9b89ebf
fix get_random_embeddings --> get_random_embedding ( #1726 )
...
* fix get_random_embeddings --> get_random_embedding
function typo leads to training crash, no such function
* fix typo
get_random_embedding
2022-08-07 14:06:03 +02:00
rbaraglia
75ac9e3f0c
Fix language flags generated by espeak-ng phonemizer ( #1801 )
...
* fix language flags generated by espeak-ng phonemizer
* Style
* Updated language flag regex to consider all language codes alike
2022-08-07 13:57:40 +02:00
Lars Kiesow
8c645080ac
Adjust default to be able to process longer sentences ( #1835 )
...
Running `tts --text "$text" --out_path …` with a somewhat longer
sentences in the text will lead to warnings like “Decoder stopped with
max_decoder_steps 500” and the sentences just being cut off in the
resulting WAV file.
This happens quite frequently when feeding longer texts (e.g. a blog
post) to `tts`. It's particular frustrating since the error is not
always obvious in the output. You have to notice that there are missing
parts. This is something other users seem to have run into as well [1].
This patch simply increases the maximum number of steps allowed for the
tacotron decoder to fix this issue, resulting in a smoother default
behavior.
[1] https://github.com/mozilla/TTS/issues/734
2022-08-07 13:51:29 +02:00
p0p4k
903a77c197
Update wavenet.py ( #1796 )
...
* Update wavenet.py
Current version does not use "in_channels" argument.
In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor.
However, since it is a generic implementation, I believe it is better to update it for a more general use.
* "in_channels -> hidden_channels"
2022-08-01 12:20:37 +02:00
p0p4k
4fe50801b5
Update README.md; download progress bar in CLI. ( #1797 )
...
* Update README.md
- minor PR
- added model_info usage guide based on #1623 in README.md .
* "added tqdm bar for model download"
* Update manage.py
* fixed style
* fixed style
* sort imports
2022-08-01 12:17:47 +02:00
Eren G??lge
7d8b1665c8
Fix rand_segment edge case (input_len == seg_len - 1)
2022-08-01 11:37:45 +02:00
vanIvan
5094499eba
Fix & update WaveRNN vocoder model ( #1749 )
...
* Fixes KeyError bug. Adding logging to dashboard.
* Make pep8 compliant
* Make style compliant
* Still fixing style
2022-07-26 15:05:11 +02:00
p0p4k
10195c4eba
Update decoder.py ( #1792 )
...
Minor comment correction.
2022-07-26 13:06:06 +02:00
ivan provalov
903d9c791a
Fix for FloorDiv Function Warning ( #1760 )
...
* Fix for Floor Function Warning
Fix for Floor Function Warning
* Adding double quotes to fix formatting
Adding double quotes to fix formatting
* Update glow_tts.py
* Update glow_tts.py
2022-07-20 11:31:22 +02:00
Eren Gölge
f7587fc134
Fix SSIM loss correction
2022-07-13 10:47:12 +02:00
Eren Gölge
bc1f93c299
Fix device allocation
2022-07-12 19:05:25 +02:00
Eren Gölge
49bac724c0
Implement VitsAudioConfig ( #1556 )
...
* Implement VitsAudioConfig
* Update VITS LJSpeech recipe
* Update VITS VCTK recipe
* Make style
* Add missing decorator
* Add missing param
* Make style
* Update recipes
* Fix test
* Bug fix
* Exclude tests folder
* Make linter
* Make style
2022-07-12 18:49:58 +02:00
a-froghyar
34b80e0280
feat: updated recipes and lr fix ( #1718 )
...
- updated the recipes activating more losses for more stable training
- re-enabling guided attention loss
- fixed a bug about not the correct lr fetched for logging
2022-07-12 15:00:53 +02:00
Eren G??lge
48a4f3647f
Make lint
2022-07-12 14:58:26 +02:00
WeberJulian
c614f21982
Add durations as aux input for VITS ( #1694 )
...
* Add durations as aux input for VITS
* Make style
* Fix tts_tests
* Fix test_get_aux_input
2022-07-12 14:25:21 +02:00
Eren G??lge
2cf89b88c9
Make style
2022-07-12 14:12:57 +02:00
Eren G??lge
a6f73a18cb
Fix BCELoss adressing #1192
2022-07-12 14:11:34 +02:00
Eren G??lge
c17ff17a18
Fix SSIM loss
2022-07-12 12:35:24 +02:00
Eren G??lge
f1e35596e8
Remove redundant config field
2022-07-11 13:39:41 +02:00
WeberJulian
5cef6facb0
Fix tokenizer for punc only ( #1717 )
2022-07-06 22:59:41 +02:00
camillem
5c821d9fa1
Fix the --model_name and --vocoder_name arguments need a <model_type> element ( #1469 )
...
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-06-27 10:32:43 +02:00
manmay nakhashi
577ec406f4
Fix checkpointing GAN models ( #1641 )
...
* checkpoint sae step crash fix
* checkpoint save step crash fix
* Update gan.py
updated requested changes
* crash fix
2022-06-22 12:07:46 +02:00
Eren G??lge
00e67092d8
Bump up to v0.7.1
2022-06-21 14:12:55 +02:00
Eren G??lge
3328be7a8e
Remove GL message
2022-06-21 12:39:31 +02:00
WeberJulian
30c72e0d05
Add Thorsten VITS model ( #1675 )
...
Co-authored-by: Eren Gölge <egolge@coqui.ai>
2022-06-21 11:39:49 +02:00
p0p4k
71281ff1e4
Add support for model_info in CLI ( #1623 )
...
* model_info
* model_info
* model_info_by_idx and name
* model_info_by_idx and name
* model_info
* Update manage.py
* fixed linter
* fixed linter
* fixed linter
* fixed linter
* fixed return style checks
* fixed linter
* fixed linter
* fixed idx always positive
* added comments
* fix parser.args check
* fix parser.args check
* Make style
Co-authored-by: Eren G??lge <egolge@coqui.ai>
2022-06-20 23:28:17 +02:00
Eren G??lge
8b75e8be9c
Bump up to v0.7.0
2022-06-20 13:50:09 +02:00
WeberJulian
6126c23498
Add synpaflex formatter ( #1616 )
...
* Add synpaflex formatter
* Fix formatter
* Make style
2022-06-20 13:36:26 +02:00
WeberJulian
f09ea11c71
Internal formatter ( #1629 )
...
* Add coqui formatter
* Make style
2022-06-08 14:31:03 +02:00
Eren Gölge
f70e82cd19
Use fsspec and torch for embedding file IO ( #1581 )
...
* Use fsspec and torch for embedding file
* Fixup
* Fix load and save files
* Fix compute embedding script
* Set use_cuda to true if available
* Add dummy speakers.pth file
* Make style
* Change default speakers file extension
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
2022-06-01 13:49:42 +02:00
Noran Raskin
a790df4e94
Training recipes for thorsten dataset ( #1020 )
...
* Fix style
* Fix isort
* Remove tensorboardX from requirements
Co-authored-by: logan hart <72301874+loganhart420@users.noreply.github.com>
Co-authored-by: Eren Gölge <egolge@coqui.ai>
2022-05-30 12:07:31 +02:00
André R. de Miranda
3b84ef9524
Fixed use_cuda issue in compute_embeddings.py
...
Added use_cuda argument in self.init_encoder method
2022-05-20 12:46:46 -03:00
a-froghyar
8be21ec387
Capacitron ( #977 )
...
* new CI config
* initial Capacitron implementation
* delete old unused file
* fix empty formatting changes
* update losses and training script
* fix previous commit
* fix commit
* Add Capacitron test and first round of test fixes
* revert formatter change
* add changes to the synthesizer
* add stepwise gradual lr scheduler and changes to the recipe
* add inference script for dev use
* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting
* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model
* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style
* chore: fix linting
* feat: add Tacotron 2 support
* leftover from dev
* chore:rename parser args
* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers
* chore: revert arbitrary trainer changes
* fmt: revert formatting bug
* formatting again
* formatting fixed
* fix: log func
* fix: update optimizer
- Implemented load_state_dict for continuing training
* fix: clean optimizer init for standard models
* improvement: purge espeak flags and add training scripts
* Delete capacitronT2.py
delete old training script, new one is pushed
* feat: capacitron trainer methods
- extracted capacitron specific training operations from the trainer into custom
methods in taco1 and taco2 models
* chore: renaming and merging capacitron and gst style args
* fix: bug fixes from the previous commit
* fix: implement state_dict method on CapacitronOptimizer
* fix: call method
* fix: inference naming
* Delete train_capacitron.py
* fix: synthesize
* feat: update tests
* chore: fix style
* Delete capacitron_inference.py
* fix: fix train tts t2 capacitron tests
* fix: double forward in T2 train step
* fix: double forward in T1 train step
* fix: run make style
* fix: remove unused import
* fix: test for T1 capacitron
* fix: make lint
* feat: add blizzard2013 recipes
* make style
* fix: update recipes
* chore: make style
* Plot test sentences in Tacotron
* chore: make style and fix import
* fix: call forward first before problematic floordiv op
* fix: update recipes
* feat: add min_audio_len to recipes
* aux_input["style_mel"]
* chore: make style
* Make capacitron T2 recipe more stable
* Remove T1 capacitron Ljspeech
* feat: implement new grad clipping routine and update configs
* make style
* Add pretrained checkpoints
* Add default vocoder
* Change trainer package
* Fix grad clip issue for tacotron
* Fix scheduler issue with tacotron
Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-05-20 16:17:11 +02:00
Edresson Casanova
ee99a6c1e2
Fix voice conversion inference ( #1583 )
...
* Add voice conversion zoo test
* Fix style
* Fix unit test
2022-05-20 15:50:25 +02:00
Edresson Casanova
e5d8ec2402
Change the VITS upsampling interpolation trick to linear ( #1564 )
2022-05-13 10:52:39 +02:00
Edresson Casanova
c6008e5235
Add audio length sampler balancer ( #1561 )
...
* Add audio length sampler balancer
* Add unit tests
2022-05-12 19:59:19 +02:00
Eren Gölge
6e460b7e42
Add an assert for the upsampling trick ( #1538 )
2022-05-12 19:55:24 +02:00
Eren Gölge
4857967063
🐍 Python 3.10.x support and drop Python 3.6 support ( #1565 )
...
* Update requirements
* Update CI for p3.10
* Update numpy requirement
* Drop 🐍 p3.6 support
Numpy also dropped support for p3.6
* Bind cython v0.29.28
* Bind pyworld to v0.2.10
> 0.2.10 is not p3.10.x compatible
* Update Dockerfile
2022-05-12 15:50:25 +02:00
Edresson Casanova
a97eed696a
Fix the bug in eSpeak wrapper for eSpeak version 1.48.15 ( #1560 )
2022-05-12 15:15:18 +02:00
Eren Gölge
e45ae57aef
Merge pull request #1550 from coqui-ai/fix-upsampling-asserts
...
Fix VITS upsampling asserts
2022-05-12 14:51:41 +02:00
Edresson Casanova
175ca06388
Add reinit text encoder and duration predictor parameter ( #1562 )
...
* Add reinit encoder and duration predictor option
* Add .data to prevent any overlooked autograd hook
2022-05-12 09:08:36 -03:00
Edresson Casanova
182711043c
Fix the VITS upsampling asserts
...
Fix style
2022-05-12 09:08:29 -03:00
Eren Gölge
2fc38f67d2
Update SpeakerManager init in Synthesizer
2022-05-11 11:32:27 +02:00
Eren Gölge
c3f8c4d5eb
Return default SpeakerManager if no d_vector_file
2022-05-11 11:31:45 +02:00
Eren Gölge
121e9ed685
Pass use_cuda to init_encoder
2022-05-11 11:31:17 +02:00
Eren Gölge
c18bd21b3f
Return durations at VITS inference
2022-05-11 11:30:05 +02:00
Eren Gölge
5021a03de0
Use torch.no_grad for VITS inference
2022-05-11 11:29:36 +02:00
Eren Gölge
3f03e3012c
Fix batch_group_size in VITS
2022-05-07 13:44:44 +02:00
code-review-doctor
fa887ef5f9
Fix issue probably-meant-fstring found at https://codereview.doctor ( #1532 )
2022-05-07 13:33:40 +02:00
Eren Gölge
a0a9279e4b
Fix GAN optimizer order
...
commit 212d330929
Author: Edresson Casanova <edresson1@gmail.com>
Date: Fri Apr 29 16:29:44 2022 -0300
Fix unit test
commit 44456b0483
Author: Edresson Casanova <edresson1@gmail.com>
Date: Fri Apr 29 07:28:39 2022 -0300
Fix style
commit d545beadb9
Author: Edresson Casanova <edresson1@gmail.com>
Date: Thu Apr 28 17:08:04 2022 -0300
Change order of HIFI-GAN optimizers to be equal than the original repository
commit 657c5442e5
Author: Edresson Casanova <edresson1@gmail.com>
Date: Thu Apr 28 15:40:16 2022 -0300
Remove audio padding before mel spec extraction
commit 76b274e690
Merge: 379ccd7b
6233f4fc
Author: Edresson Casanova <edresson1@gmail.com>
Date: Wed Apr 27 07:28:48 2022 -0300
Merge pull request #1541 from coqui-ai/comp_emb_fix
Bug fix in compute embedding without eval partition
commit 379ccd7ba6
Author: WeberJulian <julian.weber@hotmail.fr>
Date: Wed Apr 27 10:42:26 2022 +0200
returns y_mask in VITS inference (#1540 )
* returns y_mask
* make style
2022-05-07 13:29:11 +02:00
Edresson Casanova
60034674f9
Remove audio padding before mel spec extraction
2022-05-07 13:12:09 +02:00
WeberJulian
fbdf76b2fc
returns y_mask in VITS inference ( #1540 )
...
* returns y_mask
* make style
2022-05-03 13:49:24 +02:00
Edresson Casanova
6233f4fcd7
Bug fix in compute embedding without eval partition
2022-04-26 13:58:03 -03:00
Edresson Casanova
8d228ab22a
Trick to Upsampling to High sampling rates using VITS model ( #1456 )
...
* Add upsample VITS support
* Fix the bug in inference
* Fix lint checks
* Add RMS based norm in save_wav method
* Style fix
* Add the period for VITS multi-period discriminator in model_args
* Bug fix in speaker encoder load in inference time
* Add unit tests
* Remove useless detach_z_vocoder parameter
* Add docs for VITS upsampling
* Fix the docs
* Rename TTS_part_sample_rate to encoder_sample_rate
* Add upsampling_init and upsampling_z methods
* Add asserts for encoder_sample_rate part
* Move upsampling tests to test_vits.py
2022-04-26 11:47:46 +02:00
Eren Gölge
c410bc58ef
Bump to v0.6.2
2022-04-20 11:46:26 +02:00
WeberJulian
30bea7d53c
Update manage.py ( #1514 )
2022-04-19 14:27:32 +02:00
Eren Gölge
7133f8f47d
Print Model's license when downloading ( #1512 )
...
* Print model license while downloading
* Make style
* Add a new license link
* Make style
2022-04-19 14:18:49 +02:00
WeberJulian
4953636b14
Add African models ( #1511 )
...
* Add african models
* Set default license for all models
2022-04-19 14:18:30 +02:00
Edresson Casanova
060e0f9368
Add EmbeddingManager and BaseIDManager ( #1374 )
2022-03-31 13:41:16 +02:00
WeberJulian
1b22f03e98
Fix G2P backend of the released models ( #1461 )
...
* Fix enforce phonemizer
* Add new models
* Fix .model.json
2022-03-30 12:47:11 +02:00
WeberJulian
c66a6241fd
Enforce phonemizer definition for synthesis ( #1441 )
...
* Enforce phonemizer definition for synthesis
* Fix train_tts, tokenizer init can now edit config
* Add small change to trigger CI pipeline
* fix wrong output path for one tts_test
* Fix style
* Test config overides by args and tokenizer
* Fix style
2022-03-25 23:15:33 +01:00
Edresson Casanova
37896e1743
Bug fix in freeze encoder ( #1391 )
...
* Fix the bug in freeze encoder
* Remove emb_l definition for non-multilingual training
* Fix unit tests
2022-03-24 18:16:04 +01:00
Edresson Casanova
3435bc8fca
Fix style tests
2022-03-23 15:05:32 -03:00
Edresson Casanova
0ae1e0248c
Fix the bug for emptly audio files
2022-03-23 14:39:31 -03:00
Edresson Casanova
ea53d6feb3
Replace webrtcvad by silero-vad
2022-03-23 14:39:31 -03:00
Eren Gölge
3af01cfe3b
Update base model wrt 👟 ( #1406 )
2022-03-23 17:24:20 +01:00
Eren Gölge
1c3623af33
Fix model manager ( #1436 )
...
* Fix manager
* Make style
2022-03-23 12:57:14 +01:00
Eren Gölge
72d85e53c9
Update model file extension ( #1422 )
...
* Update model file ext to ```.pth```
* Update docs
* Rename more
* Find model files
2022-03-22 17:55:00 +01:00
Eren Gölge
fd56fabb21
Fix #1380 ( #1409 )
2022-03-16 12:38:27 +01:00
Eren Gölge
0870a4faa2
Make style ( #1405 )
2022-03-16 12:13:55 +01:00
WeberJulian
690c96ed28
Fix default phonemizer for ja and zh ( #1399 )
2022-03-16 12:13:22 +01:00
Edresson Casanova
f81892483d
REBASED: Transform Speaker Encoder in a Generic Encoder and Implement Emotion Encoder training support ( #1349 )
...
* Rename Speaker encoder module to encoder
* Add a generic emotion dataset formatter
* Transform the Speaker Encoder dataset to a generic dataset and create emotion encoder config
* Add class map in emotion config
* Add Base encoder config
* Add evaluation encoder script
* Fix the bug in plot_embeddings
* Enable Weight decay for encoder training
* Add argumnet to disable storage
* Add Perfect Sampler and remove storage
* Add evaluation during encoder training
* Fix lint checks
* Remove useless config parameter
* Active evaluation in speaker encoder test and use multispeaker dataset for this test
* Unit tests fixs
* Remove useless tests for speedup the aux_tests
* Use get_optimizer in Encoder
* Add BaseEncoder Class
* Fix the unitests
* Add Perfect Batch Sampler unit test
* Add compute encoder accuracy in a function
2022-03-11 14:43:40 +01:00
Edresson Casanova
36e9ea2f97
Open bible dataset formatter ( #1365 )
...
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference
* Fix the bug in find unique chars script
* Add OpenBible formatter
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-03-11 10:43:31 +01:00
Edresson Casanova
dbe9da7f15
Add Voice conversion inference support ( #1337 )
...
* Add support for voice conversion inference
* Cache d_vectors_by_speaker for fast inference using a bigger speakers.json
* Rebase bug fix
* Use the average d-vector for inference
2022-03-10 14:57:12 +01:00
Edresson Casanova
917f417ac4
Add alphas to control language and speaker balancer ( #1216 )
...
* Add alphas to control language and speaker balancer
* Add docs for speaker and language samplers
* Change the Samplers weights to float for save memory
* Change the test_samplers to unittest format
* Add get_sampler method in BaseTTS
* Fix rebase issues
* Add language and speaker samplers support for DDP training
* Rename distributed sampler wrapper
* Remove the DistributedSamplerWrapper and use the one from Trainer
* Bugfix after rebase
* Move the samplers config to tts config
2022-03-10 14:56:09 +01:00
Edresson Casanova
f381e29b91
REBASED: Add support for the speaker encoder training using torch spectrograms ( #1348 )
...
* Add support for the speaker encoder training using torch spectrograms
* Remove useless function in speaker encoder dataset class
2022-03-10 14:54:51 +01:00
Eren Gölge
c670365507
Fix VCTK recipe and formatter
2022-03-08 14:20:34 +01:00
Eren Gölge
8feb41d361
Bump up to v0.6.1
2022-03-07 15:57:44 +01:00
Eren Gölge
ee02bc3823
Bump up to v0.6.0
2022-03-07 12:08:22 +01:00
Eren Gölge
dc280819be
Add new models
2022-03-07 12:08:09 +01:00
Eren Gölge
e9d9028b4d
Revert cleaner name
2022-03-06 12:57:06 +01:00
Eren Gölge
764c7fa4a4
Rename phoneme_cleaners
2022-03-06 12:09:54 +01:00
Eren Gölge
dd4287de1f
Update models
2022-03-03 20:23:00 +01:00
Eren Gölge
6cb00be795
Update your_tts model URL
2022-03-02 18:04:49 +01:00
Eren Gölge
1425a023fe
Make style and lint
2022-03-02 13:25:35 +01:00
Eren Gölge
c68885b3fd
Update Vits speaker encoder init
2022-03-02 13:20:23 +01:00
Eren Gölge
27b67b7945
Fix import
2022-03-02 09:15:20 +01:00
Eren Gölge
942df0fb05
Update vits dataset
2022-03-02 09:14:32 +01:00
Eren Gölge
6a9f8074f0
Fix TTSDataset
2022-03-01 07:57:48 +01:00
Eren Gölge
690de1ab06
Update Characters and add more tests
2022-02-25 11:32:44 +01:00
Eren Gölge
9063397892
Fix FastSpeech config
2022-02-25 11:31:56 +01:00
Eren Gölge
1e414b3a09
Make stlye
2022-02-25 11:31:56 +01:00
Eren Gölge
acc83cd3e6
Update Vits model API
2022-02-25 11:31:56 +01:00
Eren Gölge
fe656659be
Implement BaseTTS
2022-02-25 11:31:56 +01:00
Eren Gölge
bed4afd4ee
Implement BaseVocabulary
2022-02-25 11:31:56 +01:00
Eren Gölge
e0f9be76c0
Update test_run in wavernn and wavegrad
2022-02-25 11:31:56 +01:00
Eren Gölge
bf540f4323
Update imports for trainer
2022-02-25 11:31:56 +01:00
Eren Gölge
4c43eda414
Update BaseTrainerModel
2022-02-25 11:31:56 +01:00
Eren Gölge
83c5ddc5b7
Update imports
2022-02-25 11:31:56 +01:00
Eren Gölge
14c117978d
Fix return outputs
2022-02-25 11:31:56 +01:00
Eren Gölge
424d04e4f6
Make stlye
2022-02-25 11:31:56 +01:00
Eren Gölge
8b3ba02c95
Add vocab_dict to model config
2022-02-25 11:31:20 +01:00
Eren Gölge
ff23dce081
Update TTSDataset
2022-02-25 11:31:20 +01:00
Eren Gölge
750903d2ba
Add VCTK formatter docstring
2022-02-25 11:30:24 +01:00
Eren Gölge
52a7896668
Update VITS loss
2022-02-25 11:30:24 +01:00
Eren Gölge
c68962c574
Update forward tts binary loss
2022-02-25 11:30:24 +01:00
Eren Gölge
c11944022d
Revert back again rand_segment
2022-02-25 11:30:24 +01:00
Eren Gölge
00c7600103
Update Vits model API
2022-02-25 11:30:24 +01:00
Eren Gölge
935a604046
Delete trainer_utils
2022-02-25 11:29:41 +01:00
Eren Gölge
d0c27a9661
Update synthesis.py
2022-02-25 11:29:41 +01:00
Eren Gölge
35fc7270ff
Implement BaseTTS
2022-02-25 11:28:47 +01:00
Eren Gölge
2bad098625
Implement BaseVocabulary
2022-02-25 11:28:47 +01:00
Eren Gölge
833de62e30
Update base_vocoder
2022-02-25 11:28:14 +01:00
Eren Gölge
fc3b6d2861
Update gan
2022-02-25 11:28:14 +01:00
Eren Gölge
20a677c623
Update test_run in wavernn and wavegrad
2022-02-25 11:28:14 +01:00
Eren Gölge
be3a03126a
Update imports for trainer
2022-02-25 11:28:14 +01:00
Eren Gölge
c911729896
Update BaseTrainerModel
2022-02-25 11:28:14 +01:00
Eren Gölge
1e219fef0a
Revert drop_last
2022-02-25 11:26:59 +01:00
Eren Gölge
7dfd753d91
Add a cheap trick to avoid short audio clips
2022-02-25 11:26:59 +01:00
Eren Gölge
1a43e05460
Fix VITS loss bug
...
Fake and real features were given in the wrong args order to
the loss function
2022-02-25 11:26:59 +01:00
Eren Gölge
4b96bfe925
Fix train logging
2022-02-25 11:26:59 +01:00
Eren Gölge
ab8a4ca2c3
Revert random segment
2022-02-25 11:26:59 +01:00
Eren Gölge
8622226f3f
Make style
2022-02-25 11:26:59 +01:00
Eren Gölge
27db089d6c
Change TrainingArgs -> TrainerArgs
2022-02-25 11:26:59 +01:00
Eren Gölge
aa81454721
Update BaseTrainingConfig
2022-02-25 11:26:59 +01:00
Eren Gölge
d3a58ed07a
Fix default values
2022-02-25 11:26:59 +01:00
Eren Gölge
54c6bb2a8c
Fix add speaker VITS
2022-02-25 11:26:59 +01:00
Eren Gölge
590b04fb89
Fix espeak_wrapper
2022-02-25 11:26:59 +01:00
Eren Gölge
a013566d15
Delete trainer related code
2022-02-25 11:26:59 +01:00
Eren Gölge
38314194e7
Set `drop_last`
2022-02-25 11:26:59 +01:00
Eren Gölge
f70e4bb8c6
Add new speakers to the vits model
2022-02-25 11:26:59 +01:00
Eren Gölge
d5c0e17548
Load right char class dynamically
2022-02-25 11:26:59 +01:00
Eren Gölge
1f0c8179da
Make style
2022-02-25 11:26:59 +01:00
Eren Gölge
b3ed6ff6b7
Update FastPitchConfig
2022-02-25 11:26:59 +01:00
Eren Gölge
1932401e8d
Fix dataset preprocessing
2022-02-25 11:26:59 +01:00
Eren Gölge
34c4be5e49
Update forwardtts
2022-02-25 11:26:59 +01:00
Eren Gölge
bb37462794
Update language manager
2022-02-25 11:26:59 +01:00
Eren Gölge
5169d4eb32
Plot pitch over input characters
2022-02-25 11:26:59 +01:00
Eren Gölge
cd5d1497cf
Add pitch_fmin pitch_fmax args to the audio
2022-02-25 11:26:59 +01:00
Eren Gölge
1445a46e9e
Update synthesizer to use iinit_from_config
2022-02-25 11:26:59 +01:00
Eren Gölge
7058fcc3ff
Take file extension as an argument
2022-02-25 11:26:59 +01:00
Eren Gölge
13482dde1f
Update GAN model
2022-02-25 11:26:59 +01:00
Eren Gölge
2829027d8b
Refactor VITS model
2022-02-25 11:26:59 +01:00
Eren Gölge
ef63c99524
Implement `start_by_longest` option for TTSDatase
2022-02-25 11:26:18 +01:00
Eren Gölge
c4c471d61d
Allow padding for shorter segments
2022-02-25 11:25:48 +01:00
Eren Gölge
47fbddc8d4
Fix docstring
2022-02-25 11:25:48 +01:00
Eren Gölge
bc2243bac4
Fix tests
2022-02-25 11:25:00 +01:00
Eren Gölge
146fbfd7c9
Extend unittests
2022-02-25 11:25:00 +01:00
Eren Gölge
2fe16de8e3
Make lint
2022-02-25 11:25:00 +01:00
Eren Gölge
7b49a4aa2b
Fix glow_tts_config missing field
2022-02-25 11:24:13 +01:00
Eren Gölge
07b0a80d57
Fix tokenizer init_from_config
2022-02-25 11:24:13 +01:00
Eren Gölge
50e17097a7
Add verbose option to AudioProcessor
2022-02-25 11:24:13 +01:00
Eren Gölge
235f7d9b02
Extend glow_tts model tests
2022-02-25 11:24:13 +01:00
Eren Gölge
8e248913d6
Update train_tts for the new API
2022-02-25 11:24:13 +01:00
Eren Gölge
001da8afc8
Update Vits for the new model API
2022-02-25 11:21:19 +01:00
Eren Gölge
5176ae9e53
Fixes small compat. issues
2022-02-25 11:21:19 +01:00
Eren Gölge
131bc0cfc0
Fix synthesis.py 🔧
2022-02-25 11:18:00 +01:00
Eren Gölge
c0746f23df
Fix `too many open files`
2022-02-25 11:16:30 +01:00
Eren Gölge
df0d58bf09
Update VCTK recipes
2022-02-25 11:16:30 +01:00
Eren Gölge
730f7c0df4
Add file_ext args to resample.py
2022-02-25 11:15:46 +01:00
Eren Gölge
28d98da422
Update VCTK formatter
2022-02-25 11:15:46 +01:00
Eren Gölge
4d99fee3e2
Update spec extractor
2022-02-25 11:12:44 +01:00
Eren Gölge
38a0b3b6c7
Update train_tts.py
2022-02-25 11:11:35 +01:00
Eren Gölge
cfaa51fddc
Update BaseTTS config
2022-02-25 11:11:35 +01:00
Eren Gölge
4c5cb44eeb
Update setup_model
2022-02-25 11:11:35 +01:00
Eren Gölge
7c4243fba7
Update GlowTTS
2022-02-25 11:11:35 +01:00
Eren Gölge
bacf79f4fb
Update AlignTTS
2022-02-25 11:11:35 +01:00
Eren Gölge
18f726af65
Update ForwardTTS
2022-02-25 11:11:35 +01:00
Eren Gölge
d0ec4b91e5
Update Tacotron models
2022-02-25 11:11:35 +01:00
Eren Gölge
ea965a5683
Update VITS for the new API
2022-02-25 11:11:35 +01:00
Eren Gölge
f802a931a3
Pass samples to init_from_config in SpeakerManager
2022-02-25 11:07:34 +01:00
Eren Gölge
bde68d9f25
Use the same phonemizer for `en` to `en-us`
2022-02-25 11:07:34 +01:00
Eren Gölge
8649d4fd36
Allow None pad and blank tokens
2022-02-25 11:07:34 +01:00
Eren Gölge
c9972e6f14
Make lint
2022-02-25 11:07:34 +01:00
Eren Gölge
30cfafce56
Add init_from_config
2022-02-25 11:05:54 +01:00
Eren Gölge
90cc45dd4e
Update data loader tests
2022-02-25 11:05:54 +01:00
Eren Gölge
93957d58a1
Refactorin VITS for the tokenizer API
2022-02-25 11:05:06 +01:00
Eren Gölge
04df0a3d9f
Refactor TTSDataset ⚡ ️
2022-02-25 11:05:06 +01:00
Eren Gölge
9bb347a52b
Update for tokenizer API
2022-02-25 11:05:06 +01:00
Eren Gölge
452dbc43d8
Update imports for symbols -> characters
2022-02-25 11:05:06 +01:00
Eren Gölge
8071fa0020
Refactor GlowTTS model and recipe for TTSTokenizer
2022-02-25 11:05:06 +01:00
Eren Gölge
b6c2bfdf08
Refactor synthesis.py for TTSTokenizer
2022-02-25 11:05:06 +01:00
Eren Gölge
b2bb954a51
Refactor TTSDataset to use TTSTokenizer
2022-02-25 11:05:06 +01:00
Eren Gölge
84091096a6
Refactor Synthesizer class for TTSTokenizer
2022-02-25 11:05:06 +01:00
Eren Gölge
196ae74273
Update data loader tests
2022-02-25 11:05:06 +01:00
Eren Gölge
98057a00ae
Make style
2022-02-25 10:57:35 +01:00
Eren Gölge
7575367b9f
Refactorin VITS for the tokenizer API
2022-02-25 10:57:35 +01:00
Eren Gölge
4cd690e4c1
Updates BaseTTS and configs
2022-02-25 10:57:35 +01:00
Eren Gölge
176b712c1a
Refactor TTSDataset ⚡ ️
2022-02-25 10:57:35 +01:00
Eren Gölge
4597d4e5b6
Remove get_characters from BaseTTS
2022-02-25 10:48:03 +01:00
Eren Gölge
1df1d6c4a9
Update for tokenizer API
2022-02-25 10:48:03 +01:00
Eren Gölge
2d8ce98d2a
Update imports for symbols -> characters
2022-02-25 10:48:03 +01:00
Eren Gölge
9a95e15483
Refactor GlowTTS model and recipe for TTSTokenizer
2022-02-25 10:48:03 +01:00
Eren Gölge
d0eb642d88
Refactor synthesis.py for TTSTokenizer
2022-02-25 10:48:03 +01:00
Eren Gölge
3476be30d7
Refactor Synthesizer class for TTSTokenizer
2022-02-25 10:48:03 +01:00
Eren Gölge
9397a56b13
Allow init_from_config from model or audio config
2022-02-25 10:48:03 +01:00
Eren Gölge
a71a013276
Fix the wrong default loss name for GAN models
2022-02-25 10:48:03 +01:00
Eren Gölge
04202da1ac
Make style
2022-02-25 10:48:03 +01:00
Eren Gölge
3b63d713b9
Fix espeak wrapper cmd call
2022-02-25 10:48:03 +01:00
Eren Gölge
4894998e6b
Fix print_logs
2022-02-25 10:48:03 +01:00
Eren Gölge
4e8f9d6f10
Fix IPAPhonemes init_from_config
2022-02-25 10:48:03 +01:00
Eren Gölge
0fe39166fe
Discard OOV chars in tokenizer
...
Discard but store OOV chars with a warninig message
when the OOV char first recognized
2022-02-25 10:48:03 +01:00
Eren Gölge
c39aaafbfc
Update EspeakWrapper for espeak-ng
2022-02-25 10:48:03 +01:00
Eren Gölge
bb389479a4
Update setup_model for TTS.tts models
2022-02-25 10:48:03 +01:00
Eren Gölge
9b83e665fc
Add init_from_config as an abstract class
2022-02-25 10:48:03 +01:00
Eren Gölge
3eca5ad060
Update config fields for phonemizer
2022-02-25 10:48:03 +01:00
Eren Gölge
d2525abe8c
Remove get_characters from BaseTTS
2022-02-25 10:48:03 +01:00
Eren Gölge
73d27ebd45
Fix GlowTTS
2022-02-25 10:48:03 +01:00
Eren Gölge
87bf940676
Print duplicate characters
2022-02-25 10:48:03 +01:00
Eren Gölge
3de9f38d16
Add init_from_config to SpeakerManager
2022-02-25 10:48:03 +01:00
Eren Gölge
d8ec7086b6
Update `synthesis` for the new API
2022-02-25 10:48:03 +01:00
Eren Gölge
4e83bf3968
Allow choosing phonemizer
2022-02-25 10:48:02 +01:00
Eren Gölge
22f0c58fe1
Print language codes
2022-02-25 10:48:02 +01:00
Eren Gölge
693fb4dd39
Modify init_from_config for IPAPhonemes
2022-02-25 10:48:02 +01:00
Eren Gölge
acc6eef625
Update for tokenizer API
2022-02-25 10:48:02 +01:00
Eren Gölge
e1b4c4ca43
Add init_from_config to GAN
2022-02-25 10:48:02 +01:00
Eren Gölge
353f913efc
Fix #985
2022-02-25 10:48:02 +01:00
Eren Gölge
ba3b60c90f
Test TTSTokenizer
2022-02-25 10:48:02 +01:00
Eren Gölge
79a84410f2
Test punctuations
2022-02-25 10:48:02 +01:00
Eren Gölge
d8bdeb8b8f
Fix Punctuation
2022-02-25 10:48:02 +01:00
Eren Gölge
ff7c385838
Fix BasePhonemizer
2022-02-25 10:48:02 +01:00
Eren Gölge
10d435ce77
Fixup
2022-02-25 10:48:02 +01:00
Eren Gölge
f0655bfffc
Fix ja_jp_phonemizer
2022-02-25 10:48:02 +01:00
Eren Gölge
20e5dd3678
Add doc examples
2022-02-25 10:48:02 +01:00
Eren Gölge
fbad17e084
Update imports for symbols -> characters
2022-02-25 10:48:02 +01:00
Eren Gölge
a1df4f9887
Test character classes
2022-02-25 10:45:24 +01:00
Eren Gölge
bd461ace33
Refactor GlowTTS model and recipe for TTSTokenizer
2022-02-25 10:45:24 +01:00
Eren Gölge
5a9653978a
Refactor synthesis.py for TTSTokenizer
2022-02-25 10:45:24 +01:00
Eren Gölge
e5785b34b0
Style fix
2022-02-25 10:27:46 +01:00
Eren Gölge
e4049aa31a
Refactor TTSDataset to use TTSTokenizer
2022-02-25 10:27:46 +01:00
Eren Gölge
2480bbe937
Remove OLD TOKENIZATION ROUTINES
2022-02-25 09:32:54 +01:00
Eren Gölge
53f696615b
Add init_from_config to AudioProcessor
2022-02-25 09:32:54 +01:00
Eren Gölge
3d86edfc81
Refactor Synthesizer class for TTSTokenizer
2022-02-25 09:32:54 +01:00
Eren Gölge
8d85af84cd
Implement Punctuation class
2022-02-25 09:32:54 +01:00
Eren Gölge
1aca58afaf
Fix imports in cleaners.py
2022-02-25 09:32:54 +01:00
Eren Gölge
0344645e90
Implement TTSTokenizer
2022-02-25 09:32:54 +01:00
Eren Gölge
2fb1f70503
Implement BaseCharacters, IPAPhonemes, Graphemes
2022-02-25 09:32:54 +01:00
Eren Gölge
1bee40af40
Create language folders under `TTS.tts.utils.text`
2022-02-25 09:32:54 +01:00
Eren Gölge
c1119bc291
Implement BasePhonemizer
2022-02-25 09:32:54 +01:00
Eren Gölge
dcd01356e0
Create `text/english` folder
2022-02-25 09:32:54 +01:00
Eren Gölge
80867c8e8c
Implement multi-phonemizer
2022-02-25 09:32:54 +01:00
Eren Gölge
5e4f78add3
Implement espeak wrapper
2022-02-25 09:32:54 +01:00
Eren Gölge
e03a05c816
Implement gruut wrapper
2022-02-25 09:32:54 +01:00
Eren Gölge
172ba0c5e7
Implement JA_JP phonemizer
2022-02-25 09:32:54 +01:00
Eren Gölge
ca02b82218
Implement ZH_CH phonemizer
2022-02-25 09:32:54 +01:00
Eren Gölge
a51b031bff
Merge branch 'dev' into dev-fix-glowtts-infer
2022-02-21 12:01:40 +03:00
Edresson Casanova
28a7464975
Fix the bug in split dataset function ( #1251 )
...
* Fix the bug in split_dataset
* Make eval_split_size configurable
* Change test_loader to use load_tts_samples function
* Change eval_split_portion to eval_split_size and permits to set the absolute number of samples in eval
* Fix samplers unit test
* Add data unit test on GitHub workflow
2022-02-21 11:59:36 +03:00
Edresson Casanova
bc5db13d06
Fix the bug in extract tts spectrogram script
2022-02-19 19:24:00 +00:00
Edresson Casanova
ba6e56e01c
Fix Glow-TTS multi-speaker inference
2022-02-18 19:25:29 +00:00
Eren Gölge
127118c637
Update TTS.tts formatters ( #1228 )
...
* Return Dict from tts formatters
* Make style
2022-02-11 23:03:43 +01:00
Eren Gölge
5e3f499a69
Fix #1187 ( #1227 )
2022-02-11 13:27:59 +01:00
Edresson Casanova
0860d73cf8
Remove Tensorflow requeriment ( #1225 )
...
* Remove TF modules
* Remove TF unit tests
* Remove TF vocoder modules
* Remove TF convert scripts
* Remove TF requirement
* Remove the Docs TF instructions
* Remove TF inference support
2022-02-10 16:14:54 +01:00
Eren Gölge
44c7d1a826
Merge pull request #1054 from WeberJulian/partial_embedding_compute
...
Partial embedding compute
2022-02-06 20:13:55 +01:00
WeberJulian
c7f5e005e1
Compute embedding for new audios only
2022-01-06 15:41:38 +01:00
WeberJulian
e778bad626
Add argument to enable dp speaker conditioning
2022-01-06 15:07:27 +01:00
WeberJulian
e1accb6e28
Fix train_tts.py and uncomment code ( #1051 )
...
* Fix SE loading and language embedding logic
* remove trailing white space
* Uncomment resmapling code for SCL
2022-01-03 17:44:57 +01:00
Eren Gölge
58c38de58d
Bump up to v0.5.0
2022-01-03 15:04:03 +00:00
Eren Gölge
5840d89802
Keep proj_dim in speaker encoder models
2022-01-03 15:03:34 +00:00
Eren Gölge
03bcae1ba5
Merge pull request #1050 from coqui-ai/fix_synthesizer_init
...
Fix if else statement
2022-01-03 15:59:29 +01:00
Eren Gölge
fc09e319d4
Prioritize the given encoder path over config
2022-01-03 14:24:19 +00:00
Eren Gölge
7fad969a1f
Fix if else statement
2022-01-03 14:16:11 +00:00
Eren Gölge
d724984be1
Fix language assignment
2022-01-02 11:11:24 +00:00
WeberJulian
a63998c048
Fix phoneme language
2022-01-01 21:08:13 +01:00
Eren Gölge
7ef458a59c
Updake default vocoder for uk model
2022-01-01 16:09:42 +00:00
Eren Gölge
e55f5ee59e
Make linter
2022-01-01 15:50:04 +00:00
Eren Gölge
38f5a11125
Merge branch 'dev' of https://github.com/coqui-ai/TTS into dev
2022-01-01 15:38:46 +00:00
Eren Gölge
c5512af82b
Update uk vocoder url
2022-01-01 15:38:21 +00:00
Eren Gölge
d37cfe474a
Merge branch 'pr/Edresson/731-rebased' into dev
2022-01-01 15:37:35 +00:00
Eren Gölge
33711afa01
Update yourTTS url
2022-01-01 15:37:08 +00:00
Eren Gölge
8fd1ee1926
Print urls when BadZipError
2022-01-01 15:26:35 +00:00
Eren Gölge
61874bc0a0
Fix your_tts inference from the listed models
2021-12-31 13:45:05 +00:00
Eren Gölge
8100135a7e
Add the YourTTS entry to the models
2021-12-31 12:22:08 +00:00
Eren Gölge
36cef5966b
Fix resnet speaker encoder
2021-12-30 15:36:35 +00:00
Eren Gölge
348b5c96a2
Fix speaker encoder test
2021-12-30 15:36:35 +00:00
Eren Gölge
7129b04d46
Update VITS model
2021-12-30 14:08:17 +00:00
Eren Gölge
638091f41d
Update Speaker Encoder models
2021-12-30 12:02:06 +00:00
Eren Gölge
6189fdfaea
Fix Training HiFiGan -- avg loss not decreasing #1003
2021-12-30 10:48:55 +00:00
Eren Gölge
275c759993
Fix #1037
2021-12-23 15:57:10 +00:00
Eren Gölge
5c5ddd2ba7
Init speaker manager for speaker encoder
2021-12-22 15:51:53 +00:00
Eren Gölge
633dcc9c56
Implement RMS volume normalization
2021-12-22 15:51:14 +00:00
Eren Gölge
8d2bb284ac
Add UK vocoder models
2021-12-21 13:13:35 +00:00
Eren Gölge
56378b12f7
Fix speaker encoder init
2021-12-21 12:26:25 +00:00
Eren Gölge
c9c1fa0548
Fix multi-speaker init in Synthesizer
2021-12-21 09:44:07 +00:00