Merge branch 'reformat' into hifigan-reformat

This commit is contained in:
Eren Gölge 2021-04-12 12:00:17 +02:00
commit a7f6045644
65 changed files with 315 additions and 311 deletions

View File

@ -1,51 +0,0 @@
version: 2
workflows:
version: 2
test:
jobs:
- test-3.6
- test-3.7
- test-3.8
executor: ubuntu-latest
on:
push:
pull_request:
types: [opened, synchronize, reopened]
jobs:
test-3.6: &test-template
docker:
- image: circleci/python:3.6
resource_class: large
working_directory: ~/repo
steps:
- checkout
- run: |
sudo apt update
sudo apt install espeak-ng git
- run: sudo pip install --upgrade pip
- run: sudo pip install -e .
- run: |
sudo pip install --quiet --upgrade cardboardlint pylint
cardboardlinter --refspec ${CIRCLE_BRANCH} -n auto
- run: nosetests tests --nocapture --processes=0 --process-timeout=20 --process-restartworker
- run: |
sudo ./tests/test_glow-tts_train.sh
sudo ./tests/test_tacotron_train.sh
sudo ./tests/test_vocoder_gan_train.sh
sudo ./tests/test_vocoder_wavegrad_train.sh
sudo ./tests/test_vocoder_wavernn_train.sh
sudo ./tests/test_speedy_speech_train.sh
test-3.7:
<<: *test-template
docker:
- image: circleci/python:3.7
test-3.8:
<<: *test-template
docker:
- image: circleci/python:3.8

View File

@ -35,8 +35,9 @@ jobs:
- name: Install dependencies - name: Install dependencies
run: | run: |
sudo apt update sudo apt update
sudo apt install -y espeak-ng git sudo apt install -y git make
sudo apt install -y python3-wheel gcc sudo apt install -y python3-wheel gcc
make system-deps
- name: Upgrade pip - name: Upgrade pip
# so we can take advantage of pyproject.toml build-dependency support # so we can take advantage of pyproject.toml build-dependency support
run: python3 -m pip install --upgrade pip run: python3 -m pip install --upgrade pip
@ -46,17 +47,6 @@ jobs:
python3 setup.py egg_info python3 setup.py egg_info
- name: Lint check - name: Lint check
run: | run: |
cardboardlinter make lint
- name: Unit tests - name: Unit tests
run: nosetests tests --nocapture --processes=0 --process-timeout=20 --process-restartworker run: make test
- name: Test scripts
run: |
./tests/test_demo_server.sh
./tests/test_glow-tts_train.sh
./tests/test_tacotron_train.sh
./tests/test_vocoder_gan_train.sh
./tests/test_vocoder_wavegrad_train.sh
./tests/test_vocoder_wavernn_train.sh
./tests/test_speedy_speech_train.sh
./tests/test_resample.sh
./tests/test_compute_statistics.sh

12
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,12 @@
repos:
- repo: 'https://github.com/pre-commit/pre-commit-hooks'
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: 'https://github.com/psf/black'
rev: 20.8b1
hooks:
- id: black
language_version: python3

View File

@ -1,51 +1,118 @@
# Contribution guidelines # Contribution guidelines
This repository is governed by Mozilla's code of conduct and etiquette guidelines. For more details, please read the [Mozilla Community Participation Guidelines](https://www.mozilla.org/about/governance/policies/participation/). Welcome to the 🐸TTS!
Before making a Pull Request, check your changes for basic mistakes and style problems by using a linter. We have cardboardlinter setup in this repository, so for example, if you've made some changes and would like to run the linter on just the differences between your work and master, you can use the follow command: This repository is governed by the Contributor Covenant Code of Conduct - [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md).
```bash ## Where to start.
pip install pylint cardboardlint We welcome everyone who likes to contribute to 🐸TTS.
cardboardlinter --refspec master You can contribute not only with code but with bug reports, comments, questions, answers, or just a simple tweet to spread the word.
```
This will compare the code against master and run the linter on all the changes. To run it automatically as a git pre-commit hook, you can do do the following: If you like to contribute code, squash a bug but if you don't know where to start, here are some pointers.
```bash - [Github Issues Tracker](https://github.com/coqui-ai/TTS/issues)
cat <<\EOF > .git/hooks/pre-commit
#!/bin/bash
if [ ! -x "$(command -v cardboardlinter)" ]; then
exit 0
fi
# First, stash index and work dir, keeping only the This is a place to find feature requests, bugs.
# to-be-committed changes in the working directory.
echo "Stashing working tree changes..." 1>&2
old_stash=$(git rev-parse -q --verify refs/stash)
git stash save -q --keep-index
new_stash=$(git rev-parse -q --verify refs/stash)
# If there were no changes (e.g., `--amend` or `--allow-empty`) Issues with the ```good first issue``` tag are good place for beginners to take on.
# then nothing was stashed, and we should skip everything,
# including the tests themselves. (Presumably the tests passed
# on the previous commit, so there is no need to re-run them.)
if [ "$old_stash" = "$new_stash" ]; then
echo "No changes, skipping lint." 1>&2
exit 0
fi
# Run tests - ✨**PR**✨ [pages](https://github.com/coqui-ai/TTS/pulls) with the ```🚀new version``` tag.
cardboardlinter --refspec HEAD -n auto
status=$?
# Restore changes We list all the target improvements for the next version. You can pick one of them and start contributing.
echo "Restoring working tree changes..." 1>&2
git reset --hard -q && git stash apply --index -q && git stash drop -q
# Exit with status from test-run: nonzero prevents commit - Also feel free to suggest new features, ideas and models. We're always open for new things.
exit $status ## Sending a ✨**PR**✨
EOF
chmod +x .git/hooks/pre-commit
```
This will run the linters on just the changes made in your commit. If you have a new feature, a model to implement, or a bug to squash, go ahead and send a ✨**PR**✨.
Please use the following steps to send a ✨**PR**✨.
Let us know if you encounter a problem along the way.
The following steps are tested on an Ubuntu system.
1. Fork 🐸TTS[https://github.com/coqui-ai/TTS] by clicking the fork button at the top right corner of the project page.
2. Clone 🐸TTS and add the main repo as a new remote named ```upsteam```.
```bash
$ git clone git@github.com:<your Github name>/TTS.git
$ cd TTS
$ git remote add upstream https://github.com/coqui-ai/TTS.git
```
3. Install 🐸TTS for development.
```bash
$ make system-deps # intended to be used on Ubuntu (Debian). Let us know if you have a different OS.
$ make install
```
4. Create a new branch with an informative name for your goal.
```bash
$ git checkout -b an_informative_name_for_my_branch
```
5. Implement your changes on your new branch.
6. Explain your code using [Google Style](https://google.github.io/styleguide/pyguide.html#381-docstrings) docstrings.
7. Add your tests to our test suite under ```tests``` folder. It is important to show that your code works, edge cases are considered, and inform others about the intended use.
8. Run the tests to see how your updates work with the rest of the project. You can repeat this step multiple times as you implement your changes to make sure you are on the right direction.
```bash
$ make tests
```
9. Format your code. We use ```black``` for code and ```isort``` for ```import``` formatting.
```bash
$ make style
```
10. Run the linter and correct the issues raised. We use ```pylint``` for linting. It helps to enforce a coding standard, offers simple refactoring suggestions.
```bash
$ make lint
```
11. When things are good, add new files and commit your changes.
```bash
$ git add my_file1.py my_file2.py ...
$ git commit
```
It's a good practice to regularly sync your local copy of the project with the upstream code to keep up with the recent updates.
```bash
$ git fetch upstream
$ git rebase upstream/master
# or for the development version
$ git rebase upstream/dev
```
12. Send a PR to ```dev``` branch.
Push your branch to your fork.
```bash
$ git push -u origin an_informative_name_for_my_branch
```
Then go to your fork's Github page and click on 'Pull request' to send your ✨**PR**✨.
Please set ✨**PR**✨'s target branch to ```dev``` as we use ```dev``` to work on the next version.
13. Let's discuss until it is perfect. 💪
We might ask you for certain changes that would appear in the ✨**PR**✨'s page under 🐸TTS[https://github.com/coqui-ai/TTS/pulls].
14. Once things look perfect, We merge it to the ```dev``` branch and make it ready for the next version.
Feel free to ping us at any step you need help using our communication channels.
If you are new to Github or open-source contribution, These are good resources.
- [Github Docs](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/proposing-changes-to-your-work-with-pull-requests)
- [First-Contribution](https://github.com/firstcontributions/first-contributions)

View File

@ -40,6 +40,7 @@ Please use our dedicated channels for questions and discussion. Help is much mor
| Type | Links | | Type | Links |
| ------------------------------- | --------------------------------------- | | ------------------------------- | --------------------------------------- |
| 💾 **Installation** | [TTS/README.md](https://github.com/coqui-ai/TTS/tree/dev#install-tts)| | 💾 **Installation** | [TTS/README.md](https://github.com/coqui-ai/TTS/tree/dev#install-tts)|
| 👩‍💻 **Contributing** | [CONTRIBUTING.md](https://github.com/coqui-ai/TTS/blob/main/CONTRIBUTING.md)
| 👩🏾‍🏫 **Tutorials and Examples** | [TTS/Wiki](https://github.com/coqui-ai/TTS/wiki/%F0%9F%90%B8-TTS-Notebooks,-Examples-and-Tutorials) | | 👩🏾‍🏫 **Tutorials and Examples** | [TTS/Wiki](https://github.com/coqui-ai/TTS/wiki/%F0%9F%90%B8-TTS-Notebooks,-Examples-and-Tutorials) |
| 🚀 **Released Models** | [TTS Releases](https://github.com/coqui-ai/TTS/releases) and [Experimental Models](https://github.com/coqui-ai/TTS/wiki/Experimental-Released-Models)| | 🚀 **Released Models** | [TTS Releases](https://github.com/coqui-ai/TTS/releases) and [Experimental Models](https://github.com/coqui-ai/TTS/wiki/Experimental-Released-Models)|
| 💻 **Docker Image** | [Repository by @synesthesiam](https://github.com/synesthesiam/docker-coqui-aitts)| | 💻 **Docker Image** | [Repository by @synesthesiam](https://github.com/synesthesiam/docker-coqui-aitts)|
@ -114,10 +115,19 @@ pip install -e .
``` ```
We use ```espeak-ng``` to convert graphemes to phonemes. You might need to install separately. We use ```espeak-ng``` to convert graphemes to phonemes. You might need to install separately.
```bash ```bash
sudo apt-get install espeak-ng sudo apt-get install espeak-ng
``` ```
If you are on Ubuntu (Debian), you can also run following commands for installation.
```bash
$ make system-deps # intended to be used on Ubuntu (Debian). Let us know if you have a diffent OS.
$ make install
```
If you are on Windows, 👑@GuyPaddock wrote installation instructions [here](https://stackoverflow.com/questions/66726331/how-can-i-run-mozilla-tts-coqui-tts-training-with-cuda-on-a-windows-system).
## Directory Structure ## Directory Structure
``` ```
|- notebooks/ (Jupyter Notebooks for model evaluation, parameter selection and data analysis.) |- notebooks/ (Jupyter Notebooks for model evaluation, parameter selection and data analysis.)
@ -167,11 +177,13 @@ Some of the public datasets that we successfully applied 🐸TTS:
After the installation, 🐸TTS provides a CLI interface for synthesizing speech using pre-trained models. You can either use your own model or the release models under 🐸TTS. After the installation, 🐸TTS provides a CLI interface for synthesizing speech using pre-trained models. You can either use your own model or the release models under 🐸TTS.
Listing released 🐸TTS models. Listing released 🐸TTS models.
```bash ```bash
tts --list_models tts --list_models
``` ```
Run a tts and a vocoder model from the released model list. (Simply copy and paste the full model names from the list as arguments for the command below.) Run a tts and a vocoder model from the released model list. (Simply copy and paste the full model names from the list as arguments for the command below.)
```bash ```bash
tts --text "Text for TTS" \ tts --text "Text for TTS" \
--model_name "<type>/<language>/<dataset>/<model_name>" \ --model_name "<type>/<language>/<dataset>/<model_name>" \
@ -180,6 +192,7 @@ tts --text "Text for TTS" \
``` ```
Run your own TTS model (Using Griffin-Lim Vocoder) Run your own TTS model (Using Griffin-Lim Vocoder)
```bash ```bash
tts --text "Text for TTS" \ tts --text "Text for TTS" \
--model_path path/to/model.pth.tar \ --model_path path/to/model.pth.tar \
@ -188,6 +201,7 @@ tts --text "Text for TTS" \
``` ```
Run your own TTS and Vocoder models Run your own TTS and Vocoder models
```bash ```bash
tts --text "Text for TTS" \ tts --text "Text for TTS" \
--model_path path/to/config.json \ --model_path path/to/config.json \
@ -242,29 +256,7 @@ In case of any error or intercepted execution, if there is no checkpoint yet und
You can also enjoy Tensorboard, if you point Tensorboard argument```--logdir``` to the experiment folder. You can also enjoy Tensorboard, if you point Tensorboard argument```--logdir``` to the experiment folder.
## Contribution guidelines ## [Contribution guidelines](https://github.com/coqui-ai/TTS/blob/main/CONTRIBUTING.md)
Please follow the steps below as you send a PR to 🐸. It helps us to keep things organized.
1. Create a new branch.
2. Implement your changes.
3. (if applicable) Add [Google Style](https://google.github.io/styleguide/pyguide.html#381-docstrings) docstrings.
4. (if applicable) Implement a test case under ```tests``` folder.
5. (Optional but Prefered) Run tests.
```bash
./run_tests.sh
```
6. Run the ```pylint``` linter.
```bash
pip install pylint cardboardlint
cardboardlinter --refspec master
```
7. Send a PR to ```dev``` branch, explain what the change is about.
8. Let us discuss until we make it perfect :) 💪.
9. We merge it to the ```dev``` branch once things look good.
Feel free to ping us at any step you need help using our communication channels.
[Here](https://github.com/firstcontributions/first-contributions) is a good resource for complete beginners.
### Acknowledgement ### Acknowledgement
- https://github.com/keithito/tacotron (Dataset pre-processing) - https://github.com/keithito/tacotron (Dataset pre-processing)
- https://github.com/r9y9/tacotron_pytorch (Initial Tacotron architecture) - https://github.com/r9y9/tacotron_pytorch (Initial Tacotron architecture)

View File

@ -16,7 +16,7 @@ from TTS.utils.audio import AudioProcessor
from TTS.utils.io import load_config from TTS.utils.io import load_config
if __name__ == "__main__": if __name__ == "__main__":
# pylint: disable=bad-continuation # pylint: disable=bad-option-value
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="""Extract attention masks from trained Tacotron/Tacotron2 models. description="""Extract attention masks from trained Tacotron/Tacotron2 models.
These masks can be used for different purposes including training a TTS model with a Duration Predictor.\n\n""" These masks can be used for different purposes including training a TTS model with a Duration Predictor.\n\n"""

View File

@ -7,7 +7,7 @@ from TTS.tts.datasets.preprocess import get_preprocessor_by_name
def main(): def main():
# pylint: disable=bad-continuation # pylint: disable=bad-option-value
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="""Find all the unique characters or phonemes in a dataset.\n\n""" description="""Find all the unique characters or phonemes in a dataset.\n\n"""
"""Target dataset must be defined in TTS.tts.datasets.preprocess\n\n""" """Target dataset must be defined in TTS.tts.datasets.preprocess\n\n"""

View File

@ -23,7 +23,7 @@ def str2bool(v):
def main(): def main():
# pylint: disable=bad-continuation # pylint: disable=bad-option-value
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="""Synthesize speech on command line.\n\n""" description="""Synthesize speech on command line.\n\n"""
"""You can either use your trained model or choose a model from the provided list.\n\n""" """You can either use your trained model or choose a model from the provided list.\n\n"""

View File

@ -24,7 +24,10 @@ from TTS.vocoder.utils.io import save_best_model, save_checkpoint
# from torch.utils.data.distributed import DistributedSampler # from torch.utils.data.distributed import DistributedSampler
<<<<<<< HEAD
=======
>>>>>>> reformat
use_cuda, num_gpus = setup_torch_training_env(True, True) use_cuda, num_gpus = setup_torch_training_env(True, True)

View File

@ -16,7 +16,7 @@ class GE2ELoss(nn.Module):
- init_w (float): defines the initial value of w in Equation (5) of [1] - init_w (float): defines the initial value of w in Equation (5) of [1]
- init_b (float): definies the initial value of b in Equation (5) of [1] - init_b (float): definies the initial value of b in Equation (5) of [1]
""" """
super(GE2ELoss, self).__init__() super().__init__()
# pylint: disable=E1102 # pylint: disable=E1102
self.w = nn.Parameter(torch.tensor(init_w)) self.w = nn.Parameter(torch.tensor(init_w))
# pylint: disable=E1102 # pylint: disable=E1102
@ -129,7 +129,7 @@ class AngleProtoLoss(nn.Module):
""" """
def __init__(self, init_w=10.0, init_b=-5.0): def __init__(self, init_w=10.0, init_b=-5.0):
super(AngleProtoLoss, self).__init__() super().__init__()
# pylint: disable=E1102 # pylint: disable=E1102
self.w = nn.Parameter(torch.tensor(init_w)) self.w = nn.Parameter(torch.tensor(init_w))
# pylint: disable=E1102 # pylint: disable=E1102

View File

@ -54,7 +54,7 @@ class MyDataset(Dataset):
use_noise_augment (bool): enable adding random noise to wav for augmentation. use_noise_augment (bool): enable adding random noise to wav for augmentation.
verbose (bool): print diagnostic information. verbose (bool): print diagnostic information.
""" """
super(MyDataset, self).__init__() super().__init__()
self.batch_group_size = batch_group_size self.batch_group_size = batch_group_size
self.items = meta_data self.items = meta_data
self.outputs_per_step = outputs_per_step self.outputs_per_step = outputs_per_step

View File

@ -136,18 +136,18 @@ class Encoder(nn.Module):
# init encoder # init encoder
if encoder_type.lower() == "relative_position_transformer": if encoder_type.lower() == "relative_position_transformer":
# text encoder # text encoder
# pylint: disable=unexpected-keyword-arg
self.encoder = RelativePositionTransformerEncoder( self.encoder = RelativePositionTransformerEncoder(
in_hidden_channels, out_channels, in_hidden_channels, encoder_params in_hidden_channels, out_channels, in_hidden_channels, encoder_params
) # pylint: disable=unexpected-keyword-arg )
elif encoder_type.lower() == "residual_conv_bn": elif encoder_type.lower() == "residual_conv_bn":
self.encoder = ResidualConv1dBNEncoder(in_hidden_channels, out_channels, in_hidden_channels, encoder_params) self.encoder = ResidualConv1dBNEncoder(in_hidden_channels, out_channels, in_hidden_channels, encoder_params)
elif encoder_type.lower() == "fftransformer": elif encoder_type.lower() == "fftransformer":
assert ( assert (
in_hidden_channels == out_channels in_hidden_channels == out_channels
), "[!] must be `in_channels` == `out_channels` when encoder type is 'fftransformer'" ), "[!] must be `in_channels` == `out_channels` when encoder type is 'fftransformer'"
self.encoder = FFTransformerBlock( # pylint: disable=unexpected-keyword-arg
in_hidden_channels, **encoder_params self.encoder = FFTransformerBlock(in_hidden_channels, **encoder_params)
) # pylint: disable=unexpected-keyword-arg
else: else:
raise NotImplementedError(" [!] unknown encoder type.") raise NotImplementedError(" [!] unknown encoder type.")

View File

@ -52,7 +52,7 @@ class L1LossMasked(nn.Module):
class MSELossMasked(nn.Module): class MSELossMasked(nn.Module):
def __init__(self, seq_len_norm): def __init__(self, seq_len_norm):
super(MSELossMasked, self).__init__() super().__init__()
self.seq_len_norm = seq_len_norm self.seq_len_norm = seq_len_norm
def forward(self, x, target, length): def forward(self, x, target, length):
@ -132,7 +132,7 @@ class AttentionEntropyLoss(nn.Module):
class BCELossMasked(nn.Module): class BCELossMasked(nn.Module):
def __init__(self, pos_weight): def __init__(self, pos_weight):
super(BCELossMasked, self).__init__() super().__init__()
self.pos_weight = pos_weight self.pos_weight = pos_weight
def forward(self, x, target, length): def forward(self, x, target, length):
@ -193,7 +193,7 @@ class DifferentailSpectralLoss(nn.Module):
class GuidedAttentionLoss(torch.nn.Module): class GuidedAttentionLoss(torch.nn.Module):
def __init__(self, sigma=0.4): def __init__(self, sigma=0.4):
super(GuidedAttentionLoss, self).__init__() super().__init__()
self.sigma = sigma self.sigma = sigma
def _make_ga_masks(self, ilens, olens): def _make_ga_masks(self, ilens, olens):
@ -247,7 +247,7 @@ class TacotronLoss(torch.nn.Module):
"""Collection of Tacotron set-up based on provided config.""" """Collection of Tacotron set-up based on provided config."""
def __init__(self, c, stopnet_pos_weight=10, ga_sigma=0.4): def __init__(self, c, stopnet_pos_weight=10, ga_sigma=0.4):
super(TacotronLoss, self).__init__() super().__init__()
self.stopnet_pos_weight = stopnet_pos_weight self.stopnet_pos_weight = stopnet_pos_weight
self.ga_alpha = c.ga_alpha self.ga_alpha = c.ga_alpha
self.decoder_diff_spec_alpha = c.decoder_diff_spec_alpha self.decoder_diff_spec_alpha = c.decoder_diff_spec_alpha

View File

@ -16,7 +16,7 @@ class LocationLayer(nn.Module):
""" """
def __init__(self, attention_dim, attention_n_filters=32, attention_kernel_size=31): def __init__(self, attention_dim, attention_n_filters=32, attention_kernel_size=31):
super(LocationLayer, self).__init__() super().__init__()
self.location_conv1d = nn.Conv1d( self.location_conv1d = nn.Conv1d(
in_channels=2, in_channels=2,
out_channels=attention_n_filters, out_channels=attention_n_filters,
@ -51,7 +51,7 @@ class GravesAttention(nn.Module):
def __init__(self, query_dim, K): def __init__(self, query_dim, K):
super(GravesAttention, self).__init__() super().__init__()
self._mask_value = 1e-8 self._mask_value = 1e-8
self.K = K self.K = K
# self.attention_alignment = 0.05 # self.attention_alignment = 0.05
@ -178,7 +178,7 @@ class OriginalAttention(nn.Module):
trans_agent, trans_agent,
forward_attn_mask, forward_attn_mask,
): ):
super(OriginalAttention, self).__init__() super().__init__()
self.query_layer = Linear(query_dim, attention_dim, bias=False, init_gain="tanh") self.query_layer = Linear(query_dim, attention_dim, bias=False, init_gain="tanh")
self.inputs_layer = Linear(embedding_dim, attention_dim, bias=False, init_gain="tanh") self.inputs_layer = Linear(embedding_dim, attention_dim, bias=False, init_gain="tanh")
self.v = Linear(attention_dim, 1, bias=True) self.v = Linear(attention_dim, 1, bias=True)

View File

@ -14,7 +14,7 @@ class Linear(nn.Module):
""" """
def __init__(self, in_features, out_features, bias=True, init_gain="linear"): def __init__(self, in_features, out_features, bias=True, init_gain="linear"):
super(Linear, self).__init__() super().__init__()
self.linear_layer = torch.nn.Linear(in_features, out_features, bias=bias) self.linear_layer = torch.nn.Linear(in_features, out_features, bias=bias)
self._init_w(init_gain) self._init_w(init_gain)
@ -38,7 +38,7 @@ class LinearBN(nn.Module):
""" """
def __init__(self, in_features, out_features, bias=True, init_gain="linear"): def __init__(self, in_features, out_features, bias=True, init_gain="linear"):
super(LinearBN, self).__init__() super().__init__()
self.linear_layer = torch.nn.Linear(in_features, out_features, bias=bias) self.linear_layer = torch.nn.Linear(in_features, out_features, bias=bias)
self.batch_normalization = nn.BatchNorm1d(out_features, momentum=0.1, eps=1e-5) self.batch_normalization = nn.BatchNorm1d(out_features, momentum=0.1, eps=1e-5)
self._init_w(init_gain) self._init_w(init_gain)
@ -87,7 +87,7 @@ class Prenet(nn.Module):
# pylint: disable=dangerous-default-value # pylint: disable=dangerous-default-value
def __init__(self, in_features, prenet_type="original", prenet_dropout=True, out_features=[256, 256], bias=True): def __init__(self, in_features, prenet_type="original", prenet_dropout=True, out_features=[256, 256], bias=True):
super(Prenet, self).__init__() super().__init__()
self.prenet_type = prenet_type self.prenet_type = prenet_type
self.prenet_dropout = prenet_dropout self.prenet_dropout = prenet_dropout
in_features = [in_features] + out_features[:-1] in_features = [in_features] + out_features[:-1]

View File

@ -26,7 +26,7 @@ class BatchNormConv1d(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride, padding, activation=None): def __init__(self, in_channels, out_channels, kernel_size, stride, padding, activation=None):
super(BatchNormConv1d, self).__init__() super().__init__()
self.padding = padding self.padding = padding
self.padder = nn.ConstantPad1d(padding, 0) self.padder = nn.ConstantPad1d(padding, 0)
self.conv1d = nn.Conv1d( self.conv1d = nn.Conv1d(
@ -71,7 +71,7 @@ class Highway(nn.Module):
# TODO: Try GLU layer # TODO: Try GLU layer
def __init__(self, in_features, out_feature): def __init__(self, in_features, out_feature):
super(Highway, self).__init__() super().__init__()
self.H = nn.Linear(in_features, out_feature) self.H = nn.Linear(in_features, out_feature)
self.H.bias.data.zero_() self.H.bias.data.zero_()
self.T = nn.Linear(in_features, out_feature) self.T = nn.Linear(in_features, out_feature)
@ -118,7 +118,7 @@ class CBHG(nn.Module):
gru_features=128, gru_features=128,
num_highways=4, num_highways=4,
): ):
super(CBHG, self).__init__() super().__init__()
self.in_features = in_features self.in_features = in_features
self.conv_bank_features = conv_bank_features self.conv_bank_features = conv_bank_features
self.highway_features = highway_features self.highway_features = highway_features
@ -191,7 +191,7 @@ class EncoderCBHG(nn.Module):
r"""CBHG module with Encoder specific arguments""" r"""CBHG module with Encoder specific arguments"""
def __init__(self): def __init__(self):
super(EncoderCBHG, self).__init__() super().__init__()
self.cbhg = CBHG( self.cbhg = CBHG(
128, 128,
K=16, K=16,
@ -217,7 +217,7 @@ class Encoder(nn.Module):
""" """
def __init__(self, in_features): def __init__(self, in_features):
super(Encoder, self).__init__() super().__init__()
self.prenet = Prenet(in_features, out_features=[256, 128]) self.prenet = Prenet(in_features, out_features=[256, 128])
self.cbhg = EncoderCBHG() self.cbhg = EncoderCBHG()
@ -230,7 +230,7 @@ class Encoder(nn.Module):
class PostCBHG(nn.Module): class PostCBHG(nn.Module):
def __init__(self, mel_dim): def __init__(self, mel_dim):
super(PostCBHG, self).__init__() super().__init__()
self.cbhg = CBHG( self.cbhg = CBHG(
mel_dim, mel_dim,
K=8, K=8,
@ -290,7 +290,7 @@ class Decoder(nn.Module):
attn_K, attn_K,
separate_stopnet, separate_stopnet,
): ):
super(Decoder, self).__init__() super().__init__()
self.r_init = r self.r_init = r
self.r = r self.r = r
self.in_channels = in_channels self.in_channels = in_channels
@ -491,7 +491,7 @@ class StopNet(nn.Module):
""" """
def __init__(self, in_features): def __init__(self, in_features):
super(StopNet, self).__init__() super().__init__()
self.dropout = nn.Dropout(0.1) self.dropout = nn.Dropout(0.1)
self.linear = nn.Linear(in_features, 1) self.linear = nn.Linear(in_features, 1)
torch.nn.init.xavier_uniform_(self.linear.weight, gain=torch.nn.init.calculate_gain("linear")) torch.nn.init.xavier_uniform_(self.linear.weight, gain=torch.nn.init.calculate_gain("linear"))

View File

@ -24,7 +24,7 @@ class ConvBNBlock(nn.Module):
""" """
def __init__(self, in_channels, out_channels, kernel_size, activation=None): def __init__(self, in_channels, out_channels, kernel_size, activation=None):
super(ConvBNBlock, self).__init__() super().__init__()
assert (kernel_size - 1) % 2 == 0 assert (kernel_size - 1) % 2 == 0
padding = (kernel_size - 1) // 2 padding = (kernel_size - 1) // 2
self.convolution1d = nn.Conv1d(in_channels, out_channels, kernel_size, padding=padding) self.convolution1d = nn.Conv1d(in_channels, out_channels, kernel_size, padding=padding)
@ -57,7 +57,7 @@ class Postnet(nn.Module):
""" """
def __init__(self, in_out_channels, num_convs=5): def __init__(self, in_out_channels, num_convs=5):
super(Postnet, self).__init__() super().__init__()
self.convolutions = nn.ModuleList() self.convolutions = nn.ModuleList()
self.convolutions.append(ConvBNBlock(in_out_channels, 512, kernel_size=5, activation="tanh")) self.convolutions.append(ConvBNBlock(in_out_channels, 512, kernel_size=5, activation="tanh"))
for _ in range(1, num_convs - 1): for _ in range(1, num_convs - 1):
@ -83,7 +83,7 @@ class Encoder(nn.Module):
""" """
def __init__(self, in_out_channels=512): def __init__(self, in_out_channels=512):
super(Encoder, self).__init__() super().__init__()
self.convolutions = nn.ModuleList() self.convolutions = nn.ModuleList()
for _ in range(3): for _ in range(3):
self.convolutions.append(ConvBNBlock(in_out_channels, in_out_channels, 5, "relu")) self.convolutions.append(ConvBNBlock(in_out_channels, in_out_channels, 5, "relu"))
@ -156,7 +156,7 @@ class Decoder(nn.Module):
attn_K, attn_K,
separate_stopnet, separate_stopnet,
): ):
super(Decoder, self).__init__() super().__init__()
self.frame_channels = frame_channels self.frame_channels = frame_channels
self.r_init = r self.r_init = r
self.r = r self.r = r

View File

@ -79,7 +79,7 @@ class Tacotron(TacotronAbstract):
memory_size=5, memory_size=5,
gst_use_speaker_embedding=False, gst_use_speaker_embedding=False,
): ):
super(Tacotron, self).__init__( super().__init__(
num_chars, num_chars,
num_speakers, num_speakers,
r, r,

View File

@ -11,7 +11,7 @@ from tensorflow.python.ops import math_ops
class Linear(keras.layers.Layer): class Linear(keras.layers.Layer):
def __init__(self, units, use_bias, **kwargs): def __init__(self, units, use_bias, **kwargs):
super(Linear, self).__init__(**kwargs) super().__init__(**kwargs)
self.linear_layer = keras.layers.Dense(units, use_bias=use_bias, name="linear_layer") self.linear_layer = keras.layers.Dense(units, use_bias=use_bias, name="linear_layer")
self.activation = keras.layers.ReLU() self.activation = keras.layers.ReLU()
@ -25,7 +25,7 @@ class Linear(keras.layers.Layer):
class LinearBN(keras.layers.Layer): class LinearBN(keras.layers.Layer):
def __init__(self, units, use_bias, **kwargs): def __init__(self, units, use_bias, **kwargs):
super(LinearBN, self).__init__(**kwargs) super().__init__(**kwargs)
self.linear_layer = keras.layers.Dense(units, use_bias=use_bias, name="linear_layer") self.linear_layer = keras.layers.Dense(units, use_bias=use_bias, name="linear_layer")
self.batch_normalization = keras.layers.BatchNormalization( self.batch_normalization = keras.layers.BatchNormalization(
axis=-1, momentum=0.90, epsilon=1e-5, name="batch_normalization" axis=-1, momentum=0.90, epsilon=1e-5, name="batch_normalization"
@ -44,7 +44,7 @@ class LinearBN(keras.layers.Layer):
class Prenet(keras.layers.Layer): class Prenet(keras.layers.Layer):
def __init__(self, prenet_type, prenet_dropout, units, bias, **kwargs): def __init__(self, prenet_type, prenet_dropout, units, bias, **kwargs):
super(Prenet, self).__init__(**kwargs) super().__init__(**kwargs)
self.prenet_type = prenet_type self.prenet_type = prenet_type
self.prenet_dropout = prenet_dropout self.prenet_dropout = prenet_dropout
self.linear_layers = [] self.linear_layers = []
@ -98,7 +98,7 @@ class Attention(keras.layers.Layer):
use_forward_attn_mask, use_forward_attn_mask,
**kwargs, **kwargs,
): ):
super(Attention, self).__init__(**kwargs) super().__init__(**kwargs)
self.use_loc_attn = use_loc_attn self.use_loc_attn = use_loc_attn
self.loc_attn_n_filters = loc_attn_n_filters self.loc_attn_n_filters = loc_attn_n_filters
self.loc_attn_kernel_size = loc_attn_kernel_size self.loc_attn_kernel_size = loc_attn_kernel_size
@ -244,8 +244,7 @@ class Attention(keras.layers.Layer):
# location_attention_filters=32, # location_attention_filters=32,
# location_attention_kernel_size=31): # location_attention_kernel_size=31):
# super(LocationSensitiveAttention, # super( self).__init__(units=units,
# self).__init__(units=units,
# memory=memory, # memory=memory,
# memory_sequence_length=memory_sequence_length, # memory_sequence_length=memory_sequence_length,
# normalize=normalize, # normalize=normalize,

View File

@ -10,7 +10,7 @@ from TTS.tts.tf.utils.tf_utils import shape_list
# pylint: disable=unexpected-keyword-arg # pylint: disable=unexpected-keyword-arg
class ConvBNBlock(keras.layers.Layer): class ConvBNBlock(keras.layers.Layer):
def __init__(self, filters, kernel_size, activation, **kwargs): def __init__(self, filters, kernel_size, activation, **kwargs):
super(ConvBNBlock, self).__init__(**kwargs) super().__init__(**kwargs)
self.convolution1d = keras.layers.Conv1D(filters, kernel_size, padding="same", name="convolution1d") self.convolution1d = keras.layers.Conv1D(filters, kernel_size, padding="same", name="convolution1d")
self.batch_normalization = keras.layers.BatchNormalization( self.batch_normalization = keras.layers.BatchNormalization(
axis=2, momentum=0.90, epsilon=1e-5, name="batch_normalization" axis=2, momentum=0.90, epsilon=1e-5, name="batch_normalization"
@ -28,7 +28,7 @@ class ConvBNBlock(keras.layers.Layer):
class Postnet(keras.layers.Layer): class Postnet(keras.layers.Layer):
def __init__(self, output_filters, num_convs, **kwargs): def __init__(self, output_filters, num_convs, **kwargs):
super(Postnet, self).__init__(**kwargs) super().__init__(**kwargs)
self.convolutions = [] self.convolutions = []
self.convolutions.append(ConvBNBlock(512, 5, "tanh", name="convolutions_0")) self.convolutions.append(ConvBNBlock(512, 5, "tanh", name="convolutions_0"))
for idx in range(1, num_convs - 1): for idx in range(1, num_convs - 1):
@ -44,7 +44,7 @@ class Postnet(keras.layers.Layer):
class Encoder(keras.layers.Layer): class Encoder(keras.layers.Layer):
def __init__(self, output_input_dim, **kwargs): def __init__(self, output_input_dim, **kwargs):
super(Encoder, self).__init__(**kwargs) super().__init__(**kwargs)
self.convolutions = [] self.convolutions = []
for idx in range(3): for idx in range(3):
self.convolutions.append(ConvBNBlock(output_input_dim, 5, "relu", name=f"convolutions_{idx}")) self.convolutions.append(ConvBNBlock(output_input_dim, 5, "relu", name=f"convolutions_{idx}"))
@ -81,7 +81,7 @@ class Decoder(keras.layers.Layer):
enable_tflite, enable_tflite,
**kwargs, **kwargs,
): ):
super(Decoder, self).__init__(**kwargs) super().__init__(**kwargs)
self.frame_dim = frame_dim self.frame_dim = frame_dim
self.r_init = tf.constant(r, dtype=tf.int32) self.r_init = tf.constant(r, dtype=tf.int32)
self.r = tf.constant(r, dtype=tf.int32) self.r = tf.constant(r, dtype=tf.int32)

View File

@ -28,7 +28,11 @@ class Tacotron2(keras.models.Model):
bidirectional_decoder=False, bidirectional_decoder=False,
enable_tflite=False, enable_tflite=False,
): ):
<<<<<<< HEAD
super(Tacotron2, self).__init__() super(Tacotron2, self).__init__()
=======
super().__init__()
>>>>>>> reformat
self.r = r self.r = r
self.decoder_output_dim = decoder_output_dim self.decoder_output_dim = decoder_output_dim
self.postnet_output_dim = postnet_output_dim self.postnet_output_dim = postnet_output_dim

View File

@ -57,7 +57,7 @@ def parse_speakers(c, args, meta_data_train, OUT_PATH):
prev_out_path = os.path.dirname(args.restore_path) prev_out_path = os.path.dirname(args.restore_path)
speaker_mapping = load_speaker_mapping(prev_out_path) speaker_mapping = load_speaker_mapping(prev_out_path)
speaker_embedding_dim = None speaker_embedding_dim = None
assert all([speaker in speaker_mapping for speaker in speakers]), ( assert all(speaker in speaker_mapping for speaker in speakers), (
"As of now you, you cannot " "introduce new speakers to " "a previously trained model." "As of now you, you cannot " "introduce new speakers to " "a previously trained model."
) )
elif ( elif (

View File

@ -1,17 +1,17 @@
import os import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" import numpy as np
import pkg_resources import pkg_resources
import torch
from .text import phoneme_to_sequence, text_to_sequence
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
installed = {pkg.key for pkg in pkg_resources.working_set} # pylint: disable=not-an-iterable installed = {pkg.key for pkg in pkg_resources.working_set} # pylint: disable=not-an-iterable
if "tensorflow" in installed or "tensorflow-gpu" in installed: if "tensorflow" in installed or "tensorflow-gpu" in installed:
import tensorflow as tf import tensorflow as tf
import numpy as np
import torch
from .text import phoneme_to_sequence, text_to_sequence
def text_to_seqvec(text, CONFIG): def text_to_seqvec(text, CONFIG):
text_cleaner = [CONFIG.text_cleaner] text_cleaner = [CONFIG.text_cleaner]

View File

@ -1,13 +1,13 @@
import librosa import librosa
import matplotlib import matplotlib
import matplotlib.pyplot as plt
import numpy as np import numpy as np
import torch import torch
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from TTS.tts.utils.text import phoneme_to_sequence, sequence_to_phoneme from TTS.tts.utils.text import phoneme_to_sequence, sequence_to_phoneme
matplotlib.use("Agg")
def plot_alignment(alignment, info=None, fig_size=(16, 10), title=None, output_fig=False): def plot_alignment(alignment, info=None, fig_size=(16, 10), title=None, output_fig=False):
if isinstance(alignment, torch.Tensor): if isinstance(alignment, torch.Tensor):

View File

@ -179,8 +179,8 @@ class AudioProcessor(object):
S_norm = ((2 * self.max_norm) * S_norm) - self.max_norm S_norm = ((2 * self.max_norm) * S_norm) - self.max_norm
if self.clip_norm: if self.clip_norm:
S_norm = np.clip( S_norm = np.clip(
S_norm, -self.max_norm, self.max_norm S_norm, -self.max_norm, self.max_norm # pylint: disable=invalid-unary-operand-type
) # pylint: disable=invalid-unary-operand-type )
return S_norm return S_norm
else: else:
S_norm = self.max_norm * S_norm S_norm = self.max_norm * S_norm
@ -206,8 +206,8 @@ class AudioProcessor(object):
if self.symmetric_norm: if self.symmetric_norm:
if self.clip_norm: if self.clip_norm:
S_denorm = np.clip( S_denorm = np.clip(
S_denorm, -self.max_norm, self.max_norm S_denorm, -self.max_norm, self.max_norm # pylint: disable=invalid-unary-operand-type
) # pylint: disable=invalid-unary-operand-type )
S_denorm = ((S_denorm + self.max_norm) * -self.min_level_db / (2 * self.max_norm)) + self.min_level_db S_denorm = ((S_denorm + self.max_norm) * -self.min_level_db / (2 * self.max_norm)) + self.min_level_db
return S_denorm + self.ref_level_db return S_denorm + self.ref_level_db
else: else:
@ -326,13 +326,7 @@ class AudioProcessor(object):
) )
def _istft(self, y): def _istft(self, y):
return librosa.istft( return librosa.istft(y, hop_length=self.hop_length, win_length=self.win_length)
y,
hop_length=self.hop_length,
win_length=self.win_length,
window="hann",
center=True,
)
def _griffin_lim(self, S): def _griffin_lim(self, S):
angles = np.exp(2j * np.pi * np.random.rand(*S.shape)) angles = np.exp(2j * np.pi * np.random.rand(*S.shape))

View File

@ -14,7 +14,7 @@ class DistributedSampler(Sampler):
""" """
def __init__(self, dataset, num_replicas=None, rank=None): def __init__(self, dataset, num_replicas=None, rank=None):
super(DistributedSampler, self).__init__(dataset) super().__init__(dataset)
if num_replicas is None: if num_replicas is None:
if not dist.is_available(): if not dist.is_available():
raise RuntimeError("Requires distributed package to be available") raise RuntimeError("Requires distributed package to be available")

View File

@ -19,7 +19,7 @@ class AttrDict(dict):
to class attributes""" to class attributes"""
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs) super().__init__(*args, **kwargs)
self.__dict__ = self self.__dict__ = self

View File

@ -25,10 +25,10 @@ class RAdam(Optimizer):
defaults = dict( defaults = dict(
lr=lr, betas=betas, eps=eps, weight_decay=weight_decay, buffer=[[None, None, None] for _ in range(10)] lr=lr, betas=betas, eps=eps, weight_decay=weight_decay, buffer=[[None, None, None] for _ in range(10)]
) )
super(RAdam, self).__init__(params, defaults) super().__init__(params, defaults)
def __setstate__(self, state): # pylint: disable=useless-super-delegation def __setstate__(self, state): # pylint: disable=useless-super-delegation
super(RAdam, self).__setstate__(state) super().__setstate__(state)
def step(self, closure=None): def step(self, closure=None):

View File

@ -74,7 +74,7 @@ def set_weight_decay(model, weight_decay, skip_list={"decoder.attention.v", "rnn
if not param.requires_grad: if not param.requires_grad:
continue continue
if len(param.shape) == 1 or any([skip_name in name for skip_name in skip_list]): if len(param.shape) == 1 or any((skip_name in name for skip_name in skip_list)):
no_decay.append(param) no_decay.append(param)
else: else:
decay.append(param) decay.append(param)
@ -85,7 +85,7 @@ def set_weight_decay(model, weight_decay, skip_list={"decoder.attention.v", "rnn
class NoamLR(torch.optim.lr_scheduler._LRScheduler): class NoamLR(torch.optim.lr_scheduler._LRScheduler):
def __init__(self, optimizer, warmup_steps=0.1, last_epoch=-1): def __init__(self, optimizer, warmup_steps=0.1, last_epoch=-1):
self.warmup_steps = float(warmup_steps) self.warmup_steps = float(warmup_steps)
super(NoamLR, self).__init__(optimizer, last_epoch) super().__init__(optimizer, last_epoch)
def get_lr(self): def get_lr(self):
step = max(self.last_epoch, 1) step = max(self.last_epoch, 1)

View File

@ -30,7 +30,7 @@ class GANDataset(Dataset):
use_cache=False, use_cache=False,
verbose=False, verbose=False,
): ):
super(GANDataset, self).__init__() super().__init__()
self.ap = ap self.ap = ap
self.item_list = items self.item_list = items
self.compute_feat = not isinstance(items[0], (tuple, list)) self.compute_feat = not isinstance(items[0], (tuple, list))

View File

@ -22,7 +22,7 @@ class WaveRNNDataset(Dataset):
verbose=False, verbose=False,
): ):
super(WaveRNNDataset, self).__init__() super().__init__()
self.ap = ap self.ap = ap
self.compute_feat = not isinstance(items[0], (tuple, list)) self.compute_feat = not isinstance(items[0], (tuple, list))
self.item_list = items self.item_list = items

View File

@ -20,8 +20,7 @@ class TorchSTFT(nn.Module): # pylint: disable=abstract-method
n_mels=80, n_mels=80,
use_mel=False, use_mel=False,
): ):
""" Torch based STFT operation """ super().__init__()
super(TorchSTFT, self).__init__()
self.n_fft = n_fft self.n_fft = n_fft
self.hop_length = hop_length self.hop_length = hop_length
self.win_length = win_length self.win_length = win_length
@ -91,7 +90,7 @@ class STFTLoss(nn.Module):
It is from ParallelWaveGAN paper https://arxiv.org/pdf/1910.11480.pdf""" It is from ParallelWaveGAN paper https://arxiv.org/pdf/1910.11480.pdf"""
def __init__(self, n_fft, hop_length, win_length): def __init__(self, n_fft, hop_length, win_length):
super(STFTLoss, self).__init__() super().__init__()
self.n_fft = n_fft self.n_fft = n_fft
self.hop_length = hop_length self.hop_length = hop_length
self.win_length = win_length self.win_length = win_length
@ -113,7 +112,7 @@ class MultiScaleSTFTLoss(torch.nn.Module):
It is from ParallelWaveGAN paper https://arxiv.org/pdf/1910.11480.pdf""" It is from ParallelWaveGAN paper https://arxiv.org/pdf/1910.11480.pdf"""
def __init__(self, n_ffts=(1024, 2048, 512), hop_lengths=(120, 240, 50), win_lengths=(600, 1200, 240)): def __init__(self, n_ffts=(1024, 2048, 512), hop_lengths=(120, 240, 50), win_lengths=(600, 1200, 240)):
super(MultiScaleSTFTLoss, self).__init__() super().__init__()
self.loss_funcs = torch.nn.ModuleList() self.loss_funcs = torch.nn.ModuleList()
for n_fft, hop_length, win_length in zip(n_ffts, hop_lengths, win_lengths): for n_fft, hop_length, win_length in zip(n_ffts, hop_lengths, win_lengths):
self.loss_funcs.append(STFTLoss(n_fft, hop_length, win_length)) self.loss_funcs.append(STFTLoss(n_fft, hop_length, win_length))
@ -199,7 +198,7 @@ class MSEDLoss(nn.Module):
def __init__( def __init__(
self, self,
): ):
super(MSEDLoss, self).__init__() super().__init__()
self.loss_func = nn.MSELoss() self.loss_func = nn.MSELoss()
# pylint: disable=no-self-use # pylint: disable=no-self-use
@ -225,7 +224,7 @@ class MelganFeatureLoss(nn.Module):
def __init__( def __init__(
self, self,
): ):
super(MelganFeatureLoss, self).__init__() super().__init__()
self.loss_func = nn.L1Loss() self.loss_func = nn.L1Loss()
# pylint: disable=no-self-use # pylint: disable=no-self-use

View File

@ -4,7 +4,7 @@ from torch.nn.utils import weight_norm
class ResidualStack(nn.Module): class ResidualStack(nn.Module):
def __init__(self, channels, num_res_blocks, kernel_size): def __init__(self, channels, num_res_blocks, kernel_size):
super(ResidualStack, self).__init__() super().__init__()
assert (kernel_size - 1) % 2 == 0, " [!] kernel_size has to be odd." assert (kernel_size - 1) % 2 == 0, " [!] kernel_size has to be odd."
base_padding = (kernel_size - 1) // 2 base_padding = (kernel_size - 1) // 2

View File

@ -17,7 +17,7 @@ class ResidualBlock(torch.nn.Module):
bias=True, bias=True,
use_causal_conv=False, use_causal_conv=False,
): ):
super(ResidualBlock, self).__init__() super().__init__()
self.dropout = dropout self.dropout = dropout
# no future time stamps available # no future time stamps available
if use_causal_conv: if use_causal_conv:

View File

@ -8,7 +8,7 @@ from scipy import signal as sig
# https://github.com/kan-bayashi/ParallelWaveGAN/tree/master/parallel_wavegan # https://github.com/kan-bayashi/ParallelWaveGAN/tree/master/parallel_wavegan
class PQMF(torch.nn.Module): class PQMF(torch.nn.Module):
def __init__(self, N=4, taps=62, cutoff=0.15, beta=9.0): def __init__(self, N=4, taps=62, cutoff=0.15, beta=9.0):
super(PQMF, self).__init__() super().__init__()
self.N = N self.N = N
self.taps = taps self.taps = taps

View File

@ -4,7 +4,7 @@ from torch.nn import functional as F
class Stretch2d(torch.nn.Module): class Stretch2d(torch.nn.Module):
def __init__(self, x_scale, y_scale, mode="nearest"): def __init__(self, x_scale, y_scale, mode="nearest"):
super(Stretch2d, self).__init__() super().__init__()
self.x_scale = x_scale self.x_scale = x_scale
self.y_scale = y_scale self.y_scale = y_scale
self.mode = mode self.mode = mode
@ -28,7 +28,7 @@ class UpsampleNetwork(torch.nn.Module):
freq_axis_kernel_size=1, freq_axis_kernel_size=1,
use_causal_conv=False, use_causal_conv=False,
): ):
super(UpsampleNetwork, self).__init__() super().__init__()
self.use_causal_conv = use_causal_conv self.use_causal_conv = use_causal_conv
self.up_layers = torch.nn.ModuleList() self.up_layers = torch.nn.ModuleList()
for scale in upsample_factors: for scale in upsample_factors:
@ -76,7 +76,7 @@ class ConvUpsample(torch.nn.Module):
aux_context_window=0, aux_context_window=0,
use_causal_conv=False, use_causal_conv=False,
): ):
super(ConvUpsample, self).__init__() super().__init__()
self.aux_context_window = aux_context_window self.aux_context_window = aux_context_window
self.use_causal_conv = use_causal_conv and aux_context_window > 0 self.use_causal_conv = use_causal_conv and aux_context_window > 0
# To capture wide-context information in conditional features # To capture wide-context information in conditional features

View File

@ -14,7 +14,7 @@ class MelganDiscriminator(nn.Module):
downsample_factors=(4, 4, 4, 4), downsample_factors=(4, 4, 4, 4),
groups_denominator=4, groups_denominator=4,
): ):
super(MelganDiscriminator, self).__init__() super().__init__()
self.layers = nn.ModuleList() self.layers = nn.ModuleList()
layer_kernel_size = np.prod(kernel_sizes) layer_kernel_size = np.prod(kernel_sizes)

View File

@ -16,7 +16,7 @@ class MelganGenerator(nn.Module):
res_kernel=3, res_kernel=3,
num_res_blocks=3, num_res_blocks=3,
): ):
super(MelganGenerator, self).__init__() super().__init__()
# assert model parameters # assert model parameters
assert (proj_kernel - 1) % 2 == 0, " [!] proj_kernel should be an odd number." assert (proj_kernel - 1) % 2 == 0, " [!] proj_kernel should be an odd number."

View File

@ -18,7 +18,7 @@ class MelganMultiscaleDiscriminator(nn.Module):
pooling_padding=2, pooling_padding=2,
groups_denominator=4, groups_denominator=4,
): ):
super(MelganMultiscaleDiscriminator, self).__init__() super().__init__()
self.discriminators = nn.ModuleList( self.discriminators = nn.ModuleList(
[ [

View File

@ -15,7 +15,7 @@ class MultibandMelganGenerator(MelganGenerator):
res_kernel=3, res_kernel=3,
num_res_blocks=3, num_res_blocks=3,
): ):
super(MultibandMelganGenerator, self).__init__( super().__init__(
in_channels=in_channels, in_channels=in_channels,
out_channels=out_channels, out_channels=out_channels,
proj_kernel=proj_kernel, proj_kernel=proj_kernel,

View File

@ -26,7 +26,7 @@ class ParallelWaveganDiscriminator(nn.Module):
nonlinear_activation_params={"negative_slope": 0.2}, nonlinear_activation_params={"negative_slope": 0.2},
bias=True, bias=True,
): ):
super(ParallelWaveganDiscriminator, self).__init__() super().__init__()
assert (kernel_size - 1) % 2 == 0, " [!] does not support even number kernel size." assert (kernel_size - 1) % 2 == 0, " [!] does not support even number kernel size."
assert dilation_factor > 0, " [!] dilation factor must be > 0." assert dilation_factor > 0, " [!] dilation factor must be > 0."
self.conv_layers = nn.ModuleList() self.conv_layers = nn.ModuleList()
@ -100,7 +100,7 @@ class ResidualParallelWaveganDiscriminator(nn.Module):
nonlinear_activation="LeakyReLU", nonlinear_activation="LeakyReLU",
nonlinear_activation_params={"negative_slope": 0.2}, nonlinear_activation_params={"negative_slope": 0.2},
): ):
super(ResidualParallelWaveganDiscriminator, self).__init__() super().__init__()
assert (kernel_size - 1) % 2 == 0, "Not support even number kernel size." assert (kernel_size - 1) % 2 == 0, "Not support even number kernel size."
self.in_channels = in_channels self.in_channels = in_channels

View File

@ -33,7 +33,7 @@ class ParallelWaveganGenerator(torch.nn.Module):
inference_padding=2, inference_padding=2,
): ):
super(ParallelWaveganGenerator, self).__init__() super().__init__()
self.in_channels = in_channels self.in_channels = in_channels
self.out_channels = out_channels self.out_channels = out_channels
self.aux_channels = aux_channels self.aux_channels = aux_channels

View File

@ -4,7 +4,7 @@ from torch import nn
class GBlock(nn.Module): class GBlock(nn.Module):
def __init__(self, in_channels, cond_channels, downsample_factor): def __init__(self, in_channels, cond_channels, downsample_factor):
super(GBlock, self).__init__() super().__init__()
self.in_channels = in_channels self.in_channels = in_channels
self.cond_channels = cond_channels self.cond_channels = cond_channels
@ -35,7 +35,7 @@ class GBlock(nn.Module):
class DBlock(nn.Module): class DBlock(nn.Module):
def __init__(self, in_channels, out_channels, downsample_factor): def __init__(self, in_channels, out_channels, downsample_factor):
super(DBlock, self).__init__() super().__init__()
self.in_channels = in_channels self.in_channels = in_channels
self.downsample_factor = downsample_factor self.downsample_factor = downsample_factor
@ -62,7 +62,7 @@ class DBlock(nn.Module):
class ConditionalDiscriminator(nn.Module): class ConditionalDiscriminator(nn.Module):
def __init__(self, in_channels, cond_channels, downsample_factors=(2, 2, 2), out_channels=(128, 256)): def __init__(self, in_channels, cond_channels, downsample_factors=(2, 2, 2), out_channels=(128, 256)):
super(ConditionalDiscriminator, self).__init__() super().__init__()
assert len(downsample_factors) == len(out_channels) + 1 assert len(downsample_factors) == len(out_channels) + 1
@ -106,7 +106,7 @@ class ConditionalDiscriminator(nn.Module):
class UnconditionalDiscriminator(nn.Module): class UnconditionalDiscriminator(nn.Module):
def __init__(self, in_channels, base_channels=64, downsample_factors=(8, 4), out_channels=(128, 256)): def __init__(self, in_channels, base_channels=64, downsample_factors=(8, 4), out_channels=(128, 256)):
super(UnconditionalDiscriminator, self).__init__() super().__init__()
self.downsample_factors = downsample_factors self.downsample_factors = downsample_factors
self.in_channels = in_channels self.in_channels = in_channels
@ -148,7 +148,7 @@ class RandomWindowDiscriminator(nn.Module):
window_sizes=(512, 1024, 2048, 4096, 8192), window_sizes=(512, 1024, 2048, 4096, 8192),
): ):
super(RandomWindowDiscriminator, self).__init__() super().__init__()
self.cond_channels = cond_channels self.cond_channels = cond_channels
self.window_sizes = window_sizes self.window_sizes = window_sizes
self.hop_length = hop_length self.hop_length = hop_length

View File

@ -3,7 +3,7 @@ import tensorflow as tf
class ReflectionPad1d(tf.keras.layers.Layer): class ReflectionPad1d(tf.keras.layers.Layer):
def __init__(self, padding): def __init__(self, padding):
super(ReflectionPad1d, self).__init__() super().__init__()
self.padding = padding self.padding = padding
def call(self, x): def call(self, x):
@ -12,7 +12,7 @@ class ReflectionPad1d(tf.keras.layers.Layer):
class ResidualStack(tf.keras.layers.Layer): class ResidualStack(tf.keras.layers.Layer):
def __init__(self, channels, num_res_blocks, kernel_size, name): def __init__(self, channels, num_res_blocks, kernel_size, name):
super(ResidualStack, self).__init__(name=name) super().__init__(name=name)
assert (kernel_size - 1) % 2 == 0, " [!] kernel_size has to be odd." assert (kernel_size - 1) % 2 == 0, " [!] kernel_size has to be odd."
base_padding = (kernel_size - 1) // 2 base_padding = (kernel_size - 1) // 2

View File

@ -5,7 +5,7 @@ from scipy import signal as sig
class PQMF(tf.keras.layers.Layer): class PQMF(tf.keras.layers.Layer):
def __init__(self, N=4, taps=62, cutoff=0.15, beta=9.0): def __init__(self, N=4, taps=62, cutoff=0.15, beta=9.0):
super(PQMF, self).__init__() super().__init__()
# define filter coefficient # define filter coefficient
self.N = N self.N = N
self.taps = taps self.taps = taps

View File

@ -1,13 +1,14 @@
import logging import logging
import os import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" # FATAL
logging.getLogger("tensorflow").setLevel(logging.FATAL)
import tensorflow as tf import tensorflow as tf
from TTS.vocoder.tf.layers.melgan import ReflectionPad1d, ResidualStack from TTS.vocoder.tf.layers.melgan import ReflectionPad1d, ResidualStack
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" # FATAL
logging.getLogger("tensorflow").setLevel(logging.FATAL)
from TTS.vocoder.tf.layers.melgan import ReflectionPad1d, ResidualStack
# pylint: disable=too-many-ancestors # pylint: disable=too-many-ancestors
# pylint: disable=abstract-method # pylint: disable=abstract-method
@ -25,7 +26,7 @@ class MelganGenerator(tf.keras.models.Model):
res_kernel=3, res_kernel=3,
num_res_blocks=3, num_res_blocks=3,
): ):
super(MelganGenerator, self).__init__() super().__init__()
self.in_channels = in_channels self.in_channels = in_channels

View File

@ -17,7 +17,7 @@ class MultibandMelganGenerator(MelganGenerator):
res_kernel=3, res_kernel=3,
num_res_blocks=3, num_res_blocks=3,
): ):
super(MultibandMelganGenerator, self).__init__( super().__init__(
in_channels=in_channels, in_channels=in_channels,
out_channels=out_channels, out_channels=out_channels,
proj_kernel=proj_kernel, proj_kernel=proj_kernel,

View File

@ -1,8 +1,8 @@
dependencies = ['torch', 'gdown', 'pysbd', 'phonemizer', 'unidecode', 'pypinyin'] # apt install espeak-ng dependencies = ['torch', 'gdown', 'pysbd', 'phonemizer', 'unidecode', 'pypinyin'] # apt install espeak-ng
import torch import torch
from TTS.utils.synthesizer import Synthesizer
from TTS.utils.manage import ModelManager from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer
def tts(model_name='tts_models/en/ljspeech/tacotron2-DCA', vocoder_name=None, use_cuda=False): def tts(model_name='tts_models/en/ljspeech/tacotron2-DCA', vocoder_name=None, use_cuda=False):

View File

@ -16,12 +16,14 @@ tqdm
inflect inflect
bokeh==1.4.0 bokeh==1.4.0
pysbd pysbd
# pyworld
soundfile soundfile
nose==1.3.7
cardboardlint==1.3.0
pylint==2.5.3
gdown gdown
umap-learn==0.4.6 umap-learn==0.4.6
cython cython
pyyaml pyyaml
# quality and style
nose
coverage
black
isort
pylint==2.7.4

View File

@ -1,21 +0,0 @@
torch>=1.5
tensorflow==2.3.1
numpy>=1.16.0
scipy>=0.19.0
numba==0.48
librosa==0.7.2
phonemizer>=2.2.0
unidecode==0.4.20
attrdict
tensorboardX
matplotlib
Pillow
flask
tqdm
inflect
pysbd
bokeh==1.4.0
soundfile
nose==1.3.7
cardboardlint==1.3.0
cython

14
run_bash_tests.sh Executable file
View File

@ -0,0 +1,14 @@
set -e
TF_CPP_MIN_LOG_LEVEL=3
# runtime bash based tests
./tests/bash_tests/test_demo_server.sh && \
./tests/bash_tests/test_resample.sh && \
./tests/bash_tests/test_tacotron_train.sh && \
./tests/bash_tests/test_glow-tts_train.sh && \
./tests/bash_tests/test_vocoder_gan_train.sh && \
./tests/bash_tests/test_vocoder_wavernn_train.sh && \
./tests/bash_tests/test_vocoder_wavegrad_train.sh && \
./tests/bash_tests/test_speedy_speech_train.sh && \
./tests/bash_tests/test_aligntts_train.sh && \
./tests/bash_tests/test_compute_statistics.sh

View File

@ -8,9 +8,8 @@ from distutils.version import LooseVersion
import numpy import numpy
import setuptools.command.build_py import setuptools.command.build_py
import setuptools.command.develop import setuptools.command.develop
from setuptools import setup, Extension, find_packages
from Cython.Build import cythonize from Cython.Build import cythonize
from setuptools import Extension, find_packages, setup
if LooseVersion(sys.version) < LooseVersion("3.6") or LooseVersion(sys.version) > LooseVersion("3.9"): if LooseVersion(sys.version) < LooseVersion("3.6") or LooseVersion(sys.version) > LooseVersion("3.9"):
raise RuntimeError( raise RuntimeError(

View File

@ -3,11 +3,11 @@ set -xe
BASEDIR=$(dirname "$0") BASEDIR=$(dirname "$0")
echo "$BASEDIR" echo "$BASEDIR"
# run training # run training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_align_tts.py --config_path $BASEDIR/inputs/test_align_tts.json CUDA_VISIBLE_DEVICES="" python TTS/bin/train_align_tts.py --config_path $BASEDIR/../inputs/test_align_tts.json
# find the training folder # find the training folder
LATEST_FOLDER=$(ls $BASEDIR/train_outputs/| sort | tail -1) LATEST_FOLDER=$(ls $BASEDIR/../train_outputs/| sort | tail -1)
echo $LATEST_FOLDER echo $LATEST_FOLDER
# continue the previous training # continue the previous training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_align_tts.py --continue_path $BASEDIR/train_outputs/$LATEST_FOLDER CUDA_VISIBLE_DEVICES="" python TTS/bin/train_align_tts.py --continue_path $BASEDIR/../train_outputs/$LATEST_FOLDER
# remove all the outputs # remove all the outputs
rm -rf $BASEDIR/train_outputs/ rm -rf $BASEDIR/../train_outputs/

View File

@ -3,5 +3,5 @@ set -xe
BASEDIR=$(dirname "$0") BASEDIR=$(dirname "$0")
echo "$BASEDIR" echo "$BASEDIR"
# run training # run training
CUDA_VISIBLE_DEVICES="" python TTS/bin/compute_statistics.py --config_path $BASEDIR/inputs/test_glow_tts.json --out_path $BASEDIR/outputs/scale_stats.npy CUDA_VISIBLE_DEVICES="" python TTS/bin/compute_statistics.py --config_path $BASEDIR/../inputs/test_glow_tts.json --out_path $BASEDIR/../outputs/scale_stats.npy

View File

@ -3,11 +3,11 @@ set -xe
BASEDIR=$(dirname "$0") BASEDIR=$(dirname "$0")
echo "$BASEDIR" echo "$BASEDIR"
# run training # run training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_glow_tts.py --config_path $BASEDIR/inputs/test_glow_tts.json CUDA_VISIBLE_DEVICES="" python TTS/bin/train_glow_tts.py --config_path $BASEDIR/../inputs/test_glow_tts.json
# find the training folder # find the training folder
LATEST_FOLDER=$(ls $BASEDIR/train_outputs/| sort | tail -1) LATEST_FOLDER=$(ls $BASEDIR/../train_outputs/| sort | tail -1)
echo $LATEST_FOLDER echo $LATEST_FOLDER
# continue the previous training # continue the previous training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_glow_tts.py --continue_path $BASEDIR/train_outputs/$LATEST_FOLDER CUDA_VISIBLE_DEVICES="" python TTS/bin/train_glow_tts.py --continue_path $BASEDIR/../train_outputs/$LATEST_FOLDER
# remove all the outputs # remove all the outputs
rm -rf $BASEDIR/train_outputs/ rm -rf $BASEDIR/../train_outputs/

View File

@ -4,7 +4,7 @@ BASEDIR=$(dirname "$0")
TARGET_SR=16000 TARGET_SR=16000
echo "$BASEDIR" echo "$BASEDIR"
#run the resample script #run the resample script
python TTS/bin/resample.py --input_dir $BASEDIR/data/ljspeech --output_dir $BASEDIR/outputs/resample_tests --output_sr $TARGET_SR python TTS/bin/resample.py --input_dir $BASEDIR/../data/ljspeech --output_dir $BASEDIR/outputs/resample_tests --output_sr $TARGET_SR
#check samplerate of output #check samplerate of output
OUT_SR=$( (echo "import librosa" ; echo "y, sr = librosa.load('"$BASEDIR"/outputs/resample_tests/wavs/LJ001-0012.wav', sr=None)" ; echo "print(sr)") | python ) OUT_SR=$( (echo "import librosa" ; echo "y, sr = librosa.load('"$BASEDIR"/outputs/resample_tests/wavs/LJ001-0012.wav', sr=None)" ; echo "print(sr)") | python )
OUT_SR=$(($OUT_SR + 0)) OUT_SR=$(($OUT_SR + 0))

View File

@ -3,11 +3,11 @@ set -xe
BASEDIR=$(dirname "$0") BASEDIR=$(dirname "$0")
echo "$BASEDIR" echo "$BASEDIR"
# run training # run training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_speedy_speech.py --config_path $BASEDIR/inputs/test_speedy_speech.json CUDA_VISIBLE_DEVICES="" python TTS/bin/train_speedy_speech.py --config_path $BASEDIR/../inputs/test_speedy_speech.json
# find the training folder # find the training folder
LATEST_FOLDER=$(ls $BASEDIR/train_outputs/| sort | tail -1) LATEST_FOLDER=$(ls $BASEDIR/../train_outputs/| sort | tail -1)
echo $LATEST_FOLDER echo $LATEST_FOLDER
# continue the previous training # continue the previous training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_speedy_speech.py --continue_path $BASEDIR/train_outputs/$LATEST_FOLDER CUDA_VISIBLE_DEVICES="" python TTS/bin/train_speedy_speech.py --continue_path $BASEDIR/../train_outputs/$LATEST_FOLDER
# remove all the outputs # remove all the outputs
rm -rf $BASEDIR/train_outputs/ rm -rf $BASEDIR/../train_outputs/

View File

@ -4,33 +4,33 @@ BASEDIR=$(dirname "$0")
echo "$BASEDIR" echo "$BASEDIR"
# run training # run training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --config_path $BASEDIR/inputs/test_tacotron_config.json CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --config_path $BASEDIR/../inputs/test_tacotron_config.json
# find the training folder # find the training folder
LATEST_FOLDER=$(ls $BASEDIR/train_outputs/| sort | tail -1) LATEST_FOLDER=$(ls $BASEDIR/../train_outputs/| sort | tail -1)
echo $LATEST_FOLDER echo $LATEST_FOLDER
# continue the previous training # continue the previous training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --continue_path $BASEDIR/train_outputs/$LATEST_FOLDER CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --continue_path $BASEDIR/../train_outputs/$LATEST_FOLDER
# remove all the outputs # remove all the outputs
rm -rf $BASEDIR/train_outputs/ rm -rf $BASEDIR/../train_outputs/
# run Tacotron bi-directional decoder # run Tacotron bi-directional decoder
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --config_path $BASEDIR/inputs/test_tacotron_bd_config.json CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --config_path $BASEDIR/../inputs/test_tacotron_bd_config.json
# find the training folder # find the training folder
LATEST_FOLDER=$(ls $BASEDIR/train_outputs/| sort | tail -1) LATEST_FOLDER=$(ls $BASEDIR/../train_outputs/| sort | tail -1)
echo $LATEST_FOLDER echo $LATEST_FOLDER
# continue the previous training # continue the previous training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --continue_path $BASEDIR/train_outputs/$LATEST_FOLDER CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --continue_path $BASEDIR/../train_outputs/$LATEST_FOLDER
# remove all the outputs # remove all the outputs
rm -rf $BASEDIR/train_outputs/ rm -rf $BASEDIR/../train_outputs/
# Tacotron2 # Tacotron2
# run training # run training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --config_path $BASEDIR/inputs/test_tacotron2_config.json CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --config_path $BASEDIR/../inputs/test_tacotron2_config.json
# find the training folder # find the training folder
LATEST_FOLDER=$(ls $BASEDIR/train_outputs/| sort | tail -1) LATEST_FOLDER=$(ls $BASEDIR/../train_outputs/| sort | tail -1)
echo $LATEST_FOLDER echo $LATEST_FOLDER
# continue the previous training # continue the previous training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --continue_path $BASEDIR/train_outputs/$LATEST_FOLDER CUDA_VISIBLE_DEVICES="" python TTS/bin/train_tacotron.py --continue_path $BASEDIR/../train_outputs/$LATEST_FOLDER
# remove all the outputs # remove all the outputs
rm -rf $BASEDIR/train_outputs/ rm -rf $BASEDIR/../train_outputs/

View File

@ -3,13 +3,13 @@ set -xe
BASEDIR=$(dirname "$0") BASEDIR=$(dirname "$0")
echo "$BASEDIR" echo "$BASEDIR"
# create run dir # create run dir
mkdir $BASEDIR/train_outputs mkdir $BASEDIR/../train_outputs
# run training # run training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_gan.py --config_path $BASEDIR/inputs/test_vocoder_multiband_melgan_config.json CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_gan.py --config_path $BASEDIR/../inputs/test_vocoder_multiband_melgan_config.json
# find the training folder # find the training folder
LATEST_FOLDER=$(ls $BASEDIR/train_outputs/| sort | tail -1) LATEST_FOLDER=$(ls $BASEDIR/../train_outputs/| sort | tail -1)
echo $LATEST_FOLDER echo $LATEST_FOLDER
# continue the previous training # continue the previous training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_gan.py --continue_path $BASEDIR/train_outputs/$LATEST_FOLDER CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_gan.py --continue_path $BASEDIR/../train_outputs/$LATEST_FOLDER
# remove all the outputs # remove all the outputs
rm -rf $BASEDIR/train_outputs/$LATEST_FOLDER rm -rf $BASEDIR/../train_outputs/$LATEST_FOLDER

View File

@ -3,13 +3,13 @@ set -xe
BASEDIR=$(dirname "$0") BASEDIR=$(dirname "$0")
echo "$BASEDIR" echo "$BASEDIR"
# create run dir # create run dir
mkdir -p $BASEDIR/train_outputs mkdir -p $BASEDIR/../train_outputs
# run training # run training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_wavegrad.py --config_path $BASEDIR/inputs/test_vocoder_wavegrad.json CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_wavegrad.py --config_path $BASEDIR/../inputs/test_vocoder_wavegrad.json
# find the training folder # find the training folder
LATEST_FOLDER=$(ls $BASEDIR/train_outputs/| sort | tail -1) LATEST_FOLDER=$(ls $BASEDIR/../train_outputs/| sort | tail -1)
echo $LATEST_FOLDER echo $LATEST_FOLDER
# continue the previous training # continue the previous training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_wavegrad.py --continue_path $BASEDIR/train_outputs/$LATEST_FOLDER CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_wavegrad.py --continue_path $BASEDIR/../train_outputs/$LATEST_FOLDER
# remove all the outputs # remove all the outputs
rm -rf $BASEDIR/train_outputs/$LATEST_FOLDER rm -rf $BASEDIR/../train_outputs/$LATEST_FOLDER

View File

@ -3,13 +3,13 @@ set -xe
BASEDIR=$(dirname "$0") BASEDIR=$(dirname "$0")
echo "$BASEDIR" echo "$BASEDIR"
# create run dir # create run dir
mkdir -p $BASEDIR/train_outputs mkdir -p $BASEDIR/../train_outputs
# run training # run training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_wavernn.py --config_path $BASEDIR/inputs/test_vocoder_wavernn_config.json CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_wavernn.py --config_path $BASEDIR/../inputs/test_vocoder_wavernn_config.json
# find the training folder # find the training folder
LATEST_FOLDER=$(ls $BASEDIR/train_outputs/| sort | tail -1) LATEST_FOLDER=$(ls $BASEDIR/../train_outputs/| sort | tail -1)
echo $LATEST_FOLDER echo $LATEST_FOLDER
# continue the previous training # continue the previous training
CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_wavernn.py --continue_path $BASEDIR/train_outputs/$LATEST_FOLDER CUDA_VISIBLE_DEVICES="" python TTS/bin/train_vocoder_wavernn.py --continue_path $BASEDIR/../train_outputs/$LATEST_FOLDER
# remove all the outputs # remove all the outputs
rm -rf $BASEDIR/train_outputs/$LATEST_FOLDER rm -rf $BASEDIR/../train_outputs/$LATEST_FOLDER

View File

@ -65,7 +65,7 @@
// MODEL PARAMETERS // MODEL PARAMETERS
"positional_encoding": true, "positional_encoding": true,
"hidden_channels": 256, "hidden_channels": 256,
"hidden_channels_dp": 128, "hidden_channels_dp": 256,
"encoder_type": "fftransformer", "encoder_type": "fftransformer",
"encoder_params":{ "encoder_params":{
"hidden_channels_ffn": 1024 , "hidden_channels_ffn": 1024 ,
@ -87,7 +87,7 @@
"eval_batch_size":1, "eval_batch_size":1,
"r": 1, // Number of decoder frames to predict per iteration. Set the initial values if gradual training is enabled. "r": 1, // Number of decoder frames to predict per iteration. Set the initial values if gradual training is enabled.
"loss_masking": true, // enable / disable loss masking against the sequence padding. "loss_masking": true, // enable / disable loss masking against the sequence padding.
"phase_start_steps": [0, 40000, 80000, 160000, 170000], "phase_start_steps": null,
// LOSS PARAMETERS // LOSS PARAMETERS

View File

@ -16,7 +16,7 @@ conf = load_config(os.path.join(get_tests_input_path(), "test_config.json"))
# pylint: disable=protected-access # pylint: disable=protected-access
class TestAudio(unittest.TestCase): class TestAudio(unittest.TestCase):
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
super(TestAudio, self).__init__(*args, **kwargs) super().__init__(*args, **kwargs)
self.ap = AudioProcessor(**conf.audio) self.ap = AudioProcessor(**conf.audio)
def test_audio_synthesis(self): def test_audio_synthesis(self):

View File

@ -28,7 +28,7 @@ print(" > Dynamic data loader test: {}".format(DATA_EXIST))
class TestTTSDataset(unittest.TestCase): class TestTTSDataset(unittest.TestCase):
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
super(TestTTSDataset, self).__init__(*args, **kwargs) super().__init__(*args, **kwargs)
self.max_loader_iter = 4 self.max_loader_iter = 4
self.ap = AudioProcessor(**c.audio) self.ap = AudioProcessor(**c.audio)