Merge pull request #452 from nmstoker/dev

Small changes for vocoder
This commit is contained in:
Eren Gölge 2020-07-12 11:04:35 +02:00 committed by GitHub
commit 4e68e3bb23
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 10 additions and 9 deletions

View File

@ -16,23 +16,23 @@ You can see here an example (Soon)[Colab Notebook]() training MelGAN with LJSpee
In order to train a new model, you need to collecto all your wav files under a common parent folder and give this path to `data_path` field in '''config.json'''
You need to define other relevant parameters in your ```config.json``` and then start traning with the following command from Mozilla TTS root path.
You need to define other relevant parameters in your ```config.json``` and then start traning with the following command from Mozilla TTS root path, where '0' is the Id of the GPU you wish to use.
```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --config_path path/to/config.json```
```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --config_path path/to/config.json```
Exampled config files can be found under `vocoder/configs/` folder.
You can continue a previous training by the following command.
```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --continue_path path/to/your/model/folder```
```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --continue_path path/to/your/model/folder```
You can fine-tune a pre-trained model by the following command.
```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --restore_path path/to/your/model.pth.tar```
```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --restore_path path/to/your/model.pth.tar```
Restoring a model starts a new training in a different output folder. It only restores model weights with the given checkpoint file. However, continuing a training starts from the same conditions the previous training run left off.
You can also follow your training runs on Tensorboard as you do with our TTS models.
## Acknowledgement
Thanks to @kan-bayashi for his [repository](https://github.com/kan-bayashi/ParallelWaveGAN) being the start point of our work.
Thanks to @kan-bayashi for his [repository](https://github.com/kan-bayashi/ParallelWaveGAN) being the start point of our work.

View File

@ -87,6 +87,11 @@ class GANDataset(Dataset):
audio, mel = self.cache[idx]
else:
audio = self.ap.load_wav(wavpath)
if len(audio) < self.seq_len + self.pad_short:
audio = np.pad(audio, (0, self.seq_len + self.pad_short - len(audio)), \
mode='constant', constant_values=0.0)
mel = self.ap.melspectrogram(audio)
else:
@ -99,10 +104,6 @@ class GANDataset(Dataset):
audio = self.ap.load_wav(wavpath)
mel = np.load(feat_path)
if len(audio) < self.seq_len + self.pad_short:
audio = np.pad(audio, (0, self.seq_len + self.pad_short - len(audio)), \
mode='constant', constant_values=0.0)
# correct the audio length wrt padding applied in stft
audio = np.pad(audio, (0, self.hop_len), mode="edge")
audio = audio[:mel.shape[-1] * self.hop_len]