Merge pull request #452 from nmstoker/dev

Small changes for vocoder
This commit is contained in:
Eren Gölge 2020-07-12 11:04:35 +02:00 committed by GitHub
commit 4e68e3bb23
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 10 additions and 9 deletions

View File

@ -16,23 +16,23 @@ You can see here an example (Soon)[Colab Notebook]() training MelGAN with LJSpee
In order to train a new model, you need to collecto all your wav files under a common parent folder and give this path to `data_path` field in '''config.json''' In order to train a new model, you need to collecto all your wav files under a common parent folder and give this path to `data_path` field in '''config.json'''
You need to define other relevant parameters in your ```config.json``` and then start traning with the following command from Mozilla TTS root path. You need to define other relevant parameters in your ```config.json``` and then start traning with the following command from Mozilla TTS root path, where '0' is the Id of the GPU you wish to use.
```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --config_path path/to/config.json``` ```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --config_path path/to/config.json```
Exampled config files can be found under `vocoder/configs/` folder. Exampled config files can be found under `vocoder/configs/` folder.
You can continue a previous training by the following command. You can continue a previous training by the following command.
```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --continue_path path/to/your/model/folder``` ```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --continue_path path/to/your/model/folder```
You can fine-tune a pre-trained model by the following command. You can fine-tune a pre-trained model by the following command.
```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --restore_path path/to/your/model.pth.tar``` ```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --restore_path path/to/your/model.pth.tar```
Restoring a model starts a new training in a different output folder. It only restores model weights with the given checkpoint file. However, continuing a training starts from the same conditions the previous training run left off. Restoring a model starts a new training in a different output folder. It only restores model weights with the given checkpoint file. However, continuing a training starts from the same conditions the previous training run left off.
You can also follow your training runs on Tensorboard as you do with our TTS models. You can also follow your training runs on Tensorboard as you do with our TTS models.
## Acknowledgement ## Acknowledgement
Thanks to @kan-bayashi for his [repository](https://github.com/kan-bayashi/ParallelWaveGAN) being the start point of our work. Thanks to @kan-bayashi for his [repository](https://github.com/kan-bayashi/ParallelWaveGAN) being the start point of our work.

View File

@ -87,6 +87,11 @@ class GANDataset(Dataset):
audio, mel = self.cache[idx] audio, mel = self.cache[idx]
else: else:
audio = self.ap.load_wav(wavpath) audio = self.ap.load_wav(wavpath)
if len(audio) < self.seq_len + self.pad_short:
audio = np.pad(audio, (0, self.seq_len + self.pad_short - len(audio)), \
mode='constant', constant_values=0.0)
mel = self.ap.melspectrogram(audio) mel = self.ap.melspectrogram(audio)
else: else:
@ -99,10 +104,6 @@ class GANDataset(Dataset):
audio = self.ap.load_wav(wavpath) audio = self.ap.load_wav(wavpath)
mel = np.load(feat_path) mel = np.load(feat_path)
if len(audio) < self.seq_len + self.pad_short:
audio = np.pad(audio, (0, self.seq_len + self.pad_short - len(audio)), \
mode='constant', constant_values=0.0)
# correct the audio length wrt padding applied in stft # correct the audio length wrt padding applied in stft
audio = np.pad(audio, (0, self.hop_len), mode="edge") audio = np.pad(audio, (0, self.hop_len), mode="edge")
audio = audio[:mel.shape[-1] * self.hop_len] audio = audio[:mel.shape[-1] * self.hop_len]