Merge pull request #452 from nmstoker/dev

Small changes for vocoder
2020-07-12 11:04:35 +02:00 · 2020-07-12 11:04:35 +02:00 · 4e68e3bb23
parent a03ae8e99a 3d9e2faba8
commit 4e68e3bb23
2 changed files with 10 additions and 9 deletions
--- a/vocoder/README.md
+++ b/vocoder/README.md
@ -16,23 +16,23 @@ You can see here an example (Soon)[Colab Notebook]() training MelGAN with LJSpee

 In order to train a new model, you need to collecto all your wav files under a common parent folder and give this path to `data_path` field in '''config.json'''

-You need to define other relevant parameters in your ```config.json``` and then start traning with the following command from Mozilla TTS root path.
+You need to define other relevant parameters in your ```config.json``` and then start traning with the following command from Mozilla TTS root path, where '0' is the Id of the GPU you wish to use.

-```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --config_path path/to/config.json```
+```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --config_path path/to/config.json```

 Exampled config files can be found under `vocoder/configs/` folder.

 You can continue a previous training by the following command.

-```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --continue_path path/to/your/model/folder```
+```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --continue_path path/to/your/model/folder```

 You can fine-tune a pre-trained model by the following command.

-```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --restore_path path/to/your/model.pth.tar```
+```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --restore_path path/to/your/model.pth.tar```

 Restoring a model starts a new training in a different output folder. It only restores model weights with the given checkpoint file. However, continuing a training starts from the same conditions the previous training run left off.

 You can also follow your training runs on Tensorboard as you do with our TTS models.

 ## Acknowledgement
-Thanks to @kan-bayashi for his [repository](https://github.com/kan-bayashi/ParallelWaveGAN) being the start point of our work.
+Thanks to @kan-bayashi for his [repository](https://github.com/kan-bayashi/ParallelWaveGAN) being the start point of our work.
--- a/vocoder/datasets/gan_dataset.py
+++ b/vocoder/datasets/gan_dataset.py
@ -87,6 +87,11 @@ class GANDataset(Dataset):
                audio, mel = self.cache[idx]
            else:
                audio = self.ap.load_wav(wavpath)
+
+                if len(audio) < self.seq_len + self.pad_short:
+                    audio = np.pad(audio, (0, self.seq_len + self.pad_short - len(audio)), \
+                            mode='constant', constant_values=0.0)
+
                mel = self.ap.melspectrogram(audio)
        else:

@ -99,10 +104,6 @@ class GANDataset(Dataset):
                audio = self.ap.load_wav(wavpath)
                mel = np.load(feat_path)

-        if len(audio) < self.seq_len + self.pad_short:
-            audio = np.pad(audio, (0, self.seq_len + self.pad_short - len(audio)), \
-                    mode='constant', constant_values=0.0)
-
        # correct the audio length wrt padding applied in stft
        audio = np.pad(audio, (0, self.hop_len), mode="edge")
        audio = audio[:mel.shape[-1] * self.hop_len]