Merge pull request #452 from nmstoker/dev

Small changes for vocoder
2020-07-12 11:04:35 +02:00 · 2020-07-12 11:04:35 +02:00 · 4e68e3bb23
parent a03ae8e99a 3d9e2faba8
commit 4e68e3bb23
2 changed files with 10 additions and 9 deletions
--- a/vocoder/README.md
+++ b/vocoder/README.md
@ -16,23 +16,23 @@ You can see here an example (Soon)[Colab Notebook]() training MelGAN with LJSpee
 In order to train a new model, you need to collecto all your wav files under a common parent folder and give this path to `data_path` field in '''config.json'''
-You need to define other relevant parameters in your ```config.json``` and then start traning with the following command from Mozilla TTS root path.
+You need to define other relevant parameters in your ```config.json``` and then start traning with the following command from Mozilla TTS root path, where '0' is the Id of the GPU you wish to use.
-```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --config_path path/to/config.json```
+```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --config_path path/to/config.json```
 Exampled config files can be found under `vocoder/configs/` folder.
 You can continue a previous training by the following command.
-```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --continue_path path/to/your/model/folder```
+```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --continue_path path/to/your/model/folder```
 You can fine-tune a pre-trained model by the following command.
-```CUDA_VISIBLE_DEVICES='1' python vocoder/train.py --restore_path path/to/your/model.pth.tar```
+```CUDA_VISIBLE_DEVICES='0' python vocoder/train.py --restore_path path/to/your/model.pth.tar```
 Restoring a model starts a new training in a different output folder. It only restores model weights with the given checkpoint file. However, continuing a training starts from the same conditions the previous training run left off.
 You can also follow your training runs on Tensorboard as you do with our TTS models.
 ## Acknowledgement
-Thanks to @kan-bayashi for his [repository](https://github.com/kan-bayashi/ParallelWaveGAN) being the start point of our work.
+Thanks to @kan-bayashi for his [repository](https://github.com/kan-bayashi/ParallelWaveGAN) being the start point of our work.
--- a/vocoder/datasets/gan_dataset.py
+++ b/vocoder/datasets/gan_dataset.py
@ -87,6 +87,11 @@ class GANDataset(Dataset):
                audio, mel = self.cache[idx]
            else:
                audio = self.ap.load_wav(wavpath)
                if len(audio) < self.seq_len + self.pad_short:
                    audio = np.pad(audio, (0, self.seq_len + self.pad_short - len(audio)), \
                            mode='constant', constant_values=0.0)
                mel = self.ap.melspectrogram(audio)
        else:
@ -99,10 +104,6 @@ class GANDataset(Dataset):
                audio = self.ap.load_wav(wavpath)
                mel = np.load(feat_path)
        if len(audio) < self.seq_len + self.pad_short:
            audio = np.pad(audio, (0, self.seq_len + self.pad_short - len(audio)), \
                    mode='constant', constant_values=0.0)
        # correct the audio length wrt padding applied in stft
        audio = np.pad(audio, (0, self.hop_len), mode="edge")
        audio = audio[:mel.shape[-1] * self.hop_len]