I have trained a model with n speakers and now a new speaker is being added. Given rem

Would it be possible to train and initialize training with previous checkpoint? about mb-istft-vits2 HOT 8 CLOSED

fenrlr commented on August 18, 2024

Would it be possible to train and initialize training with previous checkpoint?

from mb-istft-vits2.

Comments (8)

nshmyrev commented on August 18, 2024 2

I want to resume the training from the latest checkpoint in case something happened to the previous session; I assume I should simply leave the speaker embedding untouched?

Yes, it picks the latest checkpoint automatically

from mb-istft-vits2.

SoshyHayami commented on August 18, 2024

yeah good question. i hope the author answer this one

from mb-istft-vits2.

nshmyrev commented on August 18, 2024

You can load checkpoint and add as many new speakers as you want into speaker embedding, then save and train:

net_g = SynthesizerTrn(
    len(symbols),
    posterior_channels,
    hps.train.segment_size // hps.data.hop_length,
    n_speakers=100,
    **hps.model).to(device)

all_model = torch.load("G_old.pth", map_location=device)
model = all_model["model"]
optimizer = all_model["optimizer"]
iteration = all_model["iteration"]
learning_rate = all_model["learning_rate"]
net_g.load_state_dict(model)

with torch.no_grad():
    weight = net_g.emb_g.weight
    .... # Modify here as you like
    net_g.emb_g = torch.nn.Embedding.from_pretrained(new_embedding)
    net_g.n_speakers = 105

state_dict = net_g.state_dict()
torch.save({'model': state_dict,
              'iteration': iteration,
              # Don't save the optimizer, it contains old speaker dimension, trainer will reinit it
              'learning_rate': learning_rate}, "G_open.pth")

from mb-istft-vits2.

kafan1986 commented on August 18, 2024

You can load checkpoint and add as many new speakers as you want into speaker embedding, then save and train:

net_g = SynthesizerTrn(
    len(symbols),
    posterior_channels,
    hps.train.segment_size // hps.data.hop_length,
    n_speakers=100,
    **hps.model).to(device)

all_model = torch.load("G_old.pth", map_location=device)
model = all_model["model"]
optimizer = all_model["optimizer"]
iteration = all_model["iteration"]
learning_rate = all_model["learning_rate"]
net_g.load_state_dict(model)

with torch.no_grad():
    weight = net_g.emb_g.weight
    .... # Modify here as you like
    net_g.emb_g = torch.nn.Embedding.from_pretrained(new_embedding)
    net_g.n_speakers = 105

state_dict = net_g.state_dict()
torch.save({'model': state_dict,
              'iteration': iteration,
              # Don't save the optimizer, it contains old speaker dimension, trainer will reinit it
              'learning_rate': learning_rate}, "G_open.pth")

What should be the new_embedding value?
Should it be = nn.Embedding(n_speakers, gin_channels) ?

from mb-istft-vits2.

nshmyrev commented on August 18, 2024

You can load checkpoint and add as many new speakers as you want into speaker embedding, then save and train:

net_g = SynthesizerTrn(
    len(symbols),
    posterior_channels,
    hps.train.segment_size // hps.data.hop_length,
    n_speakers=100,
    **hps.model).to(device)

all_model = torch.load("G_old.pth", map_location=device)
model = all_model["model"]
optimizer = all_model["optimizer"]
iteration = all_model["iteration"]
learning_rate = all_model["learning_rate"]
net_g.load_state_dict(model)

with torch.no_grad():
    weight = net_g.emb_g.weight
    .... # Modify here as you like
    net_g.emb_g = torch.nn.Embedding.from_pretrained(new_embedding)
    net_g.n_speakers = 105

state_dict = net_g.state_dict()
torch.save({'model': state_dict,
              'iteration': iteration,
              # Don't save the optimizer, it contains old speaker dimension, trainer will reinit it
              'learning_rate': learning_rate}, "G_open.pth")

What should be the new_embedding value? Should it be = nn.Embedding(n_speakers, gin_channels) ?

Yes, you can do like that too. Then you can copy weights from the old embedidngs or just leave it random, trainer will quickly update them.

The only thing is that you shouldn't copy identical rows for different new speaker, trainer will be confused. Random initialization is ok.

from mb-istft-vits2.

SoshyHayami commented on August 18, 2024

I want to resume the training from the latest checkpoint in case something happened to the previous session; I assume I should simply leave the speaker embedding untouched?

from mb-istft-vits2.

kafan1986 commented on August 18, 2024

You can load checkpoint and add as many new speakers as you want into speaker embedding, then save and train:
net_g = SynthesizerTrn(
    len(symbols),
    posterior_channels,
    hps.train.segment_size // hps.data.hop_length,
    n_speakers=100,
    **hps.model).to(device)

all_model = torch.load("G_old.pth", map_location=device)
model = all_model["model"]
optimizer = all_model["optimizer"]
iteration = all_model["iteration"]
learning_rate = all_model["learning_rate"]
net_g.load_state_dict(model)

with torch.no_grad():
    weight = net_g.emb_g.weight
    .... # Modify here as you like
    net_g.emb_g = torch.nn.Embedding.from_pretrained(new_embedding)
    net_g.n_speakers = 105

state_dict = net_g.state_dict()
torch.save({'model': state_dict,
              'iteration': iteration,
              # Don't save the optimizer, it contains old speaker dimension, trainer will reinit it
              'learning_rate': learning_rate}, "G_open.pth")
What should be the new_embedding value? Should it be = nn.Embedding(n_speakers, gin_channels) ?
Yes, you can do like that too. Then you can copy weights from the old embedidngs or just leave it random, trainer will quickly update them.

The only thing is that you shouldn't copy identical rows for different new speaker, trainer will be confused. Random initialization is ok.

I tried this but it did not work. The model started from the initial 0 checkpoint without any apparent advantage of using previous checkpoint for initialization.

I am trying another approach, using more than actual speaker count in the config (I currently have 9 speakers, but in config I used n_speakers as 20), just to keep some headroom for future speakers, not sure whether there would be any adverse impact on the TTS quality or not, till all the empty slots get filled in future. Any idea?

from mb-istft-vits2.

debasish-mihup commented on August 18, 2024

Keeping higher speaker count with future headroom in mind works. Closing issue.

from mb-istft-vits2.

Would it be possible to train and initialize training with previous checkpoint? about mb-istft-vits2 HOT 8 CLOSED

Comments (8)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent