Giter Club home page Giter Club logo

Comments (8)

nshmyrev avatar nshmyrev commented on August 18, 2024 2

I want to resume the training from the latest checkpoint in case something happened to the previous session; I assume I should simply leave the speaker embedding untouched?

Yes, it picks the latest checkpoint automatically

from mb-istft-vits2.

SoshyHayami avatar SoshyHayami commented on August 18, 2024

yeah good question. i hope the author answer this one

from mb-istft-vits2.

nshmyrev avatar nshmyrev commented on August 18, 2024

You can load checkpoint and add as many new speakers as you want into speaker embedding, then save and train:

net_g = SynthesizerTrn(
    len(symbols),
    posterior_channels,
    hps.train.segment_size // hps.data.hop_length,
    n_speakers=100,
    **hps.model).to(device)

all_model = torch.load("G_old.pth", map_location=device)
model = all_model["model"]
optimizer = all_model["optimizer"]
iteration = all_model["iteration"]
learning_rate = all_model["learning_rate"]
net_g.load_state_dict(model)

with torch.no_grad():
    weight = net_g.emb_g.weight
    .... # Modify here as you like
    net_g.emb_g = torch.nn.Embedding.from_pretrained(new_embedding)
    net_g.n_speakers = 105

state_dict = net_g.state_dict()
torch.save({'model': state_dict,
              'iteration': iteration,
              # Don't save the optimizer, it contains old speaker dimension, trainer will reinit it
              'learning_rate': learning_rate}, "G_open.pth")

from mb-istft-vits2.

kafan1986 avatar kafan1986 commented on August 18, 2024

You can load checkpoint and add as many new speakers as you want into speaker embedding, then save and train:

net_g = SynthesizerTrn(
    len(symbols),
    posterior_channels,
    hps.train.segment_size // hps.data.hop_length,
    n_speakers=100,
    **hps.model).to(device)

all_model = torch.load("G_old.pth", map_location=device)
model = all_model["model"]
optimizer = all_model["optimizer"]
iteration = all_model["iteration"]
learning_rate = all_model["learning_rate"]
net_g.load_state_dict(model)

with torch.no_grad():
    weight = net_g.emb_g.weight
    .... # Modify here as you like
    net_g.emb_g = torch.nn.Embedding.from_pretrained(new_embedding)
    net_g.n_speakers = 105

state_dict = net_g.state_dict()
torch.save({'model': state_dict,
              'iteration': iteration,
              # Don't save the optimizer, it contains old speaker dimension, trainer will reinit it
              'learning_rate': learning_rate}, "G_open.pth")

What should be the new_embedding value?
Should it be = nn.Embedding(n_speakers, gin_channels) ?

from mb-istft-vits2.

nshmyrev avatar nshmyrev commented on August 18, 2024

You can load checkpoint and add as many new speakers as you want into speaker embedding, then save and train:

net_g = SynthesizerTrn(
    len(symbols),
    posterior_channels,
    hps.train.segment_size // hps.data.hop_length,
    n_speakers=100,
    **hps.model).to(device)

all_model = torch.load("G_old.pth", map_location=device)
model = all_model["model"]
optimizer = all_model["optimizer"]
iteration = all_model["iteration"]
learning_rate = all_model["learning_rate"]
net_g.load_state_dict(model)

with torch.no_grad():
    weight = net_g.emb_g.weight
    .... # Modify here as you like
    net_g.emb_g = torch.nn.Embedding.from_pretrained(new_embedding)
    net_g.n_speakers = 105

state_dict = net_g.state_dict()
torch.save({'model': state_dict,
              'iteration': iteration,
              # Don't save the optimizer, it contains old speaker dimension, trainer will reinit it
              'learning_rate': learning_rate}, "G_open.pth")

What should be the new_embedding value? Should it be = nn.Embedding(n_speakers, gin_channels) ?

Yes, you can do like that too. Then you can copy weights from the old embedidngs or just leave it random, trainer will quickly update them.

The only thing is that you shouldn't copy identical rows for different new speaker, trainer will be confused. Random initialization is ok.

from mb-istft-vits2.

SoshyHayami avatar SoshyHayami commented on August 18, 2024

I want to resume the training from the latest checkpoint in case something happened to the previous session; I assume I should simply leave the speaker embedding untouched?

from mb-istft-vits2.

kafan1986 avatar kafan1986 commented on August 18, 2024

You can load checkpoint and add as many new speakers as you want into speaker embedding, then save and train:

net_g = SynthesizerTrn(
    len(symbols),
    posterior_channels,
    hps.train.segment_size // hps.data.hop_length,
    n_speakers=100,
    **hps.model).to(device)

all_model = torch.load("G_old.pth", map_location=device)
model = all_model["model"]
optimizer = all_model["optimizer"]
iteration = all_model["iteration"]
learning_rate = all_model["learning_rate"]
net_g.load_state_dict(model)

with torch.no_grad():
    weight = net_g.emb_g.weight
    .... # Modify here as you like
    net_g.emb_g = torch.nn.Embedding.from_pretrained(new_embedding)
    net_g.n_speakers = 105

state_dict = net_g.state_dict()
torch.save({'model': state_dict,
              'iteration': iteration,
              # Don't save the optimizer, it contains old speaker dimension, trainer will reinit it
              'learning_rate': learning_rate}, "G_open.pth")

What should be the new_embedding value? Should it be = nn.Embedding(n_speakers, gin_channels) ?

Yes, you can do like that too. Then you can copy weights from the old embedidngs or just leave it random, trainer will quickly update them.

The only thing is that you shouldn't copy identical rows for different new speaker, trainer will be confused. Random initialization is ok.

I tried this but it did not work. The model started from the initial 0 checkpoint without any apparent advantage of using previous checkpoint for initialization.

I am trying another approach, using more than actual speaker count in the config (I currently have 9 speakers, but in config I used n_speakers as 20), just to keep some headroom for future speakers, not sure whether there would be any adverse impact on the TTS quality or not, till all the empty slots get filled in future. Any idea?

from mb-istft-vits2.

debasish-mihup avatar debasish-mihup commented on August 18, 2024

Keeping higher speaker count with future headroom in mind works. Closing issue.

from mb-istft-vits2.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.