Giter Club home page Giter Club logo

Comments (5)

JCBrouwer avatar JCBrouwer commented on June 18, 2024 2

I've noticed the same thing in my own training runs as well. My first instinct was that it's related to mirroring the dataset, but it looks like you have that turned off!

All my videos are dominated by two modes of motion. A large scale left-to-right movement and an undulating, faster, up-and-down flashing movement.

I'm starting to think this is inherent to the current design of the motion encoder.

Back in February I tried cleaning up the research zip from #1 and got these results training the motion encoder from scratch:
https://gfycat.com/gloriousgrizzledhadrosaurus
(not exactly sure what the settings were, but I think my motion_z_distance was too short, leading to the extreme quick motions)

With the release of the official code I tried again starting from the pre-trained faces checkpoint:
https://gfycat.com/generalquarterlykudu
Config: https://pastebin.com/WqrygJMA
The results are definitely smoother (probably because of the long motion_z_distance / the better starting point), but still this large scale left-right movement is very apparent in all of the videos.

The reason I think it might be inherent is that the same effect is in the pre-trained checkpoint which I started from, here's the video from the start of training with the unchanged faces checkpoint.
https://gfycat.com/bouncyagonizingaustraliankestrel
It also contains the same undulating, flashing, periodic motion!

The same effect is also clearly visible in the SkyTimelapse GIF in the README. Look at how all the clouds make a long movement right and then a long movement back to the left.

Would love to know if there is a way to change up the motion encoder (or anything else?) to reduce this effect!

(paging @universome, thank you for the amazing work by the way :)

from stylegan-v.

skymanaditya1 avatar skymanaditya1 commented on June 18, 2024 1

Hi,

We have a dataset where a liquid flowing in the water from right to left. We are trying to generate similar videos using StyleGAN-V. But the produced videos are have a spring like animation, ie. at first video moves from right to left then left to right. For example the video starts with a nice motion from right to left but after some time it begins to go from left to right:

Will more training solve the issue or is there any optimization that we can do?

Thanks!

I had faced a similar issue. I guess it could be because of the augmentations that you are using. In your config file, you have bgc as the aug_pipe which has different augmentations like rotation, flipping, etc. I guess that could be the reason for observing the motion in two different directions.

from stylegan-v.

universome avatar universome commented on June 18, 2024 1

Hi! To be honest, I believe that the issue you report does not seem to be easily fixable. I attribute it to the fact that the generator uses just a single 512-dimensional content code (w) while you are trying to generate an "infinite" amount of different content from it. But there are other factors at play as well.

To mitigate it, I would try are the following things:

  • I am somewhat surprised that the discriminator does not catch the reversed movement. It could be explainable if you trained with 2 frames per clip, but as I see you are using 3 of them. Did you try switching off horizontal flips (xflip=1 => xlip=0) in the augmentation pipe? In this case, the discriminator will be seeing just a single movement direction all the time during training, which could help.
  • Passing motion codes via modulation rather than concatenation (maybe even removing w codes completely). This can be done by setting cond_type to sum_w or concat_w in the config. In this way, it would be similar in spirit to having "infinite" amount of content codes.
  • Using spatio-temporal time embeddings specifically for your dataset. This means, that instead of concatenating time embeddings p_\theta(t) to the constant input tensor (which depend only on time), you concatenate p_w(t, x, y) — joint positional embeddings, similar to DIGAN. Also, the role of the mapping network is somewhat unclear for your dataset since all the videos are of the same style.
  • Processing the frame features in the discriminator in a more sophisticated way, rather than just concatenation and using more frames per clip. For example, with rnns/temporal convs/attention and using absolute time positional embeddings for them. It should help the discriminator to catch the reversed movement.
  • Replacing StyleGAN2 generator backbone with the StyleGAN3 backbone. I have a suspicion that it is difficult for the SG2 generator to "draw" content wherever it wants due to 3x3 convolutions (which entangles pixels together) and not having positional embeddings, which makes it more difficult for it to "execute" movement in general. Using SG3 will result in having spatio-temporal positional embeddings for pixels.
  • Adapting a scheme similar to ALIS, where we "merge" an infinite amount of content codes together. But it's not clear, how to "move" them.

Is the dataset you are using available publicly?

from stylegan-v.

JCBrouwer avatar JCBrouwer commented on June 18, 2024

I had faced a similar issue. I guess it could be because of the augmentations that you are using. In your config file, you have bgc as the aug_pipe which has different augmentations like rotation, flipping, etc. I guess that could be the reason for observing the motion in two different directions.

In my case, at least, I have more 100k frames in the dataset so I'm quite confident there isn't any augmentation leakage. I've only ever seen that with very small datasets (<2000 imgs).

from stylegan-v.

JCBrouwer avatar JCBrouwer commented on June 18, 2024

Thanks for the in-depth response @universome !

I'll definitely have a look at some of your suggestions. It seems to me maybe it also makes sense to supply the w-code to the motion encoder. Some motions might only be valid for certain styles and not for others, but now the motion encoder does not have this information.

Have you seen Generating Long Videos of Dynamic Scenes? Looks very promising! Of course they're using much more compute because they work with dense spatio-temporal representations all the way through. Perhaps still some of their temporal-coherency-focused ideas can be ported over into the motion encoder here for gains.

from stylegan-v.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.