Giter Club home page Giter Club logo

Comments (122)

nbardy avatar nbardy commented on May 5, 2024 6

@nbardy are you planning on open sourcing the final model, or is this for commercial purposes for Facet?

Got the all clear to open source the weights.

Might finetune on some proprietary data. But the base model trained on LAION we'd release.

from gigagan-pytorch.

francqz31 avatar francqz31 commented on May 5, 2024 4

@nbardy I think you should just train it for the Super Resolution Upsampling Task 128px to 4k Which is the highlight of the paper. Gigagan's text to image is kinda meh and not good nor impressive.

What's impressive and holds the current SOTA in text to Image is this project https://raphael-painter.github.io/ it even beats midjouney v5.1 and is competitive with 5.2v and has Efficient finetuning
lucid might implement raphael and you might train it that would be a far better idea than wasting all that compute on nothing.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 4

Definitely most interested in training the upscaler.

@lucidrains do you have an idea how much work is left for the upscaler code? Looking at the paper it seems pretty similar to the base unconditioned model with some tweaks.

although the paper is light on details about the upscaler

I’m still at the same startup, Facet.

Talking to the Google team and they said the performance is very similar between PyTorch and Jax now.

from gigagan-pytorch.

francqz31 avatar francqz31 commented on May 5, 2024 3

@francqz31 do correct me if i'm wrong about that paper. i will get around to reading it (too much in the queue)

Ok here is a quick thing that I hacked Because I read the paper before.

To implement the RAPHAEL model described in this paper, here are the main steps they used:

1-Data collection and preprocessing
*They Collect a large-scale dataset of text prompt-image pairs. This paper uses LAION-5B of course and some internal datasets.
*They Preprocess the images and text by removing noise, resizing images, etc.
2-Model architecture
*The model is based on a U-Net architecture with 16 transformer blocks.
*Each block contains:
*A self-attention layer
*A cross-attention layer with textPrompt
*A space-Mixture-of-Experts (space-MoE) layer
*A time-Mixture-of-Experts (time-MoE) layer
*An edge-supervised learning module
3-Space-MoE
*The space-MoE layer uses experts to model the relationship between text tokens and image regions.
*A text gate network is used to assign text tokens to experts.
*A thresholding mechanism is used to determine the correspondence between text tokens and image regions.
*There are 6 space experts in each of the 16 transformer blocks.
4-Time-MoE
*The time-MoE layer uses experts to handle different diffusion timesteps.
*A time gate network is used to assign timesteps to experts.
*There are 4 time experts.
5-Edge-supervised learning. Add an edge detection module to extract edges from the input image.
Supervise the model using these edges and a focal loss. Pause edge learning after a certain timestep threshold.
6-Training
*They Use the AdamW optimizer with learning rate 1e-4.
*They Train for 2 months on 1000 GPUs with a batch size of 2000, Warmup steps 20000.
*They Combine a denoising loss and an edge-supervised loss.
*Optional: Use LoRA, ControlNet or SR-GAN for additional controls or higher resolution.
*They use a private tailormade SR-GAN model too I think not the public one but that can be replaced by the Gigagan upsampler ;).

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024 3

@nbardy awesome! i will prioritize this! expect me to power through it this weekend

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 3

Exiting progress.

Trying to start some jobs this week and there is no actually available TPUv4. We have the quota but the LLMs teams must be taking them all. Yet to see if we actually have compute :( or if it's a mirage.

Probably willing to pay to scale up a smaller version of this. It looks like the compute budget isn't too high for the upscaler.

from gigagan-pytorch.

francqz31 avatar francqz31 commented on May 5, 2024 2

@lucidrains Nope there isn't , I asked one of the authors he said something about releasing an api or something but they will not open source it that's 100% for sure. downside of an api that i don't think it will have fine-tuning. but yeah overall they trained it on 1000 A100s for 2 months straight , if you implement it and nbardy trains it. it will be a huge leap in the opensource community.

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024 2

@francqz31 i haven't dived into the paper yet, but i think there's basically nothing to it besides adding MoE and some hand wavy stuff about each expert being 'painters'. i just need to do to mixture-of-experts what i did to attention, and both language and generative image / videos will naturally improve if one replaces the feedforwards with them

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024 2

ok, got the unet upsampler to a decent place, will move onwards to unconditional training tomorrow, and by week's end, conditional + unet upsampler training

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024 2

Haha yeah, they are busy training Gemini I heard

No worries, take your time, as the training code isn't ready yet

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 2

Alright we've got some other preview chips now(I think their existence is under NDA right now). But should be plenty for the upscaler training

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 2

I’ll be on a long weekend break. I can take a look at an upsampler training script next week

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024 2

@nbardy hey! I'll circle back to this late next week unless you get to it first!

had to bring doggo out of the city to a hotel near airport since she is frightened by fireworks, so didn't get around to unconditional training code yet

20230702_211551.mp4

also currently working on another project and cannot context switch without losing progress

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 2

Exciting, I have cleared my schedule tomorrow and next week to work only on training the upsampler.

Correct me if I’m wrong, but looking at the code it looks input wxh is fixed to model architecture size. It’s so fast we can tile it at inference time for different resolutions. Shouldn’t be a problem.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 2

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024 2

ok, finished the text-conditioning logic for both base and upsampler

going to start wiring up accelerate probably this afternoon (as well as some hparams for more efficient recon and multi-scale losses)

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

512 TPUv4 from a google startup grant.

Didn't get any response in LAION when I asked. Looks like nothing going on yet.

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024 1

@francqz31 oh nice, wasn't aware of raphael. there is no implementation yet?

from gigagan-pytorch.

francqz31 avatar francqz31 commented on May 5, 2024 1

@lucidrains that's my pleasure I indeed will , I even took some prompts of raphael and I compared it with midjourney v5.2 , it is almost the same if not even better , But in the paper they compare with v5.1
like this for example with 5.1v
get (57)
prompts by order:

  1. A cute little matte low poly isometric cherry blossom forest island, waterfalls, lighting, soft shadows, trending on
    Artstation, 3d render, monument valley, fez video game
  2. A shanty version of Tokyo, new rustic style, bold colors with all colors palette, video game, genshin, tribe, fantasy,
    overwatch.
  3. Cartoon characters, mini characters, figures, illustrations, flower fairy, green dress, brown hair, curly long hair, elf-like
    wings, many flowers and leaves, natural scenery, golden eyes, detailed light and shadow , a high degree of detail.
  4. Cartoon characters, mini characters, hand-made, illustrations, robot kids, color expressions, boy, short brown hair, curly
    hair, blue eyes, technological age, cyberpunk, big eyes, cute, mini, detailed light and shadow, high detail.

from gigagan-pytorch.

francqz31 avatar francqz31 commented on May 5, 2024 1

@nbardy no problems don't feel any pressure , Dr. phil might just implement it and leave it for the open source community. if any one else is interested. someone will be hopefully.

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024 1

@nbardy i'll get to it soon, but like anything in open source, no promises on timeline

@francqz31 oh please, don't address me that way. got enough of that in med school

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

Happy to jump in and help.

How up to date is the TODO list? You mentioned there is some work left on the unconditioned model code still.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

🥳

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

image
Useful compute cost notes from the paper. The text conditioned model is about 2 orders or magnitude more compute. Super-Res much more reasonable.

Got a ray cluster running tonight. Should have some time to look into a training script Friday

from gigagan-pytorch.

francqz31 avatar francqz31 commented on May 5, 2024 1

That might be the cutest dog ever , look at him laying on the bed knowing he is a good boi hehe.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

Okay great, I'm seeing the Generator has cross attention.

Looks like the there is a text conditioned generator and unconditioned upscaler if I'm reading the code right.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

They don't indicate which upscaler was used in the paper for which samples.

Unclear how much it matters. Could probably get good results with unconditioned as well. But given the results from some diffusion papers of scaling up the text encoder I could imagine the text conditioning stabilizing training a lot at scale.

Also going from 64-> 512 is a lot of room for artistic interpretation as well so it's a nice feature to have. There's a lot of information loss scaling up from thumbnails.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

Thanks for the update. Code looks great.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

re: other project - there's been a small breakthrough leading to a few SOTAs in the geometric deep learning space (which is still being used for molecules / proteins) https://arxiv.org/abs/2302.03655 math was quite hairy, so took me nearly a week or two to nail down

This looks super interesting. Wish I had more time for the geometric stuff.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

Thanks for the updates.

I split off the distributed train script and have been working on getting this train script to run.
https://github.com/nbardy/gigagan-pytorch/blob/main/training_scripts/train_simple.py

I can move over to testing your train script.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

Yea can you email me your signal. Just waking up will start work in a few hours

[email protected]

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

:) Exciting. Managed to cancel all my meetings this week.

have you tested the discriminators yet?

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1
image

Looks like the same text encoder from the generator. They use only the global code.

from gigagan-pytorch.

francqz31 avatar francqz31 commented on May 5, 2024 1

Based on the paper, it seems that GigaGAN uses separate text encoders for the generator and discriminator:
1-For the generator, it extracts text features from CLIP and processes them through additional learned attention layers T to get text embeddings t.
2-For the discriminator, it similarly applies a pretrained text encoder like CLIP, followed by extra learned attention layers to get the text descriptor t_D.
The paper mentions using t_local and t_global for the generator, and just a global descriptor t_D for the discriminator. So the text encoders have similar architectures (pretrained CLIP + extra attention) but with separate learned parameters.
The motivation is that the generator and discriminator have different requirements for the text embedding. The generator needs both local word-level features t_local and global sentence-level features t_global, while the discriminator only needs to extract an overall global descriptor of the text prompt for its real/fake prediction. So using separate encoders allows customizing them for each task.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024 1

Reading through training details. Some notes on datasets and models size from the paper.

with the exception of the 128-to 1024 upsampler model trained on Adobe’s internal Stock
images.

That is the 8x upsampler that gives the stunning results in the paper.

Unfortunately it's hyper-parameters are not in the paper, but I imagine it would be about the same size maybe a little deeper to get some higher resolution features. Should take less compute than the text conditioned upscalers.

Also interesting

Additionally, we train a separate 256px class-conditional upsampler model and combine them with end-to-end finetuning stage.

Does this mean training the text->image and upsampler models in series for fine tuning, I hadn't noticed before.

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy where do you have the compute from? you should join the LAION discord and check to see first

i will be finishing the unconditional training code this week for starters, before the entire training code by end of month

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

ohh sweet, though you probably should do it in jax? or has the state of pytorch xla improved?

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

are you doing a startup? or working for a new one?

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@francqz31 i see, they just added a ton of mixture of experts. i have been meaning to open source ST-MoE for language modeling front, so maybe this is good timing. also have a few ideas for improving PKM

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@francqz31 it was on my plate anyways, since we now know GPT4 uses mixture of experts

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@francqz31 do correct me if i'm wrong about that paper. i will get around to reading it (too much in the queue)

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@francqz31 cool! yea, i guess this is yet another testament to using mixture-of-experts or conditional computation modules

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

@francqz31 thanks for sharing, too much work to implement and train a new model architecture on a short timeline. Raphael does look quite interesting, although expensive to run inference with MoE.

particularly interested in the openMUSe training going on.

from gigagan-pytorch.

francqz31 avatar francqz31 commented on May 5, 2024

it is more than enough that you are willing to train the Upsampler. it is not an easy work. plus it is the most important thing in the paper.

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy yea, the plan of attack was going to be to wire up hf accelerate for unconditional, following their example here, then move on to conditional, before finally tackling the upsampler modifications

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy are you planning on open sourcing the final model, or is this for commercial purposes for Facet?

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@francqz31 thanks for the rundown!

yea, there is nothing surprising then. mostly more attention (transformer blocks), and the experts per diffusion timesteps goes back to eDiff from Balaji et al

the application of space and time MoE seems to be the main novelty, but that in itself is just porting over lessons from LLM

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

didn't get to it this weekend 😢 caught up with some TTS work and Pride celebrations

going to work on it this morning!

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy the upsampler is nothing more than a unet with some high resolution downsampling layers removed, should be straightforward!

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy nice! i'll get unconditional training wired up tomorrow morning and make sure the discriminator works, before moving on to the rest of the training code next Monday (some of my favorite electronic music artists are in town this weekend)

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy always welcoming PRs, if you are in a hurry!

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

ok, let us reconvene on this Monday then

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

haha or Wednesday, whenever you are free

i'll take my time here then

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

It's unclear to me in the paper how the ImageNet superRes model and text conditioned upsampler compare in quality. Will have to see if they have ablations there.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

In addition,
for more controlled comparison, we train our model on
the ImageNet unconditional superresolution task and compare performance with the diffusion-based models, including SR3 [81] and LDM [79].

Looks like the smaller one was mostly for benchmarking.

I think text conditioned upscaling would be much more useful for running after other model results as well in pipelines.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

Getting up to speed with the code today. Feel like I understood most of it.

Looks like a little text conditioning is the only thing missing in the model architecture code.

Looking at text encoding for the paper it goes through cross attention and the style network.
I see the style network and text encoder are already there.

Looks like I can just add a cross attention layer with t-local here:
https://github.com/nbardy/gigagan-pytorch/blob/main/gigagan_pytorch/unet_upsampler.py#L495

And hook up tglobal to the stylenetwork.

I started on a distributed train script today on my fork(Pretty messy at the moment maybe not worth taking a look at)

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy yup correct!

does the upscaler need text conditioning?

i'll get back to this mid-week. finally got over a big hurdle with another project

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

oh yup, text conditioning would still make sense for low res upscaling, let me aim to get that out of the way Wednesday

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

Going to try and get training code running tomorrow and try to get the unconditioned one converging.

Surprised how different the upscaler code looks from the Generator. Generator also has some stuff like skip_layer_excite that looks nice.

Glad to hear your other project is wrapped up. What was it? Open source?

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

Tried to add text conditioning to the Upscaler this evening. Seems like it should just be cross attention and global text code to the styleNetwork.

https://github.com/lucidrains/gigagan-pytorch/pull/20/files#diff-43ea16d9f61a65661c24088011c2c775964911cacf11aa87d17ed789730777caR434
(I linked to the relevant lines)

What formatter do you use? Would be nice to set mine the same for this repo. Getting a lot of formatting changes in the diff.

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

Going to try and get training code running tomorrow and try to get the unconditioned one converging.

Surprised how different the upscaler code looks from the Generator. Generator also has some stuff like skip_layer_excite that looks nice.

Glad to hear your other project is wrapped up. What was it? Open source?

hey, the PR looks great! yes, we can adopt a styling convention since it is clear you are a seasoned engineer from first glance

i heard ruff is all the rage these days?

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

re: other project - there's been a small breakthrough leading to a few SOTAs in the geometric deep learning space (which is still being used for molecules / proteins) https://arxiv.org/abs/2302.03655 math was quite hairy, so took me nearly a week or two to nail down

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy yes, i can get the unconditional training code underway done by end of the day, and then move towards text conditioned

i noticed you are also trying Lion, but i would caution against using it, as the paper never explored it in the GAN setting

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy for skip layer excitation, i tried it in a unet setting some time back, and didn't see much an improvement; i think the concatenative skip connections does most of the heavy lifting already. but i'm not much of an experimentalist, just didn't see a dramatic improvement in the first 5k steps. i could add it

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy ran out of steam, will work more on the training code tomorrow morning!

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

Is it obvious to you what is a big contribution in this paper?

Seems like the adaptive kernel is important to keep the parameter count down. And then just lots of tricks to keep the training stable at scale.

I will go through tomorrow and try to line up the hyper-parameters with the different paper models.

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

Do you think LION will fail on a smaller model?

Looking at try a few optimizers across a sweep to start. distributed shampoo and Adam are top candidates right now. boris has a lot of positive notes on shampoo for similar sized models for ~460M param models for dalle-mini

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy i would just stick with Adam, as most of the architectural tricks we know probably overfit to Adam, for GAN training

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

Is it obvious to you what is a big contribution in this paper?

Seems like the adaptive kernel is important to keep the parameter count down. And then just lots of tricks to keep the training stable at scale.

I will go through tomorrow and try to line up the hyper-parameters with the different paper models.

i would say adaptive conv kernel, incorporation of the l2 distance self attention, as well as the scale invariant discriminator

a bit of a bag of tricks paper, but the results is what counts

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

the truth is, any of these concepts would benefit DDPMs as well.. but let's just keep moving forward. people can just pip install this library and experiment with the separate modules

never thought i'd be doing GANs after all this time

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

made a tiny bit of progress; unfortunately unconditional image synthesis didn't work on the first try

I'll try to debug what's going on tomorrow morning - still need to account for aux losses and gradient penalty. I think order of attack would be to be able to train a small upsampler on one machine before going full distributed

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy ah yea, i don't think different aspect ratios were used in the paper? may be a nice to have; we can start with square images, and get that working for starters

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

revisiting all this complicated GAN training code, all I can say is, thank god for denoising diffusion

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

hmm, no there's still something wrong, training blows up, even when i add gradient penalty (usually adding GP is enough for me to stabilize training early on) maybe there's a bug in the discriminator

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy you want to give it a try? we should make sure it can work for mnist (I'm using the old Oxford flowers dataset)

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy ok, i'm not sure what's going on, going to give up for the day

next plan of attack will probably be to copy paste the generator and discriminator from my working repositories (stylegan2-pytorch or lightweight-gan), and work from the bottom up. an alternative idea would be to pare down the generator and discriminator here and plug them into the working stylegan2-pytorch repo, and that would validate which modules are working or not, piece by piece

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

ok cool! yeah I'll resume trying to debug the system tomorrow morning

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

I can try testing the discriminators as classifiers today.

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy hey, good timing! i'm about to test the generator in lightweight gan, and rule out that the issue rests in the generator for starters

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy do you want to chat on Signal btw? (may be more convenient for a lot of back and forth) i can send you my number through email

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy ok cool, i'm actually nearing end of work and doing park stuff with doggo rest of day. will know more about whether generator is borked or not in half an hour

let me send you my number

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

0

ok, generator is not the issue! will move on to gigagan discriminator tomorrow

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

11-ema

11k steps for gigagan generator paired with lightweight gan discriminator

looks ok

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

bug is probably in discriminator somewhere, let me throw a few hours this morning at this, see if i can find it. pretty sure i can find it by tomorrow's end for sure (as well as solidify all the auxiliary losses)

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

hey no worries, rest up!

will need to move on to some other work mid-next week, but i'm sure it'll be semi functional by then, save for probably one or two aux losses + distributed. just so you can plan ahead for work

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

good news, have gigagan training using the lightweight gan peripheral training code. losses look ok; generator loss still a bit on the high side, but stabilizing

gigagan training is still borked, so either the self-supervised loss is crucial, or it is something else

will resume tomorrow morning; have a great rest of your Sunday and hope your neck feel better!

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

ok further good news, validated multi-scale inputs and scale invariant code + skip layer excitation all works over at lightweight gan. converges much nicer too

so maybe the issue was just with torch.cdist and / or any remaining issues with the gan training loop

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

training is now stable in the main repo, even without reconstruction loss 👌 turns out the GLU doesn't work that hot in this setting, so i removed it

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy aha yea, that's a good first step towards productivity

i haven't done the hinge loss for the multiscale outputs yet, but i reckon it should be fine. should know before noon

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

yea, it is working with the multiscale logits being involved, but the loss is very rocky for the first 1k steps

i'll let it run until 5-10k and see if it stabilizes. worse comes to worse, can always taper in the multi-scale contribution

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

ok, once i took out the gradient penalty contributions for multi-scale logits, training is back to being very stable

let us roll with that for now!

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

Screenshot from 2023-07-17 13-10-49

looking great - will move on towards rest of the losses, accelerate integration, text conditioning tomorrow

do you know if the text encoder was shared between discriminator and generator, or separate?

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

I'm able to get the upscaler and base gan running locally.

Getting a missing op error when the gradient penalty is applied.
RuntimeError: derivative for aten::linear_backward is not implemented

Seems like a macbook related thing I'll just ignore it for now.

I'm going to see if I can get it running on TPUs this afternoon.

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

I'm able to get the upscaler and base gan running locally.

Getting a missing op error when the gradient penalty is applied. RuntimeError: derivative for aten::linear_backward is not implemented

Seems like a macbook related thing I'll just ignore it for now.

I'm going to see if I can get it running on TPUs this afternoon.

ok cool, i'm moving onto another project for the rest of the day; will throw a bit more hours at this tomorrow morning

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024
image Looks like the same text encoder from the generator. They use only the global code.

so they have a 'few learnable attention layers' in addition to the CLIP text encoder. i guess i'm wondering if that is shared between generator and discriminator or no

probably safest just to learn them separately

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@francqz31 nice find!

from gigagan-pytorch.

nbardy avatar nbardy commented on May 5, 2024

I was not able to find the t_local and t_global sizes in the paper.

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

will also aim to get the eval for both base and upsampler done, using what @CerebralSeed pull requested as a starting point. then we can see the GAN working for some toy datasets for unconditional training

from gigagan-pytorch.

lucidrains avatar lucidrains commented on May 5, 2024

@nbardy or were you planning on doing the distributed stuff with accelerate + ray today? just making sure no overlapping work

from gigagan-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.