Giter Club home page Giter Club logo

Comments (12)

neonbjb avatar neonbjb commented on August 11, 2024 4

The author exports the NLL loss here:
https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96

I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.

from srflow.

neonbjb avatar neonbjb commented on August 11, 2024 3

I have not explicitly trained models for facial SR, but I have used the pretrained models "successfully" in my repo. I'll upload a weight conversion script you can use to do the same if you care to. It'll be found in recipes/srflow/convert_official_weights.py

Before anyone spends a lot of GPU time training these models, though, I want to add a little input on my experience working with them. I would love it if the authors were to pipe in and help me out where I am wrong.

Let's use faces as an example. I pulled a random face from the FFHQ dataset and downsampled it 8x:
Original:
original
LQ:
LR

I then went to the Jupyter notebook found in this repo and did a few upsample tests with the CelebA_8x model. Here is the best result:
from_jupyter
Note that it is missing a lot of high frequency details and has some artifacts, notably a repeating pattern of squares (may only be visible if you download and zoom in). These are fairly typical in all SRFlow models I train, even with noise added to the HQ inputs as the authors suggest.

I then converted that same model into my repo, and ran a script I have been using to play with these models. One thing I can do with this script is generate the "mean" face for any LR input (simple really, you just use a Z=0 distribution into the flow network). Here is the output from that:
mean

So what you are seeing here is what the model thinks the "most likely" HQ image is for the given LQ input. For reference, here is the image difference between the original HQ and the mean:
mean_to_original_difference

Note that the mean is missing a lot of the high-frequency details. My original suspiscion for why this is happening is that the network is encoding these details into the Z vector that it trains on, e.g. the Z vector never really collapses into a true gaussian, and instead holds on to structural information about the original image. To test this, I plotted the std(dim=1) and mean(dim=1) of the Z vectors (dim 1 is channel/filter dimension):
Mean:
mean0_0
mean0_1
mean0_2
Std:
std0_0
std0_1
std0_2

In a well trained normalizing flow, these would be indistinguishable from noise. As you can see, they are not: the Z vector contains a ton of structural information about the underlying HQ image. This tells me that the network is unable to properly capture these high frequency details and map them to a believable function.

This is, in general, my experience with SRFlow. I presented one image above, but the same behavior is exhibited in pretty much all inputs I have tested with. The best I can ever get out of the network is images with Z=0, which produces appealing, "smoothed" images that beat out PSNR losses, but it is misses all of the high-frequency details that a true SR algorithm should be creating. No amount of noise at the Z-input produces these details: they are highly spatially correlated.

I think the idea behind SRFlow has some real merit. It is likely that these networks are not being trained properly or a better architecture needs to be presented to take advantage of that potential. I also think that projecting Z into a structural space might be harmful: the model can manipulate the Z statistics to "appear" close to a Guassian but still preserve structural details within that "noise" - but that's just a hunch from a dumb computer programmer..

Oh - one last edit: if you are really interested in facial SR for some reason, I'd highly recommend checking out "GLEAN". I have an implementation in my repo and it is exceptionally good at producing high frequency details and doing extreme SR, with the hard limitation that it can only do so on tightly focused datasets like faces/cats/etc.

from srflow.

neonbjb avatar neonbjb commented on August 11, 2024 2

I promised to update on training code, here you go:
https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflow

I've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models.

from srflow.

martin-danelljan avatar martin-danelljan commented on August 11, 2024 2

Hi.

First, we finally got thumbs up from the project funding partner to publish the training code, so @andreas128 will push it up asap after the holidays.

@neonbjb thank you for your analysis and sharing your experience. Here are some comments and answers to your observation:

  • What downsampling kernel did you use? Unfortunately all SR methods are very sensitive to this (not only SRFlow). We used the Matlab Bicubic, which is the most common choice in research. There are methods to train an SR method robust to different kernels, which could also be applied for SRFlow, but we haven't tried this.

  • Did you use the face SR model or the general SR model for this? It is important to note that no generic SR model can compete with specialized face SR methods, since they can learn very rich and specialized image priors for faces. On the other hand, the our generic SR models have probably not even seen a single HR face in the training set. You could try our face SR, but as most models, it assumes a particular face alignment and resolution so I dont expect it to work directly on your image without doing the same preprocessing as we did in the paper (code for this will come now soon).

  • Please note that SRFlow is a first step in this new research direction. Do not expect a final solution that works amazingly well everywhere. In the paper we demonstrate results that are very competitive on SOTA GAN based approaches. But of course, there is a large room for improvement. In fact, we know this since we are already working on an improved version of SRFlow that has substantially better output image quality. We hope to release a pre-print of this in the coming months.

  • We are also convinced about the underlying idea that an SR method need to explicitly account for the illposedness of the problem. I.e. that the training HR ground-truth is just one possibility. The network should thus not be "forced" to predict exactly this one. Flows provide a very exciting alternative for developing such an SR solution.

  • The exact structure of the latent space is an open research question. Although our experiments in our (and other) papers provide some insights, there is much to understand still that could guide us to develop even better solutions.

from srflow.

machlea avatar machlea commented on August 11, 2024

The author exports the NLL loss here:
https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96

I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.

May I ask a easy question?Why the output isn't a tensor,the size of which is (1,256), and each number in this tensor is a probability?

from srflow.

LyWangPX avatar LyWangPX commented on August 11, 2024

The author exports the NLL loss here:
https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96
I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.

May I ask a easy question?Why the output isn't a tensor,the size of which is (1,256), and each number in this tensor is a probability?

I am not 100% sure about your questions. But if you sample a multi variable gaussian the overall probability will be the product of each element, which will be a sum in log. So they collapse.

from srflow.

erqiangli avatar erqiangli commented on August 11, 2024

I promised to update on training code, here you go:
https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflow

I've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models.
Hi, James Betker! I have paid attention to some valuable open source you have done before. Thank you for sharing. You promised to update the SRFlow - training code. I could not find the train.py file through the link you sent. Could you please send me the code of train.py? Looking forward to your reply !

from srflow.

machlea avatar machlea commented on August 11, 2024

The author exports the NLL loss here:
https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96
I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.

May I ask a easy question?Why the output isn't a tensor,the size of which is (1,256), and each number in this tensor is a probability?

I am not 100% sure about your questions. But if you sample a multi variable gaussian the overall probability will be the product of each element, which will be a sum in log. So they collapse.

i've been tried to define it as a sum of ln(xx+1) , xx+1 makes each element >=1 so it can calculate ln and ln(x*x+1)>=0.However, it only need 30 or 40 epoches ,the loss will suddenly become very large and PSNR drop to 6

from srflow.

yzcv avatar yzcv commented on August 11, 2024

I promised to update on training code, here you go:
https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflow
I've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models.
Hi, James Betker! I have paid attention to some valuable open source you have done before. Thank you for sharing. You promised to update the SRFlow - training code. I could not find the train.py file through the link you sent. Could you please send me the code of train.py? Looking forward to your reply !

Hi, you can find the train.py here.
https://github.com/neonbjb/DL-Art-School/blob/gan_lab/codes/train.py

from srflow.

burb0 avatar burb0 commented on August 11, 2024

I promised to update on training code, here you go:
https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflow

I've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models.

Have you tried for face SR? Could this work for face SR simply by changing datasets?

from srflow.

neonbjb avatar neonbjb commented on August 11, 2024

Hi @martin-danelljan - excellent! I look forward to trying that code out!

I am using the cv2 bicubic downsampling kernel - the same one used by ESRGAN since it seems like you are borrowing a lot of your code from that repository. For this demonstration, I used the pretrained SRFlow_CelebA_8X.pth weights provided by setup.sh. The first result in the above post uses only code from this repo. The other results use code from my repo, which is nearly identical and should not have functional changes. I admit that it is possible that there are some changes, but the similarity in the HQ results seem to indicate not. I want to note that I had better results when using images from the CelebA dataset, but I specifically wanted to demonstrate how the model generalizes here.

Agree with all your other comments. My comment here was not made with the intent of conveying that this is a dead research direction. On the contrary - I think it is extremely exciting and I have nothing but admiration for what you have done. After spending the last half of a year playing around with these types of models, I am becoming convinced that the way forward for realistic SR is going to involve mapping HQ images to some latent space, and then performing transfers of latent data encoding high-frequency details from those images to corresponding LQ images. I think the approach SRFlow offers some tantalizing ways to make this happen.

My comment was instead to urge folks to be realistic about what they would be getting out of several hundred hours of GPU training using my repo, and I wanted to convey the point that I believe the poor results are not because I implemented something wrong. While I specifically dug into your pretrained model here, these results are the same things I am seeing for models trained on the datasets that I am using.

from srflow.

martin-danelljan avatar martin-danelljan commented on August 11, 2024

Hi again. I fully understand, so dont worry :)

Ok so one other important thing regarding your experiment. When it comes to face models, they are trained for a certain resolution and it is important that the same input/output resolution is used during inference. Its actually the same for StyleGAN and most other works. Basically, the network learns to use the absolute image location in order to decide where to generate eyes, nose, hair etc. So if you further downscale your image to the resolution our pretrained network was trained for, O'm sure that you will get a much better result. And in order to make it super-resolve larger images from eg. FFHQ, it needs to be retrained for that. We haven't tried this yet actually.

Thanks for again for your interest in our work. And happy new year :D

/Martin

from srflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.