Comments (12)
The author exports the NLL loss here:
https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96
I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.
from srflow.
I have not explicitly trained models for facial SR, but I have used the pretrained models "successfully" in my repo. I'll upload a weight conversion script you can use to do the same if you care to. It'll be found in recipes/srflow/convert_official_weights.py
Before anyone spends a lot of GPU time training these models, though, I want to add a little input on my experience working with them. I would love it if the authors were to pipe in and help me out where I am wrong.
Let's use faces as an example. I pulled a random face from the FFHQ dataset and downsampled it 8x:
Original:
LQ:
I then went to the Jupyter notebook found in this repo and did a few upsample tests with the CelebA_8x model. Here is the best result:
Note that it is missing a lot of high frequency details and has some artifacts, notably a repeating pattern of squares (may only be visible if you download and zoom in). These are fairly typical in all SRFlow models I train, even with noise added to the HQ inputs as the authors suggest.
I then converted that same model into my repo, and ran a script I have been using to play with these models. One thing I can do with this script is generate the "mean" face for any LR input (simple really, you just use a Z=0 distribution into the flow network). Here is the output from that:
So what you are seeing here is what the model thinks the "most likely" HQ image is for the given LQ input. For reference, here is the image difference between the original HQ and the mean:
Note that the mean is missing a lot of the high-frequency details. My original suspiscion for why this is happening is that the network is encoding these details into the Z vector that it trains on, e.g. the Z vector never really collapses into a true gaussian, and instead holds on to structural information about the original image. To test this, I plotted the std(dim=1) and mean(dim=1) of the Z vectors (dim 1 is channel/filter dimension):
Mean:
Std:
In a well trained normalizing flow, these would be indistinguishable from noise. As you can see, they are not: the Z vector contains a ton of structural information about the underlying HQ image. This tells me that the network is unable to properly capture these high frequency details and map them to a believable function.
This is, in general, my experience with SRFlow. I presented one image above, but the same behavior is exhibited in pretty much all inputs I have tested with. The best I can ever get out of the network is images with Z=0, which produces appealing, "smoothed" images that beat out PSNR losses, but it is misses all of the high-frequency details that a true SR algorithm should be creating. No amount of noise at the Z-input produces these details: they are highly spatially correlated.
I think the idea behind SRFlow has some real merit. It is likely that these networks are not being trained properly or a better architecture needs to be presented to take advantage of that potential. I also think that projecting Z into a structural space might be harmful: the model can manipulate the Z statistics to "appear" close to a Guassian but still preserve structural details within that "noise" - but that's just a hunch from a dumb computer programmer..
Oh - one last edit: if you are really interested in facial SR for some reason, I'd highly recommend checking out "GLEAN". I have an implementation in my repo and it is exceptionally good at producing high frequency details and doing extreme SR, with the hard limitation that it can only do so on tightly focused datasets like faces/cats/etc.
from srflow.
I promised to update on training code, here you go:
https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflow
I've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models.
from srflow.
Hi.
First, we finally got thumbs up from the project funding partner to publish the training code, so @andreas128 will push it up asap after the holidays.
@neonbjb thank you for your analysis and sharing your experience. Here are some comments and answers to your observation:
-
What downsampling kernel did you use? Unfortunately all SR methods are very sensitive to this (not only SRFlow). We used the Matlab Bicubic, which is the most common choice in research. There are methods to train an SR method robust to different kernels, which could also be applied for SRFlow, but we haven't tried this.
-
Did you use the face SR model or the general SR model for this? It is important to note that no generic SR model can compete with specialized face SR methods, since they can learn very rich and specialized image priors for faces. On the other hand, the our generic SR models have probably not even seen a single HR face in the training set. You could try our face SR, but as most models, it assumes a particular face alignment and resolution so I dont expect it to work directly on your image without doing the same preprocessing as we did in the paper (code for this will come now soon).
-
Please note that SRFlow is a first step in this new research direction. Do not expect a final solution that works amazingly well everywhere. In the paper we demonstrate results that are very competitive on SOTA GAN based approaches. But of course, there is a large room for improvement. In fact, we know this since we are already working on an improved version of SRFlow that has substantially better output image quality. We hope to release a pre-print of this in the coming months.
-
We are also convinced about the underlying idea that an SR method need to explicitly account for the illposedness of the problem. I.e. that the training HR ground-truth is just one possibility. The network should thus not be "forced" to predict exactly this one. Flows provide a very exciting alternative for developing such an SR solution.
-
The exact structure of the latent space is an open research question. Although our experiments in our (and other) papers provide some insights, there is much to understand still that could guide us to develop even better solutions.
from srflow.
The author exports the NLL loss here:
https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.
May I ask a easy question?Why the output isn't a tensor,the size of which is (1,256), and each number in this tensor is a probability?
from srflow.
The author exports the NLL loss here:
https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96
I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.May I ask a easy question?Why the output isn't a tensor,the size of which is (1,256), and each number in this tensor is a probability?
I am not 100% sure about your questions. But if you sample a multi variable gaussian the overall probability will be the product of each element, which will be a sum in log. So they collapse.
from srflow.
I promised to update on training code, here you go:
https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflowI've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models.
Hi, James Betker! I have paid attention to some valuable open source you have done before. Thank you for sharing. You promised to update the SRFlow - training code. I could not find the train.py file through the link you sent. Could you please send me the code of train.py? Looking forward to your reply !
from srflow.
The author exports the NLL loss here:
https://github.com/andreas128/SRFlow/blob/master/code/models/modules/SRFlowNet_arch.py#L96
I've had some success training this flow network by simply taking the mean of that output and performing gradient descent as described in the paper. The code does appear to have a few bugs that need manual fixing, though, that are likely compensated for in the author's training code. If I can successfully reproduce results, I'll post something here.May I ask a easy question?Why the output isn't a tensor,the size of which is (1,256), and each number in this tensor is a probability?
I am not 100% sure about your questions. But if you sample a multi variable gaussian the overall probability will be the product of each element, which will be a sum in log. So they collapse.
i've been tried to define it as a sum of ln(xx+1) , xx+1 makes each element >=1 so it can calculate ln and ln(x*x+1)>=0.However, it only need 30 or 40 epoches ,the loss will suddenly become very large and PSNR drop to 6
from srflow.
I promised to update on training code, here you go:
https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflow
I've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models.
Hi, James Betker! I have paid attention to some valuable open source you have done before. Thank you for sharing. You promised to update the SRFlow - training code. I could not find the train.py file through the link you sent. Could you please send me the code of train.py? Looking forward to your reply !
Hi, you can find the train.py here.
https://github.com/neonbjb/DL-Art-School/blob/gan_lab/codes/train.py
from srflow.
I promised to update on training code, here you go:
https://github.com/neonbjb/DL-Art-School/tree/gan_lab/recipes/srflowI've been having a lot of fun with this architecture. Thanks to the authors for the ideas and for open sourcing the models.
Have you tried for face SR? Could this work for face SR simply by changing datasets?
from srflow.
Hi @martin-danelljan - excellent! I look forward to trying that code out!
I am using the cv2 bicubic downsampling kernel - the same one used by ESRGAN since it seems like you are borrowing a lot of your code from that repository. For this demonstration, I used the pretrained SRFlow_CelebA_8X.pth
weights provided by setup.sh
. The first result in the above post uses only code from this repo. The other results use code from my repo, which is nearly identical and should not have functional changes. I admit that it is possible that there are some changes, but the similarity in the HQ results seem to indicate not. I want to note that I had better results when using images from the CelebA dataset, but I specifically wanted to demonstrate how the model generalizes here.
Agree with all your other comments. My comment here was not made with the intent of conveying that this is a dead research direction. On the contrary - I think it is extremely exciting and I have nothing but admiration for what you have done. After spending the last half of a year playing around with these types of models, I am becoming convinced that the way forward for realistic SR is going to involve mapping HQ images to some latent space, and then performing transfers of latent data encoding high-frequency details from those images to corresponding LQ images. I think the approach SRFlow offers some tantalizing ways to make this happen.
My comment was instead to urge folks to be realistic about what they would be getting out of several hundred hours of GPU training using my repo, and I wanted to convey the point that I believe the poor results are not because I implemented something wrong. While I specifically dug into your pretrained model here, these results are the same things I am seeing for models trained on the datasets that I am using.
from srflow.
Hi again. I fully understand, so dont worry :)
Ok so one other important thing regarding your experiment. When it comes to face models, they are trained for a certain resolution and it is important that the same input/output resolution is used during inference. Its actually the same for StyleGAN and most other works. Basically, the network learns to use the absolute image location in order to decide where to generate eyes, nose, hair etc. So if you further downscale your image to the resolution our pretrained network was trained for, O'm sure that you will get a much better result. And in order to make it super-resolve larger images from eg. FFHQ, it needs to be retrained for that. We haven't tried this yet actually.
Thanks for again for your interest in our work. And happy new year :D
/Martin
from srflow.
Related Issues (20)
- Super Resolution Video HOT 1
- meet NAN when I use realSR dataset to train SrFlow HOT 1
- why doesn't LR encoding network g_θ need to be invertible? HOT 3
- Image manipulations
- How to crop CelebA dataset for training
- why only take $z_L$ into consideration instead of $z=(z_l)_l=1^L$ HOT 2
- Installation error HOT 1
- how to solve it?
- Is GT_size 160 (size of HR image) hard-coded in the code HOT 1
- Bug: RRDB not frozen for first half of the training
- Will you use all of the z during your training/testing process?
- How to optimize the jacobian matrix?
- How to improve the results
- It seems that the RRDB_DF2K_4X is not right.
- Is the RRDBNet correct?
- Why NLL is negative during the training? HOT 3
- Why add noise and how to calculate the logdet of this operation
- Is there something wrong in original code?
- 404 not found HOT 4
- how to train get SRFlow latent for a application like styletransfer?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from srflow.