Comments (6)
Hi, what batch size and training dataset are you using?
We use a batch size of 64 and the training dataset consists of DIV2K, Flickr2K, BSD500, and WED, with over 8000 training images.
Different batch sizes and training data yield different results. Larger batch sizes and more training images may lead to better results. Additionally, you may try training for a longer duration, as observed from the image you provided, the PSNR curve is still ascending.
from maskeddenoising.
Hi, I use batch size 64 as specific by your configuration file. What's more, I use same dataset as you, that is DIV2K, Flickr2K, BSD500, and WED. The only exception is I crop DIV2K and Flickr2K to small patches to speed up data loading process, which is a common practice used by Real-ESRGAN.
Now, I'm training models with 4 GPUs, a total of batch size 256, the results are shown below:
I do not see much different than single GPU training, I may train models with 300K iterations to see how them perform.
from maskeddenoising.
Hi, I have tried to train 300K iterations and validation on SIDD also.
See the figure below, light blue curve represents naive SwinIR (ignore the masked part which do not have any useful information):
For both validation set, the best performing model is observed in early iteration of naive SwinIR model, the one trained with masking strategy can not surpass it finally (increasing trend is disappear in the end of training).
Considering the performance of two models in the end of training, naive one performed better than SwinIR trained with masking strategy in McM, and performed bad than another in SIDD.
@haoyuc Could you please help me train a model to see the benefits of masking strategy?
from maskeddenoising.
Firstly, we appreciate your in-depth engagement with our work.
- Regarding "the best performing model is observed in early iteration of naive SwinIR model":
It's important to clarify that our evaluation of model performance is based on the results from the final, stable stage of training, not the peak performance observed in early training iterations. This approach is predicated on the practical application context: in real-world scenarios, we often lack detailed knowledge about the degradation distribution of test images, and performance testing is often unfeasible. Therefore, we typically use models that have reached a state of convergence and avoid deploying models that have only undergone very limited training.
- Experiment on McM:
As your experiment rightly shows, the masked training significantly outperforms the baseline model upon convergence for McM (Poison, 2). This advantage holds true across other noise categories and levels. Our method excels in situations where there is a significant disparity between the noise distribution of the training set and that of the test set. Therefore, when the noise level is intensified, the advantage of our method becomes even more evident, as demonstrated in our paper.
- Experiment on SIDD:
Regarding the SIDD experiments in our supplementary materials: "In order to simulate a scenario with extremely limited training samples, the training set only contains two 4K noisy – clean image pairs from SIDD".
In the experiments you conducted, it seems that you utilized all the training images from SIDD, leading to the observed results.
It's important to emphasize that our method is highly applicable in certain extreme real-world situations, such as
- when the number of available training set image pairs is minimal or
- when there is a significant disparity between the degradation distributions of the training and test sets.
We hope these responses address your queries. If you have any additional questions, let‘s have further discussions.
from maskeddenoising.
Hello, I encountered the above error while training this model. Could it be possible that the dataroot_H path is incorrect?
from maskeddenoising.
@Maoeyu Hi, you need to change "dataroot_H" to your own data path.
from maskeddenoising.
Related Issues (20)
- What caused the inconsistency between training and testing?
- The code for input mask and attention mask HOT 3
- About CKA calculation
- 关于论文没有提及的细节的一些疑问 HOT 2
- 关于input mask代码的一点疑问 HOT 2
- 请问训练使用的图像一共多少张?数据集分辨率是X2还是多少? HOT 1
- How to download the pretrained models? HOT 1
- how to denoise images which are not in 500X500 dimension?
- 只有噪声数据可以训练吗 HOT 1
- 关于权重的问题 HOT 2
- testing 時的 attention mask
- 新论文 LWay何时开源啊? HOT 1
- // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg" | "masked_denoising"
- mixture噪音
- hello
- hello
- 训练问题 HOT 1
- 训练问题
- Implementation of Input Mask
- Code of SRGA
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from maskeddenoising.