cszn / kair Goto Github PK

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Home Page: https://cszn.github.io/

License: MIT License

Python 90.51% C++ 2.76% Cuda 5.10% MATLAB 1.62% M 0.01%

image-restoration denoising super-resolution sisr pytorch toolbox dncnn usrnet ffdnet srmd

kair's Introduction

Kai Zhang

[Homepage] [Google Scholar] [ResearchGate] [知乎]

I am currently a postdoctoral researcher at Computer Vision Lab, ETH Zurich, Switzerland.

⚡ News:

Our new work SCUNet for practical image denoising.

⚡ News

We released the testing code of BSRGAN!

🌱 My Repositories

SCUNet	BSRGAN	KAIR	USRNet	DPIR	IRCNN	SRMD	DPSR	DnCNN	FFDNet

The pytorch training and testing codes of SRMD, DnCNN, FFDNet can be found in KAIR.

kair's People

Contributors

Stargazers

Watchers

Forkers

castellanliu bolt1st angleboy8 kanshichao sybnudt duducode zhaoqiangshen lmcltj liuguoyou mayinjin zj15001 starstylesky kanbo0409 surej0824 liuwenbo3 gaimjkp mazzzystar futureprecd kaizen123 haolyshiit franyi zaccharieramzi zhengjing8628 shivamkhare95 panpp2019 ahuatian25 pbdahzou yangsenwxy jjwangnlp clscy dr-alok-tiwari zsm1211 zeeshannadir scape1989 haozhen315 holmesperfectc zhushaoquan buuuuuuug zongjg joeupwu parallax-zhao hjc1009jin lelegogo26 devhliu zeizeidealice yefeichen1 wuhaoboom tvanguard2018 972461099 lincolnsun dsp6414 jiangzt ringzl dunazo chw0806-github db1929 phunghx qlawliet lizhangscience ws1414153477 qqw-123 saengsawang yangyuren03 hwshen96 ltyong wulalaa-coder emanueledalsasso wonex chisyliu curre11 rsrscoder meloneater bdotgradb green-s lgc1156958932 smt970913 gaogodfather yingnanma zt706 styler00dollar yangfei963158659 kevinkevin325 njxzhang 2019-paper-fun c00renut haipinglu wh-forker zlou pingdixiaoguai canbuoy dhiyu sharny gifwittit creatorcen sporterman yongsongh zfxu hkzhang-git lily977 cv-ip

kair's Issues

I don't have permission to access the model.any idea why?

DPSR Bad quality on testing

Hi @cszn,

I've trained DPSR for my own dataset and the results seem pretty good when training.
If I apply the learned weights to some new images (very similar to the ones used for training) the output is very bad?
Here an example:
1- The low resolution image (cropped just for this example):

2- This is the result generated during the phase 6 (testing) while training with 'main_train_dpsr.py':

I was expecting this to be the result when applying the learned weights to my low resolution images, but the result when using the main_test_dpsr.py with the learned weights is:

Am I missing something? Could you please suggest?
Many thanks in advance

IMDN Training

In training, loaded all cpu cores 100%. GPU 15-20%
Any ideas? How to fix

train_imdn.json
"upsample_mode": "upconv"

main_test.imdn.py

from models.network_imdn import IMDN as net
    model = net (in_nc = n_channels, out_nc = n_channels, nc = 64, nb = 8, upscale = scale_factor, act_mode = 'L', upsample_mode = 'pixelshuffle')

If set
model.load_state_dict (torch.load (model_path), strict = False)
This leads to black images. On a trained model. otherwise, main_test.imdn.py throws an error

Can not get similar results by test color model of ffdnet on matlab and pytorch

I test ffnet model ffdent_color.mat on matlab and ffdnet_color.pth with same image and same noise level ,matlab get very good result but pytorch result worse.

I test matlab with demo Demo_Real_Color.m and pytorch with main_test_ffdnet.py

Why the train is so slow?

My train log is:

21-04-08 09:20:55.248 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-08 09:20:55.248 : loading PCA projection matrix...
21-04-08 09:20:55.248 : Random seed: 8094
21-04-08 09:20:55.380 : Number of train images: 3,550, iters: 56
21-04-08 09:20:57.633 : 
Networks name: SRMD
Params number: 1553200
Net structure:
SRMD(
  (model): Sequential(
    (0): Conv2d(19, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU(inplace=True)
    (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (17): ReLU(inplace=True)
    (18): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (19): ReLU(inplace=True)
    (20): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (21): ReLU(inplace=True)
    (22): Conv2d(128, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (23): PixelShuffle(upscale_factor=4)
  )
)

21-04-08 09:20:57.636 : 
 |  mean  |  min   |  max   |  std   || shape               
 | -0.000 | -0.058 |  0.064 |  0.015 | torch.Size([128, 19, 3, 3]) || model.0.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.0.bias
 |  0.000 | -0.024 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.2.bias
 | -0.000 | -0.025 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.4.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.4.bias
 | -0.000 | -0.027 |  0.024 |  0.006 | torch.Size([128, 128, 3, 3]) || model.6.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.6.bias
 | -0.000 | -0.029 |  0.024 |  0.006 | torch.Size([128, 128, 3, 3]) || model.8.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.8.bias
 |  0.000 | -0.025 |  0.024 |  0.006 | torch.Size([128, 128, 3, 3]) || model.10.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.10.bias
 |  0.000 | -0.029 |  0.027 |  0.006 | torch.Size([128, 128, 3, 3]) || model.12.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.12.bias
 | -0.000 | -0.025 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.14.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.14.bias
 | -0.000 | -0.026 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.16.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.16.bias
 | -0.000 | -0.025 |  0.027 |  0.006 | torch.Size([128, 128, 3, 3]) || model.18.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.18.bias
 |  0.000 | -0.027 |  0.027 |  0.006 | torch.Size([128, 128, 3, 3]) || model.20.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.20.bias
 | -0.000 | -0.023 |  0.024 |  0.006 | torch.Size([48, 128, 3, 3]) || model.22.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([48]) || model.22.bias

21-04-08 10:12:27.138 : <epoch:  3, iter:     200, lr:1.000e-04> G_loss: 1.228e-01 
21-04-08 11:03:26.168 : <epoch:  7, iter:     400, lr:1.000e-04> G_loss: 1.004e-01 
21-04-08 11:55:42.532 : <epoch: 10, iter:     600, lr:1.000e-04> G_loss: 8.437e-02 
21-04-08 12:46:08.205 : <epoch: 14, iter:     800, lr:1.000e-04> G_loss: 7.818e-02 
21-04-08 13:38:28.291 : <epoch: 18, iter:   1,000, lr:1.000e-04> G_loss: 5.932e-02 
21-04-08 14:28:32.066 : <epoch: 21, iter:   1,200, lr:1.000e-04> G_loss: 6.853e-02 
21-04-08 15:20:37.527 : <epoch: 25, iter:   1,400, lr:1.000e-04> G_loss: 5.390e-02 
21-04-08 16:11:03.519 : <epoch: 29, iter:   1,600, lr:1.000e-04> G_loss: 5.861e-02 
21-04-08 17:01:57.788 : <epoch: 32, iter:   1,800, lr:1.000e-04> G_loss: 5.812e-02 
21-04-08 17:54:55.700 : <epoch: 36, iter:   2,000, lr:1.000e-04> G_loss: 4.487e-02 
21-04-08 18:45:43.054 : <epoch: 39, iter:   2,200, lr:1.000e-04> G_loss: 5.985e-02 
21-04-08 19:38:15.152 : <epoch: 43, iter:   2,400, lr:1.000e-04> G_loss: 6.035e-02 
21-04-08 20:28:46.777 : <epoch: 47, iter:   2,600, lr:1.000e-04> G_loss: 5.407e-02 
21-04-08 21:21:19.155 : <epoch: 50, iter:   2,800, lr:1.000e-04> G_loss: 5.800e-02 
21-04-08 22:12:26.084 : <epoch: 54, iter:   3,000, lr:1.000e-04> G_loss: 4.669e-02 
21-04-08 23:04:49.046 : <epoch: 58, iter:   3,200, lr:1.000e-04> G_loss: 5.707e-02 
21-04-08 23:55:42.746 : <epoch: 61, iter:   3,400, lr:1.000e-04> G_loss: 5.521e-02 
21-04-09 00:48:11.666 : <epoch: 65, iter:   3,600, lr:1.000e-04> G_loss: 5.583e-02 
21-04-09 01:39:08.950 : <epoch: 69, iter:   3,800, lr:1.000e-04> G_loss: 4.659e-02 
21-04-09 02:30:07.278 : <epoch: 72, iter:   4,000, lr:1.000e-04> G_loss: 6.075e-02 
21-04-09 03:22:56.870 : <epoch: 76, iter:   4,200, lr:1.000e-04> G_loss: 5.796e-02 
21-04-09 04:13:49.914 : <epoch: 79, iter:   4,400, lr:1.000e-04> G_loss: 4.472e-02 
21-04-09 05:06:26.278 : <epoch: 83, iter:   4,600, lr:1.000e-04> G_loss: 4.891e-02 
21-04-09 05:56:58.472 : <epoch: 87, iter:   4,800, lr:1.000e-04> G_loss: 5.581e-02 
21-04-09 06:49:25.905 : <epoch: 90, iter:   5,000, lr:1.000e-04> G_loss: 6.413e-02 
21-04-09 06:49:25.905 : Saving the model.
21-04-09 06:49:26.138 : ---1-->   baby.bmp | 26.81dB
21-04-09 06:49:26.158 : ---2-->   bird.bmp | 22.54dB
21-04-09 06:49:26.170 : ---3--> butterfly.bmp | 18.75dB
21-04-09 06:49:26.218 : ---4-->   head.bmp | 26.36dB
21-04-09 06:49:26.253 : ---5-->  woman.bmp | 22.48dB
21-04-09 06:49:26.303 : <epoch: 90, iter:   5,000, Average PSNR : 23.39dB

21-04-09 07:40:04.505 : <epoch: 94, iter:   5,200, lr:1.000e-04> G_loss: 5.548e-02 
21-04-09 08:32:23.308 : <epoch: 98, iter:   5,400, lr:1.000e-04> G_loss: 5.314e-02 
21-04-09 09:22:56.333 : <epoch:101, iter:   5,600, lr:1.000e-04> G_loss: 5.548e-02

非常抱歉，说英文比较费事，下面我会用中文描述问题。
我做了如下计算:开始训练是09:20.57,训练出来一个结果是第二天早上06:49.26，总共训练22小时27分，epoch次数是90.平均训练一次花费时间15分钟。程序设定训练1000000次，花费总时间为10,393.5天，即28.5年。
这样的训练速度也太慢了吧。有什么提升速度的方法吗？
而且GPU的占用率一直非常低，不知道是什么原因，请问有没有解决方案。

Size mismatches and missing keys with testing

Hi, so after I've trained for a bit I want to actually use this model in another code. So following the example set in main_test, I run the following snippet:

    denoiser = net(in_nc=1, out_nc=1, nc=64, nb=17, act_mode='R')
    denoiser.load_state_dict(torch.load(os.path.join(args.model_dir, args.model_name)), strict=True)
    denoiser.eval()

However when it loads in the state dict, I get a myriad of size mismatches and missing keys. Admittedly I am terminating training early because its currently taking ~10 days to train a network, but I don't understand why it's able to run this model in the training mode but not in testing mode.

Slow convergence

Thanks for the great work again.

I am wondering whether there might be something wrong with the learning rate scheduler.
I tested "main_train_rrdb_psnr.py", found that the learning rate quickly been scheduled to "1.563e-06" from early stage of training.
Thought I checked update_learning_rate and option, it seems ok implementation-wise.

But I observed much slower convergence comparing to implementation from xinntao,
Did you observe same phenomenon?

Training issue

Getting this issue while running the training code. I guess it's because of the missing h_nc value in the json. What h_nc value should be kept for the model?

OOM when allocating tensor with shape[1,64,4028,3020]

Cannot find the training code for USRNet

Thank you very much for your excellent work, I would like to know if you can open source the training code of USRNet

About the code KAIR/models/model_gan.py: 177-198

code is here:

if current_step % self.D_update_ratio == 0 and current_step > self.D_init_iters:  # updata D first
    if self.opt_train['G_lossfn_weight'] > 0:
        G_loss = self.G_lossfn_weight * self.G_lossfn(self.E, self.H)
        loss_G_total += G_loss                 # 1) pixel loss
    if self.opt_train['F_lossfn_weight'] > 0:
        real_fea = self.netF(self.H).detach()
        fake_fea = self.netF(self.E)
        F_loss = self.F_lossfn_weight * self.F_lossfn(fake_fea, real_fea)
        loss_G_total += F_loss                 # 2) VGG feature loss

    pred_g_fake = self.netD(self.E)
    if self.opt['train']['gan_type'] == 'gan':
        D_loss = self.D_lossfn_weight * self.D_lossfn(pred_g_fake, True)
    elif self.opt['train']['gan_type'] == 'ragan':
        pred_d_real = self.netD(self.var_ref).detach()
        D_loss = self.D_lossfn_weight * (
            self.D_lossfn(pred_d_real - torch.mean(pred_g_fake), False) +
            self.D_lossfn(pred_g_fake - torch.mean(pred_d_real), True)) / 2
    loss_G_total += D_loss                     # 3) GAN loss

    loss_G_total.backward()
    self.G_optimizer.step()

For example, if current_step == 1 and self.D_init_iter == 10, which means we should update model G not D, but the code in the if-else do self.G_optimizer.step().
Does these code do the update on D first as what the comment said?

Is that my mis-understanding of your code :)

Stucked while training FFDNet

Hello. I have been referring to your research on a daily basis.
So I have one question.
I am trying to train FFDNet using my rgb dataset by running main_train_ffdnet.py,
But it stops working at the part "for i, train_data in enumerate(train_loader):".
I can run it using a grayscale image.
I'll post the log. please help me!

train.log

dear writer, how to use gpu to run imdn?

关于在测试时对退化核和噪声系数的选择

您好，请问usrnet在SR inference的时候除了对退化核(如k1-k12)和噪声系数(如0-0.1)进行排列组合的尝试，还有什么更快或者更自动化的方法对退化核和噪声系数的组合进行筛选，从而针对该测试数据集(如一段视频等)达到最好的视觉效果？您的看法或者建议？

Question regarding the batch normlisation after a convolution

In DnCNN, the 2D convolution is followed by a batch normalisation layer.
However, we can see that bias is used in the convolution (https://github.com/cszn/KAIR/blob/master/models/network_dncnn.py#L60-L63). From what I understand, it's important to remove that bias in the case of a convolution followed by a batch normalisation because it will be canceled out.

Is there any reason to keep it?

which file is the traing file of USRNet Model ?

Hey! Thanks for your contribution! This could help me a lot for my jobs ! But now I have an issue so I hope you can answer me for my confused like the title about this issuse. Thank you！

Bad DPSR test results default weights

Hi,
I'm trying to generate some results with the defaults weights (dpsr_x4.pth) on testsets/set12 testset, by using main_test_dpsr.py.
The result images continue having the same dimensions (so no x4 applied) and the quality is much worst then the original ones.
F.e: The original low quality image:

The result after applying the weights:

This are the configration values:

noise_level_img = 0                  # default: 0, noise level for LR image
noise_level_model = noise_level_img  # noise level for model    
model_name = 'dpsr_x4'           # 'dpsr_x2' | 'dpsr_x3' | 'dpsr_x4' | 'dpsr_x4_gan'
testset_name = 'set12'                # test set,  'set5' | 'srbsd68'
need_degradation = False              # default: True
x8 = False                           # default: False, x8 to boost performance
sf = [int(s) for s in re.findall(r'\d+', model_name)][0]
show_img = False                     # default: False
task_current = 'sr'       # 'dn' for denoising | 'sr' for super-resolution
n_channels = 3            # fixed
nc = 96                   # fixed, number of channels
nb = 16                   # fixed, number of conv layers
model_pool = 'model_zoo'  # fixed
testsets = 'testsets'     # fixed
results = 'results'       # fixed_

Could you please suggest what am I doing wrong?
Thanks a lot in advance!

util.tensor2uint: It takes a lot of time to convert GPU tensor to CPU tensor

Hi, I successfully run the test code of FFDNet. But I find it takes a lot of time (4.5s) to convert GPU tendor to CPU tensor and convert to uint8 image. Can you help me solve the problem. Thanks.

About gan code

Hi, there:
thanks for your great work. I noticed in your code about GAN training in model_gan.py, the parameters in discriminator D are fixed (requires_grad=false ) before back propagating the gradients of Generator G. Is this 'fix operation' necessary? Thank you.

test issue

Hello! When I use the msrresnet(getting from the training process,e.g.560000_G.pth) for the test(main_test_msrresnet.py), it occurs an error. But when i use the msrresnet_x4_psnr.pth for the test, there is no errors, how to solve this problem?

Training question

Thanks for sharing.

if i want to train my own dataset, do i need to conform the format of the DIV2K? i noticed that you have put up a few dataset e.g. from Flick2K etc. Please advise. thank you

training usrnet

Hi,
I was trying to train the USRNET but it gets stuck every time at line 135 in file main_train_msrresnet_psnr.py
for i, train_data in enumerate(train_loader):
it stuck in this loop when trying to enumerate the train_loader. Any solution, please?

Thanks

Could you please upload the pre-tained model to Baidu Netdisk ?

Hello ! I can not download the Model_Zoo from Google. Could you please upload the pre-tained model to Baidu Netdisk if it's convenient？ Thank you .

hello, what should I do to train a DPSRGAN?

Training parameters for FFDNet

Hi,
Thanks for the code. I want to train the FFDNet, however I don't understand why the training parameters in the JSON file are different from those in the paper: batch normalization is not activated (and therefore, there is no merging neither), the patch size is set to 64 instead of 70 (or 50 for color image denoising), the batch size is set to 64 instead of 128, L1 is set as the loss function instead of the one shown in the paper, and the generation of training patches is also different since the notion of epoch is not implemented. Could you explain the reason why these settings are changed, especially the absence of batch normalization? Thank you very much!

Dear，sir. After I complete the SRMD model training, Indexerror is displayed in the test

After I complete the SRMD model training, Index error is displayed in the test。
But when I load the trained model online（srmd_×2，×3，×4）, no error occurs.
If you can help me, I will be particularly grateful！

Excessive memory usage usrnet

I'm testing usrgan and I'm surprised that I'm not able to apply it to a 750x1000 px image, considering that the model has only 17016016 parameters. The error is CUDA out of memory and my graphic card has 8GB of memory. I have also tested it on a 11GB 1080Ti and the error still happens. Can someone test it and shed some light on this matter?

I'm pasting the full traceback here:

Traceback (most recent call last):
File "main_test_usrnet.py", line 236, in
main()
File "main_test_usrnet.py", line 190, in main
x = model(x, k, sf, sigma)
File "C:\Users\usrname\anaconda3\envs\tensor\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Desktop\KAIR-master\models\network_usrnet.py", line 342, in forward
x = self.p(torch.cat((x, ab[:, i+self.n:i+self.n+1, ...].repeat(1, 1, x.size(2), x.size(3))), dim=1))
File "C:\Users\usrname\anaconda3\envs\tensor\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Desktop\KAIR-master\models\network_usrnet.py", line 237, in forward
x2 = self.m_down1(x1)
File "C:\Users\usrname\anaconda3\envs\tensor\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\usrname\anaconda3\envs\tensor\lib\site-packages\torch\nn\modules\container.py", line 117, in forward
input = module(input)
File "C:\Users\usrname\anaconda3\envs\tensor\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Desktop\KAIR-master\models\basicblock.py", line 222, in forward
res = self.res(x)
File "C:\Users\usrname\anaconda3\envs\tensor\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\usrname\anaconda3\envs\tensor\lib\site-packages\torch\nn\modules\container.py", line 117, in forward
input = module(input)
File "C:\Users\usrname\anaconda3\envs\tensor\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\usrname\anaconda3\envs\tensor\lib\site-packages\torch\nn\modules\conv.py", line 419, in forward
return self._conv_forward(input, self.weight)
File "C:\Users\usrname\anaconda3\envs\tensor\lib\site-packages\torch\nn\modules\conv.py", line 416, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 2.86 GiB (GPU 0; 8.00 GiB total capacity; 4.10 GiB already allocated; 1.88 GiB free; 4.43 GiB reserved in total by PyTorch)

DPSR training problem

Thanks for @cszn contribution, DPSR is an amazing job, so I give author a star to say thanks! But in my training，It takes a long time. When I checked, I found that my CPU usage was very high and GPU usage was very low(such as image1 and image2)，So I wanted to ask：
1.How long does a DPSR training normally take？
2.Have you ever experienced high CPU usage and low GPU usage before?How do you solve the problem of low GPU usage and high CPU usage?

My email address is [email protected]，I look forward to further discussion with you.
image 1： https://imgchr.com/i/8shnK0
image 2 : https://imgchr.com/i/8s4wyq

Blind DNCNN

There isn't any training code available for the blind DnCNN in either this repository or the DnCNN one. I was wondering if this could be provided.

Dependencies for USRNet

Hi @cszn, where can I find the packages needed to install in order to be able to test and train USRNet? Thanks in advance

decompressing data: inconsistent stream state

Hello, I'm in python main_test_usrnet.py encountered
Traceback (most recent call last):
File "/root/userfolder/anaconda3/lib/python3.6/site-packages/hdf5storage/init.py", line 1768, in loadmat
with h5py.File(filename, mode='r') as f:
File "/root/userfolder/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 271, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/root/userfolder/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 101, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (File signature not found)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main_test_table1.py", line 230, in
main()
File "main_test_table1.py", line 87, in main
kernels = hdf5storage.loadmat(os.path.join('kernels', 'kernels_12.mat'))['kernels']
File "/root/userfolder/anaconda3/lib/python3.6/site-packages/hdf5storage/init.py", line 1801, in loadmat
**keywords)
File "/root/userfolder/anaconda3/lib/python3.6/site-packages/scipy/io/matlab/mio.py", line 208, in loadmat
matfile_dict = MR.get_variables(variable_names)
File "/root/userfolder/anaconda3/lib/python3.6/site-packages/scipy/io/matlab/mio5.py", line 272, in get_variables
hdr, next_position = self.read_var_header()
File "/root/userfolder/anaconda3/lib/python3.6/site-packages/scipy/io/matlab/mio5.py", line 226, in read_var_header
mdtype, byte_count = self._matrix_reader.read_full_tag()
File "mio5_utils.pyx", line 548, in scipy.io.matlab.mio5_utils.VarReader5.read_full_tag
File "mio5_utils.pyx", line 556, in scipy.io.matlab.mio5_utils.VarReader5.cread_full_tag
File "streams.pyx", line 171, in scipy.io.matlab.streams.ZlibInputStream.read_into
File "streams.pyx", line 158, in scipy.io.matlab.streams.ZlibInputStream._fill_buffer
zlib.error: Error -2 while decompressing data: inconsistent stream state
My environment：
h5py 2.7.0
hdf5storage 0.1.15
torch 1.6.0+cu92
torchvision 0.7.0+cu92
Have you ever encounterd such a problem? Thank you very much.

您好，我在python main_test_usrnet.py遇到
Traceback (most recent call last):
File "/root/userfolder/anaconda3/lib/python3.6/site-packages/hdf5storage/init.py", line 1768, in loadmat
with h5py.File(filename, mode='r') as f:
File "/root/userfolder/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 271, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/root/userfolder/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 101, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (File signature not found)

During handling of the above exception, another exception occurred:

我的环境
h5py 2.7.0
hdf5storage 0.1.15
torch 1.6.0+cu92
torchvision 0.7.0+cu92
请问有遇到过这种问题吗？十分感谢

test srmd

i test srmd，but the resolution the generated image has not changed

USRNET training

Hi I am a beginner with Super Resolution task I was trying the usrnet.py as given in the repo, I created a similar .json file for usrnet by taking hints from other models. I am having issue with the training. Please provide directions on how to train the model correctly.

main_train_msrresnet

How can I train USRGan and other tiny models?

Hey, thanks for contributing. Your work is awesome. I tried to find the USRGan and USR-tiny models training support, but didn't get any of these? Is there any support for this currently?

DatasetUSRNet

20-09-25 03:21:23.915 : <epoch:  5, iter:   5,000, lr:1.000e-04> G_loss: 3.441e-02 
20-09-25 03:21:23.916 : Saving the model.
Traceback (most recent call last):
  File "E:/Work/KAIR/main_train_msrresnet_psnr.py", line 219, in <module>
    main()
  File "E:/Work/KAIR/main_train_msrresnet_psnr.py", line 178, in main
    for test_data in test_loader:
  File "D:\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 363, in __next__
    data = self._next_data()
  File "D:\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 989, in _next_data
    return self._process_data(data)
  File "D:\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1014, in _process_data
    data.reraise()
  File "D:\anaconda3\lib\site-packages\torch\_utils.py", line 395, in reraise
    raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "D:\anaconda3\lib\site-packages\torch\utils\data\_utils\worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "D:\anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "D:\anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "E:\Work\KAIR\data\dataset_usrnet.py", line 116, in __getitem__
    return {'L': img_L, 'H': img_H, 'k': k, 'sigma': noise_level, 'sf': self.sf, 'L_path': L_path, 'H_path': H_path}
AttributeError: 'DatasetUSRNet' object has no attribute 'sf'


Process finished with exit code 1

Difference of almost 5dB when training with opencv/pillow bicubic.

Hello again,
There is a difference of almost 5dB when I train with opencv/pillow bicubic kernel as compared to default matlab. Is matlab kernel easy to learn, as compare to others? Why is this happening?

DPSR Training Error

Thanks for @cszn contribution, DPSR is an amazing job.
I was tried to train with my own dataset with main_train_dpsr.py by using pretrained_netG.
I used pretrained_netG with dpsr repository's model, DPSRx4.pth.
But i got runtime error.
RuntimeError: Error(s) in loading state_dict for SRResNet:
Missing key(s) in state_dict: "model.3.weight", "model.3.bias", "model.6.weight", "model.6.bias".
Unexpected key(s) in state_dict: "model.2.weight", "model.2.bias", "model.5.weight", "model.5.bias".

How can i fix? Hope your kind help. Thanks

Training error in release mode

Hi
First of all, i'm very appreciate to your nice work.

When I tried to train the ffdnet it stucked in dataloader in epoch "for loop".

However, it works in debugging mode without stucking.
Have you ever been strike this kind of problem?

If you solve this problem previously, please share the solution.

Thanks in advance.

About spatially variant degradation feature in Dimensionality Stretching

Great paper and great code!
I didn't find codes correspond to the feature “spatially variant degradation”, and I am also confused about how "Dimensionality Stretching" strategy can involve spatial information...

A beginner hoping for your reply, thanks

Training Course

Hello, everybody, I wanna see loss when network is training using tensorboard.
How can i see training course in KAIR?
thanks

model.init_train() should be called before logger.info()

Hi,

Your software is interesting. By the way,
In training, an error no attribute 'opt_train' occured, it seems the reason is logger.info() is called before model.init_train() . We should call model.init_train() before logger.info().

# orig order
   logger.info(model.info_network()) 
   model.init_train()   # this should be called before logger.info
   logger.info(model.info_params()) 
# fixed
   model.init_train() 
   logger.info(model.info_network()) 
   logger.info(model.info_params())

The error log is

Traceback (most recent call last):
  File "main_train_msrresnet_gan.py", line 224, in <module>
    main()
  File "main_train_msrresnet_gan.py", line 128, in main
    logger.info(model.info_network())
  File "/work/s124087/KAIR/models/model_gan.py", line 297, in info_network
    if self.opt_train['F_lossfn_weight'] > 0:
AttributeError: 'ModelGAN' object has no attribute 'opt_train'

Also I encountered the following warning.

/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:100: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

Black and white images when training SRMD

Hi,

Thanks for this repository.

I am trying to retrain SRMD x4 on DIV2K so I changed options/train_srmd.json with my database path and changed the frequency of the checkpoints. After one day of training on GPU (around 1300 iterations), all the generated images seem to estimate the HR but in black and white (with 3 channels). I plotted the LR and HR images and they are in colour.

Does the colour usually appear later during the training or is there something to add?

Thanks,
Charles

upscale directly

hi @cszn
thank you for this great progect
I noticed that the project downscale the image in folder ( testsets ) first and then upscale it again
how can i upscale the low res image directly without downscale please

Optimizing DPSR training to avoid low and spiky GPU usage

I have the DPSR training running but it does not appear to be fully utilising the GPU which spikes in a saw too profile as below:

I've tried adjusting the batch size but the change doesn't improve overall GPU utilisation just the profile of the saw tooth.
Overall from the stats and GPU power draw I estimate it is at best hitting about 50% utilization when other GAN training I have done has been able to sustain above 90% when optimised.

Are there any other parameters I should be looking to tweak?

DPSR error in training

/home/pjh/anaconda3/envs/DPSR/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:100: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

I am keep getting this error when I'm trying to run the training code (main_train_dpsr.py).
I've seen the issue#2 that deals with the similar error but I'm wondering if the problem is also solved in the DPSR code! Do you have any solutions?

main_test_rrdb.py

用main_train_rrdb_psnr.py训练出model之后，用test代码不能正常超分图片。
用原model = net(in_nc=3, out_nc=3, nc=64, nb=23, gc=32, upscale=4, act_mode='L', upsample_mode='upconv')
得出的输出会变黄色，后面按照option改成了
model = net(in_nc=3, out_nc=3, nc=64, nb=23, gc=32, upscale=4, act_mode='R', upsample_mode='upconv')
得出的输出会变黑色
训练过程中的验证是能正常超分出来的

network_msresnet.py

Hi, @cszn , what the main difference between Msrresnet0 and Msrresnet1? Does it seem that Msrresnet1 is easier to convergence? thanks!

Can‘t reproduce the same test results on BSD68 using my own USRNET training model.

Hi, I trained the USRNet using the code in this repository recently. I haven't found problems in data and network during training. But I got the worse results than yours in the paper. So I'd like to ask for some hints to train the model correctly.
I was wondering whether the mannual seed affecting the results. If so, Can you show me your setting of mannual seed during your training? Thanks for your help.

My results were shown above, the blue ones are results in the paper. I can get the same results using the pretrained model download from drive. The red ones are my results which have a large gap between the ones in papers.

FFDNet on custom dataset

Hi! Thank you so much for making this work open source!

I have a custom dataset (train/test) which I would like to use for training and testing. I was able to change the dataroot_H paths and successfully begin training of the model. Is there anything else I should change in the code for the best results on my particular dataset?

Thanks!