In paper of edsr,rcan and so on, before residual there should be a conv. For examp

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

a question about residual in your paper about esrgan HOT 8 CLOSED

xinntao commented on August 15, 2024

a question about residual in your paper

from esrgan.

Comments (8)

splinter21 commented on August 15, 2024

我还有一个问题，对比了RCAN网络，如果我们定义64x64feature maps的3x3conv为一个基本单位，忽略其他层，那么
RCAN的计算成本为2x20x10=400
1个denseblock计算成本 64x32+96x32+128x32+192x32+256x64=9个基本单位
23个RRDB的RRDBNet计算成本9x3x23=621，计算成本为RCAN的1.5倍多，也说明网络比RCAN复杂很多
但是在同时没有self ensemble的情况下，成绩依旧还是落后RCAN一些的，尤其是Urban100。
但是当数据集范围变更广泛时，成绩提升很大。因此我比较好奇，如果同时是在DF2K数据集的情况下，RCAN和RRDBNet的成绩谁更优异？

from esrgan.

xinntao commented on August 15, 2024

Hi @Splinter22

About the conv in the residual path . We have not investigated the 'long' and 'short' residual paths and their 'convs'. You may try to investigate them.
ESRGAN is mainly for visual-pleasing SR results. So the structure has not been optimized for PSNR. We believe that attention mechanisms such as that in RCAN will further improve the PSNR performance, which we have not used.
One observation in our work is that a deeper network will improve the quality of GAN-based approaches with easy training (while SRGAN claimed that deeper models are increasingly difficult to train). Therefore, we try a deep network with RRDB with simply stacking the blocks.
The calculation you mentioned is more about the network parameters. I have calculated the parameters of RRDB and RCAN. (RRDB: 16,697,987; RCAN: 15,592,355). RRDB has slightly more parameters but not as much as 1.5x, maybe you ignore other parameters.
We note that PSNR is not our main purpose. We compare the PSNR for a reference and the RRDB model with stacked blocks can also achieve high PSNR and DF dataset can further improve the performance.

We provide a comparison under the same DIV2K dataset and their performance of RRDB and RCAN can be found in the supplementary materials.
Note that if want to achieve higher PSNR, RCAN is a better choice for your baseline and comparision, while our method is mainly for better visual quality.

from esrgan.

splinter21 commented on August 15, 2024

@xinntao 有关res scale的参数0.2，这个是怎么得到的呢？如果是不停通过实验找到最佳参数，对于这么深的网络，付出的代价太过庞大。有什么快速得到它的技巧或者如何观察到这个值的区间呢？

from esrgan.

xinntao commented on August 15, 2024

From previous works, Inception-v4, inception-resnet and the impact of residual connections on learning first proposes residual scaling with scaling factors between 0.1 and 0.3. And EDSR also uses this technique with scaling factor 0.1.
So from previous experience, we'd better choose some values between 0.1 and 0.3.

And we choose 0.2 without many trials. When it is getting convergence, we then use the corresponding value. That means, we do not finish the whole training to pick the optimal scaling factor.

from esrgan.

splinter21 commented on August 15, 2024

抱歉抱歉，计算denseblock那里我算错了，应该是64x32+96x32+128x32+160x32+192x64=6.5个基本单位，所以对比RCAN400应该是6.5x3x23=448.5这样差距就小很多了，不会有1.5倍这么大，然后RCAN还有channel attention层一点微小的计算成本没有算进去。

今天才去RCAN的github上看了下issue，完整的训练要1000个epoch，我终于知道为什么我复现RCAN会失败了。之前是基于EDSR的pytorch版本改的，基本上200-250个epoch之间，decay2次learning rate就收敛了，而RCAN则是收敛比较慢，但是收敛后效果会很好的那种，我只训练250epoch是观察不到最终收敛的效果的。

有关模型调整和优化最后的实验结果，前辈有什么比较好的建议吗，如果是在满配置（block数量达到论文指标模型中最大值）下改好几个模型，还有训练观察非常多的epoch，实验周期太长了，需要的GPU卡数也非常大，而在baseline配置下（feature map数与block数调低）修改模型，实验周期会比较短，虽然能验证修改模型，哪些模型能work，在baseline配置下达到更好的指标，但是我并不知道他在满配置下是不是也work。实验室的服务器GPU卡数，性能，显存大小都不可能和公司的服务器竞争，在GPU算力有限的情况下，有没有比较高效的实验计划与策略？

感谢！

from esrgan.

splinter21 commented on August 15, 2024

有关训练集，我这边网络的测试结果是，（RGB、传统4个测试集+DIV2K100验证集，训练集data augment开启）PSNR DIV2K和Flickr2K差不多，DF2K比DIV2K平均高0.1dB多一点（合一起就会高一些，这个应该如何解释？
limbee/NTIRE2017#29
中EDSR作者说3者在不过拟合的情况下影响微不足道，看来不是正确的），比RRDBNet的差距要小一点，可能DIV2K800训练集不能很好地发挥出RRDBNet的潜力吧？

from esrgan.

xinntao commented on August 15, 2024

@Splinter22
关于型调整和优化, 一般是参照其他paper的做法, 然后自己尝试, 找到比较好的参数的.
确实有你说的这个情况 -- 在较小的模型上尝试, 然后移植到大模型上. 也确实有一些工作(包括在high-level vision中) 提出的方法在小模型上有效, 但是在大模型上由于模型的capacity变大后, 提出的方法变得没有那么重要了.
但是一般也还是假设小模型和大模型有一致的表现. 然后在大模型上尝试一下.

确实, 现在模型变大, 算法变复杂后, 训练和测试变得更加不容易了. 就包括这些repo中的模型训练都非常久, 一个方式是多GPU加快训练.
如果GPU计算能力有限, 这个确实是一个"硬伤" =-=, 或许可以研究一下不那么需要GPU的工作吧, 没有好的解决方式....

我的观察是在大模型下, 数据集越diverse, 越大,效果越好. 在不同setting下, 可能结果会有些差异吧.

from esrgan.

splinter21 commented on August 15, 2024

我发现小模型做对照实验测试改进的模块是否真的有改进有个好处是，大模型的话对照测出来的PSNR差距非常低，因为本身快到瓶颈了…小模型差距稍微大点，更能体现模块改变带来的差异
大模型测试最头疼的一件事是，必须等到所有epoch都训练完，才能看出不同模型最后的结果时好时坏。不能训练一半没有完全收敛就结束，不然会得出错误的结论，因为有的模型它就是前期速度稍微慢点，收敛后反而更高。
所以还是像你说的，假定大小模型结论一致，然后在小模型上做对照实验，不然等实验结果出来，都凉凉了……

from esrgan.

a question about residual in your paper about esrgan HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent