ree1s / idm Goto Github PK

View Code? Open in Web Editor NEW

284.0 284.0 20.0 5.65 MB

Python 93.09% C++ 0.87% Cuda 5.96% Shell 0.08%

idm's People

Contributors

Stargazers

Watchers

idm's Issues

psnr calculated in rgb or ycbcr?

About training on multi-gpus

Hello! Thanks for sharing your excellent work! I tried the training code and have the following two problems.

I have tried training with 2 GPUs using the command in run.sh. But the training stoped after about several hours with errors about l_pixel.backward:

/IDM-main/model/model.py", line 86,in optimize parameters:
l pix.backward()
anaconda3/envs/idm3/lib/python3.9/site-packages/torch/autograd/init .py", in line 200, in backward
Variable. execution engine.run backward #Calls into C++ engine to run the backward pass
RuntimeError:[../third party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [127.0.0.1]: 47290
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid:2176380) of binary:

Have you ever met this problem?

Regarding the main function, since I met the problem of 1), I tried to use sr.py (in SR3 repository) instead of idm_main.py. idm_main.py uses nn.parallel.DistributedDataParallel while SR3 uses nn.DataParallel. However, when I tried sr.py to use 2 gpus for training, only one single GPU was used. I wonder if you also found this problem and thus use nn.parallel.DistributedDataParallel. Is it caused by basicsr?

Thanks for your patience in reading this issue! Looking forward to your reply.

Applying the pretrained model to other scales

Hi,
Thanks for sharing your excellent work!
I'm wondering if the released pretrained model (x8 SR) can be directly applied to other scales (e.g., x4) by modifying the output resolution. Or should we change the config file first, then train from scratch to obtain models of specific scales?

pre-trained checkpoints

I would like to ask if the size of pre-trained checkpoints/face_gen.pth is suitable for any scale between 16-128 of the face data set or any starting point from *1 to *8, such as 32-256 of the face data set.
My understanding is that the dimensions of face_gen.pth are any scale between 16-128 applicable to the face dataset

About training on a gpus

So is that the start of training? But after an hour, how could it still be like this. How long does it take to train an epoch?

23-12-06 07:29:46.861 - INFO: Model [DDPM] is created.
23-12-06 07:29:46.861 - INFO: Initial Model Finished
23-12-06 07:29:46.861 - INFO: Resuming training from epoch: 0, iter: 0.

cuda version

I use cuda toolkit version 11.1; then there will be an issue about "torch.distributed.elastic.multiprocessing.api failed". Do you know if this issue is related to the cuda version?

It seems that the part of SR on general images for DIV2K dataset has not been realized ??

According to the current code for config file and dataset, 'lr_dataroot' and 'hr_dataroot' keys in config files are not used for creating dataset.
And current dataset-related codes, such as 'init.py' and 'LRHR_dataset.py', are only fitted to FFHQ and CelebAHQ dataset.
Besides, the cropped patches of lr_size 16 and hr_size 128 for DIV2K confuse me.
In this way, I wonder whether the part of SR on general images for DIV2K dataset has been realized.

About arbitrary size image upsampling

When I try to upsample arbitrary size image (e.g., 228x344 Set5 woman.png), dimension mismatching error occurs at line 400 in unet.py file.
The problem happens in the else clause at line 404 in the same file, where the argument h is set to the width of feats (e.g., h=feats[-1].shape[-1]).
After I change it to h=feats[-1].shape[-2], arbitrary size image upsampling becomes possible.
If the code is not intended, then it would be nice to fix it.

IDM/model/sr3_modules/unet.py

Line 402 in d224c0e

x = rearrange(x, 'b (h w) c -> b c h w', h=feats[-1].shape[-1])

Confused of 'opt['path']['resume_state']:' in line 64 of idm_main.py

The key "opt['path']['resume_state']" in line 64 of idm_main.py , which may be unchanged from SR3 or BasicSR, should be substituted with the key "opt['path']['checkpoint']" in .json config file?

some error with `fused_act_ext`

NameError: name 'fused_act_ext' is not defined
where is this

I can not find it

subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

关于模型训练的问题

@Ree1s 您好，我想请问您一下。能不能在readme里面更新一下详细的训练步骤，我不太清楚您的模型是怎样进行训练的。

About prepare_data.py

Thanks for your great work!
After I ran 'prepare_data.py', I only got two '.mdb' files other than images. Does that right?

Counld you release the checkpoint trained on DIV2K?

Mistake in readme

IDM/README.md

Line 24 in 735e836

 conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 

Here in order to install pytorch you need postfix -c pytorch
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

License Query

Can you add a LICENSE to your repository?

MIT would be cool, but CC BY-NC 2.0 is understandable.

You do interesting work!

distributed

I have a question about distributed training, how to run the idm_main.py file on my single graphics card window computer.My question is that "RuntimeError: Default process group has not been initialized, please make sure to call init_process_group."

关于预训练模型

关于预训练模型，我在您给出的链接里只看到了_gen.pth文件却没有_opt.pth文件，我的模型无法训练

Missing checkpoint?

Hey,

For training, I'm missing the checkpoint: 'best_psnr_opt.pth'. Will this be uploaded?

Could you release the train.py and test.py?

Thanks!
I would like to train your model on other tasks.

After reading your excellent work, I would like to know more about the actual training parameters and process of your model.

About training epoch and time

What is the batch size and eopch when training 1616->128128 SRs on two 24GB NVIDIA RTX A5000 GPUs on the FFHQ dataset? How many days did you train at a time? Thanks.

error: [WinError 2] 系统找不到指定的文件

(idm) PS C:\wav2lip\IDM-main> python setup.py develop
C:\ProgramData\Anaconda3\lib\site-packages\setuptools_init_.py:85: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated. Requirements should be satisfied by a PEP 517 installer. If you are using pip, you can try pip install --use-pep517.
dist.fetch_build_eggs(dist.setup_requires)
running develop
C:\ProgramData\Anaconda3\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\ProgramData\Anaconda3\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running egg_info
writing basicsr.egg-info\PKG-INFO
writing dependency_links to basicsr.egg-info\dependency_links.txt
writing requirements to basicsr.egg-info\requires.txt
writing top-level names to basicsr.egg-info\top_level.txt
reading manifest file 'basicsr.egg-info\SOURCES.txt'
writing manifest file 'basicsr.egg-info\SOURCES.txt'
running build_ext
error: [WinError 2] 系统找不到指定的文件。

a bug of distributed

when i run this command:
CUDA_VISIBLE_DEVICES=5 python -m torch.distributed.launch idm_main.py -p train -c config/ffhq_liifsr3_scaler_16_128.json -r /home/sx/sx_data/IDM-main/checkpoints/face/home/sx/sx_data/IDM-main/checkpoints/face
this error occurs, how can i fix it?

and when i run this command:
CUDA_VISIBLE_DEVICES=5 python -m torch.distributed.launch idm_main.py --local_rank=0 -p train -c config/ffhq_liifsr3_scaler_16_128.json -r /home/sx/sx_data/IDM-main/checkpoints/face/home/sx/sx_data/IDM-main /checkpoints/face
this error occurs,

How to decide the performance?

Hi, Reel1s:

Thanks for the nice work. How do you decide the reported performance? Do you select the models from the iterations with best PSNR or other metrics? And do you take average over multiple runs?

Thanks
Jinxin

Image quality on different scales

I can't get a good image on a scale other than 8x even with pre-trained checkpoints. As explained by the README, if additional learning is performed, the results are okay at a scale around 8x, but the same results as the paper cannot be obtained.
How do I get the same results as the paper?

Regarding the resolution of the test image

It seems that the test image resolution size of the given code is all square, for example, 16×16, 128×128. If the resolution of the test image is rectangular, can it be successfully enlarged, for example, 512×256

Results on DIV2K validation set

Dear authors, I have some questions about the results on DIV2K validation set (Table 4):

Did you use resized HR images to generate LR-HR pairs with predefined LR/HR resolutions, rather than the original DIV2K validation set? Could this potentially lead to biased evaluation results by omitting finer details from the original HR images?
How about the LPIPS scores, since all papers in Table 4 are generative SR methods.
Can your model produce full-resolution images on the DIV2K validation set without cropping them into squared dimensions? I'm curious because other works in the field seem to adopt this validation setting.
If not, could you share your specific validation setting, including LR resolution, HR resolution, and any pertinent details?

Thank you! If I made some mistakes, please correct me.

ree1s / idm Goto Github PK

idm's People

Contributors

Stargazers

Watchers

Forkers

idm's Issues

Recommend Projects

Recommend Topics

Recommend Org