Giter Club home page Giter Club logo

Comments (6)

XiaohangZhan avatar XiaohangZhan commented on August 16, 2024

You only need to change data_source_cfg in the config. Do not change others. You may use SGD if the batch size is small, you may also adjust hyperparams such as lr.

from mmselfsup.

etbox avatar etbox commented on August 16, 2024

Thanks for your reply. Following your instruction, I reset my config and only change data_source_cfg, but it still makes no effect.
The log shows below:

(open-mmlab) lhy@mustdl2:/disk1/lhy/Documents/github/OpenSelfSup$ bash tools/dist_train.sh configs/selfsup/byol/r50_cifar.py 2
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2020-09-01 15:35:03,052 - openselfsup - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.6 (default, Jan  8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda-8.0
NVCC: Cuda compilation tools, release 8.0, V8.0.61
GPU 0,1: GeForce GTX 1080 Ti
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
PyTorch: 1.5.1
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.6.0a0+35d732a
OpenCV: 4.3.0
MMCV: 1.0.3
OpenSelfSup: 0.2.0+dbfc6b1
------------------------------------------------------------

2020-09-01 15:35:03,053 - openselfsup - INFO - Distributed training: True
2020-09-01 15:35:03,053 - openselfsup - INFO - Config:
/disk1/lhy/Documents/github/OpenSelfSup/configs/base.py
train_cfg = {}
test_cfg = {}
optimizer_config = dict()  # grad_clip, coalesce, bucket_size_mb
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
dist_params = dict(backend='nccl')
cudnn_benchmark = True
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

/disk1/lhy/Documents/github/OpenSelfSup/configs/selfsup/byol/r50_cifar.py
import copy
_base_ = '../../base.py'
# Model settings
model = dict(
    type='BYOL',
    pretrained=None,
    base_momentum=0.996,
    backbone=dict(
        type='ResNet',
        depth=50,
        in_channels=3,
        out_indices=[4],  # 0: conv-1, x: stage-x
        norm_cfg=dict(type='BN')),
    neck=dict(
        type='NonLinearNeckV2',
        in_channels=2048,
        hid_channels=4096,
        out_channels=256,
        with_avg_pool=True),
    head=dict(type='LatentPredictHead',
              size_average=True,
              predictor=dict(type='NonLinearNeckV2',
                             in_channels=256, hid_channels=4096,
                             out_channels=256, with_avg_pool=False)))
# Dataset settings
data_source_cfg = dict(type='Cifar10', root='data')
# data_source_cfg = dict(
#     type='ImageNet',
#     memcached=True,
#     mclient_path='/mnt/lustre/share/memcached_client')
# data_train_list = 'data/imagenet/meta/train.txt'
# data_train_root = 'data/imagenet/train'
dataset_type = 'BYOLDataset'
img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_pipeline = [
    dict(type='RandomResizedCrop', size=224, interpolation=3), # bicubic
    dict(type='RandomHorizontalFlip'),
    dict(
        type='RandomAppliedTrans',
        transforms=[
            dict(
                type='ColorJitter',
                brightness=0.4,
                contrast=0.4,
                saturation=0.2,
                hue=0.1)
        ],
        p=0.8),
    dict(type='RandomGrayscale', p=0.2),
    dict(
        type='RandomAppliedTrans',
        transforms=[
            dict(
                type='GaussianBlur',
                sigma_min=0.1,
                sigma_max=2.0,
                kernel_size=23)
        ],
        p=1.),
    dict(type='RandomAppliedTrans',
         transforms=[dict(type='Solarization')], p=0.),
    dict(type='ToTensor'),
    dict(type='Normalize', **img_norm_cfg),
]
train_pipeline1 = copy.deepcopy(train_pipeline)
train_pipeline2 = copy.deepcopy(train_pipeline)
train_pipeline2[4]['p'] = 0.1 # gaussian blur
train_pipeline2[5]['p'] = 0.2 # solarization

data = dict(
    imgs_per_gpu=32,  # total 32*8=256
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        data_source=dict(
            # list_file=data_train_list, root=data_train_root,
            **data_source_cfg),
        pipeline1=train_pipeline1,
        pipeline2=train_pipeline2))
# Additional hooks
custom_hooks = [
    dict(type='BYOLHook', end_momentum=1.)
]
# Optimizer
optimizer = dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0005)
# optimizer = dict(type='LARS', lr=0.2, weight_decay=0.0000015, momentum=0.9,
#                  paramwise_options={
#                     '(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay=0., lars_exclude=True),
#                     'bias': dict(weight_decay=0., lars_exclude=True)})
# Learning policy
lr_config = dict(
    policy='CosineAnnealing',
    min_lr=0.,
    warmup='linear',
    warmup_iters=2,
    warmup_ratio=0.0001, # cannot be 0
    warmup_by_epoch=True)
checkpoint_config = dict(interval=10)
# Runtime settings
total_epochs = 200

2020-09-01 15:35:03,053 - openselfsup - INFO - Set random seed to 0, deterministic: False
Traceback (most recent call last):
  File "tools/train.py", line 142, in <module>
    main()
  File "tools/train.py", line 124, in main
    datasets = [build_dataset(cfg.data.train)]
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/builder.py", line 37, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/utils/registry.py", line 79, in build_from_cfg
    return obj_cls(**args)
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/byol.py", line 18, in __init__
    self.data_source = build_datasource(data_source)
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/builder.py", line 43, in build_datasource
    return build_from_cfg(cfg, DATASOURCES)
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/utils/registry.py", line 79, in build_from_cfg
    return obj_cls(**args)
TypeError: __init__() missing 1 required positional argument: 'split'
Traceback (most recent call last):
  File "tools/train.py", line 142, in <module>
    main()
  File "tools/train.py", line 124, in main
    datasets = [build_dataset(cfg.data.train)]
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/builder.py", line 37, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/utils/registry.py", line 79, in build_from_cfg
    return obj_cls(**args)
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/byol.py", line 18, in __init__
    self.data_source = build_datasource(data_source)
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/builder.py", line 43, in build_datasource
    return build_from_cfg(cfg, DATASOURCES)
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/utils/registry.py", line 79, in build_from_cfg
    return obj_cls(**args)
TypeError: __init__() missing 1 required positional argument: 'split'
Traceback (most recent call last):
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/disk1/lhy/Applications/anaconda3/envs/open-mmlab/bin/python', '-u', 'tools/train.py', '--local_rank=1', 'configs/selfsup/byol/r50_cifar.py', '--work_dir', 'work_dirs/selfsup/byol/r50_cifar/', '--seed', '0', '--launcher', 'pytorch']' returned non-zero exit status 1.

Should I change your code in /openselfsup?

from mmselfsup.

XiaohangZhan avatar XiaohangZhan commented on August 16, 2024

The bug is obvious. It shows init() missing argument "split". The key "data_source" in the config under data.train shall accept an argument "split". You may refer to configs/classification/cifar/r50.py to confirm it. I'm willing to help but I suggest carefully reading the log to find the bug by yourself first before raising issues, so that we could save time for both of us :)

from mmselfsup.

etbox avatar etbox commented on August 16, 2024

Please forgive my carelessness. You are right! After fixing this bug, I met another one:

Original Traceback (most recent call last):
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/byol.py", line 29, in __getitem__
    img1 = self.pipeline1(img)
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 61, in __call__
    img = t(img)
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 680, in __call__
    i, j, h, w = self.get_params(img, self.scale, self.ratio)
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 641, in get_params
    width, height = _get_image_size(img)
  File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 40, in _get_image_size
    raise TypeError("Unexpected type {}".format(type(img)))
TypeError: Unexpected type <class 'tuple'>
raise self.exc_type(msg)

Then I found the img variable contains the origin image data (<PIL.Image.Image image mode=RGB size=32x32 at 0x7F1197BDA690>, 1) with a tuple. So I changed your code to extract the data, and it works now.

Thank you for your help, and your instruction did inspire me a lot!

from mmselfsup.

XiaohangZhan avatar XiaohangZhan commented on August 16, 2024

I notice that your code is still in an old version. Please follow the latest code, otherwise there may be bugs and the result cannot be reproduced.

from mmselfsup.

etbox avatar etbox commented on August 16, 2024

Roger that!

from mmselfsup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.