Comments (6)
You only need to change data_source_cfg in the config. Do not change others. You may use SGD if the batch size is small, you may also adjust hyperparams such as lr.
from mmselfsup.
Thanks for your reply. Following your instruction, I reset my config and only change data_source_cfg
, but it still makes no effect.
The log shows below:
(open-mmlab) lhy@mustdl2:/disk1/lhy/Documents/github/OpenSelfSup$ bash tools/dist_train.sh configs/selfsup/byol/r50_cifar.py 2
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
2020-09-01 15:35:03,052 - openselfsup - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda-8.0
NVCC: Cuda compilation tools, release 8.0, V8.0.61
GPU 0,1: GeForce GTX 1080 Ti
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
PyTorch: 1.5.1
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
TorchVision: 0.6.0a0+35d732a
OpenCV: 4.3.0
MMCV: 1.0.3
OpenSelfSup: 0.2.0+dbfc6b1
------------------------------------------------------------
2020-09-01 15:35:03,053 - openselfsup - INFO - Distributed training: True
2020-09-01 15:35:03,053 - openselfsup - INFO - Config:
/disk1/lhy/Documents/github/OpenSelfSup/configs/base.py
train_cfg = {}
test_cfg = {}
optimizer_config = dict() # grad_clip, coalesce, bucket_size_mb
# yapf:disable
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
# yapf:enable
# runtime settings
dist_params = dict(backend='nccl')
cudnn_benchmark = True
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
/disk1/lhy/Documents/github/OpenSelfSup/configs/selfsup/byol/r50_cifar.py
import copy
_base_ = '../../base.py'
# Model settings
model = dict(
type='BYOL',
pretrained=None,
base_momentum=0.996,
backbone=dict(
type='ResNet',
depth=50,
in_channels=3,
out_indices=[4], # 0: conv-1, x: stage-x
norm_cfg=dict(type='BN')),
neck=dict(
type='NonLinearNeckV2',
in_channels=2048,
hid_channels=4096,
out_channels=256,
with_avg_pool=True),
head=dict(type='LatentPredictHead',
size_average=True,
predictor=dict(type='NonLinearNeckV2',
in_channels=256, hid_channels=4096,
out_channels=256, with_avg_pool=False)))
# Dataset settings
data_source_cfg = dict(type='Cifar10', root='data')
# data_source_cfg = dict(
# type='ImageNet',
# memcached=True,
# mclient_path='/mnt/lustre/share/memcached_client')
# data_train_list = 'data/imagenet/meta/train.txt'
# data_train_root = 'data/imagenet/train'
dataset_type = 'BYOLDataset'
img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_pipeline = [
dict(type='RandomResizedCrop', size=224, interpolation=3), # bicubic
dict(type='RandomHorizontalFlip'),
dict(
type='RandomAppliedTrans',
transforms=[
dict(
type='ColorJitter',
brightness=0.4,
contrast=0.4,
saturation=0.2,
hue=0.1)
],
p=0.8),
dict(type='RandomGrayscale', p=0.2),
dict(
type='RandomAppliedTrans',
transforms=[
dict(
type='GaussianBlur',
sigma_min=0.1,
sigma_max=2.0,
kernel_size=23)
],
p=1.),
dict(type='RandomAppliedTrans',
transforms=[dict(type='Solarization')], p=0.),
dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg),
]
train_pipeline1 = copy.deepcopy(train_pipeline)
train_pipeline2 = copy.deepcopy(train_pipeline)
train_pipeline2[4]['p'] = 0.1 # gaussian blur
train_pipeline2[5]['p'] = 0.2 # solarization
data = dict(
imgs_per_gpu=32, # total 32*8=256
workers_per_gpu=4,
train=dict(
type=dataset_type,
data_source=dict(
# list_file=data_train_list, root=data_train_root,
**data_source_cfg),
pipeline1=train_pipeline1,
pipeline2=train_pipeline2))
# Additional hooks
custom_hooks = [
dict(type='BYOLHook', end_momentum=1.)
]
# Optimizer
optimizer = dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0005)
# optimizer = dict(type='LARS', lr=0.2, weight_decay=0.0000015, momentum=0.9,
# paramwise_options={
# '(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay=0., lars_exclude=True),
# 'bias': dict(weight_decay=0., lars_exclude=True)})
# Learning policy
lr_config = dict(
policy='CosineAnnealing',
min_lr=0.,
warmup='linear',
warmup_iters=2,
warmup_ratio=0.0001, # cannot be 0
warmup_by_epoch=True)
checkpoint_config = dict(interval=10)
# Runtime settings
total_epochs = 200
2020-09-01 15:35:03,053 - openselfsup - INFO - Set random seed to 0, deterministic: False
Traceback (most recent call last):
File "tools/train.py", line 142, in <module>
main()
File "tools/train.py", line 124, in main
datasets = [build_dataset(cfg.data.train)]
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/builder.py", line 37, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/utils/registry.py", line 79, in build_from_cfg
return obj_cls(**args)
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/byol.py", line 18, in __init__
self.data_source = build_datasource(data_source)
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/builder.py", line 43, in build_datasource
return build_from_cfg(cfg, DATASOURCES)
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/utils/registry.py", line 79, in build_from_cfg
return obj_cls(**args)
TypeError: __init__() missing 1 required positional argument: 'split'
Traceback (most recent call last):
File "tools/train.py", line 142, in <module>
main()
File "tools/train.py", line 124, in main
datasets = [build_dataset(cfg.data.train)]
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/builder.py", line 37, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/utils/registry.py", line 79, in build_from_cfg
return obj_cls(**args)
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/byol.py", line 18, in __init__
self.data_source = build_datasource(data_source)
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/builder.py", line 43, in build_datasource
return build_from_cfg(cfg, DATASOURCES)
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/utils/registry.py", line 79, in build_from_cfg
return obj_cls(**args)
TypeError: __init__() missing 1 required positional argument: 'split'
Traceback (most recent call last):
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
main()
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/disk1/lhy/Applications/anaconda3/envs/open-mmlab/bin/python', '-u', 'tools/train.py', '--local_rank=1', 'configs/selfsup/byol/r50_cifar.py', '--work_dir', 'work_dirs/selfsup/byol/r50_cifar/', '--seed', '0', '--launcher', 'pytorch']' returned non-zero exit status 1.
Should I change your code in /openselfsup
?
from mmselfsup.
The bug is obvious. It shows init() missing argument "split". The key "data_source" in the config under data.train shall accept an argument "split". You may refer to configs/classification/cifar/r50.py to confirm it. I'm willing to help but I suggest carefully reading the log to find the bug by yourself first before raising issues, so that we could save time for both of us :)
from mmselfsup.
Please forgive my carelessness. You are right! After fixing this bug, I met another one:
Original Traceback (most recent call last):
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/disk1/lhy/Documents/github/OpenSelfSup/openselfsup/datasets/byol.py", line 29, in __getitem__
img1 = self.pipeline1(img)
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 61, in __call__
img = t(img)
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 680, in __call__
i, j, h, w = self.get_params(img, self.scale, self.ratio)
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 641, in get_params
width, height = _get_image_size(img)
File "/disk1/lhy/Applications/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 40, in _get_image_size
raise TypeError("Unexpected type {}".format(type(img)))
TypeError: Unexpected type <class 'tuple'>
raise self.exc_type(msg)
Then I found the img variable contains the origin image data (<PIL.Image.Image image mode=RGB size=32x32 at 0x7F1197BDA690>, 1)
with a tuple. So I changed your code to extract the data, and it works now.
Thank you for your help, and your instruction did inspire me a lot!
from mmselfsup.
I notice that your code is still in an old version. Please follow the latest code, otherwise there may be bugs and the result cannot be reproduced.
from mmselfsup.
Roger that!
from mmselfsup.
Related Issues (20)
- 自监督学习,合适工业缺陷检测不,Self Supervised learning, suitable for industrial defect detection HOT 1
- Problems when applying the DINO method
- How to add input variables in the forward process?
- How can I use it to segment my own datasets
- Are there any plans to update the version of pypi packages?
- Tensorflow equivalent version Mask-R-N
- Tensorflow equivalent Mask-R-CNN Version
- Now git link is broken, can't download to local
- Why does the mae model generate the same mask every time
- MMSelfSup is incompatible with MMCV==2.1.0 [Bug] HOT 1
- The model and loaded state dict do not match exactly
- How to Load Pretrained SSL Models Properly
- [Docs] Training Object detector using MAE as backbone
- 不是无监督学习吗,数据集中的ann_file是什么意思
- There is a problem with the registry
- 请问有MASKDISTILL论文地址么
- AssertionError: MMCV==2.0.0rc3 is used but incompatible. HOT 3
- WARNING - The model and loaded state dict do not match exactly
- Converting mmengine weight for models to pytorch state dicts?
- [Docs] A100算力加持!书生大模型实战营第3期全面升级,趣味闯关模式等你开启
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mmselfsup.