Giter Club home page Giter Club logo

tianchiguangdong2019_2th's Issues

关于每代训练后的测试验证环节

您好,我按照您的代码,跑通后发现每一代训练之后并没有相应的测试验证环节(是我运行出错了么),如果需要,是在eval.py文件实现的吗

关于验证集数据

哈喽 !最近希望复现作者的实验并比较不同的模板使用方式的差异。请问能否提供temp_cls.pkl 或者是val_0.json 的对应数据,因为需要在一个验证集上评估不同模型~

question

image
image

大佬,为什么会出现这种报错?

叠加全局特征的问题

想请问下叠加全局特征的具体是怎么实现的,我看detector里没找到,能不能说一下具体是在哪些代码中实现的,感谢

关于每个epoch训练之后的测试

您好,我使用了您的代码,训练我自定义的数据集,为什么每个epoch之后没有相应的测试test阶段,也不会出现任何结果,bbox之类的,我想看看每代训练的ap值。这个是本来您使用的时候没有设置,还是我代码调试错误了,只能实现训练,无法实现test阶段。是test.py还是eval.py呢,麻烦大佬帮忙解答下,感谢。

run tools/train.py 报错

/home/dl/anaconda3/envs/open-mmlab/bin/python /opt/pycharm-2022.1.3/plugins/python/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 44603 --file /home/dl/Desktop/fabric_detect/TianchiGuangdong2019_2th-master/TianchiGuangdong2019_2th-master/src/tools/train.py /home/dl/Desktop/fabric_detect/TianchiGuangdong2019_2th-master/TianchiGuangdong2019_2th-master/config/cascade_r101.py --launcher none
Connected to pydev debugger (build 221.5921.27)
2022-08-09 10:40:28,557 - INFO - Distributed training: False
loading annotations into memory...
python-BaseException
Traceback (most recent call last):
File "/opt/pycharm-2022.1.3/plugins/python/helpers/pydev/pydevd.py", line 1491, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/opt/pycharm-2022.1.3/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/dl/Desktop/fabric_detect/TianchiGuangdong2019_2th-master/TianchiGuangdong2019_2th-master/src/tools/train.py", line 111, in
main()
File "/home/dl/Desktop/fabric_detect/TianchiGuangdong2019_2th-master/TianchiGuangdong2019_2th-master/src/tools/train.py", line 80, in main
datasets = [build_dataset(cfg.data.train)]
File "/home/dl/Desktop/fabric_detect/TianchiGuangdong2019_2th-master/TianchiGuangdong2019_2th-master/src/mmdet/datasets/builder.py", line 39, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/home/dl/Desktop/fabric_detect/TianchiGuangdong2019_2th-master/TianchiGuangdong2019_2th-master/src/mmdet/utils/registry.py", line 76, in build_from_cfg
return obj_cls(**args)
File "/home/dl/Desktop/fabric_detect/TianchiGuangdong2019_2th-master/TianchiGuangdong2019_2th-master/src/mmdet/datasets/coco_r2.py", line 49, in init
test_mode)
File "/home/dl/Desktop/fabric_detect/TianchiGuangdong2019_2th-master/TianchiGuangdong2019_2th-master/src/mmdet/datasets/custom.py", line 65, in init
self.i

Context ROI erro help

Hello , 我想试验这份repository中的contextROI,所以把直接把SingleRoIExtractor粘到我的文件中了,但是看起来116行的加法 roi_feats_t[rois_[:, 0] == j] += context[i][j]是个inplace操作,我注释后才能运行,即使改成roi_feats_t[rois_[:, 0] == j] = roi_feats_t[rois_[:, 0] == j] + context[i][j] 仍然报了同样的错误。
报错如下:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [38, 256, 7, 7]], which is output 0 of IndexPutBackward, is at version 5; expected version 4 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)
----------------
请问你们在feat进入SingleRoIExtractor之前,做了什么特殊的操作吗?
Thanks for your help.

global context

请问那个global context的实现在这个代码里面是不是没有呢,谢谢

No such file or directory: './source/cls2ind.pkl'

作者可以解释一下这个文件怎么获取吗?
代码在tools/PrepareData.py中
def generate_coco(annos, out_file):
cls2ind = mmcv.load("./source/cls2ind.pkl")
ind2cls = mmcv.load("./source/ind2cls.pkl")

train error

I am new for this. When I run train_model1.sh , it has an error. can you give me some tips? thanks!

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:56: void ClassNLLCriterion_updateOutput_no_reduce_kernel(int, THCDeviceTensor<Dtype, 2, int, DefaultPtrTraits>, THCDeviceTensor<long, 1, int, DefaultPtrTraits>, THCDeviceTensor<Dtype, 1, int, DefaultPtrTraits>, Dtype *, int, int) [with Dtype = float]: block: [0,0,0], thread: [32,0,0] Assertion cur_target >= 0 && cur_target < n_classes failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:56: void ClassNLLCriterion_updateOutput_no_reduce_kernel(int, THCDeviceTensor<Dtype, 2, int, DefaultPtrTraits>, THCDeviceTensor<long, 1, int, DefaultPtrTraits>, THCDeviceTensor<Dtype, 1, int, DefaultPtrTraits>, Dtype *, int, int) [with Dtype = float]: block: [0,0,0], thread: [33,0,0] Assertion cur_target >= 0 && cur_target < n_classes failed.

error as running train_model2.sh

When I run train_model2.sh , it has an error. Could you give me some tips? thanks!
GPU: Titan V; memory:32G
Run: ./train_model2.sh 1

Error:
Traceback (most recent call last):
File "./tools/train.py", line 108, in
main()
File "./tools/train.py", line 104, in main
logger=logger)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/apis/train.py", line 186, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/nanhui2/anaconda3/envs/open-mmlab2/lib/python3.6/site-packages/mmcv/runner/runner.py", line 384, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/nanhui2/anaconda3/envs/open-mmlab2/lib/python3.6/site-packages/mmcv/runner/runner.py", line 283, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/apis/train.py", line 38, in batch_processor
losses = model(**data)
File "/home/nanhui2/anaconda3/envs/open-mmlab2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/nanhui2/anaconda3/envs/open-mmlab2/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 376, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/nanhui2/anaconda3/envs/open-mmlab2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/core/fp16/decorators.py", line 75, in new_func
output = old_func(*new_args, **new_kwargs)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/models/detectors/base.py", line 86, in forward
return self.forward_train(img, img_meta, **kwargs)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/models/detectors/cascade_rcnn_pair.py", line 556, in forward_train
proposals=proposals)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/models/detectors/cascade_rcnn_pair.py", line 268, in forward_train_single
*rpn_loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/models/anchor_heads/rpn_head.py", line 51, in loss
gt_bboxes_ignore=gt_bboxes_ignore)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/core/fp16/decorators.py", line 152, in new_func
output = old_func(*new_args, **new_kwargs)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/models/anchor_heads/anchor_head.py", line 179, in loss
sampling=self.sampling)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/core/anchor/anchor_target.py", line 63, in anchor_target
unmap_outputs=unmap_outputs)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/core/utils/misc.py", line 24, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/core/anchor/anchor_target.py", line 116, in anchor_target_single
anchors, gt_bboxes, gt_bboxes_ignore, None, cfg)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/core/bbox/assign_sampling.py", line 30, in assign_and_sample
gt_labels)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/core/bbox/assigners/max_iou_assigner.py", line 93, in assign
overlaps = bbox_overlaps(gt_bboxes, bboxes)
File "/opt/data/TianchiGuangdong2019_2th/src/mmdet/core/bbox/geometry.py", line 51, in bbox_overlaps
wh = (rb - lt + 1).clamp(min=0) # [rows, cols, 2]
RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0

Traceback (most recent call last):
File "/home/nanhui2/anaconda3/envs/open-mmlab2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/nanhui2/anaconda3/envs/open-mmlab2/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/nanhui2/anaconda3/envs/open-mmlab2/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in
main()
File "/home/nanhui2/anaconda3/envs/open-mmlab2/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/home/nanhui2/anaconda3/envs/open-mmlab2/bin/python', '-u', './tools/train.py', '--local_rank=0', '../config/cascade_x101_20e.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

When i train with your config,i encounter a problem

When i train with your config,i encounter a problem,it is :
Traceback (most recent call last):
File "src/tools/train.py", line 108, in
main()
File "src/tools/train.py", line 86, in main
datasets = [build_dataset(cfg.data.train)]
File "/cache/user-job-dir/codes/mmdetection/src/mmdet/datasets/builder.py", line 39, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/cache/user-job-dir/codes/mmdetection/src/mmdet/utils/registry.py", line 76, in build_from_cfg
return obj_cls(**args)
TypeError: init() got an unexpected keyword argument 'normal'
How can i solve this problem?Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.