Comments (3)
I have a similar problem. Just want to test the whole thing with my gtx 970 memory 4G.
I get:
Traceback (most recent call last):
File "fsod_train_net.py", line 118, in <module>
args=(args,),
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/engine/launch.py", line 62, in launch
main_func(*args)
File "fsod_train_net.py", line 106, in main
return trainer.train()
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 431, in train
super().train(self.start_iter, self.max_iter)
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 138, in train
self.run_step()
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 441, in run_step
self._trainer.run_step()
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 232, in run_step
loss_dict = self.model(data)
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/selim/FewShot/FewX/fewx/modeling/fsod/fsod_rcnn.py", line 153, in forward
support_features = self.backbone(support_images)
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/modeling/backbone/resnet.py", line 444, in forward
x = self.stem(x)
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/modeling/backbone/resnet.py", line 355, in forward
x = self.conv1(x)
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/layers/wrappers.py", line 88, in forward
x = self.norm(x)
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/detectron2/layers/batch_norm.py", line 65, in forward
eps=self.eps,
File "/home/selim/anaconda3/envs/FewX/lib/python3.7/site-packages/torch/nn/functional.py", line 2058, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 1000.00 MiB (GPU 0; 3.94 GiB total capacity; 2.15 GiB already allocated; 340.25 MiB free; 2.79 GiB reserved in total by PyTorch)
I tried halving the BATCH_SIZE_PER_IMAGE and IMS_PER_BATCH settings in the config but I still get memory problems. I dont want to make them too small, I think it would lead to bad results. Not an expert though.
Did anyone find a solution?
from fewx.
Ok so I continued trying to get it to work.
I found success when setting the SOLVER.IMS_PER_BATCH to 1 in configs/fsod/Base-FSOD-C4.yaml
I did not run a complete training process since it would have taken me 2 days and 11 hours, but it started training without issues.
Hope this helps someone else too
from fewx.
It depends on your support set. Maybe you can try to make RPN.POST_NMS_TOPK_TEST small.
from fewx.
Related Issues (20)
- How to use FSVOD dataset?
- a little problem with data split
- fsod_fast_rcnn.py中,在将背景的logit进行排序时,为什么选用的是第0列而不是第1列? HOT 1
- confused about `dataset_dict` in `FewX/fewx/data/dataset_mapper.py` HOT 1
- ============ Few-shot object detetion will start. ============= HOT 3
- How to train an own model
- ValueError: Unsupported type found in checkpoint! res4_avg: <class 'dict'> HOT 1
- About query images and support images HOT 2
- random seed
- RuntimeErroe HOT 1
- Is FSVOD code available? HOT 2
- Can I train directly with VOC dataset?
- 元学习or两阶段微调 HOT 1
- About the test result
- KeyError: 'id' HOT 1
- code of cpmask
- Low performances/Wrong boxes fix
- Thanks for your great work, could you please release the code of your paper《Few-Shot Object Detection with Model Calibration》??
- The question about the 'first_stride' HOT 3
- RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fewx.