Comments (15)
It maybe pytorch==1.5 version problem, 1.4 ok. Closed!
Closing as the original issue seems to be resolved.
from yolov5.
It maybe pytorch==1.5 version problem, 1.4 ok. Closed!
from yolov5.
Hello @lhwcv, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Google Colab Notebook, Docker Image, and GCP Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
- Cloud-based AI surveillance systems operating on hundreds of HD video streams in realtime.
- Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
- Custom data training, hyperparameter evolution, and model exportation to any destination.
For more information please visit https://www.ultralytics.com.
from yolov5.
@lhwcv I'm not able to reproduce your issue. I tried with our docker container (with pytorch 1.5), and training operates correctly with your command with 4 GPUs:
from yolov5.
Note: this may have been fixed by the fix applied for #15.
from yolov5.
Not yet, official pytorch 1.5 still got this issue:
/usr/local/lib/python3.6/dist-packages/torch/serialization.py:657: SourceChangeWarning: source code of class 'models.yolo.Model' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/distributed.py:303: UserWarning: Single-Process Multi-GPU is not the recommended mode for DDP. In this mode, each DDP instance operates on multiple devices and creates multiple module replicas within one process. The overhead of scatter/gather and GIL contention in every forward pass can slow down training. Please consider using one DDP instance per device or per module replica by explicitly setting device_ids or CUDA_VISIBLE_DEVICES. NB: There is a known issue in nn.parallel.replicate that prevents a single DDP instance to operate on multiple model replicas.
"Single-Process Multi-GPU is not the recommended mode for "
Traceback (most recent call last):
File "train.py", line 399, in <module>
train(hyp)
File "train.py", line 155, in train
model = torch.nn.parallel.DistributedDataParallel(model)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/distributed.py", line 287, in __init__
self._ddp_init_helper()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/distributed.py", line 380, in _ddp_init_helper
expect_sparse_gradient)
RuntimeError: Model replicas must have an equal number of parameters.
from yolov5.
the same issue with custom dataset and using the pre-trained yolov5x.pt file
RuntimeError: Model replicas must have an equal number of parameters.
from yolov5.
I've reopened as issue appears to still be present.
@mingmmq could you supply code to reproduce your issue? Is it reproducible on coco128.yaml dataset?
from yolov5.
I have the same problem in my custom dataset(24 classes).
from yolov5.
I have the same problem in my custom dataset(11 classes).
from yolov5.
Try to downgrade the PyTorch from1.5 to 1.4. It works for me
from yolov5.
run
pip install torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html
to fix Model replicas must have an equal number of parameters.
Or you see https://github.com/pytorch/pytorch/pull/36503
. This bug was fixed in this issue, but you must manually build PyTorch==1.5+cu102
from yolov5.
torch1.5->1.4 is ok
from yolov5.
@panchengl does the recently released 1.5.1 fix this?
from yolov5.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from yolov5.
Related Issues (20)
- 🚀 Feature Request: Simplified Method for Changing Label Names in YOLOv5 Model HOT 2
- where is yolov5 v7.0 --trian in export.py? HOT 2
- MESSES MY SYSTEM HOT 6
- Per Detection class accuracy on validation set HOT 4
- how to find why mAP suddenly increased HOT 4
- Parameters Fusion HOT 8
- Parameters Fusion HOT 1
- A question about bbox normalization HOT 2
- Unable to train model on VisDrone HOT 6
- Author, do you have a complete Python version that reads the engine model of Tensorrt to infer strength segmentation code, which is a simple version of the official inference code. It can be run in just one file without calling too many Python files or libraries HOT 1
- Android uses YOLOv5 segmentation HOT 3
- yolov5 Tensortt errors ? HOT 8
- about physical memory and virtual memory HOT 1
- _clip_augmented: clarifications required HOT 4
- After training my own dataset, the labels of pt model inference and engine model inference are inconsistent. HOT 3
- How to Show Real-Time Detection of Multiple Streams Using Titled Display Windows in Yolov5? HOT 4
- Class scores from TFlite model's output data don't add up to 1 HOT 4
- Model size is doubled when exporting model to onnx/torchscript HOT 2
- Labelling Objects Occluded objects in Extreme Environment HOT 4
- Trying to implement a custom dataset HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yolov5.