Comments (5)
Name: nm-yolov5
Version: 1.5.0.60200
from sparseml.
I've learned thatthis happens with and without the --cache
argument
- If I use this training setup it breaks at the QAT phase:
!sparseml.yolov5.train \
--weights "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
--recipe "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
--recipe-args '{"num_epochs":12, "quantization_epochs":10}' \
--data VisDrone.yaml \
--batch-size 64 \
--save-period 10 \
--cfg yolov5s.yaml \
--hyp hyps/hyp.finetune.yaml
Adjusted gradient clipping threshold to 10.0
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
1/11 13G 0.08168 0.02643 0.01815 437 640: 1
Class Images Instances P R mAP50 WARNING ⚠️ NMS time limit 6.900s exceeded
Class Images Instances P R mAP50 WARNING ⚠️ NMS time limit 6.900s exceeded
Class Images Instances P R mAP50 WARNING ⚠️ NMS time limit 6.900s exceeded
Class Images Instances P R mAP50 WARNING ⚠️ NMS time limit 6.900s exceeded
Class Images Instances P R mAP50 WARNING ⚠️ NMS time limit 2.300s exceeded
Class Images Instances P R mAP50
all 548 38759 0.0232 0.0303 0.0168 0.00501
Neural Magic: Starting QAT phase
Neural Magic: Turning off EMA (not supported with QAT)
Neural Magic: Turning off AMP (not supported with QAT)
Traceback (most recent call last):
File "/people/finl072/.conda/envs/n_magic/bin/sparseml.yolov5.train", line 8, in <module>
sys.exit(train())
File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/sparseml/yolov5/scripts.py", line 41, in train
train_run(**vars(opt))
File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 731, in run
main(opt)
File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 631, in main
train(opt.hyp, opt, device, callbacks)
File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 314, in train
new_batch_size, new_accumulate = sparsification_manager.rescale_gradient_accumulation(
TypeError: SparsificationManager.rescale_gradient_accumulation() got an unexpected keyword argument 'image_size'
- If I use the
--cache
argument, it also breaks at the QAT phase.
!sparseml.yolov5.train \
--weights "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
--recipe "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
--recipe-args '{"num_epochs":12, "quantization_epochs":10}' \
--data VisDrone.yaml \
--batch-size 64 \
--save-period 10 \
--cfg yolov5s.yaml \
--cache \
--hyp hyps/hyp.finetune.yaml
!sparseml.yolov5.train \
--weights "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
--recipe "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
--recipe-args '{"num_epochs":12, "quantization_epochs":10}' \
--data VisDrone.yaml \
--batch-size 64 \
--save-period 10 \
--cfg yolov5s.yaml \
--cache \
--hyp hyps/hyp.finetune.yaml
Starting training for 12 epochs...
Adjusted gradient clipping threshold to 10.0
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
0/11 13G 0.09091 0.02234 0.02223 564 640: 1
Class Images Instances P R mAP50
all 548 38759 0.00333 0.0256 0.00214 0.000509
Adjusted gradient clipping threshold to 10.0
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
1/11 13G 0.08168 0.02643 0.01815 437 640: 1
Class Images Instances P R mAP50
all 548 38759 0.0215 0.0856 0.026 0.00806
Neural Magic: Starting QAT phase
Neural Magic: Turning off EMA (not supported with QAT)
Neural Magic: Turning off AMP (not supported with QAT)
Traceback (most recent call last):
File "/people/finl072/.conda/envs/n_magic/bin/sparseml.yolov5.train", line 8, in <module>
sys.exit(train())
File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/sparseml/yolov5/scripts.py", line 41, in train
train_run(**vars(opt))
File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 731, in run
main(opt)
File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 631, in main
train(opt.hyp, opt, device, callbacks)
File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 314, in train
new_batch_size, new_accumulate = sparsification_manager.rescale_gradient_accumulation(
TypeError: SparsificationManager.rescale_gradient_accumulation() got an unexpected keyword argument 'image_size'
I also discovered that training needs to not start at the same epoch as the QAT phase
- i.e. num_epochs needs to be > quantization epochs**
from sparseml.
Hi @ianfinley89, what version of nm-yolov5 do you have installed? This issue should have been resolved a few releases ago
from sparseml.
I have finally got this working, however, I had to completely uninstall sparseml with the yolov5 extensions (so torch, vision, audio), deepsparse, and nm-yolov5. I had to essentially start back from scratch to get around this issue because upgrading never changed any of my versions/alleviated the problem.
from sparseml.
Hi @ianfinley89 thanks for the update - looks like getting to the latest release was able to get the patch in you needed, closing
from sparseml.
Related Issues (20)
- Adding a `.pre-commit-config.yaml` file for maintaining consistent style and code quality. HOT 3
- Oriented Bounding Box support HOT 1
- Sparse ML not working for Transformers HOT 3
- Models with loops in their graph can't be converted to DeepSparse after QAT HOT 4
- RecursionError when converting LlaMa model to ONNX HOT 6
- Error converting mistral to onnx HOT 13
- SparseML/YOLOv5s - ValueError: Unable to find any modifiers in given recipe. HOT 1
- Feature Request: Oriented Bounding Box Sparsification for YOLOv5/YOLOv8 on Custom Models/Datasets HOT 1
- [Roadmap] SparseML Roadmap Q1 2024 HOT 1
- Regarding the execution speed and model size after Sparsifying ResNet-50 HOT 2
- Class Index change observed when validating a yolov5 pruned sparseml model HOT 2
- yolov5 sparse fine tuning error HOT 2
- [Roadmap] SparseML Roadmap Q2 2024
- Does Sparseml support Integer-Arithmetic-Only Inference? HOT 1
- recipe.yaml not found HOT 3
- Missing key(s) in state_dict: "model.0.conv.quant.activation_post_process.scale" HOT 5
- Performance Degradation in YOLOv8s Model Exported to ONNX via SparseML's Exporter HOT 4
- How to export a GPTQ model to ONNX to run in DeepSparse HOT 2
- Sparsify custom pytorch models from scratch HOT 5
- Remove QuantizeLinear/DequantizeLinear of ONNX model HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparseml.