Describe the bug During sparmseml yolov5 transfer learning, once

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Unexpected keyword krgument 'image_size' about sparseml HOT 5 CLOSED

ianfinley89 commented on June 18, 2024

Unexpected keyword krgument 'image_size'

from sparseml.

Comments (5)

ianfinley89 commented on June 18, 2024 1

Name: nm-yolov5
Version: 1.5.0.60200

from sparseml.

ianfinley89 commented on June 18, 2024

I've learned thatthis happens with and without the --cache argument

If I use this training setup it breaks at the QAT phase:

!sparseml.yolov5.train \
  --weights "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe-args '{"num_epochs":12, "quantization_epochs":10}' \
  --data VisDrone.yaml \
  --batch-size 64 \
  --save-period 10 \
  --cfg yolov5s.yaml \
  --hyp hyps/hyp.finetune.yaml

Adjusted gradient clipping threshold to 10.0

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       1/11        13G    0.08168    0.02643    0.01815        437        640: 1
                 Class     Images  Instances          P          R      mAP50   WARNING ⚠️ NMS time limit 6.900s exceeded
                 Class     Images  Instances          P          R      mAP50   WARNING ⚠️ NMS time limit 6.900s exceeded
                 Class     Images  Instances          P          R      mAP50   WARNING ⚠️ NMS time limit 6.900s exceeded
                 Class     Images  Instances          P          R      mAP50   WARNING ⚠️ NMS time limit 6.900s exceeded
                 Class     Images  Instances          P          R      mAP50   WARNING ⚠️ NMS time limit 2.300s exceeded
                 Class     Images  Instances          P          R      mAP50   
                   all        548      38759     0.0232     0.0303     0.0168    0.00501
Neural Magic: Starting QAT phase
Neural Magic: Turning off EMA (not supported with QAT)
Neural Magic: Turning off AMP (not supported with QAT)
Traceback (most recent call last):
  File "/people/finl072/.conda/envs/n_magic/bin/sparseml.yolov5.train", line 8, in <module>
    sys.exit(train())
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/sparseml/yolov5/scripts.py", line 41, in train
    train_run(**vars(opt))
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 731, in run
    main(opt)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 631, in main
    train(opt.hyp, opt, device, callbacks)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 314, in train
    new_batch_size, new_accumulate = sparsification_manager.rescale_gradient_accumulation(
TypeError: SparsificationManager.rescale_gradient_accumulation() got an unexpected keyword argument 'image_size'

If I use the --cache argument, it also breaks at the QAT phase.

!sparseml.yolov5.train \
  --weights "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe-args '{"num_epochs":12, "quantization_epochs":10}' \
  --data VisDrone.yaml \
  --batch-size 64 \
  --save-period 10 \
  --cfg yolov5s.yaml \
  --cache \
  --hyp hyps/hyp.finetune.yaml

!sparseml.yolov5.train \
  --weights "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe-args '{"num_epochs":12, "quantization_epochs":10}' \
  --data VisDrone.yaml \
  --batch-size 64 \
  --save-period 10 \
  --cfg yolov5s.yaml \
  --cache \
  --hyp hyps/hyp.finetune.yaml

Starting training for 12 epochs...
Adjusted gradient clipping threshold to 10.0

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       0/11        13G    0.09091    0.02234    0.02223        564        640: 1
                 Class     Images  Instances          P          R      mAP50   
                   all        548      38759    0.00333     0.0256    0.00214   0.000509
Adjusted gradient clipping threshold to 10.0

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       1/11        13G    0.08168    0.02643    0.01815        437        640: 1
                 Class     Images  Instances          P          R      mAP50   
                   all        548      38759     0.0215     0.0856      0.026    0.00806
Neural Magic: Starting QAT phase
Neural Magic: Turning off EMA (not supported with QAT)
Neural Magic: Turning off AMP (not supported with QAT)
Traceback (most recent call last):
  File "/people/finl072/.conda/envs/n_magic/bin/sparseml.yolov5.train", line 8, in <module>
    sys.exit(train())
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/sparseml/yolov5/scripts.py", line 41, in train
    train_run(**vars(opt))
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 731, in run
    main(opt)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 631, in main
    train(opt.hyp, opt, device, callbacks)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 314, in train
    new_batch_size, new_accumulate = sparsification_manager.rescale_gradient_accumulation(
TypeError: SparsificationManager.rescale_gradient_accumulation() got an unexpected keyword argument 'image_size'

I also discovered that training needs to not start at the same epoch as the QAT phase

i.e. num_epochs needs to be > quantization epochs**

from sparseml.

bfineran commented on June 18, 2024

Hi @ianfinley89, what version of nm-yolov5 do you have installed? This issue should have been resolved a few releases ago

from sparseml.

ianfinley89 commented on June 18, 2024

I have finally got this working, however, I had to completely uninstall sparseml with the yolov5 extensions (so torch, vision, audio), deepsparse, and nm-yolov5. I had to essentially start back from scratch to get around this issue because upgrading never changed any of my versions/alleviated the problem.

from sparseml.

bfineran commented on June 18, 2024

Hi @ianfinley89 thanks for the update - looks like getting to the latest release was able to get the patch in you needed, closing

from sparseml.

Unexpected keyword krgument 'image_size' about sparseml HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent