Giter Club home page Giter Club logo

Comments (5)

ianfinley89 avatar ianfinley89 commented on June 18, 2024 1

Name: nm-yolov5
Version: 1.5.0.60200

from sparseml.

ianfinley89 avatar ianfinley89 commented on June 18, 2024

I've learned thatthis happens with and without the --cache argument

  • If I use this training setup it breaks at the QAT phase:
!sparseml.yolov5.train \
  --weights "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe-args '{"num_epochs":12, "quantization_epochs":10}' \
  --data VisDrone.yaml \
  --batch-size 64 \
  --save-period 10 \
  --cfg yolov5s.yaml \
  --hyp hyps/hyp.finetune.yaml
Adjusted gradient clipping threshold to 10.0

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       1/11        13G    0.08168    0.02643    0.01815        437        640: 1
                 Class     Images  Instances          P          R      mAP50   WARNING ⚠️ NMS time limit 6.900s exceeded
                 Class     Images  Instances          P          R      mAP50   WARNING ⚠️ NMS time limit 6.900s exceeded
                 Class     Images  Instances          P          R      mAP50   WARNING ⚠️ NMS time limit 6.900s exceeded
                 Class     Images  Instances          P          R      mAP50   WARNING ⚠️ NMS time limit 6.900s exceeded
                 Class     Images  Instances          P          R      mAP50   WARNING ⚠️ NMS time limit 2.300s exceeded
                 Class     Images  Instances          P          R      mAP50   
                   all        548      38759     0.0232     0.0303     0.0168    0.00501
Neural Magic: Starting QAT phase
Neural Magic: Turning off EMA (not supported with QAT)
Neural Magic: Turning off AMP (not supported with QAT)
Traceback (most recent call last):
  File "/people/finl072/.conda/envs/n_magic/bin/sparseml.yolov5.train", line 8, in <module>
    sys.exit(train())
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/sparseml/yolov5/scripts.py", line 41, in train
    train_run(**vars(opt))
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 731, in run
    main(opt)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 631, in main
    train(opt.hyp, opt, device, callbacks)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 314, in train
    new_batch_size, new_accumulate = sparsification_manager.rescale_gradient_accumulation(
TypeError: SparsificationManager.rescale_gradient_accumulation() got an unexpected keyword argument 'image_size'
  • If I use the --cache argument, it also breaks at the QAT phase.
!sparseml.yolov5.train \
  --weights "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe-args '{"num_epochs":12, "quantization_epochs":10}' \
  --data VisDrone.yaml \
  --batch-size 64 \
  --save-period 10 \
  --cfg yolov5s.yaml \
  --cache \
  --hyp hyps/hyp.finetune.yaml
!sparseml.yolov5.train \
  --weights "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none?recipe_type=transfer_learn" \
  --recipe-args '{"num_epochs":12, "quantization_epochs":10}' \
  --data VisDrone.yaml \
  --batch-size 64 \
  --save-period 10 \
  --cfg yolov5s.yaml \
  --cache \
  --hyp hyps/hyp.finetune.yaml

Starting training for 12 epochs...
Adjusted gradient clipping threshold to 10.0

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       0/11        13G    0.09091    0.02234    0.02223        564        640: 1
                 Class     Images  Instances          P          R      mAP50   
                   all        548      38759    0.00333     0.0256    0.00214   0.000509
Adjusted gradient clipping threshold to 10.0

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       1/11        13G    0.08168    0.02643    0.01815        437        640: 1
                 Class     Images  Instances          P          R      mAP50   
                   all        548      38759     0.0215     0.0856      0.026    0.00806
Neural Magic: Starting QAT phase
Neural Magic: Turning off EMA (not supported with QAT)
Neural Magic: Turning off AMP (not supported with QAT)
Traceback (most recent call last):
  File "/people/finl072/.conda/envs/n_magic/bin/sparseml.yolov5.train", line 8, in <module>
    sys.exit(train())
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/sparseml/yolov5/scripts.py", line 41, in train
    train_run(**vars(opt))
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 731, in run
    main(opt)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 631, in main
    train(opt.hyp, opt, device, callbacks)
  File "/people/finl072/.conda/envs/n_magic/lib/python3.10/site-packages/yolov5/train.py", line 314, in train
    new_batch_size, new_accumulate = sparsification_manager.rescale_gradient_accumulation(
TypeError: SparsificationManager.rescale_gradient_accumulation() got an unexpected keyword argument 'image_size'

I also discovered that training needs to not start at the same epoch as the QAT phase

  • i.e. num_epochs needs to be > quantization epochs**

from sparseml.

bfineran avatar bfineran commented on June 18, 2024

Hi @ianfinley89, what version of nm-yolov5 do you have installed? This issue should have been resolved a few releases ago

from sparseml.

ianfinley89 avatar ianfinley89 commented on June 18, 2024

I have finally got this working, however, I had to completely uninstall sparseml with the yolov5 extensions (so torch, vision, audio), deepsparse, and nm-yolov5. I had to essentially start back from scratch to get around this issue because upgrading never changed any of my versions/alleviated the problem.

from sparseml.

bfineran avatar bfineran commented on June 18, 2024

Hi @ianfinley89 thanks for the update - looks like getting to the latest release was able to get the patch in you needed, closing

from sparseml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.