Comments (13)
Update on my issue, it seems that the optimizer setter is not triggered by the callback: https://github.com/Lightning-AI/lightning/blob/618e1c8061753e767e7ae628cf55098b8fa6ad55/src/lightning/pytorch/strategies/strategy.py#L107
Meaning that the resulting LightningOptimizer
is not properly initialized by LightningOptimizer._to_lightning_optimizer
.
One quick hack would be to either modify the already existing object such as:
def on_fit_start(self, trainer: Trainer,
pl_module: LightningModule) -> None:
if len(trainer.strategy.optimizers) > 1:
raise MisconfigurationException(
"SparseML only supports training with one optimizer.")
optimizer = trainer.strategy.optimizers[0]
optimizer = self.manager.modify(
pl_module,
optimizer,
steps_per_epoch=trainer.estimated_stepping_batches,
epoch=0)
trainer.strategy._optimizers = [optimizer]
trainer.strategy._lightning_optimizers[0]._optimizer = optimizer
What do you think about this possible solution? I'm afraid this would break older versions of pytorch lightnings, as I did not tested on older versions.
from sparseml.
Hey @clementpoiret we added support for torch<2.2
on the latest sparseml-nightly
. Please give it a try and let me know if you need any other assistance
from sparseml.
Hi @clementpoiret, yes it looks like it does get overwritten here -
the reason why seems to be lost in the commit history, but we can dig a bit more. If you can make edits to your local install, you can try switching preserve to eval here.
Additionally, tou can try running an eval and benchmark on your exported model to see if it meets your accuracy/performance needs in case the warning is not material.
from sparseml.
Hi @clementpoiret
As it's been some time without further discussion, I am going to go ahead and close this thread. However, feel free to re-open if there is further comment.
Thank you! 👋🏼
Jeannie / Neural Magic
from sparseml.
Hey @clementpoiret I think the lightning integration isn't heavily used so I wouldn't have concern with backwards compatibility for that interface. My main concern would be with torch 2.1 affecting other flows. It sounds like that upgrade would still be needed for your DINO usecase, correct?
from sparseml.
Yes, unfortunately I can't export dinov2 in onnx using torch 2.0 as I am getting torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::scaled_dot_product_attention' to ONNX opset version 17 is not supported.
This issue has been fixed by the 2.1 update
from sparseml.
Hey @mgoin thanks for the update
I can confirm everything works on my side!
from sparseml.
Okay @mgoin, I'll correct what I said 😂 my fix of the lightning callback was just (I believe) incorrectly applying the optimizer, thus dismissing the error. Actually, I end up with non sparse models, meaning the optimizer step isn't applied correctly... So I still have the issue with sparseml 1.6.0: the ScheduledModifierManager and the LightningOptimizer classes have issues working together. I ended with the same issue as above : self._strategy is none.
from sparseml.
Okay, I believe it's just a mix of ambiguities here and there, what I said above is weird, I still don't understand why modifying the base optimizer reset lightning's strategy, but setting modifying the optimizer of the LightningOptimizer
seems to be okay now.
What happened is quite simple: trainer.estimated_stepping_batches
was incorrect, leading modifiers to be close to never applied, thus start_epoch
and end_epoch
being completely off. Manually indicating step_per_epoch fixed the issue and applied all modifiers.
My last weird error is a warning when exporting to ONNX, telling that I should disable constant folding if the training=TrainingMode.PRESERVE or TrainingMode.TRAINING. I don't get how to modify this behavior. I keep having the warning even when setting export_onnx(..., training=TrainingMode.EVAL)
or setting model.training = False
.
For the sake of reference, here is the updated SparseMLCallback:
from typing import Any, Optional
import torch
from lightning.pytorch import Callback, LightningModule, Trainer
from lightning.pytorch.utilities.exceptions import MisconfigurationException
from sparseml.pytorch.optim import ScheduledModifierManager
from sparseml.pytorch.utils import ModuleExporter
class SparseMLCallback(Callback):
"""Enables SparseML aware training. Requires a recipe to run during training.
Args:
recipe_path: Path to a SparseML compatible yaml recipe.
More information at https://docs.neuralmagic.com/sparseml/source/recipes.html
"""
def __init__(self,
recipe_path: str,
steps_per_epoch: Optional[int] = None) -> None:
self.manager = ScheduledModifierManager.from_yaml(recipe_path)
self.steps_per_epoch = steps_per_epoch
def on_fit_start(self, trainer: Trainer,
pl_module: LightningModule) -> None:
optimizer = trainer.strategy._lightning_optimizers[0]._optimizer
if len(trainer.optimizers) > 1:
raise MisconfigurationException(
"SparseML only supports training with one optimizer.")
if self.steps_per_epoch is None:
self.steps_per_epoch = trainer.estimated_stepping_batches
optimizer = self.manager.modify(pl_module,
optimizer,
steps_per_epoch=self.steps_per_epoch,
epoch=0)
trainer.strategy._lightning_optimizers[0]._optimizer = optimizer
def on_fit_end(self, trainer: Trainer, pl_module: LightningModule) -> None:
self.manager.finalize(pl_module)
@staticmethod
# TODO: check for TrainingMode.EVAL
def export_to_sparse_onnx(model: LightningModule,
output_dir: str,
sample_batch: Optional[torch.Tensor] = None,
name: str = "model.onnx",
opset: int = 14,
disable_bn_fusing: bool = True,
convert_qat: bool = True,
**export_kwargs: Any) -> None:
"""Exports the model to ONNX format."""
exporter = ModuleExporter(model, output_dir=output_dir)
sample_batch = sample_batch if sample_batch is not None else model.example_input_array # type: ignore[assignment] # noqa: E501
if sample_batch is None:
raise MisconfigurationException(
"To export the model, a sample batch must be passed via "
"``SparseMLCallback.export_to_sparse_onnx(model, output_dir, sample_batch=sample_batch)`` "
"or an ``example_input_array`` property within the LightningModule"
)
exporter.export_onnx(
sample_batch=sample_batch,
name=name,
opset=opset,
disable_bn_fusing=disable_bn_fusing,
convert_qat=convert_qat,
**export_kwargs,
)
from sparseml.
Hi @clementpoiret does the export with the warning still produce a valid ONNX model? Could you also paste the warning if you have it?
from sparseml.
@bfineran it sounds like it is valid. Here is the code I use to save the model:
# Export to ONNX
clf.eval()
clf.training = False
sparseml.export_to_sparse_onnx(
clf,
output_dir="./sparseml_models/",
name="sparse_model.onnx",
sample_batch=torch.randn(1, 1, 28, 28),
# opset_version=17,
opset=14,
disable_bn_fusing=False,
convert_qat=True,
input_names=["input"],
output_names=["output"],
dynamic_axes={
"input": {
0: "batch_size"
},
"output": {
0: "batch_size"
},
},
)
And the warning:
/home/clementpoiret/micromamba/envs/torch211/lib/python3.10/site-packages/torch/onnx/utils.py:823: UserWarning: It is recommended that constant folding be turned off ('d
o_constant_folding=False') when exporting the model in training-amenable mode, i.e. with 'training=TrainingMode.TRAIN' or 'training=TrainingMode.PRESERVE' (when model is
in training mode). Otherwise, some learnable model parameters may not translate correctly in the exported ONNX model because constant folding mutates model parameters.
Please consider turning off constant folding or setting the training=TrainingMode.EVAL.
passing training=TrainingMode.EVAL has no effect as it seems overwritten later on.
from sparseml.
Thanks for your answer. Do you know the practical implications of saving a model in training mode? Will the onnx file be bigger or the inference slower?
from sparseml.
Not sure of any specific examples, but if the model behaves differently in training vs eval (batch norm updates, dropout, etc) these operations may be represented in the trace
from sparseml.
Related Issues (20)
- Search for models in the sparsezoo using architecture_name HOT 7
- Unexpected keyword krgument 'image_size' HOT 5
- Got error on YOLOv8n `sparseml.ultralytics.train` train starting HOT 2
- Llama 2 sparsity support HOT 1
- My own model HOT 1
- Question on quantization size HOT 2
- Add ScheduledModifierManager.from_str HOT 1
- Adding a `.pre-commit-config.yaml` file for maintaining consistent style and code quality. HOT 3
- Oriented Bounding Box support HOT 1
- Sparse ML not working for Transformers HOT 3
- Models with loops in their graph can't be converted to DeepSparse after QAT HOT 4
- RecursionError when converting LlaMa model to ONNX HOT 6
- Error converting mistral to onnx HOT 13
- SparseML/YOLOv5s - ValueError: Unable to find any modifiers in given recipe. HOT 1
- Feature Request: Oriented Bounding Box Sparsification for YOLOv5/YOLOv8 on Custom Models/Datasets HOT 1
- [Roadmap] SparseML Roadmap Q1 2024 HOT 1
- Regarding the execution speed and model size after Sparsifying ResNet-50 HOT 2
- Class Index change observed when validating a yolov5 pruned sparseml model HOT 2
- yolov5 sparse fine tuning error HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparseml.