Comments (8)
Strange that it triggers only after so many steps seems like it would be a pytorch/sync issue.
Just wanted to say, if you are using Cross-Entropy loss (for LM) SophiaG variant is more efficient (since it's just squaring the gradient, see https://github.com/Liuhong99/Sophia/blob/19f45d30723bbffcce3d18e4e858d95b0f36dbb6/sophia.py#L56), you can use it like so (not tested):
hessian = list(map(lambda p: p.grad * p.grad, model.parameters()))
opt.step(hessian=hessian)
This also skips the 2nd order gradient calculation, so it could resolve your issue.
EDIT: you also need to filter out the non-trainable & sparse parameters so it would be more like:
hessian = [p.grad*p.grad for p in model.parameters() if p.requires_grad and p.grad is not None and not p.grad.is_sparse]
opt.step(hessian=hessian)
from pytorch_optimizer.
Thank you for the quick response! I have applied your example to my own code (to the best of my ability), and while we're making progress, training bombs with a new error after reaching the first update_period
:
Traceback (most recent call last):
File "/src/trainer.py", line 403, in <module>
ai.train(
File "/usr/local/lib/python3.10/dist-packages/aitextgen/aitextgen.py", line 804, in train
trainer.fit(train_model)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 532, in fit
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 571, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 980, in _run
results = self._run_stage()
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 1023, in _run_stage
self.fit_loop.run()
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run
self.advance()
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 355, in advance
self.epoch_loop.run(self._data_fetcher)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
self.advance(data_fetcher)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 221, in advance
batch_output = self.manual_optimization.run(kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/manual.py", line 91, in run
self.advance(kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/manual.py", line 111, in advance
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 294, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 380, in training_step
return self.model.training_step(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/aitextgen/train.py", line 59, in training_step
opt.step()
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/optimizer.py", line 161, in step
step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 231, in optimizer_step
return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/plugins/precision/precision_plugin.py", line 116, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/optimizer/sophia.py", line 92, in step
self.compute_hutchinson_hessian(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/base/optimizer.py", line 100, in compute_hutchinson_hessian
h_zs = torch.autograd.grad(grads, params, grad_outputs=zs, retain_graph=i < num_samples - 1)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 303, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 4 of tensors does not require grad and does not have a grad_fn
I don't suspect this is the cause, but there is a warning at the beginning of training:
/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:200: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at ../torch/csrc/autograd/engine.cpp:1151.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
It may be relevant to know that I am using the Huggingface PEFT library for LoRA training. I don't suspect that is the issue either, since all that really does is add some extra layers to the model, and freeze all the other layers.
I will troubleshoot some more when I get the chance. It's been a long day already, and I need to take a break. Thank you for the help thus far, and for maintaining such a useful library!
from pytorch_optimizer.
hmm, I guess it's not the optimizer problem, but maybe Pytorch autograd internal or the training code (e.g. model, loss, etc) issue.
I just found that a similar error occurs when the loss function is CPU-version loss.
maybe, some modules are not on the same device or there're unreachable graphs (leading to not-backprop-able).
from pytorch_optimizer.
SophiaG worked, but the perfomace is not better than Adam, maybe because of the bias. So I want to try SophiaH, which hasn't the bias.
from pytorch_optimizer.
Some last things to check:
- In the
backward
call you havecreate_graph=True
- No batch accumulation (makes create_graph very expensive)
If this is all correct then it pretty much has to be a bug in pytorch (or the training code).
from pytorch_optimizer.
I have been running into a similar error message. I've been trying to use SophiaH with Lightning AI's automatic_optimization
feature, but it always fails:
Traceback (most recent call last):
File "/src/trainer.py", line 403, in <module>
ai.train(
File "/usr/local/lib/python3.10/dist-packages/aitextgen/aitextgen.py", line 804, in train
trainer.fit(train_model)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 532, in fit
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 571, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 980, in _run
results = self._run_stage()
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 1023, in _run_stage
self.fit_loop.run()
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run
self.advance()
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 355, in advance
self.epoch_loop.run(self._data_fetcher)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
self.advance(data_fetcher)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 219, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/automatic.py", line 188, in run
self._optimizer_step(kwargs.get("batch_idx", 0), closure)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/automatic.py", line 266, in _optimizer_step
call._call_lightning_module_hook(
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 146, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/module.py", line 1276, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/optimizer.py", line 161, in step
step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 231, in optimizer_step
return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/plugins/precision/precision_plugin.py", line 116, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/optimizer/sophia.py", line 92, in step
self.compute_hutchinson_hessian(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/base/optimizer.py", line 100, in compute_hutchinson_hessian
h_zs = torch.autograd.grad(grads, params, grad_outputs=zs, retain_graph=i < num_samples - 1)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 303, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
If I iterate through every parameter group, and set requires_grad()
to True, then I go OOM immediately at the "update_period" step:
for n, p in self.model.named_parameters():
p.requires_grad = True
If I set requires_grad()
to False, then training will progress - but the model never learns anything.
If "requires_grad" in unset for ANY parameter group, I get the original error message.
I am unsure how to proceed at this point, but I would greatly appreciate any advice you have to offer.
from pytorch_optimizer.
I have been running into a similar error message. I've been trying to use SophiaH with Lightning AI's
automatic_optimization
feature, but it always fails:Traceback (most recent call last): File "/src/trainer.py", line 403, in <module> ai.train( File "/usr/local/lib/python3.10/dist-packages/aitextgen/aitextgen.py", line 804, in train trainer.fit(train_model) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 532, in fit call._call_and_handle_interrupt( File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 571, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 980, in _run results = self._run_stage() File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 1023, in _run_stage self.fit_loop.run() File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run self.advance() File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 355, in advance self.epoch_loop.run(self._data_fetcher) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run self.advance(data_fetcher) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 219, in advance batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/automatic.py", line 188, in run self._optimizer_step(kwargs.get("batch_idx", 0), closure) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/automatic.py", line 266, in _optimizer_step call._call_lightning_module_hook( File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 146, in _call_lightning_module_hook output = fn(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/module.py", line 1276, in optimizer_step optimizer.step(closure=optimizer_closure) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/optimizer.py", line 161, in step step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 231, in optimizer_step return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs) File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/plugins/precision/precision_plugin.py", line 116, in optimizer_step return optimizer.step(closure=closure, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 69, in wrapper return wrapped(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 280, in wrapper out = func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/optimizer/sophia.py", line 92, in step self.compute_hutchinson_hessian( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/base/optimizer.py", line 100, in compute_hutchinson_hessian h_zs = torch.autograd.grad(grads, params, grad_outputs=zs, retain_graph=i < num_samples - 1) File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 303, in grad return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
If I iterate through every parameter group, and set
requires_grad()
to True, then I go OOM immediately at the "update_period" step:for n, p in self.model.named_parameters(): p.requires_grad = True
If I set
requires_grad()
to False, then training will progress - but the model never learns anything.If "requires_grad" in unset for ANY parameter group, I get the original error message.
I am unsure how to proceed at this point, but I would greatly appreciate any advice you have to offer.
Hello!
SophiaH
optimizer needs to be set create_graph=True
when calling backward()
. means that automatic_optimization
should be set False
!
here's an example.
import os
from torch import optim, nn, utils, Tensor
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import lightning.pytorch as pl
from pytorch_optimizer import SophiaH
from torch.optim import Optimizer
# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))
class LitAutoEncoder(pl.LightningModule):
def __init__(self, encoder, decoder):
super().__init__()
self.encoder = encoder
self.decoder = decoder
self.automatic_optimization = False
def training_step(self, batch, batch_idx):
opt = self.optimizers()
opt.zero_grad()
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
# important
self.manual_backward(loss, create_graph=True)
opt.step()
self.log("train_loss", loss)
def configure_optimizers(self):
return SophiaH(self.parameters())
dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)
autoencoder = LitAutoEncoder(encoder, decoder)
trainer = pl.Trainer(limit_train_batches=100, max_epochs=1)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)
from pytorch_optimizer.
Alright, well I was able to test your example MNIST code, and it does work. So I know this isn't an environment issue.
I removed PEFT as well, and tried standard fine-tuning. I also tried a couple of different models (GPT-2 and GPT-Neo), from Huggingface Transformers library. All ran into the same problem with "tensors does not require grad and does not have a grad_fn".
I'm sure the issue has to do with my training code. I'm carrying some legacy baggage, and I don't really have the proper skill set to know how to optimize manually (which is why I've relied on automatic_optimization until now). I haven't given up, but I probably am going to move on for now. I appreciate your help.
from pytorch_optimizer.
Related Issues (20)
- Prodigy: An Expeditiously Adaptive Parameter-Free Learner HOT 2
- LOMO: LOw-Memory Optimization HOT 1
- Can a variant of Lion named Tiger be added to your package? HOT 2
- sophiah bug HOT 5
- Adding the CAME optimizer HOT 2
- Lookahead is not a subclass of torch.optim.Optimizer HOT 4
- Empty Docs Sections HOT 6
- Request to add 4-bit AdamW HOT 3
- ipex failed for Adan from pytorch_optimizer HOT 1
- Improvement to SAM: SAM as an Optimal Relaxation of Bayes HOT 1
- FR: Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term (WSAM) HOT 2
- Ranger21 has undocumented required arguments HOT 3
- [Feature request]REX LR scheduler HOT 2
- Aida optimizer HOT 3
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection HOT 1
- Adalite HOT 1
- ScheduleFree
- Entropy-MCMC: Sampling from flat basins with ease
- Ranger sign inversion HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch_optimizer.