#params = 151111638 #non emb params = 41066400 | epoch 1 step 50 |

Some last things to check: In the ba

sophiah in https://github.com/booydar/LM-RMT,about kozistr/pytorch_optimizer

Comments (8)

i404788 commented on May 31, 2024 1

Strange that it triggers only after so many steps seems like it would be a pytorch/sync issue.

Just wanted to say, if you are using Cross-Entropy loss (for LM) SophiaG variant is more efficient (since it's just squaring the gradient, see https://github.com/Liuhong99/Sophia/blob/19f45d30723bbffcce3d18e4e858d95b0f36dbb6/sophia.py#L56), you can use it like so (not tested):

hessian = list(map(lambda p: p.grad * p.grad, model.parameters()))
opt.step(hessian=hessian)

This also skips the 2nd order gradient calculation, so it could resolve your issue.

EDIT: you also need to filter out the non-trainable & sparse parameters so it would be more like:

hessian = [p.grad*p.grad for p in model.parameters() if p.requires_grad and p.grad is not None and not p.grad.is_sparse]
opt.step(hessian=hessian)

from pytorch_optimizer.

Vectorrent commented on May 31, 2024 1

Thank you for the quick response! I have applied your example to my own code (to the best of my ability), and while we're making progress, training bombs with a new error after reaching the first update_period:

Traceback (most recent call last):
  File "/src/trainer.py", line 403, in <module>
    ai.train(
  File "/usr/local/lib/python3.10/dist-packages/aitextgen/aitextgen.py", line 804, in train
    trainer.fit(train_model)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 532, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 571, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 980, in _run
    results = self._run_stage()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 1023, in _run_stage
    self.fit_loop.run()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run
    self.advance()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 355, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
    self.advance(data_fetcher)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 221, in advance
    batch_output = self.manual_optimization.run(kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/manual.py", line 91, in run
    self.advance(kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/manual.py", line 111, in advance
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 294, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 380, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aitextgen/train.py", line 59, in training_step
    opt.step()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/optimizer.py", line 161, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 231, in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/plugins/precision/precision_plugin.py", line 116, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/optimizer/sophia.py", line 92, in step
    self.compute_hutchinson_hessian(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/base/optimizer.py", line 100, in compute_hutchinson_hessian
    h_zs = torch.autograd.grad(grads, params, grad_outputs=zs, retain_graph=i < num_samples - 1)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 303, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 4 of tensors does not require grad and does not have a grad_fn

I don't suspect this is the cause, but there is a warning at the beginning of training:

/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:200: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at ../torch/csrc/autograd/engine.cpp:1151.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

It may be relevant to know that I am using the Huggingface PEFT library for LoRA training. I don't suspect that is the issue either, since all that really does is add some extra layers to the model, and freeze all the other layers.

I will troubleshoot some more when I get the chance. It's been a long day already, and I need to take a break. Thank you for the help thus far, and for maintaining such a useful library!

from pytorch_optimizer.

kozistr commented on May 31, 2024

hmm, I guess it's not the optimizer problem, but maybe Pytorch autograd internal or the training code (e.g. model, loss, etc) issue.

I just found that a similar error occurs when the loss function is CPU-version loss.

maybe, some modules are not on the same device or there're unreachable graphs (leading to not-backprop-able).

from pytorch_optimizer.

robotzheng commented on May 31, 2024

SophiaG worked, but the perfomace is not better than Adam, maybe because of the bias. So I want to try SophiaH, which hasn't the bias.

from pytorch_optimizer.

i404788 commented on May 31, 2024

Some last things to check:

In the backward call you have create_graph=True
No batch accumulation (makes create_graph very expensive)

If this is all correct then it pretty much has to be a bug in pytorch (or the training code).

from pytorch_optimizer.

Vectorrent commented on May 31, 2024

I have been running into a similar error message. I've been trying to use SophiaH with Lightning AI's automatic_optimization feature, but it always fails:

Traceback (most recent call last):
  File "/src/trainer.py", line 403, in <module>
    ai.train(
  File "/usr/local/lib/python3.10/dist-packages/aitextgen/aitextgen.py", line 804, in train
    trainer.fit(train_model)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 532, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 571, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 980, in _run
    results = self._run_stage()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 1023, in _run_stage
    self.fit_loop.run()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run
    self.advance()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 355, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
    self.advance(data_fetcher)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 219, in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/automatic.py", line 188, in run
    self._optimizer_step(kwargs.get("batch_idx", 0), closure)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/automatic.py", line 266, in _optimizer_step
    call._call_lightning_module_hook(
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 146, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/module.py", line 1276, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/optimizer.py", line 161, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 231, in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/plugins/precision/precision_plugin.py", line 116, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/optimizer/sophia.py", line 92, in step
    self.compute_hutchinson_hessian(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/base/optimizer.py", line 100, in compute_hutchinson_hessian
    h_zs = torch.autograd.grad(grads, params, grad_outputs=zs, retain_graph=i < num_samples - 1)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 303, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

If I iterate through every parameter group, and set requires_grad() to True, then I go OOM immediately at the "update_period" step:

for n, p in self.model.named_parameters():
    p.requires_grad = True

If I set requires_grad() to False, then training will progress - but the model never learns anything.

If "requires_grad" in unset for ANY parameter group, I get the original error message.

I am unsure how to proceed at this point, but I would greatly appreciate any advice you have to offer.

from pytorch_optimizer.

kozistr commented on May 31, 2024

I have been running into a similar error message. I've been trying to use SophiaH with Lightning AI's automatic_optimization feature, but it always fails:

Traceback (most recent call last):
  File "/src/trainer.py", line 403, in <module>
    ai.train(
  File "/usr/local/lib/python3.10/dist-packages/aitextgen/aitextgen.py", line 804, in train
    trainer.fit(train_model)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 532, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 571, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 980, in _run
    results = self._run_stage()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 1023, in _run_stage
    self.fit_loop.run()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run
    self.advance()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py", line 355, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
    self.advance(data_fetcher)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 219, in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/automatic.py", line 188, in run
    self._optimizer_step(kwargs.get("batch_idx", 0), closure)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/optimization/automatic.py", line 266, in _optimizer_step
    call._call_lightning_module_hook(
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 146, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/module.py", line 1276, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/optimizer.py", line 161, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 231, in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/plugins/precision/precision_plugin.py", line 116, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/optimizer/sophia.py", line 92, in step
    self.compute_hutchinson_hessian(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_optimizer/base/optimizer.py", line 100, in compute_hutchinson_hessian
    h_zs = torch.autograd.grad(grads, params, grad_outputs=zs, retain_graph=i < num_samples - 1)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 303, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

If I iterate through every parameter group, and set requires_grad() to True, then I go OOM immediately at the "update_period" step:

for n, p in self.model.named_parameters():
    p.requires_grad = True

If I set requires_grad() to False, then training will progress - but the model never learns anything.

If "requires_grad" in unset for ANY parameter group, I get the original error message.

I am unsure how to proceed at this point, but I would greatly appreciate any advice you have to offer.

Hello!

SophiaH optimizer needs to be set create_graph=True when calling backward(). means that automatic_optimization should be set False!

here's an example.

import os
from torch import optim, nn, utils, Tensor
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import lightning.pytorch as pl

from pytorch_optimizer import SophiaH
from torch.optim import Optimizer

# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))

class LitAutoEncoder(pl.LightningModule):
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

        self.automatic_optimization = False

    def training_step(self, batch, batch_idx):
        opt = self.optimizers()
        opt.zero_grad()
        
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        
        loss = nn.functional.mse_loss(x_hat, x)

        # important
        self.manual_backward(loss, create_graph=True)
        opt.step()
        
        self.log("train_loss", loss)

    def configure_optimizers(self):
        return SophiaH(self.parameters())

dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)

autoencoder = LitAutoEncoder(encoder, decoder)

trainer = pl.Trainer(limit_train_batches=100, max_epochs=1)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)

from pytorch_optimizer.

Vectorrent commented on May 31, 2024

Alright, well I was able to test your example MNIST code, and it does work. So I know this isn't an environment issue.

I removed PEFT as well, and tried standard fine-tuning. I also tried a couple of different models (GPT-2 and GPT-Neo), from Huggingface Transformers library. All ran into the same problem with "tensors does not require grad and does not have a grad_fn".

I'm sure the issue has to do with my training code. I'm carrying some legacy baggage, and I don't really have the proper skill set to know how to optimize manually (which is why I've relied on automatic_optimization until now). I haven't given up, but I probably am going to move on for now. I appreciate your help.

from pytorch_optimizer.

sophiah in https://github.com/booydar/LM-RMT about pytorch_optimizer HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent