Giter Club home page Giter Club logo

pytorch-lr-finder's Introduction

PyTorch learning rate finder

codecov

A PyTorch implementation of the learning rate range test detailed in Cyclical Learning Rates for Training Neural Networks by Leslie N. Smith and the tweaked version used by fastai.

The learning rate range test is a test that provides valuable information about the optimal learning rate. During a pre-training run, the learning rate is increased linearly or exponentially between two boundaries. The low initial learning rate allows the network to start converging and as the learning rate is increased it will eventually be too large and the network will diverge.

Typically, a good static learning rate can be found half-way on the descending loss curve. In the plot below that would be lr = 0.002.

For cyclical learning rates (also detailed in Leslie Smith's paper) where the learning rate is cycled between two boundaries (start_lr, end_lr), the author advises the point at which the loss starts descending and the point at which the loss stops descending or becomes ragged for start_lr and end_lr respectively. In the plot below, start_lr = 0.0002 and end_lr=0.2.

Learning rate range test

Installation

Python 3.5 and above:

pip install torch-lr-finder

Install with the support of mixed precision training (see also this section):

pip install torch-lr-finder -v --global-option="apex"

Implementation details and usage

Tweaked version from fastai

Increases the learning rate in an exponential manner and computes the training loss for each learning rate. lr_finder.plot() plots the training loss versus logarithmic learning rate.

from torch_lr_finder import LRFinder

model = ...
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot() # to inspect the loss-learning rate graph
lr_finder.reset() # to reset the model and optimizer to their initial state

Leslie Smith's approach

Increases the learning rate linearly and computes the evaluation loss for each learning rate. lr_finder.plot() plots the evaluation loss versus learning rate. This approach typically produces more precise curves because the evaluation loss is more susceptible to divergence but it takes significantly longer to perform the test, especially if the evaluation dataset is large.

from torch_lr_finder import LRFinder

model = ...
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.1, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, val_loader=val_loader, end_lr=1, num_iter=100, step_mode="linear")
lr_finder.plot(log_lr=False)
lr_finder.reset()

Notes

  • Examples for CIFAR10 and MNIST can be found in the examples folder.
  • The optimizer passed to LRFinder should not have an LRScheduler attached to it.
  • LRFinder.range_test() will change the model weights and the optimizer parameters. Both can be restored to their initial state with LRFinder.reset().
  • The learning rate and loss history can be accessed through lr_finder.history. This will return a dictionary with lr and loss keys.
  • When using step_mode="linear" the learning rate range should be within the same order of magnitude.
  • LRFinder.range_test() expects a pair of input, label to be returned from the DataLoader objects passed to it. The input must be ready to be passed to the model and the label must be ready to be passed to the criterion without any further data processing/handling/conversion. If you find yourself needing a workaround you can make use of the classes TrainDataLoaderIter and ValDataLoaderIter to perform any data processing/handling/conversion inbetween the DataLoader and the training/evaluation loop. You can find an example of how to use these classes in examples/lrfinder_cifar10_dataloader_iter.

Additional support for training

Gradient accumulation

You can set the accumulation_steps parameter in LRFinder.range_test() with a proper value to perform gradient accumulation:

from torch.utils.data import DataLoader
from torch_lr_finder import LRFinder

desired_batch_size, real_batch_size = 32, 4
accumulation_steps = desired_batch_size // real_batch_size

dataset = ...

# Beware of the `batch_size` used by `DataLoader`
trainloader = DataLoader(dataset, batch_size=real_batch_size, shuffle=True)

model = ...
criterion = ...
optimizer = ...

# (Optional) With this setting, `amp.scale_loss()` will be adopted automatically.
# model, optimizer = amp.initialize(model, optimizer, opt_level='O1')

lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=10, num_iter=100, step_mode="exp", accumulation_steps=accumulation_steps)
lr_finder.plot()
lr_finder.reset()

Mixed precision training

Both apex.amp and torch.amp are supported now, here are the examples:

  • Using apex.amp:

    from torch_lr_finder import LRFinder
    from apex import amp
    
    # Add this line before running `LRFinder`
    model, optimizer = amp.initialize(model, optimizer, opt_level='O1')
    
    lr_finder = LRFinder(model, optimizer, criterion, device='cuda', amp_backend='apex')
    lr_finder.range_test(trainloader, end_lr=10, num_iter=100, step_mode='exp')
    lr_finder.plot()
    lr_finder.reset()
  • Using torch.amp

    from torch_lr_finder import LRFinder
    
    amp_config = {
        'device_type': 'cuda',
        'dtype': torch.float16,
    }
    grad_scaler = torch.cuda.amp.GradScaler()
    
    lr_finder = LRFinder(
        model, optimizer, criterion, device='cuda',
        amp_backend='torch', amp_config=amp_config, grad_scaler=grad_scaler
    )
    lr_finder.range_test(trainloader, end_lr=10, num_iter=100, step_mode='exp')
    lr_finder.plot()
    lr_finder.reset()

Note that the benefit of mixed precision training requires a nvidia GPU with tensor cores (see also: NVIDIA/apex #297)

Besides, you can try to set torch.backends.cudnn.benchmark = True to improve the training speed. (but it won't work for some cases, you should use it at your own risk)

Contributing and pull requests

All contributions are welcome but first, have a look at CONTRIBUTING.md.

pytorch-lr-finder's People

Contributors

alexgrig avatar chawater avatar davidtvs avatar marrrcin avatar michelml avatar mpaepper avatar naleraphael avatar pabloppp avatar scottclowe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-lr-finder's Issues

device issue says its bool not torch.deivce but i have printed the device as well it is torch.device("cuda:0")

Traceback (most recent call last):
File "example_copy.py", line 31, in
lr_finder = LRFinder(model, optimizer, criterion)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py", line 166, in init
self.state_cacher.store("model", self.model.state_dict())
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py", line 624, in store
self.cached.update({key: copy.deepcopy(state_dict)})
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/copy.py", line 306, in _reconstruct
value = deepcopy(value, memo)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/copy.py", line 161, in deepcopy
y = copier(memo)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch/nn/parameter.py", line 32, in deepcopy
result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch/nn/parameter.py", line 153, in new
data = torch.tensor([], **factory_kwargs)
TypeError: tensor(): argument 'device' must be torch.device, not bool

How do I get the lr_finder to run multiple batches for each "iteration", as defined by `num_iter` in `lr_finder.range_test()`?

How do I get the lr_finder to run multiple batches for each "iteration"? Logic being that running multiple batches would give a more precise result.
Based on the naming, I'd assumed that num_iter in lr_finder.range_test() would control the number of batches/iterations for each value of lr in the given range. However, num_iter controls the number of unique lr values to test within the given interval, running only 1 batch through the network.

TrainDataLoadIter post-process network prediction

Hello. Currently the *DataLoadIter classes allow us to do some custom pre-processing of the (x, y) pairs with the "inputs_labels_from_batch" method.

I have a network where I do some post-processing on the output of the network, e.g. (simplified):

x, y = next(train_sampler)

Y_hat = model(x)
y_hat = custom_func(Y_hat)

loss = mse(y_hat, y)

Could/should this be an option of the data loader classes, to have a "output_labels_from_batch" such that we can post-process the model forward() output?

Thanks.

Issue with DataLoader with lr_finder.range_test

I try to use:

class CustomTrainIter(TrainDataLoaderIter):
    def inputs_labels_from_batch(self, batch_data):
        return batch_data["img"], batch_data["target"]

to work with DataLoader for the lr_finder.range_test() but still got the error:
TypeError: list indices must be integers or slices, not str

TypeError                                 Traceback (most recent call last)
<ipython-input-60-b2a8b27d6c88> in <module>()
      3 optim = torch.optim.Adam(model_ft.parameters(), lr=1e-7, weight_decay=1e-2)
      4 lr_finder = LRFinder(model_ft,optim, criterion, device='cuda')
----> 5 lr_finder.range_test( custom_train_iter ,end_lr=100,num_iter=100)
      6 lr_finder.plot()
      7 lr_finder.reset()

3 frames
/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    318                 train_iter,
    319                 accumulation_steps,
--> 320                 non_blocking_transfer=non_blocking_transfer,
    321             )
    322             if val_loader:

/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    369         self.optimizer.zero_grad()
    370         for i in range(accumulation_steps):
--> 371             inputs, labels = next(train_iter)
    372             inputs, labels = self._move_to_device(
    373                 inputs, labels, non_blocking=non_blocking_transfer

/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in __next__(self)
     57         try:
     58             batch = next(self._iterator)
---> 59             inputs, labels = self.inputs_labels_from_batch(batch)
     60         except StopIteration:
     61             if not self.auto_reset:

<ipython-input-58-f89d28995874> in inputs_labels_from_batch(self, batch_data)
      4 
      5 
----> 6         return batch_data["img"], batch_data["target"]
      7 
      8 custom_train_iter = CustomTrainIter(train_dl)

TypeError: list indices must be integers or slices, not str

Any suggestion ? thanks !

LR Finder for RNN network

Hi,

I want to find lr for my RNN network which will have sequence as an input. When i try with torch_lr_finder it throws error as

File "/usr/local/lib/python3.5/dist-packages/torch_lr_finder/lr_finder.py", line 125, in range_test
    inputs, labels = next(iterator)
ValueError: too many values to unpack (expected 2)

Can you help me to get over this error
@davidtvs

Steepest gradiant value

i want to be able to pull the value of steepest gradiant and use it in my code as a integer value.
i see in the code it is under lr_finder.plot() but i am not abke to just assign min_grad to anything
can you please help me

LR Finder doesn't restore original model weights?

Hey! I love this repo, thanks for making it 💯

Everything works well except for one thing, after some digging around/experimenting, here's what I've found:

Below are some figures for the training loss and training accuracy (on MNIST, using a resnet18).

Problem:

  1. Using LRFinder on a model, and then training with it afterwards appears to hurt the models learning (see pink curve below).

Solution:

  1. Using LRFinder on a model, and manually restoring the weights, appears to train the model optimally. (see green curve below).
  2. Using LRFinder on a clone of the model, and then using the original model for training, appears to train the model optimally. (see green curve below).

Regarding the figure/graphs below, both models used the same hyperparameters.

An in-code example of option 1) would be similar to what was given in the README.md:

from torch_lr_finder import LRFinder

model = ...
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()

// Then use "model" for training

An in-code example of option 3) would be:

from torch_lr_finder import LRFinder

model = ...
temp_model = *create model with same architecture*
// copy weights over
temp_model.load_state_dict(model.state_dict)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
// use temp model in lr_finder
lr_finder = LRFinder(temp_model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()

image

ERROR: torchvision 0.4.2 has requirement torch==1.3.1, but you'll have torch 1.2.0 which is incompatible.

Executing in Google Colab the command,

!pip install https://download.pytorch.org/whl/cu100/torch-1.2.0-cp36-cp36m-manylinux1_x86_64.whl && pip install https://download.pytorch.org/whl/cu100/torchvision-0.4.0-cp36-cp36m-manylinux1_x86_64.whl

found in https://colab.research.google.com/drive/1BhWYtLFOa24wisNckt9i6rQhBKurVWWV

gives the error

ERROR: torchvision 0.4.2 has requirement torch==1.3.1, but you'll have torch 1.2.0 which is incompatible..

By the way, executing !cat /usr/local/cuda/version.txt gives
CUDA Version 10.0.130

Please advise.

Thanks,

Vassilis

Validation loader flat loss

I copied your example notebook to colab and ran the code without changing anything. But the validation loss I get goes flat, which is clearly a mistake when compared to your example. I also experienced this with my other networks which do the same, the loss just goes flat.

You can see my results from colab and your example in the figures below.

EDIT: If I replace val_iter with val_loader inside loss = self._validate(...) it does seem to "work" as I'd expect. So somewhere there seems to be a mistake in how the val_iter is iterated.

Colab Your Example notebook
colab notebook

how to define num_iter?

hi davidtvs,
when use lr find, I find the loss curve is a little different when use different num_iter.
for example, image num is 6700, batch size is 252, when use num_iter=27, the loss decreases obviouslyfrom 1e-4 to 1e-3, but when num_iter =270, it decreases obviously from 1e-4 to 5e-4,when num_iter=540, it decreases obviously from 1e-4 to 8e-4, so , I am not sure which loss is correct?
@davidtvs
Thanks! it is a really good tool

How can use this lib for auto-encoders?

In manual of this library the examples are for a unified model but auto-encoders are made of two parts:encoder and decoder. How can I use this library for auto-encoder?

Multiple Input Support

Hey,
is there an elegant way to use a multiple input model with lr_finder?
Given forward looks like this:

def forward(self, x, x_embds):
    ...

and my DataLoader looks like this:

train_loader = DataLoader(TensorDataset(xtrain, xtrain_emb, ytrain), batch_size=BATCHSIZE, shuffle=True)

I want to separate numerical inputs from variables that will be used as embeddings in the model. Therefore the DataLoader yields the numerical variables (xtrain), the variable to be embedded (xtrain_emb) separately, and of course the labels y.
In this case, my lr_finder called like this:

lrf = LRFinder(net, optim, criterion)
lrf.range_test(train_loader, val_loader, start_lr=0.00001, end_lr=1)
lrf.plot()
lrf.reset()

gives this stack trace because it does not pass the "additional" component from the DataLoader (xtrain_emb) to the forward method:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/TestProject/pytorch/torch_net.py in 
      200 if 1:
      201     lrf = LRFinder(net, optim, criterion)
----> 202     lrf.range_test(train_loader, val_loader, start_lr=0.00001, end_lr=1)
      203     lrf.plot()
      204     lrf.reset()

~/miniconda3/envs/py38/lib/python3.8/site-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    315         for iteration in tqdm(range(num_iter)):
    316             # Train on batch and retrieve loss
--> 317             loss = self._train_batch(
    318                 train_iter,
    319                 accumulation_steps,

~/miniconda3/envs/py38/lib/python3.8/site-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    375 
    376             # Forward pass
--> 377             outputs = self.model(inputs)
    378             loss = self.criterion(outputs, labels)
    379 

~/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

TypeError: forward() missing 1 required positional argument: 'x_embds'

Issue #18 mentions a scenario in which the DataLoader yields additional outputs, but it seems like the additional inputs are still not used. As a workaround I might be able to pass a single design matrix X through the loader and forward and then, within the forward method, extract the column to be embedded. This seems like a not-so-nice workaround though.

Question: Number of iterations

I tried the LR finder with 100 and with 1000 iterations (all other parameters staying the same) and got very different recommendations for the LR - for 100 iterations it was 1.2e-3, and for 1000 iterations 3.3e-5. I tried training with both of these, and they don't produce optimal results compared to a more aggressive learning rate - 5e-2 (which is actually found by the 100 iterations to be the maximum LR).

What would be the ideal number of iterations that you would recommend? Would this number be calculated using model / dataset size in any way? I get the feeling that doing more iterations in general affects the finder, because the maximum learning rate is found faster. I did not look at the code, but I guess that the weights are not reset after each iteration - do you think it would be good to reset them, such that the previous iterations don't affect subsequent ones?

Can we plot the learning rate vs accuracy and get LR at max accuracy using your library - need for SuperConvergence

Hi David @davidtvs,

Is there a way we can plot the learning rate vs accuracy and get LR at max accuracy using your library?

I am trying to use SuperConvergence(https://arxiv.org/pdf/1708.07120.pdf) by Leslie N. Smith. So I am using PyTorch's OneCycleLR scheduler for this. And it is expecting max_lr value.

I used your lr-finder but it is plotting between loss curve and learning rates and suggesting LR at steepest descent. But I am looking for learning rate vs accuracy and get LR at maximum accuracy.

Please suggest to me.

Thanks in advance,
Naga Pavan

Lr-finder with multiple inputs, outputs and losses

Hello,

Firstly, thank you for this wonderful library.
I have a model which expects 2 inputs. I am working with 2 kinds of images, one of size (512, 1536) and the other of size (128, 384). Therefore, my train_loader contains 2 inputs and one target of shape (128, 384, 16). My model has 4 prediction heads and hence is trained using 4 losses for different purposes.

So my collate_fn for the data loader looks like this:

def detection_collate(batch):
    """Custom collate fn for dealing with batches of images that have a different
    number of associated object annotations (bounding boxes).
    Arguments:
        batch: (tuple) A tuple of tensor images and lists of annotations
    Return:
        A tuple containing:
            1) (tensor) batch of images stacked on their 0 dim
            2) (list of tensors) annotations for a given image are stacked on
                                 0 dim
    """
    targets = []
    imgs = []
    deps = []
    for sample in batch:
        imgs.append(sample[0])
        deps.append(sample[1])
        targets.append(sample[2])
    return torch.stack(imgs, 0), torch.stack(deps, 0), torch.stack(targets, 0)

As mentioned, there are 4 different losses: Custom Heatmap (Focal) loss, SmoothL1, SmoothL1, BCE loss.

The forward method of the model expects 2 inputs. A small snippet is shown below:

 def forward(self, x, dep=None, target=None):
        # Backbone: ResNet18, x is image size: (512, 1536)

Here, targets are the labels so to say.

In this case, how do I go about finding the best learning rate using lr-finder?
Notably, I can only use batch_size=2 because of the computational limitations.

Suggested LR not returned when min_grad_idx is 0 in plot()

When using the plot function in a situation like the following:
starting_lr_fulltrain_DEBUG
where the first value in lrs[min_grad_idx] is the suggested learning rate, even if the suggested learning rate is printed, it is not returned.

Expected behavior: return ax, lrs[min_grad_idx]
Observed behavior: return ax

These seem to be the relevant lines. Seems that ax is returned due to min_grad_idx evaluating to False (because it is 0):

if suggest_lr and min_grad_idx:
return ax, lrs[min_grad_idx]
else:
return ax

New release where .plot accepts ac

Hey,

I really enjoy using this handy package. Could you please make a new pip installable version where the plot method accepts the ax argument, this would be super helpful!

Thanks and best regards,

Fabio

Distributed training with ddp

Thanks for the work! I'd like to know if distributed training is supported, like the DataParallel and DistributedDataParallel, is it compatible in this work?

Flat loss

Hey guys,
I'm trying to find the optimal range of learning rates for Facial Expression classification but the problem is that I'm getting a flat LR from 1e-6 to 1e-2 and then it just shoots up and diverges. The flat curve occurs at loss of 1.4, which doesn't seem too low. So does it mean there's a problem with my dataset to it's due to something else?
Thanks

TypeError: DataLoaderIterWrapper object is not an iterator ?

class DataLoaderIterWrapper(object):
     def __init__(self, data_loader, auto_reset=True):
        self.data_loader = data_loader
        self.auto_reset = auto_reset
        self._iterator = iter(data_loader)

    def __next__(self):
        # Get a new set of inputs and labels
        try:
            # inputs, labels, *_ = next(self._iterator)
            inputs, labels = next(self._iterator)
        except StopIteration:
            if not self.auto_reset:
                raise
            self._iterator = iter(self.data_loader)
            # inputs, labels, *_ = next(self._iterator)
            inputs, labels = next(self._iterator)

        return inputs, labels

LRFinder w/ Gradient Accumulation

Great package! Thank you for sharing :)

  1. I was wondering if you plan on adding gradient accumulation support for using LRFinder with a larger batch size.
  2. Will you be adding mixed precision support?

LR finder for optimizing a single input tensor?

I have an optimization task that optimizes a single tensor by passing it through a set of transforms and then into the model. Losses are then calculated by using hooks attached to various model layers.

Is is possible to use this project for finding the optimal LR for my optimization task? The code looks like it requires a DataLoader instance.

How to find the best lr?

Hi, I am new to this package. I have done the things here. It gives me set of loss functions with their corresponding lr. But I don't know how to find the best lr. I would like to know if there is a method to automatically find the proposed best lr within this package.

Update class DataLoaderIterWrapper to accommodate extra returned values

At https://github.com/davidtvs/pytorch-lr-finder/blob/master/torch_lr_finder/lr_finder.py#L453

inputs, labels = next(self._iterator)

Could it be changed to

inputs, labels, *rest = next(self._iterator)

to accommodate cases where more values are returned?

For example, if a weighted loss is used, then also the weights need to be returned to calculate the loss. This weighted loss can be found in U-Net original paper. Another example is to return the training data file name being used; this is useful for debugging.

LR finder for regression problems

Is this code useable on regression problems, or only classification problems? I've been trying to get it to work for sometime with no success.

Obtaining ValueError on m-BERT even after using TrainDataLoaderIter

I am training a multilingual-bert model for a sentiment classification task. My torch dataset returns a dictionary. I tried to run lr_finder.range_test(....) with and without TrainDataLoaderIter but I get the same ValueError both times.

Torch Dataset

class JigsawDataset:
    def __init__(self, df, train_transforms = None):
        self.comment_text = df["comment_text"].values
        self.target = df["toxic"].values
        self.tokenizer = config.BERT_TOKENIZER
        self.max_len = config.MAX_LEN
        self.langs = df["lang"].values
        self.train_transforms = train_transforms

    def __len__(self):
        return len(self.comment_text)

    def __getitem__(self, item):
        comment_text = str(self.comment_text[item])
        comment_text = " ".join(comment_text.split())
        lang = self.langs[item]
        
        if self.train_transforms:
            comment_text, _ = self.train_transforms(data=(comment_text, lang))['data']

        inputs = self.tokenizer.encode_plus(
            comment_text,
            None,
            add_special_tokens=True,
            max_length=self.max_len,
            pad_to_max_length=True,
            truncation=True
        )

        ids = inputs["input_ids"]
        mask = inputs["attention_mask"]
        token_type_ids = inputs["token_type_ids"]

        data_loader_dict = {}
        data_loader_dict["ids"] = torch.tensor(ids, dtype=torch.long)
        data_loader_dict["mask"] = torch.tensor(mask, dtype=torch.long)
        data_loader_dict["token_type_ids"] = torch.tensor(token_type_ids, dtype=torch.long)
        data_loader_dict["targets"] = torch.tensor(self.target[item], dtype=torch.float)
        
        return data_loader_dict

Run Function

%%time

def run():

    class CustomTrainIter(TrainDataLoaderIter):
        def input_labels_from_batch(self, batch_data):
            return batch_data["ids"], batch_data["mask"], batch_data["token_type_ids"], batch_data["targets"]
    
    def loss_fn(outputs, targets):
        return nn.BCEWithLogitsLoss()(outputs, targets.view(-1, 1))

    def train_fn(data_loader, model, optimizer, device,):
        
        model, optimizer, data_loader = accelerator.prepare(model, optimizer, data_loader)
        model.train()

        for bi, d in tqdm(enumerate(data_loader), total=len(data_loader)):
            ids = d["ids"]
            token_type_ids = d["token_type_ids"]
            mask = d["mask"]
            targets = d["targets"]

            ids = ids.to(device, dtype=torch.long)
            token_type_ids = token_type_ids.to(device, dtype=torch.long)
            mask = mask.to(device, dtype=torch.long)
            targets = targets.to(device, dtype=torch.float)
            
            optimizer.zero_grad()
            outputs = model(ids=ids, mask=mask, token_type_ids=token_type_ids)

            loss = loss_fn(outputs, targets)
            
            if bi % 1000 == 0:
                print(f"bi={bi}, loss={loss}")

            accelerator.backward(loss)
            optimizer.step()

    def eval_fn(data_loader, model, device):
        model.eval()
        fin_targets = []
        fin_outputs = []

        with torch.no_grad():
            for bi, d in tqdm(enumerate(data_loader), total=len(data_loader)):
                ids = d["ids"]
                token_type_ids = d["token_type_ids"]
                mask = d["mask"]
                targets = d["targets"]

                ids = ids.to(device, dtype=torch.long)
                token_type_ids = token_type_ids.to(device, dtype=torch.long)
                mask = mask.to(device, dtype=torch.long)
                targets = targets.to(device, dtype=torch.float)

                outputs = model(ids=ids, mask=mask, token_type_ids=token_type_ids)
                fin_targets.extend(targets.cpu().detach().numpy().tolist())
                fin_outputs.extend(torch.sigmoid(outputs).cpu().detach().numpy().tolist())
        return fin_outputs, fin_targets

    df1 = pd.read_csv(
        "/workspace/data/jigsaw-multilingual/input/jigsaw-data/jigsaw-toxic-comment-train.csv", 
        usecols = ["comment_text", "toxic"]    
    )
    
    df1 = df1.head(1000)

    df2 = pd.read_csv(
        "/workspace/data/jigsaw-multilingual/input/jigsaw-data/jigsaw-unintended-bias-train.csv",
        usecols = ["comment_text", "toxic"]
    )
    
    df2 = df2.head(1000)

    df_train = pd.concat([df1, df2], axis = 0).reset_index(drop = True)
    df_train["comment_text"] = df_train["comment_text"].apply(clean_text)

    df_valid = pd.read_csv("/workspace/data/jigsaw-multilingual/input/jigsaw-data/Translated Datasets/jigsaw_miltilingual_valid_translated.csv")
    df_valid["comment_text"] = df_valid["translated"]
    df_valid.drop("translated", axis = 1, inplace = True)
    df_valid["comment_text"] = df_valid["comment_text"].apply(clean_text)


    nlp_transform = NLPTransform()

    df_train['lang'] = 'en'
    non_toxic_sentences = set()
    for comment_text in tqdm(df_train['comment_text'], total=df.shape[0]):
        non_toxic_sentences.update(nlp_transform.get_sentences(comment_text), 'en')

    transform = AddNonToxicSentencesTransform(non_toxic_sentences=list(non_toxic_sentences), p=1.0, sentence_range=(1,2))
           
    train_dataset = JigsawDataset(
       df =  df_train,
       train_transforms = get_train_transforms()
    )

    train_data_loader = torch.utils.data.DataLoader(
        train_dataset, 
        batch_size=config.TRAIN_BATCH_SIZE, 
        num_workers=4
    )

    valid_dataset = JigsawDataset(
        df = df_valid,
    )

    valid_data_loader = torch.utils.data.DataLoader(
        valid_dataset, 
        batch_size=config.VALID_BATCH_SIZE, 
        num_workers=1
    )

    device = torch.device(config.DEVICE)
    model = BERTModel()

    param_optimizer = list(model.named_parameters())
    no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
    optimizer_parameters = [
        {
            "params": [
                p for n, p in param_optimizer if not any(nd in n for nd in no_decay)
            ],
            "weight_decay": 0.001,
        },
        {
            "params": [
                p for n, p in param_optimizer if any(nd in n for nd in no_decay)
            ],
            "weight_decay": 0.0,
        },
    ]

    num_train_steps = int(len(df_train) / config.TRAIN_BATCH_SIZE * config.EPOCHS)
    optimizer = AdamW(optimizer_parameters, lr=config.LEARNING_RATE)
    
    criterion = nn.BCEWithLogitsLoss()
    lr_finder = LRFinder(
        model, 
        optimizer, 
        criterion, 
        device = config.DEVICE
    )
    
    custom_train_iter = CustomTrainIter(train_data_loader)
    
    lr_finder.range_test(
        custom_train_iter, 
        end_lr = 10, 
        num_iter = 100, 
        step_mode = "exp"
    )

    best_accuracy = 0
    for epoch in range(config.EPOCHS):
        
        print(f"----------EPOCH: {epoch}----------")
        train_fn(train_data_loader, model, optimizer, device)
        outputs, targets = eval_fn(valid_data_loader, model, device)
        targets = np.array(targets) >= 0.5
        accuracy = metrics.roc_auc_score(targets, outputs)
        print(f"----------ROC AUC Score = {accuracy}----------")
        print()
        if accuracy > best_accuracy:
            torch.save(model.state_dict(), config.MODEL_PATH)
            best_accuracy = accuracy

if name == "main":
run()

Error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<timed exec> in <module>

<timed exec> in run()

/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    318                 train_iter,
    319                 accumulation_steps,
--> 320                 non_blocking_transfer=non_blocking_transfer,
    321             )
    322             if val_loader:

/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    369         self.optimizer.zero_grad()
    370         for i in range(accumulation_steps):
--> 371             inputs, labels = next(train_iter)
    372             inputs, labels = self._move_to_device(
    373                 inputs, labels, non_blocking=non_blocking_transfer

/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in __next__(self)
     57         try:
     58             batch = next(self._iterator)
---> 59             inputs, labels = self.inputs_labels_from_batch(batch)
     60         except StopIteration:
     61             if not self.auto_reset:

/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in inputs_labels_from_batch(self, batch_data)
     34                 "Your batch type is not supported: {}. Please inherit from "
     35                 "`TrainDataLoaderIter` or `ValDataLoaderIter` and override the "
---> 36                 "`inputs_labels_from_batch` method.".format(type(batch_data))
     37             )
     38 

ValueError: Your batch type is not supported: <class 'dict'>. Please inherit from `TrainDataLoaderIter` or `ValDataLoaderIter` and override the `inputs_labels_from_batch` method.

How to use in the command, not notebook?

Thanks for grate tool!

I want to find best lr or save the fig.

I will run this wonderful library at the python command, not jupyter notebook.

how can I save the fig or find the best lr?

thanks.

please make it installable using pip

Currently it can't be installed with pip install git+https://github.com/davidtvs/pytorch-lr-finder

Collecting git+https://github.com/davidtvs/pytorch-lr-finder
  Cloning https://github.com/davidtvs/pytorch-lr-finder to /tmp/pip-req-build-bies1fy1
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/a_yaroshevich/anaconda3/envs/rnd/lib/python3.6/tokenize.py", line 452, in open
        buffer = _builtin_open(filename, 'rb')
    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-req-build-bies1fy1/setup.py'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-bies1fy1/

For it to work you need to create setup.py and list there your dependencies.plus modules need to be correct

RuntimeError: Expected object of scalar type Long but got scalar type Byte for argument #2 'target'

With optimizer = torch.optim.Adam( model.parameters(), lr = learning_rate, weight_decay = weight_decay)

criterion = nn.CrossEntropyLoss( weight = None, ignore_index = ignore_index, reduce = False)

and then executing

lr_finder = LRFinder(model, optimizer, criterion, device="cuda") lr_finder.range_test( dataLoader[ 'train'], end_lr=100, num_iter=100) lr_finder.plot() # to inspect the loss-learning rate graph lr_finder.reset() # to reset the model and optimizer to their initial state

I am getting the error,

RuntimeError: Expected object of scalar type Long but got scalar type Byte for argument #2 'target'

Please find below the whole trace.
So far training my models with the above optimizer and criterion I do not have any problem.

image

TypeError: forward() missing 1 required positional argument: 'labels'

I've been following and making all the necessary changes required to run the lr_finder.range_test(). However, I'm still facing this error!
Here's my code defining the Dataset class:


class HappyWhaleDataset(Dataset):
    def __init__(self, df, transforms=None):
        self.df = df
        self.file_names = df['file_path'].values
        self.labels = df['individual_id'].values
        self.transforms = transforms
        
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        img_path = self.file_names[index]
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        label = self.labels[index]
        
        if self.transforms:
            img = self.transforms(image=img)["image"]
            
        return {
            'image': img,
            'label': torch.tensor(label, dtype=torch.long)
        }


def prepare_loaders(df, fold):
    df_train = df[df.kfold != fold].reset_index(drop=True)
    df_valid = df[df.kfold == fold].reset_index(drop=True)
    
    train_dataset = HappyWhaleDataset(df_train, transforms=data_transforms["train"])
    valid_dataset = HappyWhaleDataset(df_valid, transforms=data_transforms["valid"])

    train_loader = DataLoader(train_dataset, batch_size=CONFIG['train_batch_size'], 
                              num_workers=2, shuffle=True, pin_memory=True, drop_last=True)
    valid_loader = DataLoader(valid_dataset, batch_size=CONFIG['valid_batch_size'], 
                              num_workers=2, shuffle=False, pin_memory=True)
    
    return train_loader, valid_loader

train_loader, valid_loader = prepare_loaders(df, fold=0)

Note: Model training goes without error when I'm just creating a usual train_loader with the above code.

class CustomTrainIter(TrainDataLoaderIter):
    def inputs_labels_from_batch(self, batch_data):
        return batch_data["image"], batch_data["label"]
    
custom_loader = CustomTrainIter(train_loader)

lr_finder = LRFinder(model, optimizer, criterion, device=CONFIG['device'])
lr_finder.range_test(custom_loader, end_lr=1, num_iter=100, step_mode="linear")
lr_finder.plot(log_lr=False)
lr_finder.reset()
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_34/1446799792.py in <module>
      6 
      7 lr_finder = LRFinder(model, optimizer, criterion, device=CONFIG['device'])
----> 8 lr_finder.range_test(custom_loader, end_lr=1, num_iter=100, step_mode="linear")
      9 lr_finder.plot(log_lr=False)
     10 lr_finder.reset()

/opt/conda/lib/python3.7/site-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    318                 train_iter,
    319                 accumulation_steps,
--> 320                 non_blocking_transfer=non_blocking_transfer,
    321             )
    322             if val_loader:

/opt/conda/lib/python3.7/site-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    375 
    376             # Forward pass
--> 377             outputs = self.model(inputs)
    378             loss = self.criterion(outputs, labels)
    379 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

TypeError: forward() missing 1 required positional argument: 'labels'

'dict' object has no attribute 'param_groups'

I am facing the following error, any suggestions?

py3.8.egg/torch_lr_finder/lr_finder.py", line 361, in _check_for_scheduler
AttributeError: 'dict' object has no attribute 'param_groups'

The code is a simple one

        lr_finder = LRFinder(models, optimizers, criterion, device="cuda")
        lr_finder.range_test(train_loader, end_lr=100, num_iter=100, step_mode='exp')
        lr_finder.plot(log_lr=False) # to inspect the loss-learning rate graph
        lr_finder.reset()

Multi-output regression problem

Hi,

My model has several outputs from the forward method:

def forward(self, x):
       ---code---
       return ClCd, angle

This returns a tuple, which LR finder does not like. I get the following error message:

if not (target.size() == input.size()):
AttributeError: 'tuple' object has no attribute 'size'

Is there a way for LR finder to work with tuples?
Alternatively, should I be structuring the output from my forward method differently (i.e. using a single output tensor)? I tried outputting a single tensor with two columns from my forward method (each column representing an output), but this gave significantly worse results in training.

Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

I sent the model to cuda and froze certain layers:

  model = prep_model(args)
  model.cuda()
  freeze_layers(model, [True, True, False])

Then I did:

lr_finder = AccumulationLRFinder(
            model, run.optimizer, criterion, 
            accumulation_steps=accumulation_steps
        )
lr_finder.range_test(train_loader,end_lr=10, num_iter=100, step_mode="exp")
lr_finder.plot()
lr_finder.reset()

Note that when I don't send model in gpu and add the device parameter to lr_finder, it works. My question is why isn't the input being sent to gpu as that is the device on which the model is mounted?

I'm getting a blank graph

I running a semantic segmentation model, Deeplabv3+ with a modified CrossEntropyLoss and either SGD or Adam optimizer.
When I run the LRFinder, I get a blank graph. No losses seen. Even though I printed the losses and the criterion is def returning valid values.

Sweeping across start_lr = 1e-07 and end_lr = 0.0001
  0%|                                                                                                                          | 0/10 [00:00<?, ?it/s]
loss:  tensor(89984., device='cuda:0', grad_fn=<DivBackward0>)
 10%|███████████▍                                                                                                      | 1/10 [00:06<00:54,  6.01s/it]
loss:  tensor(1588043.6250, device='cuda:0', grad_fn=<DivBackward0>)
 20%|██████████████████████▊                                                                                           | 2/10 [00:09<00:40,  5.12s/it]
loss:  tensor(420687.0938, device='cuda:0', grad_fn=<DivBackward0>)
 30%|██████████████████████████████████▏                                                                               | 3/10 [00:12<00:31,  4.50s/it]
loss:  tensor(653955.4375, device='cuda:0', grad_fn=<DivBackward0>)
 40%|█████████████████████████████████████████████▌                                                                    | 4/10 [00:15<00:24,  4.07s/it]
loss:  tensor(141592.6875, device='cuda:0', grad_fn=<DivBackward0>)
 50%|█████████████████████████████████████████████████████████                                                         | 5/10 [00:18<00:18,  3.76s/it]
loss:  tensor(97450.2891, device='cuda:0', grad_fn=<DivBackward0>)
 60%|████████████████████████████████████████████████████████████████████▍                                             | 6/10 [00:21<00:14,  3.55s/it]
loss:  tensor(160497.9375, device='cuda:0', grad_fn=<DivBackward0>)
 70%|███████████████████████████████████████████████████████████████████████████████▊                                  | 7/10 [00:24<00:10,  3.44s/it]
loss:  tensor(151121.3594, device='cuda:0', grad_fn=<DivBackward0>)
 80%|███████████████████████████████████████████████████████████████████████████████████████████▏                      | 8/10 [00:27<00:06,  3.38s/it]
loss:  tensor(123211.6484, device='cuda:0', grad_fn=<DivBackward0>)
 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████▌           | 9/10 [00:31<00:03,  3.40s/it]
loss:  tensor(98576.7578, device='cuda:0', grad_fn=<DivBackward0>)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:34<00:00,  3.43s/it]
Learning rate search finished. See the graph with {finder_name}.plot()

Lemme know what other details I can attach.

My criterion:

def cross_entropy2d(logit, target, ignore_index=255, weight=None, batch_average=True):
    """
    The loss is

    .. math::
        \sum_{i=1}^{\\infty} x_{i}

        `(minibatch, C, d_1, d_2, ..., d_K)`

    Args:
        logit (Tensor): Output of network
        target (Tensor): Ground Truth
        ignore_index (int, optional): Defaults to 255. The pixels with this labels do not contribute to loss
        weight (List, optional): Defaults to None. Weight assigned to each class
        batch_average (bool, optional): Defaults to True. Whether to consider the loss of each element in the batch.

    Returns:
        Float: The value of loss.
    """

    n, c, h, w = logit.shape
    target = target.squeeze(1)

    if weight is None:
        criterion = nn.CrossEntropyLoss(weight=weight, ignore_index=ignore_index, reduction='sum')
    else:
        criterion = nn.CrossEntropyLoss(weight=torch.tensor(weight, dtype=torch.float32),
                                        ignore_index=ignore_index,
                                        reduction='sum')

    loss = criterion(logit, target.long())

    if batch_average:
        loss /= n

    return loss

Help with lr-finder working with transformers?

I am in need of a tool like this for a particular problem that is very sensitive to the LR. I am, however, unable to get this package to work with any transformer model unfortunately.

My error is as below and I am wondering if you have any insight!

from torch_lr_finder import LRFinder
import torch.optim as optim
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
model = XLMRobertaForSequenceClassification.from_pretrained("xlm-roberta-base", num_labels=3).cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-decc9b6c423b> in <module>
----> 1 lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")

~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    284                 train_iter,
    285                 accumulation_steps,
--> 286                 non_blocking_transfer=non_blocking_transfer,
    287             )
    288             if val_loader:

~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    342             # Forward pass
    343             outputs = self.model(inputs)
--> 344             loss = self.criterion(outputs, labels)
    345 
    346             # Loss should be averaged in each step

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    724             result = self._slow_forward(*input, **kwargs)
    725         else:
--> 726             result = self.forward(*input, **kwargs)
    727         for hook in itertools.chain(
    728                 _global_forward_hooks.values(),

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
    946     def forward(self, input: Tensor, target: Tensor) -> Tensor:
    947         return F.cross_entropy(input, target, weight=self.weight,
--> 948                                ignore_index=self.ignore_index, reduction=self.reduction)
    949 
    950 

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2420     if size_average is not None or reduce is not None:
   2421         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   2423 
   2424 

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in log_softmax(input, dim, _stacklevel, dtype)
   1589         dim = _get_softmax_dim('log_softmax', input.dim(), _stacklevel)
   1590     if dtype is None:
-> 1591         ret = input.log_softmax(dim)
   1592     else:
   1593         ret = input.log_softmax(dim, dtype=dtype)

AttributeError: 'tuple' object has no attribute 'log_softmax'

How to use w/ LSTM

Hi,

I would like to use the lr-finder with an LSTM. in the forward step of my model I do:

for epoch in range(100):
    model.train()
    hidden = model.init_hidden(batch_size)
    total_loss = 0

    for data, target in dataloader:
        hidden = repackage_hidden(hidden)
        output, hidden = model(data, hidden)
        loss = loss_fn(output, target.view(-1))

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

My dataloader yields x, y where x is a sequence and y is the next step in the sequence (think language model), e.g.

data = [1, 2, 3, 4]
target = [2, 3, 4, 5]

Now when I try to do:

lr_finder.range_test(dataloader, end_lr=100, num_iter=100)

... I get the following error:

TypeError: forward() missing 1 required positional argument: 'hidden'

How can I pass hidden to the model using lr-finder?

Cannot determine `batch_size` from a list of string while running `range_test()` with `val_loader`

Hey @davidtvs, this issue is found while I was writing an example for utilizing this package with huggingface/transformers for #55 .

Condition

  • Input data: list of string (Dataset returns string)
  • Running range_test() with val_loader

Error message

---> 10 lr_finder.range_test(train_loader, val_loader=valid_loader, start_lr=1e-5, end_lr=10, num_iter=100, step_mode='linear')

1 frames

/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    288             if val_loader:
    289                 loss = self._validate(
--> 290                     val_iter, non_blocking_transfer=non_blocking_transfer
    291                 )
    292 

/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in _validate(self, val_iter, non_blocking_transfer)
    398 
    399                 if isinstance(inputs, tuple) or isinstance(inputs, list):
--> 400                     batch_size = inputs[0].size(0)
    401                 else:
    402                     batch_size = inputs.size(0)

AttributeError: 'str' object has no attribute 'size'

Description

In current implementation, batch_size is determined dynamically according to the shape of inputs in LRFinder._validate(). (v0.2.0) L399-L402 will work normally only when given inputs is a torch.tensor. And that's why it failed when inputs is a list of string.

Maybe it's not a usual case that Dataset returns non-torch.tensor values, but I think it would be more easier to access it from DataLoader.batch_size since it's going to iterate a val_loader in LRFinder._validate().

Hence that I proposed a fix for this in that notebook, it's simply add a line batch_size = val_iter.data_loader.batch_size before entering the loop and remove those if-else statement, you can check it out here.

But I'm having doubts about adding a property batch_size in DataLoaderIter, e.g.

class DataLoaderIter(object):
    # ...
    @property
    def batch_size(self):
        return self.data_loader.batch_size

With this property, proposed fix can be simplified a little into this:

class LRFinder(object):
    def _validate(self, val_iter, non_blocking_transfer=True):
        # Set model to evaluation mode and disable gradient computation
        running_loss = 0
        self.model.eval()

        with torch.no_grad():
            for inputs, labels in val_iter:
                # Move data to the correct device
                inputs, labels = self._move_to_device(
                    inputs, labels, non_blocking=non_blocking_transfer
                )

                # Forward pass and loss computation
                outputs = self.model(inputs)
                loss = self.criterion(outputs, labels)
                running_loss += loss.item() * val_iter.batch_size

        return running_loss / len(val_iter.dataset)

What do you think of it?

Is apex a must have or not?

In your README file apex seems to be an optinal requirement.

However, during import time an annoying message warning me that I don't have this module installed keeps poluting my code log.

I tried to reinstall the package using the recommended command and nothing changed at all. I also tried to use warning library to ignore the warning but the way it is implemented is using the Python logging library and I can't remove it this way.

plot not showing anything

I tried to use the package but when plotting the learning curve it doesn't show anything, just a plot with the labels but not curve

ValueError ValDataLoaderIter next() call missing

In the _validate function, you try to iterate through the elements of ValDataLoaderIter with a simple for loop:

for inputs, labels in val_iter:

This throws a 'ValueError: too many values to unpack' because it is attempting to unpack the entire dataloader which has way more then two elements. I think what you want here is just the next element in val_iter, similarly to your _train_batch function, so a loop over the length of val_iter with a call to next(val_iter) every loop or an enumerate(val_iter), no?

No Data Loader?

Is it possible to make it also compatible when there's no dataloader? My dataset is fully loaded on memory.

Plot not showing up

I'm seeing a suggested learning rate, but no plot when calling .plot() with all default arguments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.