Giter Club home page Giter Club logo

pytorch-retraining's Introduction

pytorch-retraining

Transfer Learning shootout for PyTorch's model zoo (torchvision).

  • Load any pretrained model with custom final layer (num_classes) from PyTorch's model zoo in one line
model_pretrained, diff = load_model_merged('inception_v3', num_classes)
  • Retrain minimal (as inferred on load) or a custom amount of layers on multiple GPUs. Optionally with Cyclical Learning Rate (Smith 2017).
final_param_names = [d[0] for d in diff]
stats = train_eval(model_pretrained, trainloader, testloader, final_params_names)
  • Chart training_time, evaluation_time (fps), top-1 accuracy for varying levels of retraining depth (shallow, deep and from scratch)
chart
Transfer learning on example dataset Bee vs Ants with 2xV100 GPUs

Results on more elaborate Dataset

num_classes = 23, slightly unbalanced, high variance in rotation and motion blur artifacts with 1xGTX1080Ti

chart_17
Constant LR with momentum
chart_17_clr
Cyclical Learning Rate

pytorch-retraining's People

Contributors

ahirner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-retraining's Issues

CUDA running out of memory

Hi

Thanks for this wonderful script. It is really helpful when testing various models!
I have issue of running out of memory in GPU. I know that this is NOT exactly a bug too. This is a CUDA memory issue.

Is there any way to reduce GPU memory usage. I only have 2 GB on my Geforce GTX 1050.

Only happens when training from scratch and training Deep

This is the error:

[29, 30] loss: nan [0.0044375000000000005]
[30, 30] loss: nan [0.0043333333333333392]
[31, 30] loss: nan [0.0011041666666666609]
[32, 30] loss: nan [0.0041250000000000002]
Finished Training
Evaluating...
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "retrain.py", line 380, in
CLR=use_clr)
File "retrain.py", line 322, in train_eval
stats_eval = evaluate_stats(net, testloader)
File "retrain.py", line 304, in evaluate_stats
outputs = net(Variable(images))
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 58, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib/python3.6/site-packages/torchvision/models/inception.py", line 81, in forward
x = self.Conv2d_2b_3x3(x)
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib/python3.6/site-packages/torchvision/models/inception.py", line 325, in forward
x = self.bn(x)
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 37, in forward
self.training, self.momentum, self.eps)
File "/usr/lib64/python3.6/site-packages/torch/nn/functional.py", line 639, in batch_norm
return f(input, weight, bias)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:66
[tomppa@localhost pytorch-retraining]$

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.22 Driver Version: 387.22 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:01:00.0 On | N/A |
| 54% 58C P0 N/A / 75W | 1942MiB / 1998MiB | 84% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1405 G /usr/libexec/Xorg 18MiB |
| 0 1444 G /usr/bin/gnome-shell 42MiB |
| 0 1776 G /usr/libexec/Xorg 114MiB |
| 0 1870 G /usr/bin/gnome-shell 87MiB |
| 0 6652 G gnome-control-center 1MiB |
| 0 7139 C python3 1665MiB |
+-----------------------------------------------------------------------------+

CUDA version:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

Densenet not iterable

Densenets are giving errors:

Targeting densenet201 with 2 classes

Traceback (most recent call last):
File "retrain.py", line 340, in
model_pretrained, diff = load_model_merged(name, num_classes)
TypeError: 'DenseNet' object is not iterable

num_batches_tracked error

In pytorch after 0.4.1 there is a num_batches_tracked layer in BN and not in model_zoo so there are errors in diff_states.

For own data.

I have a dataset consisting of 20 classes. How can I use this code for my dataset. Thanks in advanced.
Train-
class1
1.jpg

class2
              1.jpg

Test-
class1
1.jpg

   class2
             1.jpg

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.