Giter Club home page Giter Club logo

skipnet's People

Contributors

fyu avatar richardliaw avatar xinw1012 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

skipnet's Issues

Typo in forward of ResNetRecurrentGateSP

There is a typo in the forward method of ResNetRecurrentGateSP (could be elsewhere as well, haven't checked) under CIFAR, grob instead of gprob. Wrong values are returned up to the top function of the training loop, but thankfully they are not used.

 for g in range(3):
            for i in range(0 + int(g == 0), self.num_layers[g]):
                ...
                mask, grob = self.control(gate_feature)
                gprobs.append(gprob)
                ...

about the inference speed

Hi, thank you for your work. I encountered a problem and hope to get your help.
I first used the supervised pretraining (SP) part of SkipNet to train the model, and got a new model, but when I used PYTorch for the forward speed test, I found that the speed did not decrease. I tried to increase the https:/ /github.com/ucbdrive/skipnet/blob/master/imagenet/models.py#L278, the parameter value here is 0.5 to 0.9. It is found that the accuracy of the model on the test set is indeed reduced a lot, but the forward speed is still not improved, almost the same. Do I need to modify the model in inference?
I am very confused about where the problem is, I hope to get your help, thank you very much.

Confused about version of PyTorch

Hi,

Thanks for the excellent research and the readable code.

I am trying to run this repo with PyTorch 1.0, and get confusion about the version number.
As written in README, the PyTorch 2.0 is needed.
However, the newest stable version of PyTorch is 1.0.1, why and what is PyTorch 2.0?

Meanwhile, I still get an error about multinomial() in RLFeedforwardGateI.
I checked the PyTorch docs from 0.1.0 to 0.4.1, multinomial() function needs a parameter called num_samples which is not optional.
And the code in RLFeedforwardGateI was written like this:

    if self.training:
        action = softmax.multinomial()
        self.saved_action = action

without any parameter, and I got an error:

TypeError: multinomial() missing 1 required positional arguments: "num_samples"

I guess there should be

        action = softmax.multinomial(num_samples=1)

Thanks in advance.

bug when I train cifar10_rnn_gate_rl_38

Excuse me,I encountered the bug.Can you run the code using the command"python3 train_rl.py train cifar10_rnn_gate_rl_38 --resume resnet-38-rnn-sp-cifar10.pth.tar -d cifar10 --gate-type rnn
" normally?

The bug information lists as follow.
Traceback (most recent call last):
File "train_rl.py", line 492, in
main()
File "train_rl.py", line 121, in main
run_training(args)
File "train_rl.py", line 217, in run_training
output, masks, probs = model(input_var)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wym/skipnet-master/cifar/models.py", line 1243, in forward
mask, gprob = self.control(gate_feature)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wym/skipnet-master/cifar/models.py", line 1136, in forward
action = bi_prob.multinomial()
TypeError: multinomial() missing 1 required positional arguments: "num_samples"

Nan is encountered when training imagenet_rnn_gate_rl_50

I have met a problem when training imagenet_rnn_gate_rl_50 using offered pretrained sp model:

10-26-18 02:30:Epoch: [4][4010/5004] Time 1.793 (1.839) Data 0.000 (0.003) Loss nan (nan) Total rewards nan (nan) Prec@1 0.391 (75.964) Prec@5 1.953 (91.181)
10-26-18 02:30:total gate rewards = 2.560
10-26-18 02:30:*** Computation Percentage: 97.532 %
10-26-18 02:30:Epoch: [4][4020/5004] Time 1.754 (1.838) Data 0.000 (0.003) Loss nan (nan) Total rewards nan (nan) Prec@1 0.000 (75.775) Prec@5 0.000 (90.954)

I didn't change any default configure and the loss and rewards became "nan", is there any other matters needing attention. Please help

Thanks,
Willy

RuntimeError is encounted when training cifar10_rnn_gate_rl_38

Get a RuntimeError when training cifar10_rnn_gate_rl_38 :

04-11-19 09:10:start training cifar10_rnn_gate_rl_38
04-11-19 09:10:=> loading checkpoint ./save_checkpoints/cifar10_rnn_gate_38/model_best.pth.tar
04-11-19 09:10:=> loaded checkpoint ./save_checkpoints/cifar10_rnn_gate_38/model_best.pth.tar (iter: 59000)
Files already downloaded and verified
Files already downloaded and verified
start: 0
04-11-19 09:10:Iter [0] learning rate = 0.0001
Traceback (most recent call last):
File "train_rl.py", line 492, in
main()
File "train_rl.py", line 121, in main
run_training(args)
File "train_rl.py", line 235, in run_training
R = r + args.gamma * R
File "/seu_share/home/zhanjun/anaconda3/envs/pytorch0.2/lib/python3.6/site-packages/torch/tensor.py", line 293, in add
return self.add(other)
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:217

I didn't change any default configure, please help. Thanks.

cifar10 rl training

Hi,

I have met a problem on cifar10 training.
It is that when training feedforward_rl_38 and rnn_gate_rl_38, it goes wrong in line 230 with the error 'RuntimeError: differentiating stochastic functions requires providing a reward'. Do you have any idea on this issue?

Thank you

Some question about RL

Thanks for you work. I encountered some problems in the process of reproducing your skipnet again. I faced some problem. Can you give me some advice? After Supervised learning Resnet(with gate), When reinforce learning, the accuracy will sharp decline. when I debug the code, I find gate sample (RL-Policy gradient need this) will make bad influence in all bn layer and backward params update. How do you solve these problem or avoid them.

Accuracy of ResNets baseline model on cifar100?

I'm confused about the accuracy of baseline model on cifar100. In the paper, validate top1 accuracy of ResNet38 is 68.54, ResNet74 is 70.64 and ResNet110 is 71.21. I followed the settings in the paper, but I get a much different result, e.g. 71.11 for ResNet38, 74.26 for ResNet74, 76.34 for ResNet110.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.