ucbdrive / skipnet Goto Github PK

View Code? Open in Web Editor NEW

233.0 233.0 47.0 1.68 MB

Code for SkipNet: Learning Dynamic Routing in Convolutional Networks (ECCV 2018)

Python 100.00%

skipnet's People

Contributors

Stargazers

Watchers

skipnet's Issues

Typo in forward of ResNetRecurrentGateSP

There is a typo in the forward method of ResNetRecurrentGateSP (could be elsewhere as well, haven't checked) under CIFAR, grob instead of gprob. Wrong values are returned up to the top function of the training loop, but thankfully they are not used.

 for g in range(3):
            for i in range(0 + int(g == 0), self.num_layers[g]):
                ...
                mask, grob = self.control(gate_feature)
                gprobs.append(gprob)
                ...

about the inference speed

Hi, thank you for your work. I encountered a problem and hope to get your help.
I first used the supervised pretraining (SP) part of SkipNet to train the model, and got a new model, but when I used PYTorch for the forward speed test, I found that the speed did not decrease. I tried to increase the https:/ /github.com/ucbdrive/skipnet/blob/master/imagenet/models.py#L278, the parameter value here is 0.5 to 0.9. It is found that the accuracy of the model on the test set is indeed reduced a lot, but the forward speed is still not improved, almost the same. Do I need to modify the model in inference?
I am very confused about where the problem is, I hope to get your help, thank you very much.

Confused about version of PyTorch

Hi,

Thanks for the excellent research and the readable code.

I am trying to run this repo with PyTorch 1.0, and get confusion about the version number.
As written in README, the PyTorch 2.0 is needed.
However, the newest stable version of PyTorch is 1.0.1, why and what is PyTorch 2.0?

Meanwhile, I still get an error about multinomial() in RLFeedforwardGateI.
I checked the PyTorch docs from 0.1.0 to 0.4.1, multinomial() function needs a parameter called num_samples which is not optional.
And the code in RLFeedforwardGateI was written like this:

    if self.training:
        action = softmax.multinomial()
        self.saved_action = action

without any parameter, and I got an error:

TypeError: multinomial() missing 1 required positional arguments: "num_samples"

I guess there should be

        action = softmax.multinomial(num_samples=1)

Thanks in advance.

bug when I train cifar10_rnn_gate_rl_38

Excuse me,I encountered the bug.Can you run the code using the command"python3 train_rl.py train cifar10_rnn_gate_rl_38 --resume resnet-38-rnn-sp-cifar10.pth.tar -d cifar10 --gate-type rnn
" normally?

The bug information lists as follow.
Traceback (most recent call last):
File "train_rl.py", line 492, in
main()
File "train_rl.py", line 121, in main
run_training(args)
File "train_rl.py", line 217, in run_training
output, masks, probs = model(input_var)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wym/skipnet-master/cifar/models.py", line 1243, in forward
mask, gprob = self.control(gate_feature)
File "/home/wym/anaconda3/envs/python_auto/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wym/skipnet-master/cifar/models.py", line 1136, in forward
action = bi_prob.multinomial()
TypeError: multinomial() missing 1 required positional arguments: "num_samples"

Nan is encountered when training imagenet_rnn_gate_rl_50

I have met a problem when training imagenet_rnn_gate_rl_50 using offered pretrained sp model:

10-26-18 02:30:Epoch: [4][4010/5004] Time 1.793 (1.839) Data 0.000 (0.003) Loss nan (nan) Total rewards nan (nan) Prec@1 0.391 (75.964) Prec@5 1.953 (91.181)
10-26-18 02:30:total gate rewards = 2.560
10-26-18 02:30:*** Computation Percentage: 97.532 %
10-26-18 02:30:Epoch: [4][4020/5004] Time 1.754 (1.838) Data 0.000 (0.003) Loss nan (nan) Total rewards nan (nan) Prec@1 0.000 (75.775) Prec@5 0.000 (90.954)

I didn't change any default configure and the loss and rewards became "nan", is there any other matters needing attention. Please help

Thanks,
Willy

RuntimeError is encounted when training cifar10_rnn_gate_rl_38

Get a RuntimeError when training cifar10_rnn_gate_rl_38 :

04-11-19 09:10:start training cifar10_rnn_gate_rl_38
04-11-19 09:10:=> loading checkpoint ./save_checkpoints/cifar10_rnn_gate_38/model_best.pth.tar
04-11-19 09:10:=> loaded checkpoint ./save_checkpoints/cifar10_rnn_gate_38/model_best.pth.tar (iter: 59000)
Files already downloaded and verified
Files already downloaded and verified
start: 0
04-11-19 09:10:Iter [0] learning rate = 0.0001
Traceback (most recent call last):
File "train_rl.py", line 492, in
main()
File "train_rl.py", line 121, in main
run_training(args)
File "train_rl.py", line 235, in run_training
R = r + args.gamma * R
File "/seu_share/home/zhanjun/anaconda3/envs/pytorch0.2/lib/python3.6/site-packages/torch/tensor.py", line 293, in add
return self.add(other)
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:217

I didn't change any default configure, please help. Thanks.

cifar10 rl training

Hi,

I have met a problem on cifar10 training.
It is that when training feedforward_rl_38 and rnn_gate_rl_38, it goes wrong in line 230 with the error 'RuntimeError: differentiating stochastic functions requires providing a reward'. Do you have any idea on this issue?

Thank you

Some question about RL

Thanks for you work. I encountered some problems in the process of reproducing your skipnet again. I faced some problem. Can you give me some advice? After Supervised learning Resnet(with gate), When reinforce learning, the accuracy will sharp decline. when I debug the code, I find gate sample (RL-Policy gradient need this) will make bad influence in all bn layer and backward params update. How do you solve these problem or avoid them.

Pretrained model link not working

Hi, I am interested in this work and running an experiment with the imagenet model checkpoints. It looks like all the model checkpoint links in the Imagenet folder do not work such as: http://people.eecs.berkeley.edu/~xinw/skipnet/resnet-101-rnn-sp-imagenet.pth.tar. I would really appreciate it if you can provide me with the updated links to the checkpoints. Thanks!

what is the folder structure of imagenet ?

I want to tran my dataset. could you offer my a link for imageNet or the folder structure?

Accuracy of ResNets baseline model on cifar100?

I'm confused about the accuracy of baseline model on cifar100. In the paper, validate top1 accuracy of ResNet38 is 68.54, ResNet74 is 70.64 and ResNet110 is 71.21. I followed the settings in the paper, but I get a much different result, e.g. 71.11 for ResNet38, 74.26 for ResNet74, 76.34 for ResNet110.

Typo in skipnet/imagenet/models.py

line 402: mask, grob = self.control(gate_feature)
"grob" should be "gprob" instead. Will this impact the claims from the paper?

ucbdrive / skipnet Goto Github PK

skipnet's People

Contributors

Stargazers

Watchers

Forkers

skipnet's Issues

Recommend Projects

Recommend Topics

Recommend Org