snas-series / snas-series Goto Github PK

View Code? Open in Web Editor NEW

145.0 145.0 25.0 3.2 MB

This repo contains the PyTorch implementation of the SNAS-Series papers

License: MIT License

Python 99.24% Shell 0.76%

snas-series's Introduction

SNAS-Series

This contains the PyTorch implementation of the SNAS-Series papers, including

SNAS: Stochastic Neural Architecture Search, ICLR 2019.

By Sirui Xie, Hehui Zheng, Chunxiao Liu, Liang Lin.

Paper-arxiv

Figure: Visualization for forward pass and gradient back-propagation within SNAS.

DSNAS: Direct Neural Architecture Search without Parameter Retraining, CVPR 2020.

By Shoukang Hu*, Sirui Xie*, Hehui Zheng, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin.

Paper-arxiv

Figure: Visualization for forward pass and gradient back-propagation within DSNAS.

ANALYSIS: Understanding the wiring evolution in differentiable neural architecture search, AISTATS 2021.

By Sirui Xie*, Shoukang Hu*, Xinjiang Wang, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin.

Paper-arxiv

Figure: Visualization for cost and gradient back-propagation within cell-based differentiable NAS.

snas-series's People

Contributors

Stargazers

Watchers

snas-series's Issues

Question about two times backward?

Thanks for your publication of the code! This paper is very interesting and I have a question.

As loss = resource_loss.clone() + error_loss.clone(), why does the "loss " have to backward again after the backward of "resource_loss" and "error_loss"?

In my opinion, one-time‘s backward of "resource_loss" and "error_loss" is necessary.

failed to reproduce DSNAS paper performance

Dear author,
thank you for this great work and code.
I tried to reproduce the performance of DSNAS by following the instructions on README (firstly search supernet for 80 epochs, and then continue searching with train_imagenet_child.py)
but failed to reproduce the results. As shown in the attached curve below, I got only 73.57 top-1 accuracy, which is much lower than 74.4 as the paper claims.

questions about equation in paper

Hi, SNAS is really fantastic. I'm following SNAS now. But I cannot understand what the alpha_1^k is in Eq. 28 in the paper. It is grateful if you can explain it to me.

DSNAS: The implementation of dummy 1 gradient seems inconsistent with the formula in the paper

In the code, dummy 1 gradient is computed as:

(error_loss+loss_alpha).backward()           
self.block_reward = self.weights.grad.data.sum(-1)

In the code, error_loss.backward() will accumulate gradient dL/d(dummy 1) in self.weights, also loss_apha.backward() will accumulate extra gradient in self.weights, which is [p_0/p_k, p_1/p_k, p_2/p_k, ..., p_n/p_k] in each row of self.weights, so, self.weight.grad.data.sum(-1) will yield a vector, each item of the vector is dL/d(dummy 1) + 1/p_k.
However, In the paper, the gradient should be dL/d(dummy 1). The extra gradient 1/p_k is quite large, which will affect the reward greatly.

The quoted code
#self.block_reward = torch.autograd.grad(error_loss, self.weights, retain_graph=True, allow_unused=True)
seems the right way to compute dL/d(dummy 1), but it‘s incompatible with pytorch distributed settings.

Is this a bug or are there some other considerations for doing so?

Questions about your early stop strategy during reproducting your results

Hi, I am trying to reproduce your experment result and I just run your step one with both commends:
python -m torch.distributed.launch --nproc_per_node=8 train_imagenet.py
--SinglePath --bn_affine --flops_loss --flops_loss_coef 1e-6 --seed 48 --use_dropout --gen_max_child --early_fix_arch --config configs/SinglePath240epoch_arch_lr_1e-3_decay_0.yaml
--remark 'search_arch_lr_1e-3_decay_0'
and the one with 30 pretrain epochs:
python -m torch.distributed.launch --nproc_per_node=8 train_imagenet.py
--SinglePath --bn_affine --flops_loss --flops_loss_coef 1e-6 --seed 48 --use_dropout --pretrain_epoch 30 --gen_max_child --early_fix_arch --config configs/SinglePath240epoch_arch_lr_1e-3_decay_0.yaml
--remark 'search_arch_lr_1e-3_decay_0'
Both experiments now have complete around 160 epochs, but I can only see 8 of 20 layers structures have met the early stop condition. It's not like what you said in another issue that around training after 80 epochs, almost all the structures are fixed according to the early stop stragety.
Can you help me with this?
Thx~

Implementation on Pytorch 1.x

Thanks for your great works.
However, will you plan to create a branch for PyTorch 1.x ?
Or could you please provide guidance for users to upgrade the code for PyTorch 1.x by themselves?
Thanks in advance.

Implementation ERROR when you save alpha_log

Dear auther,
When I try to impletement your method in my own task, I realize you save the model with log_alpha after each epoch finished and you only reset the log alpha with the early stop dictionary at the start of each epcoh. However, the alpha log is changed a lot after you train the whole epoch, and the early stop fixed layers' index may not be consistent with the log alpha after you train the whole epoch. IT means you save the wrong log alpha in your first step and thus choose the wrong path to train in you second step.
Hope to hear from you soon.

Question about policy gradient

I am curious about that whether eq.7 in paper is lack of a minus. Because author said the reward is eq.8 and the policy gradient is reward * gradient.

Wonder how to draw such a visualization gif

The visualization gif is really cool!

Could you tell me what software or method you use to make it?

Thank you!

Some questions when reproducing the results in the DSNAS paper

Dear Shoukang,
thanks for your great job and I can reproduce the results in the DSNAS paper following the provided steps. But I have some questions about the codes, as listed below:

When directly training the DSNAS for 240 epochs, I can only get [email protected] that has a gap with the final result ([email protected]).
Why you suggest early-stop epocs equaling 80? That does not agree with the formula-(8) in the paper.
How do you set the threshold h (in formula-(8)) in the paper? And I can't obtain the log_alpha closed to one-hot vectors, e.g. [0.5588, 0.2505, 0.1286, 0.0622], [0.3196, 0.2729, 0.3506, 0.0570].
The second step that continuing the searching stage from epoch 80 seems to only finetune the fixed architecture obtained from weights = torch.zeros_like(log_alpha).scatter_(1, torch.argmax(log_alpha, dim = -1).view(-1,1), 1) in train_imagenet_child.py and the log_alpha will not change, which violates the concept of end-to-end.

Looking forward to your replies.
Thanks a lot!

snas-series / snas-series Goto Github PK

snas-series's Introduction

SNAS-Series

snas-series's People

Contributors

Stargazers

Watchers

Forkers

snas-series's Issues

Recommend Projects

Recommend Topics

Recommend Org