Giter Club home page Giter Club logo

snas-series's Introduction

SNAS-Series

This contains the PyTorch implementation of the SNAS-Series papers, including

SNAS: Stochastic Neural Architecture Search, ICLR 2019.

By Sirui Xie, Hehui Zheng, Chunxiao Liu, Liang Lin.

Paper-arxiv

snas

Figure: Visualization for forward pass and gradient back-propagation within SNAS.

DSNAS: Direct Neural Architecture Search without Parameter Retraining, CVPR 2020.

By Shoukang Hu*, Sirui Xie*, Hehui Zheng, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin.

Paper-arxiv

dsnas

Figure: Visualization for forward pass and gradient back-propagation within DSNAS.

ANALYSIS: Understanding the wiring evolution in differentiable neural architecture search, AISTATS 2021.

By Sirui Xie*, Shoukang Hu*, Xinjiang Wang, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin.

Paper-arxiv

dsnas

Figure: Visualization for cost and gradient back-propagation within cell-based differentiable NAS.

snas-series's People

Contributors

haydenz avatar hehuizheng avatar skhu101 avatar snas-series avatar srxie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

snas-series's Issues

Question about two times backward?

Thanks for your publication of the code! This paper is very interesting and I have a question.

loss

As loss = resource_loss.clone() + error_loss.clone(), why does the "loss " have to backward again after the backward of "resource_loss" and "error_loss"?

In my opinion, one-time‘s backward of "resource_loss" and "error_loss" is necessary.

failed to reproduce DSNAS paper performance

Dear author,
thank you for this great work and code.
I tried to reproduce the performance of DSNAS by following the instructions on README (firstly search supernet for 80 epochs, and then continue searching with train_imagenet_child.py)
but failed to reproduce the results. As shown in the attached curve below, I got only 73.57 top-1 accuracy, which is much lower than 74.4 as the paper claims.
image

questions about equation in paper

Hi, SNAS is really fantastic. I'm following SNAS now. But I cannot understand what the alpha_1^k is in Eq. 28 in the paper. It is grateful if you can explain it to me.
21641614921875_ pic_hd

DSNAS: The implementation of dummy 1 gradient seems inconsistent with the formula in the paper

In the code, dummy 1 gradient is computed as:

(error_loss+loss_alpha).backward()           
self.block_reward = self.weights.grad.data.sum(-1)

In the code, error_loss.backward() will accumulate gradient dL/d(dummy 1) in self.weights, also loss_apha.backward() will accumulate extra gradient in self.weights, which is [p_0/p_k, p_1/p_k, p_2/p_k, ..., p_n/p_k] in each row of self.weights, so, self.weight.grad.data.sum(-1) will yield a vector, each item of the vector is dL/d(dummy 1) + 1/p_k.
However, In the paper, the gradient should be dL/d(dummy 1). The extra gradient 1/p_k is quite large, which will affect the reward greatly.

The quoted code
#self.block_reward = torch.autograd.grad(error_loss, self.weights, retain_graph=True, allow_unused=True)
seems the right way to compute dL/d(dummy 1), but it‘s incompatible with pytorch distributed settings.

Is this a bug or are there some other considerations for doing so?

Questions about your early stop strategy during reproducting your results

Hi, I am trying to reproduce your experment result and I just run your step one with both commends:
python -m torch.distributed.launch --nproc_per_node=8 train_imagenet.py
--SinglePath --bn_affine --flops_loss --flops_loss_coef 1e-6 --seed 48 --use_dropout --gen_max_child --early_fix_arch --config configs/SinglePath240epoch_arch_lr_1e-3_decay_0.yaml
--remark 'search_arch_lr_1e-3_decay_0'
and the one with 30 pretrain epochs:
python -m torch.distributed.launch --nproc_per_node=8 train_imagenet.py
--SinglePath --bn_affine --flops_loss --flops_loss_coef 1e-6 --seed 48 --use_dropout --pretrain_epoch 30 --gen_max_child --early_fix_arch --config configs/SinglePath240epoch_arch_lr_1e-3_decay_0.yaml
--remark 'search_arch_lr_1e-3_decay_0'
Both experiments now have complete around 160 epochs, but I can only see 8 of 20 layers structures have met the early stop condition. It's not like what you said in another issue that around training after 80 epochs, almost all the structures are fixed according to the early stop stragety.
Can you help me with this?
Thx~

Implementation on Pytorch 1.x

Thanks for your great works.
However, will you plan to create a branch for PyTorch 1.x ?
Or could you please provide guidance for users to upgrade the code for PyTorch 1.x by themselves?
Thanks in advance.

Implementation ERROR when you save alpha_log

Dear auther,
When I try to impletement your method in my own task, I realize you save the model with log_alpha after each epoch finished and you only reset the log alpha with the early stop dictionary at the start of each epcoh. However, the alpha log is changed a lot after you train the whole epoch, and the early stop fixed layers' index may not be consistent with the log alpha after you train the whole epoch. IT means you save the wrong log alpha in your first step and thus choose the wrong path to train in you second step.
Hope to hear from you soon.

Question about policy gradient

381621477147_ pic_hd
39381621476849_ pic_hd
I am curious about that whether eq.7 in paper is lack of a minus. Because author said the reward is eq.8 and the policy gradient is reward * gradient.

Some questions when reproducing the results in the DSNAS paper

Dear Shoukang,
thanks for your great job and I can reproduce the results in the DSNAS paper following the provided steps. But I have some questions about the codes, as listed below:

  1. When directly training the DSNAS for 240 epochs, I can only get [email protected] that has a gap with the final result ([email protected]).
  2. Why you suggest early-stop epocs equaling 80? That does not agree with the formula-(8) in the paper.
  3. How do you set the threshold h (in formula-(8)) in the paper? And I can't obtain the log_alpha closed to one-hot vectors, e.g. [0.5588, 0.2505, 0.1286, 0.0622], [0.3196, 0.2729, 0.3506, 0.0570].
  4. The second step that continuing the searching stage from epoch 80 seems to only finetune the fixed architecture obtained from weights = torch.zeros_like(log_alpha).scatter_(1, torch.argmax(log_alpha, dim = -1).view(-1,1), 1) in train_imagenet_child.py and the log_alpha will not change, which violates the concept of end-to-end.

Looking forward to your replies.
Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.