Giter Club home page Giter Club logo

autodl-projects's Introduction

Hi there 👋

Anurag's github stats

autodl-projects's People

Contributors

ain-soph avatar cclauss avatar d-x-y avatar gongxinyuu avatar liulu112601 avatar priyanshu95663 avatar yongsubaek avatar yulv-git avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autodl-projects's Issues

Some doubts about the architecture sampling

Hi, thanks for your great work. After reading your paper, I have some doubts.

  1. 1 or n architectures are sampled when using n GPUs for searching?
  2. The o_k sampled from Gumbel(0,1) has a range of known values, but the A^k_{i,h} may vary a lot when adapting different wd. So what's the best range for A?

Thanks.

question about search model code?

Hi Buddy, Thanks for your great work!
Do you plan to share your search model code ? and what time about it? looking forward to your answer !

Does this code support multi-gpus?

When applying the gumbel softmax trick to multi-gpus. the gumbel-softmax needs to sample noise from gaussian distribution. But if we write the noise sampling code in the forward() function, it may occurs that different gpu sample a different noise and activate different path. Do you considerate this case in your code?

Darts models (V1, V2) are the same?

Describe the bug
A clear and concise description of what the bug is.

If you look at the V1 and V2 of your DARTS implementation, it seems like they are exactly the same. Is this intended?

V1 and V2

Training process of searched architecture

Hi, I read the codes from DARTS firstly and I didn't find the part of training the architecture in your codes, eg: the gumbel trick in your paper. Look forward to your replying!

A question about the weighted-sum in GDAS

Hi, thanks for your great work. But I have a question about the weighted-sum in GDAS.
In lib/models/cell_searchs/search_cells.py, line58, the weighted-sum is calculated as follows:
weigsum = sum( weights[_ie] * edge(nodes[j]) if _ie == argmaxs else weights[_ie] for _ie, edge in enumerate(self.edges[node_str]) )
I don't get it why we need to add the weights[_ie] when _ie != argmaxs. I think the weigsum = weights[_ie] * edge(nodes[j]) where _ie == argmaxs can be fine. Is there any difference that I don't notice?

Concern about CWI (channel-wise interpolation) in TAS.

Hi, thanks for your great work.

After reading your paper TAS, I don't understand the channel-wise interpolation. Could you explain or give an insight on channel-wise interpolation (i.e., making weighted sum on feature maps of different sizes), and the corresponding effects.

Thanks.

在单个10G大小的1080TiGPU上运行时内存溢出的问题

输出:
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=2 : out of memory Traceback (most recent call last): File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_base.py", line 89, in <module> main() File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_base.py", line 84, in main main_procedure(config, args.dataset, args.data_path, args, genotype, args.init_channels, args.layers, None, log) File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_utils.py", line 96, in main_procedure train_acc1, train_acc5, train_los = _train(train_loader, model, criterion, optimizer, 'train', epoch, config, args.print_freq, log) File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_utils.py", line 129, in _train for i, (inputs, targets) in enumerate(xloader): File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__ return self._process_next_batch(batch) File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 178, in _pin_memory_loop batch = pin_memory_batch(batch) File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 245, in pin_memory_batch return [pin_memory_batch(sample) for sample in batch] File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 245, in <listcomp> return [pin_memory_batch(sample) for sample in batch] File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 239, in pin_memory_batch return batch.pin_memory() RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/THCCachingHostAllocator.cpp:265

ImportError: cannot import name 'load_config'

Hi, Thank you for the nice code. But I got a problem when I run the script.

Traceback (most recent call last):
  File "./exps/search-shape.py", line 14, in <module>
    from config_utils import load_config, configure2str, obtain_search_single_args as obtain_args
ImportError: cannot import name 'load_config'

Thanks if you could help.

Training Algorithm while Searching

First of all, thanks for your great work!

However, I have some questions about training algorithm while searching. Since in back-propagation, soft max function is applied, will the gradients generated in one path (arg max function) also update the weights in other paths? Or the gradients will only update the weights on the sampled path?

Another question is about a description in Sec 3.2 Acceleration from the paper,
"Within one training batch, each sample produces a different hi,j , and, therefore, each element in Ai,j has a high possibility of being updated with gradients"
In my understanding, if each sample in the mini-batch sampled a different architecture, it will performs like batch size is 1.

Thanks again!

What is the usage of 'value' in genotype?

In the genotype of GDAS_GF. There are three items for one op in a cell, which are (op, index, value).
For example
('skip_connect', 0, 0.13017432391643524)

What's the usage for the 'value'? Thanks.

Can not find the papar 'GDAS'

Sorry to trouble you, But I have a little confusion about the papar GDAS. I find that I can't find the paper " Gradient-based search using Differentiable Architecture Sampler" anymore, Is it the same paper as "Searching for A Robust Neural Architecture in Four GPU Hours"?

How you find the reduction cell?

Hello,

I'm kind of interested in how you decide the fixed reduction cell. Do you test different reduction cells? How are the performances of those cells?

Thank you!

How to show the network structure?

Thanks for your work.

I have trained the cifar10 by GDAS_F1,I want to know the network structure which I get by searching. How can I get it?

thank you.

Where is cifar10-0.5.pth?

Hi, when I run "CUDA_VISIBLE_DEVICES=0,1 bash ./scripts-search/search-depth-gumbel.sh cifar10 ResNet110 CIFARX 0.57 -1", I got the following error:

Traceback (most recent call last):
File "./exps/search-shape.py", line 202, in
main(args)
File "./exps/search-shape.py", line 40, in main
assert split_file_path.exists(), '{:} does not exist'.format(split_file_path)
AssertionError: .latent-data/splits/cifar10-0.5.pth does not exist

However, I didn't find the file in the Google Driver URL, is it renamed to another file?
Thank you!

Question regarding running given algorithms on Nasbench102

Thank you very much for this remarkable contribution.
I am trying to run the given algorithms on Nasbench102, I noticed that data path has to be given, why is it needed since the benchmark dataset already has all the evaluation results? What should I do if I just want to test those algorithms on Nasbench102?
Thank you very much!

Typo

Neural instead of Nueral in the title

How to train the model I searched?

Thanks for your excellent works.

In SETN, as a newcomer to NAS, I am confused that I can't find a way to train the model I have searched. I run the scripts of training and searching respectively, which just train the default searched model instead of my searched model.

Thank you.

Reproduce the searched results for GDAS

Thanks for your awesome works!

The script for searching GDAS on a small search space (NAS-Bench-102) is given in README. If I want to reproduce the searched results in GDAS paper, is there a script available for searching on the full search space as well as the same network setting in the paper? I've noticed that the "search_space_name" in GDAS.sh can be changed to "full", however, the search code is based on NAS-Bench-102 settings. Or actually it doesn't matter and switching "search_space_name" to "full" is all I need?

Thank you.

Regarding Gumbel-Softmax

Hi there,

Just wondering how was the gumbel-softmax implemented in your scripts. Was a hard one-hot sample generated at each iteration of the searching phase, or a soft one? I noticed there's an option in the torch.nn.functional.gumbel_softmax() of setting 'hard' to true to generate a one-hot sample. Was this the approach that you took?

Many thanks,

X

setname: x-valid and ori-test

In order to use the .get_metrics() API, I need to put in the name of the dataset as well as the "setname". What is this exactly?

There seem to be 'train', 'x-valid', 'x-test' and 'ori-test, but there doesn't seem to be any documentation on what they are.

I would like to get the test performance of an architecture on all tested datasets (that is cifar10, cifar100 and imagenet16-120). Which setnames should I then use?

Question about NAS Acceleration

Hi! Thanks for your good job!

I have some questions about the Fast Acceleration version. In the main paper, it is said to adopt Gumbel Softmax with temperature to relax argmax function during the searching process. But in "model_search.py", I found it using the normal version which calculates all ops' probs (same with that in DARTS).

Is it something wrong or just my fault?

The details of "samples one sub-graph at one training iteration"

Can you talk about the details of "samples one sub-graph at one training iteration"?

As far as I know, the result of Gumbel Softmax may not be a one hot vector. It may be a vector like [0.96, 0.01, 0.01, 0.01, 0.01].

When you sample one sub-graph at training, do you just drop all the connections with weights 0.01?

Thanks.

Does TAS apply weight sharing?

Hi, Thank you for your excellent work!
While searching for width, I wonder whether the operations with different channel numbers share the weight from the one-shot network? If yes, how does the weight in the one-shot network get updated, since it has K(K=2 in your framework figure) gradients separately from the K sampled operation?
Btw, "When τ-> 0, p^ = [^ p1; :::; p^j; :::] becomes one-shot, and the Gumbel-softmax distribution...", do you mean "one-hot" instead of "one-shot"?

questions about forward propagation and backward propagation

Thank you for making the search code publicly accessible, it is really a good material for Nas researchers. After reading the paper and reviewing the search code, there a few points which I may not clearly understand.

In the paper , you mentioned: since Eq. (3) needs to sample from a discrete probability distribution, we cannot back-propagate gradients. To allow back-propagation, we use the Gumbel-Max trick.
During the acceleration part: During the backward procedure, we only back-propagate the gradient generated at the argmax.

First, set F1, F2, F3, F4 are functions between two nodes, the corresponding arch param are a1, a2, a3, a4, and their gumbel softmax are p1, p2, p3, p4.
During forward, we sample the index with max prob, and get a one-hot vector with code:
hardwts = one_h - probs.detach() + probs
assume the one-hot arch weights are w1, w2, w3,w4, set the argmax index is 2.
the forward code is as follow:
weigsum = sum( weights[_ie] * edge(nodes[j]) if _ie == argmaxs else weights[_ie] for _ie, edge in enumerate(self.edges[node_str]) )
according the code, the forward result is like:
weightsum = w1 + w2*F2+w3+w4

q1: Is the code acceleration version or not?
According to the acceleration part in you paper, only need to backprop to argmax, which means only backprop to F2, and F1, F3, F4 are ignored? For the arch param, only backprop to w2, and w1, w3, w4 are ignored? From the code, it seems w1, w3, w4 are also backprop.

q2. Only forward argmax, but from the code, w1, w3, w4 are also added to the weightsum. Even though their values are zeros, which is the same to only calculate w2F2, what's the purpose to add w1, w3, w4? What will happen if we only calculate weightsum as w2F2?

q3. If the soft gumbel softmax is applied rather than the one-hot one, can we still calculate
weightsum as w1 + w2F2+w3+w4? I think maybe not, because w and wF can be different order of magnitude.

These questions confused me a lot, it will be really helpful if you can kindly give me some suggestion. Thank you!

how do I search from scratch?

That's a very nice paper and thanks for the repo. Training scripts on searched architectures just worked, however, I have been unable to figure out which scripts to use to search the cells from scratch. Thank you.

Why retrain the last searched config of architecture by default instead of the best?

Hi Xuanyi, thanks for your excellent work and your code is really convenient to use!
I noticed that in the scripts-search folder of TAS, the scripts use seed-${rseed}-last.config to retrain the last searched architecture by default, but I am a little confused why not use seed-${rseed}-best.config to retrain the best searched architecture, which I think may has better performance?
Which searched architecture did the result reported in your paper retrain, the last or best?
Thank you!

Optimizing the searched network

Thanks for your excellent work!

I would like to ask two questions about optimizing the searched network and applying KD algorithm.

First. In my understanding, the searched network is randomly initialized and optimized to learn from the un-pruned network. Why doesn't it be initialized with the weights that are already trained in the un-pruned network and further fine-tuned, as in the three-stage pruning paradigm (training a large network, pruning, re-training)? Please point me out if I misunderstood.

Second. I believe that KD algorithm has been slightly modified to suit the case. Especially in Eq. (9), it looks like a cross entropy term in the form of "-sigma(P(x)log(Q(x)))". In my understanding, the distribution P is the true distribution and the distribution Q is the estimated distribution. Therefore, I wonder if there is a mistake in Eq. (9), which I suggest that it be "-sigma(P(z_hat)log(Q(z)))" instead of "-sigma(P(z)log(Q(z_hat)))".

Thanks for your attention.

Missing Structure object?

In file search_model_gdas.py in lib/models/cell_searchs package.

from .genotypes import Structure

I can not find the Structure object. Could you share this file?

The search procedure is slow

Hi @D-X-Y , thanks for sharing your excellent work. I found your work is very interesting, and I want to try your code.

However, I found the search is not as fast as the paper said. Specifically, I run the command
"CUDA_VISIBLE_DEVICES=0 bash ./scripts-cnn/train-cifar.sh GDAS_F1 cifar10 cut"
on the PC with a Nvidia TITAN X (Pascal) 12GB, and it eventually takes 27 hours to find a network with the top-1 error rate of 3.31. The following are some logs:

 train[2019-05-13-06:15:55] Epoch: [486][000/521] Time 0.62 (0.62) Data 0.26 (0.26) Loss 0.033 (0.033)  Prec@1 100.00 (100.00) Prec@5 100.00 (100.00)
 train[2019-05-13-06:16:31] Epoch: [486][100/521] Time 0.36 (0.37) Data 0.00 (0.00) Loss 0.129 (0.083)  Prec@1 96.88 (98.62) Prec@5 100.00 (100.00)
 train[2019-05-13-06:17:08] Epoch: [486][200/521] Time 0.38 (0.37) Data 0.00 (0.00) Loss 0.142 (0.084)  Prec@1 95.83 (98.57) Prec@5 100.00 (100.00)
 train[2019-05-13-06:17:45] Epoch: [486][300/521] Time 0.38 (0.37) Data 0.00 (0.00) Loss 0.024 (0.084)  Prec@1 100.00 (98.58) Prec@5 100.00 (100.00)
 train[2019-05-13-06:18:22] Epoch: [486][400/521] Time 0.37 (0.37) Data 0.00 (0.00) Loss 0.057 (0.085)  Prec@1 100.00 (98.54) Prec@5 100.00 (99.99)
 train[2019-05-13-06:19:00] Epoch: [486][500/521] Time 0.36 (0.37) Data 0.00 (0.00) Loss 0.098 (0.087)  Prec@1 95.83 (98.46) Prec@5 100.00 (99.99)
 train[2019-05-13-06:19:07] Epoch: [486][520/521] Time 0.32 (0.37) Data 0.00 (0.00) Loss 0.084 (0.087)  Prec@1 98.75 (98.45) Prec@5 100.00 (99.99)
[2019-05-13-06:19:07] **train** Prec@1 98.45 Prec@5 99.99 Error@1 1.55 Error@5 0.01 Loss:0.087
 test [2019-05-13-06:19:07] Epoch: [486][000/105] Time 0.34 (0.34) Data 0.26 (0.26) Loss 0.104 (0.104)  Prec@1 96.88 (96.88) Prec@5 100.00 (100.00)
 test [2019-05-13-06:19:16] Epoch: [486][100/105] Time 0.09 (0.09) Data 0.00 (0.00) Loss 0.032 (0.139)  Prec@1 98.96 (96.49) Prec@5 100.00 (99.94)
 test [2019-05-13-06:19:16] Epoch: [486][104/105] Time 0.03 (0.09) Data 0.00 (0.00) Loss 0.048 (0.140)  Prec@1 93.75 (96.44) Prec@5 100.00 (99.93)
[2019-05-13-06:19:16] **test** Prec@1 96.44 Prec@5 99.93 Error@1 3.56 Error@5 0.07 Loss:0.140
----> Best Accuracy : Acc@1=96.69, Acc@5=99.93, Error@1=3.31, Error@5=0.07
----> Save into ./output/NAS-CNN/GDAS_F1-cifar10-cut-E600/seed-166-checkpoint-cifar10-model.pth

==>>[2019-05-13-06:19:17] [Epoch=487/600] [Need: 06:21:40] LR=0.0022 ~ 0.0022, Batch=96
 train[2019-05-13-06:19:17] Epoch: [487][000/521] Time 0.69 (0.69) Data 0.31 (0.31) Loss 0.071 (0.071)  Prec@1 98.96 (98.96) Prec@5 100.00 (100.00)
 train[2019-05-13-06:19:54] Epoch: [487][100/521] Time 0.37 (0.37) Data 0.00 (0.00) Loss 0.069 (0.087)  Prec@1 97.92 (98.39) Prec@5 100.00 (99.99)
 train[2019-05-13-06:20:31] Epoch: [487][200/521] Time 0.36 (0.37) Data 0.00 (0.00) Loss 0.125 (0.086)  Prec@1 96.88 (98.45) Prec@5 100.00 (99.99)
 train[2019-05-13-06:21:08] Epoch: [487][300/521] Time 0.37 (0.37) Data 0.00 (0.00) Loss 0.212 (0.086)  Prec@1 94.79 (98.49) Prec@5 100.00 (99.99)
 train[2019-05-13-06:21:45] Epoch: [487][400/521] Time 0.36 (0.37) Data 0.00 (0.00) Loss 0.090 (0.087)  Prec@1 98.96 (98.49) Prec@5 100.00 (99.99)
 train[2019-05-13-06:22:22] Epoch: [487][500/521] Time 0.38 (0.37) Data 0.00 (0.00) Loss 0.150 (0.088)  Prec@1 96.88 (98.44) Prec@5 100.00 (99.99)
 train[2019-05-13-06:22:29] Epoch: [487][520/521] Time 0.31 (0.37) Data 0.00 (0.00) Loss 0.078 (0.088)  Prec@1 97.50 (98.46) Prec@5 100.00 (99.99)
[2019-05-13-06:22:29] **train** Prec@1 98.46 Prec@5 99.99 Error@1 1.54 Error@5 0.01 Loss:0.088
 test [2019-05-13-06:22:29] Epoch: [487][000/105] Time 0.32 (0.32) Data 0.24 (0.24) Loss 0.059 (0.059)  Prec@1 96.88 (96.88) Prec@5 100.00 (100.00)
 test [2019-05-13-06:22:38] Epoch: [487][100/105] Time 0.09 (0.09) Data 0.00 (0.00) Loss 0.029 (0.141)  Prec@1 98.96 (96.42) Prec@5 100.00 (99.93)
 test [2019-05-13-06:22:38] Epoch: [487][104/105] Time 0.03 (0.09) Data 0.00 (0.00) Loss 0.001 (0.140)  Prec@1 100.00 (96.43) Prec@5 100.00 (99.93)
[2019-05-13-06:22:38] **test** Prec@1 96.43 Prec@5 99.93 Error@1 3.57 Error@5 0.07 Loss:0.140
----> Best Accuracy : Acc@1=96.69, Acc@5=99.93, Error@1=3.31, Error@5=0.07
----> Save into ./output/NAS-CNN/GDAS_F1-cifar10-cut-E600/seed-166-checkpoint-cifar10-model.pth


Since I am fresh to NAS,  I cannot figure out what I have missed? 
Any suggestions/help are greatly appreciated! Thanks.

What's the usage 'index' in genotypes?

In the genotype of GDAS_GF. There are three items for one op in a cell, which are (op, index, value).
For example
('skip_connect', 0, 0.13017432391643524)

What's the usage for the 'index'? Thanks.

Custom training data

@D-X-Y thank you for your hard work

How can I train on my own dataset, with a simple structure of:

  • input original images
  • input mask images

Is there any pretrained model after searching for width or depth released?

Hi, I check the .pth files in the Google Driver URL(like files in the basic-results directory), and find that they have weights with the same shape and depth with the standard model. So I guess that they are models before searching. Does I mistake it or is there any pretrained model after searching for width depth released?
Thank you for your help!

Can you release the searched model of ImageNet ?

I want to use your searched model for study, can you release the finnal model checkpoints of ImageNet in Table 3 of your paper, or the searched model configirations, both will be ok, thanks !

Does the affine parameter of BN have effect in search phase?

Thanks for your impressive work and the released code!
I saw that in DARTS, BN is not learnable in the search phase. And the authors claim that

Learnable affine parameters in all batch normalizations are disabled during the search process to avoid rescaling the outputs of the candidate operations.

In contrast, BN is set to be learnable by default in the search phase of GDAS (if I didn't miss some important points).
Does the affine parameter have some effect on the search phase? Could you give me some hints?
Thanks in advance!

What's the difference from DARTS?

Thanks for sharing the code.

I have a question about the implementation difference from DARTS. The training code looks like very similar to DARTS(https://github.com/quark0/darts).

As you mentioned in the paper,
"2. Instead of using the whole DAG, GDAS samples one sub-graph at one training iteration, accelerating the searching procedure. Besides, the sampling in GDAS is learnable and contributes to finding a better cell."

But in the forward function of MixedOp, the output is just the weighted sum of all ops, same as DARTS.

def forward(self, x, weights): return sum(w * op(x) for w, op in zip(weights, self._ops))

So, can you point out the code that "samples one sub-graph at one training iteration"? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.