d-x-y / autodl-projects Goto Github PK
View Code? Open in Web Editor NEWAutomated deep learning algorithms implemented in PyTorch.
License: MIT License
Automated deep learning algorithms implemented in PyTorch.
License: MIT License
Hi, thanks for your great work. After reading your paper, I have some doubts.
Thanks.
I wanna search model in my own dataset, wish the author to release the code when U R convenient. Best wishes.
Hi Buddy, Thanks for your great work!
Do you plan to share your search model code ? and what time about it? looking forward to your answer !
When applying the gumbel softmax trick to multi-gpus. the gumbel-softmax needs to sample noise from gaussian distribution. But if we write the noise sampling code in the forward()
function, it may occurs that different gpu sample a different noise and activate different path. Do you considerate this case in your code?
Hi, I read the codes from DARTS firstly and I didn't find the part of training the architecture in your codes, eg: the gumbel trick in your paper. Look forward to your replying!
I use UDA_VISIBLE_DEVICES=0 bash ./scripts-cnn/train-cifar.sh GDAS_FG cifar10 cut
,
get Must set TORCH_HOME envoriment variable for data dir saving
Hi, thanks for your great work. But I have a question about the weighted-sum in GDAS.
In lib/models/cell_searchs/search_cells.py, line58
, the weighted-sum is calculated as follows:
weigsum = sum( weights[_ie] * edge(nodes[j]) if _ie == argmaxs else weights[_ie] for _ie, edge in enumerate(self.edges[node_str]) )
I don't get it why we need to add the weights[_ie]
when _ie != argmaxs
. I think the weigsum = weights[_ie] * edge(nodes[j])
where _ie == argmaxs
can be fine. Is there any difference that I don't notice?
Hi, thanks for your great work.
After reading your paper TAS, I don't understand the channel-wise interpolation. Could you explain or give an insight on channel-wise interpolation (i.e., making weighted sum on feature maps of different sizes), and the corresponding effects.
Thanks.
输出:
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=2 : out of memory Traceback (most recent call last): File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_base.py", line 89, in <module> main() File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_base.py", line 84, in main main_procedure(config, args.dataset, args.data_path, args, genotype, args.init_channels, args.layers, None, log) File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_utils.py", line 96, in main_procedure train_acc1, train_acc5, train_los = _train(train_loader, model, criterion, optimizer, 'train', epoch, config, args.print_freq, log) File "/media/tcl2/lilei/workspace/GDAS/exps-cnn/train_utils.py", line 129, in _train for i, (inputs, targets) in enumerate(xloader): File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__ return self._process_next_batch(batch) File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 178, in _pin_memory_loop batch = pin_memory_batch(batch) File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 245, in pin_memory_batch return [pin_memory_batch(sample) for sample in batch] File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 245, in <listcomp> return [pin_memory_batch(sample) for sample in batch] File "/media/tcl2/lilei/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 239, in pin_memory_batch return batch.pin_memory() RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/THCCachingHostAllocator.cpp:265
Hi, Thank you for the nice code. But I got a problem when I run the script.
Traceback (most recent call last):
File "./exps/search-shape.py", line 14, in <module>
from config_utils import load_config, configure2str, obtain_search_single_args as obtain_args
ImportError: cannot import name 'load_config'
Thanks if you could help.
Compares with models such as EfficientNetB0-B7?
First of all, thanks for your great work!
However, I have some questions about training algorithm while searching. Since in back-propagation, soft max function is applied, will the gradients generated in one path (arg max function) also update the weights in other paths? Or the gradients will only update the weights on the sampled path?
Another question is about a description in Sec 3.2 Acceleration from the paper,
"Within one training batch, each sample produces a different hi,j , and, therefore, each element in Ai,j has a high possibility of being updated with gradients"
In my understanding, if each sample in the mini-batch sampled a different architecture, it will performs like batch size is 1.
Thanks again!
Hi, thanks for your excellent work!
Can you release the code and model of mobilenetv2 in Network Pruning via Transformable Architecture Search?
Hi, Xuanyi. Could you please provide more details about CWI? I meet problem reading the http://arxiv.org/abs/1905.09717
In the genotype of GDAS_GF. There are three items for one op in a cell, which are (op, index, value).
For example
('skip_connect', 0, 0.13017432391643524)
What's the usage for the 'value'? Thanks.
Sorry to trouble you, But I have a little confusion about the papar GDAS. I find that I can't find the paper " Gradient-based search using Differentiable Architecture Sampler" anymore, Is it the same paper as "Searching for A Robust Neural Architecture in Four GPU Hours"?
Hi, Xuanyi,
I am very interested in the NAS-Bench-102, but how to set the path to build a API?
Did you release the benchmark files?
Manly thanks.
Hello,
I'm kind of interested in how you decide the fixed reduction cell. Do you test different reduction cells? How are the performances of those cells?
Thank you!
Thanks for your work.
I have trained the cifar10 by GDAS_F1,I want to know the network structure which I get by searching. How can I get it?
thank you.
Hi, when I run "CUDA_VISIBLE_DEVICES=0,1 bash ./scripts-search/search-depth-gumbel.sh cifar10 ResNet110 CIFARX 0.57 -1", I got the following error:
Traceback (most recent call last):
File "./exps/search-shape.py", line 202, in
main(args)
File "./exps/search-shape.py", line 40, in main
assert split_file_path.exists(), '{:} does not exist'.format(split_file_path)
AssertionError: .latent-data/splits/cifar10-0.5.pth does not exist
However, I didn't find the file in the Google Driver URL, is it renamed to another file?
Thank you!
Thank you very much for this remarkable contribution.
I am trying to run the given algorithms on Nasbench102, I noticed that data path has to be given, why is it needed since the benchmark dataset already has all the evaluation results? What should I do if I just want to test those algorithms on Nasbench102?
Thank you very much!
Neural
instead of Nueral
in the title
...
Thanks for your excellent works.
In SETN, as a newcomer to NAS, I am confused that I can't find a way to train the model I have searched. I run the scripts of training and searching respectively, which just train the default searched model instead of my searched model.
Thank you.
Hi,
Can you release the code for Auto-ReID?
Thanks
Thanks for your awesome works!
The script for searching GDAS on a small search space (NAS-Bench-102) is given in README. If I want to reproduce the searched results in GDAS paper, is there a script available for searching on the full search space as well as the same network setting in the paper? I've noticed that the "search_space_name" in GDAS.sh can be changed to "full", however, the search code is based on NAS-Bench-102 settings. Or actually it doesn't matter and switching "search_space_name" to "full" is all I need?
Thank you.
Hi there,
Just wondering how was the gumbel-softmax implemented in your scripts. Was a hard one-hot sample generated at each iteration of the searching phase, or a soft one? I noticed there's an option in the torch.nn.functional.gumbel_softmax() of setting 'hard' to true to generate a one-hot sample. Was this the approach that you took?
Many thanks,
X
In order to use the .get_metrics()
API, I need to put in the name of the dataset as well as the "setname". What is this exactly?
There seem to be 'train', 'x-valid', 'x-test' and 'ori-test, but there doesn't seem to be any documentation on what they are.
I would like to get the test performance of an architecture on all tested datasets (that is cifar10, cifar100 and imagenet16-120). Which setnames should I then use?
Hi! Thanks for your good job!
I have some questions about the Fast Acceleration version. In the main paper, it is said to adopt Gumbel Softmax with temperature to relax argmax function during the searching process. But in "model_search.py", I found it using the normal version which calculates all ops' probs (same with that in DARTS).
Is it something wrong or just my fault?
Can you talk about the details of "samples one sub-graph at one training iteration"?
As far as I know, the result of Gumbel Softmax may not be a one hot vector. It may be a vector like [0.96, 0.01, 0.01, 0.01, 0.01].
When you sample one sub-graph at training, do you just drop all the connections with weights 0.01?
Thanks.
Hi, Thank you for your excellent work!
While searching for width, I wonder whether the operations with different channel numbers share the weight from the one-shot network? If yes, how does the weight in the one-shot network get updated, since it has K(K=2 in your framework figure) gradients separately from the K sampled operation?
Btw, "When τ-> 0, p^ = [^ p1; :::; p^j; :::] becomes one-shot, and the Gumbel-softmax distribution...", do you mean "one-hot" instead of "one-shot"?
Thank you for making the search code publicly accessible, it is really a good material for Nas researchers. After reading the paper and reviewing the search code, there a few points which I may not clearly understand.
In the paper , you mentioned: since Eq. (3) needs to sample from a discrete probability distribution, we cannot back-propagate gradients. To allow back-propagation, we use the Gumbel-Max trick.
During the acceleration part: During the backward procedure, we only back-propagate the gradient generated at the argmax.
First, set F1, F2, F3, F4 are functions between two nodes, the corresponding arch param are a1, a2, a3, a4, and their gumbel softmax are p1, p2, p3, p4.
During forward, we sample the index with max prob, and get a one-hot vector with code:
hardwts = one_h - probs.detach() + probs
assume the one-hot arch weights are w1, w2, w3,w4, set the argmax index is 2.
the forward code is as follow:
weigsum = sum( weights[_ie] * edge(nodes[j]) if _ie == argmaxs else weights[_ie] for _ie, edge in enumerate(self.edges[node_str]) )
according the code, the forward result is like:
weightsum = w1 + w2*F2+w3+w4
q1: Is the code acceleration version or not?
According to the acceleration part in you paper, only need to backprop to argmax, which means only backprop to F2, and F1, F3, F4 are ignored? For the arch param, only backprop to w2, and w1, w3, w4 are ignored? From the code, it seems w1, w3, w4 are also backprop.
q2. Only forward argmax, but from the code, w1, w3, w4 are also added to the weightsum. Even though their values are zeros, which is the same to only calculate w2F2, what's the purpose to add w1, w3, w4? What will happen if we only calculate weightsum as w2F2?
q3. If the soft gumbel softmax is applied rather than the one-hot one, can we still calculate
weightsum as w1 + w2F2+w3+w4? I think maybe not, because w and wF can be different order of magnitude.
These questions confused me a lot, it will be really helpful if you can kindly give me some suggestion. Thank you!
That's a very nice paper and thanks for the repo. Training scripts on searched architectures just worked, however, I have been unable to figure out which scripts to use to search the cells from scratch. Thank you.
How to find your paper?
Hi Xuanyi, thanks for your excellent work and your code is really convenient to use!
I noticed that in the scripts-search folder of TAS, the scripts use seed-${rseed}-last.config to retrain the last searched architecture by default, but I am a little confused why not use seed-${rseed}-best.config to retrain the best searched architecture, which I think may has better performance?
Which searched architecture did the result reported in your paper retrain, the last or best?
Thank you!
Can you point out the differences between GDAS and SNAS?
Thanks.
There, I am trying to train it on Jetson nano - 4GB memory.
Is this possible?
Can I reduce the resources requested?
Thanks!
Thanks for your excellent work!
I would like to ask two questions about optimizing the searched network and applying KD algorithm.
First. In my understanding, the searched network is randomly initialized and optimized to learn from the un-pruned network. Why doesn't it be initialized with the weights that are already trained in the un-pruned network and further fine-tuned, as in the three-stage pruning paradigm (training a large network, pruning, re-training)? Please point me out if I misunderstood.
Second. I believe that KD algorithm has been slightly modified to suit the case. Especially in Eq. (9), it looks like a cross entropy term in the form of "-sigma(P(x)log(Q(x)))". In my understanding, the distribution P is the true distribution and the distribution Q is the estimated distribution. Therefore, I wonder if there is a mistake in Eq. (9), which I suggest that it be "-sigma(P(z_hat)log(Q(z)))" instead of "-sigma(P(z)log(Q(z_hat)))".
Thanks for your attention.
In file search_model_gdas.py
in lib/models/cell_searchs package.
from .genotypes import Structure
I can not find the Structure
object. Could you share this file?
Hi @D-X-Y , thanks for sharing your excellent work. I found your work is very interesting, and I want to try your code.
However, I found the search is not as fast as the paper said. Specifically, I run the command
"CUDA_VISIBLE_DEVICES=0 bash ./scripts-cnn/train-cifar.sh GDAS_F1 cifar10 cut"
on the PC with a Nvidia TITAN X (Pascal) 12GB, and it eventually takes 27 hours to find a network with the top-1 error rate of 3.31. The following are some logs:
train[2019-05-13-06:15:55] Epoch: [486][000/521] Time 0.62 (0.62) Data 0.26 (0.26) Loss 0.033 (0.033) Prec@1 100.00 (100.00) Prec@5 100.00 (100.00)
train[2019-05-13-06:16:31] Epoch: [486][100/521] Time 0.36 (0.37) Data 0.00 (0.00) Loss 0.129 (0.083) Prec@1 96.88 (98.62) Prec@5 100.00 (100.00)
train[2019-05-13-06:17:08] Epoch: [486][200/521] Time 0.38 (0.37) Data 0.00 (0.00) Loss 0.142 (0.084) Prec@1 95.83 (98.57) Prec@5 100.00 (100.00)
train[2019-05-13-06:17:45] Epoch: [486][300/521] Time 0.38 (0.37) Data 0.00 (0.00) Loss 0.024 (0.084) Prec@1 100.00 (98.58) Prec@5 100.00 (100.00)
train[2019-05-13-06:18:22] Epoch: [486][400/521] Time 0.37 (0.37) Data 0.00 (0.00) Loss 0.057 (0.085) Prec@1 100.00 (98.54) Prec@5 100.00 (99.99)
train[2019-05-13-06:19:00] Epoch: [486][500/521] Time 0.36 (0.37) Data 0.00 (0.00) Loss 0.098 (0.087) Prec@1 95.83 (98.46) Prec@5 100.00 (99.99)
train[2019-05-13-06:19:07] Epoch: [486][520/521] Time 0.32 (0.37) Data 0.00 (0.00) Loss 0.084 (0.087) Prec@1 98.75 (98.45) Prec@5 100.00 (99.99)
[2019-05-13-06:19:07] **train** Prec@1 98.45 Prec@5 99.99 Error@1 1.55 Error@5 0.01 Loss:0.087
test [2019-05-13-06:19:07] Epoch: [486][000/105] Time 0.34 (0.34) Data 0.26 (0.26) Loss 0.104 (0.104) Prec@1 96.88 (96.88) Prec@5 100.00 (100.00)
test [2019-05-13-06:19:16] Epoch: [486][100/105] Time 0.09 (0.09) Data 0.00 (0.00) Loss 0.032 (0.139) Prec@1 98.96 (96.49) Prec@5 100.00 (99.94)
test [2019-05-13-06:19:16] Epoch: [486][104/105] Time 0.03 (0.09) Data 0.00 (0.00) Loss 0.048 (0.140) Prec@1 93.75 (96.44) Prec@5 100.00 (99.93)
[2019-05-13-06:19:16] **test** Prec@1 96.44 Prec@5 99.93 Error@1 3.56 Error@5 0.07 Loss:0.140
----> Best Accuracy : Acc@1=96.69, Acc@5=99.93, Error@1=3.31, Error@5=0.07
----> Save into ./output/NAS-CNN/GDAS_F1-cifar10-cut-E600/seed-166-checkpoint-cifar10-model.pth
==>>[2019-05-13-06:19:17] [Epoch=487/600] [Need: 06:21:40] LR=0.0022 ~ 0.0022, Batch=96
train[2019-05-13-06:19:17] Epoch: [487][000/521] Time 0.69 (0.69) Data 0.31 (0.31) Loss 0.071 (0.071) Prec@1 98.96 (98.96) Prec@5 100.00 (100.00)
train[2019-05-13-06:19:54] Epoch: [487][100/521] Time 0.37 (0.37) Data 0.00 (0.00) Loss 0.069 (0.087) Prec@1 97.92 (98.39) Prec@5 100.00 (99.99)
train[2019-05-13-06:20:31] Epoch: [487][200/521] Time 0.36 (0.37) Data 0.00 (0.00) Loss 0.125 (0.086) Prec@1 96.88 (98.45) Prec@5 100.00 (99.99)
train[2019-05-13-06:21:08] Epoch: [487][300/521] Time 0.37 (0.37) Data 0.00 (0.00) Loss 0.212 (0.086) Prec@1 94.79 (98.49) Prec@5 100.00 (99.99)
train[2019-05-13-06:21:45] Epoch: [487][400/521] Time 0.36 (0.37) Data 0.00 (0.00) Loss 0.090 (0.087) Prec@1 98.96 (98.49) Prec@5 100.00 (99.99)
train[2019-05-13-06:22:22] Epoch: [487][500/521] Time 0.38 (0.37) Data 0.00 (0.00) Loss 0.150 (0.088) Prec@1 96.88 (98.44) Prec@5 100.00 (99.99)
train[2019-05-13-06:22:29] Epoch: [487][520/521] Time 0.31 (0.37) Data 0.00 (0.00) Loss 0.078 (0.088) Prec@1 97.50 (98.46) Prec@5 100.00 (99.99)
[2019-05-13-06:22:29] **train** Prec@1 98.46 Prec@5 99.99 Error@1 1.54 Error@5 0.01 Loss:0.088
test [2019-05-13-06:22:29] Epoch: [487][000/105] Time 0.32 (0.32) Data 0.24 (0.24) Loss 0.059 (0.059) Prec@1 96.88 (96.88) Prec@5 100.00 (100.00)
test [2019-05-13-06:22:38] Epoch: [487][100/105] Time 0.09 (0.09) Data 0.00 (0.00) Loss 0.029 (0.141) Prec@1 98.96 (96.42) Prec@5 100.00 (99.93)
test [2019-05-13-06:22:38] Epoch: [487][104/105] Time 0.03 (0.09) Data 0.00 (0.00) Loss 0.001 (0.140) Prec@1 100.00 (96.43) Prec@5 100.00 (99.93)
[2019-05-13-06:22:38] **test** Prec@1 96.43 Prec@5 99.93 Error@1 3.57 Error@5 0.07 Loss:0.140
----> Best Accuracy : Acc@1=96.69, Acc@5=99.93, Error@1=3.31, Error@5=0.07
----> Save into ./output/NAS-CNN/GDAS_F1-cifar10-cut-E600/seed-166-checkpoint-cifar10-model.pth
Since I am fresh to NAS, I cannot figure out what I have missed?
Any suggestions/help are greatly appreciated! Thanks.
In the genotype of GDAS_GF. There are three items for one op in a cell, which are (op, index, value).
For example
('skip_connect', 0, 0.13017432391643524)
What's the usage for the 'index'? Thanks.
@D-X-Y thank you for your hard work
How can I train on my own dataset, with a simple structure of:
Hi, I check the .pth files in the Google Driver URL(like files in the basic-results directory), and find that they have weights with the same shape and depth with the standard model. So I guess that they are models before searching. Does I mistake it or is there any pretrained model after searching for width depth released?
Thank you for your help!
I want to use your searched model for study, can you release the finnal model checkpoints of ImageNet in Table 3 of your paper, or the searched model configirations, both will be ok, thanks !
Thanks for your impressive work and the released code!
I saw that in DARTS, BN is not learnable in the search phase. And the authors claim that
Learnable affine parameters in all batch normalizations are disabled during the search process to avoid rescaling the outputs of the candidate operations.
In contrast, BN is set to be learnable by default in the search phase of GDAS (if I didn't miss some important points).
Does the affine parameter have some effect on the search phase? Could you give me some hints?
Thanks in advance!
Hi! I read that your code which finds cell structure from scratch is under review
When will you release your code?
I look forward to waiting very much! :)
Thanks for sharing the code.
I have a question about the implementation difference from DARTS. The training code looks like very similar to DARTS(https://github.com/quark0/darts).
As you mentioned in the paper,
"2. Instead of using the whole DAG, GDAS samples one sub-graph at one training iteration, accelerating the searching procedure. Besides, the sampling in GDAS is learnable and contributes to finding a better cell."
But in the forward function of MixedOp, the output is just the weighted sum of all ops, same as DARTS.
def forward(self, x, weights): return sum(w * op(x) for w, op in zip(weights, self._ops))
So, can you point out the code that "samples one sub-graph at one training iteration"? Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.