iboing / ista-nas Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 8.0 305 KB

released code for the paper: ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

Python 99.73% Shell 0.27%

ista-nas's People

Contributors

Stargazers

Watchers

Forkers

lilujunai dyf98 xrosliang lmm6895071 mrk1992 kartheekkumar65 alexander-fitzgerald

ista-nas's Issues

What does it mean that the weight of an operation is a negative value?

Hi, I have read your paper on NeruaIPS2020, and I like it.
I just have a question about the weight of operations. It usually contains negative values in the training process, what does it mean when the weight of an operation is negative? And the non-zero values in the weight matrix are not close to 1, so is there still a gap between the sub-graph and the final genotype?

How to ensure that the selected feature maps of one node are not from the same previous node?

Dear authors,
Thanks for your nice work ISTA-NAS, which is very interesting to me.
After running your code, I find that the selected cell is determined by the code ''id1, id2 = np.abs(x).argsort()[-2:]'' (L109 of the ista_nas/trainer.py). In this way, we can not guarantee that the selected feature maps of one node are not from the same previous node. Take the first node as an example, if the largest two values are just in the last and 2nd last place, the incoming feature maps of the first node will all come from the same previous node, which is in conflict with the DARTS original setting. I wonder how to include this constraint in your framework.

Best,
Shun Lu

cannot reproduce the reported accuracy in the paper.

Thanks for releasing the code.
I try to reproduce the CIFAR10 result from scratch according to your guidance (cutout enabled):
python ./tools/evaluation.py --auxiliary --cutout --onestage --arch ISTA_onestage
However, the accuracy of the model on CIFAR-10 is 97.3% after training for 600 epochs, which is lower than 97.64% (2.36±0.06 error rate) in your paper.
Can you provide the training logs to help me find out the gap?

This is one training log using your code. test_one_stage.log

Thanks again.

Initialize measurement matrix A randomly

Hello, I have read your paper on NeruaIPS2020 and the code recently, it is an interesting work and outperforms all existing methods.
Congratulations!
But I have a question about the random measurement matrix A (dictionary).
In both the paper and the code, A (base_A_[normal / reduce]) is initialized by sampling from the normal distribution.
However, in the traditional Sparse Coding, the dictionary A is also the learning objective to minimize the reconstruction error.
By randomly initializing A and not being trainable, it might just project the sparse op selection $z$ into a dense vector space by encoding, and recover the trainable A_[normal/reduce] to the sparse space by the over-complete A.
Why is the randomly initialized dictionary useful?

iboing / ista-nas Goto Github PK

ista-nas's People

Contributors

Stargazers

Watchers

Forkers

ista-nas's Issues

What does it mean that the weight of an operation is a negative value?

How to ensure that the selected feature maps of one node are not from the same previous node?

cannot reproduce the reported accuracy in the paper.

Initialize measurement matrix A randomly

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent