iboing / ista-nas Goto Github PK
View Code? Open in Web Editor NEWreleased code for the paper: ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding
released code for the paper: ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding
Hi, I have read your paper on NeruaIPS2020, and I like it.
I just have a question about the weight of operations. It usually contains negative values in the training process, what does it mean when the weight of an operation is negative? And the non-zero values in the weight matrix are not close to 1, so is there still a gap between the sub-graph and the final genotype?
Dear authors,
Thanks for your nice work ISTA-NAS, which is very interesting to me.
After running your code, I find that the selected cell is determined by the code ''id1, id2 = np.abs(x).argsort()[-2:]'' (L109 of the ista_nas/trainer.py). In this way, we can not guarantee that the selected feature maps of one node are not from the same previous node. Take the first node as an example, if the largest two values are just in the last and 2nd last place, the incoming feature maps of the first node will all come from the same previous node, which is in conflict with the DARTS original setting. I wonder how to include this constraint in your framework.
Best,
Shun Lu
Thanks for releasing the code.
I try to reproduce the CIFAR10 result from scratch according to your guidance (cutout enabled):
python ./tools/evaluation.py --auxiliary --cutout --onestage --arch ISTA_onestage
However, the accuracy of the model on CIFAR-10 is 97.3% after training for 600 epochs, which is lower than 97.64% (2.36±0.06 error rate) in your paper.
Can you provide the training logs to help me find out the gap?
This is one training log using your code. test_one_stage.log
Thanks again.
Hello, I have read your paper on NeruaIPS2020 and the code recently, it is an interesting work and outperforms all existing methods.
Congratulations!
But I have a question about the random measurement matrix A (dictionary).
In both the paper and the code, A (base_A_[normal / reduce]) is initialized by sampling from the normal distribution.
However, in the traditional Sparse Coding, the dictionary A is also the learning objective to minimize the reconstruction error.
By randomly initializing A and not being trainable, it might just project the sparse op selection
Why is the randomly initialized dictionary useful?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.