facebookresearch / lamcts Goto Github PK
View Code? Open in Web Editor NEWThe release codes of LA-MCTS with its application to Neural Architecture Search.
License: Other
The release codes of LA-MCTS with its application to Neural Architecture Search.
License: Other
Hi,
When I tried to download the pre-trained checkpoint lanas_128_99.03.zip
from the url https://drive.google.com/file/d/1bZsEoG-sroVyYR4F_2ozGLA5W50CT84P/view?usp=sharing
you provided, I failed. Therefore, are there any other ways to get the pre-trained checkpoint?
I would like to test it out for genetic algorithm. But I'm not sure where the fitness function, or the parent selection function would go. Would you have a demo that perharps applies this?
我注意到在collect_samples function中对函数值是做了*-1,我的理解是想取到函数的最小值,但是为何在输出 best f(x)时使用np.absolute(curt_best_value)?
The objective goal is to maximize my function f(range from 100 to 300). To use LaMCTS, I minimize the -f returning to LaMCTS. Just wanna double-check in this case Cp should be -10
, correct? Thank you.
Hi,
Recently I am trying to use LA-MCTS to optimize my own task, but some errors occurred during the execution.
My task has 9 dimensions, and the HP setting is:
Cp
= 10 ( follow the suggestion from the paper, ~= 10% of max f(x) )leaf_size
= 10ninits
= 40kernel_type
= "rbf"Here is the error log:
Traceback (most recent call last):
File "mcts-exec.py", line 30, in <module>
agent.search(iterations = args.iterations)
File "/root/workspace/lamcts/MCTS.py", line 239, in search
self.dynamic_treeify()
File "/root/workspace/lamcts/MCTS.py", line 116, in dynamic_treeify
bad_kid.update_bag( bad_kid_data )
File "/root/workspace/lamcts/Node.py", line 83, in update_bag
self.is_svm_splittable = self.classifier.is_splittable_svm()
File "/root/workspace/lamcts/Classifier.py", line 55, in is_splittable_svm
self.learn_boundary(plabel)
File "/root/workspace/lamcts/Classifier.py", line 410, in learn_boundary
self.svm.fit(self.X, plabel)
File "/opt/app-root/lib/python3.6/site-packages/sklearn/svm/_base.py", line 173, in fit
y = self._validate_targets(y)
File "/opt/app-root/lib/python3.6/site-packages/sklearn/svm/_base.py", line 560, in _validate_targets
" class" % len(cls))
ValueError: The number of classes has to be greater than one; got 1 class
Since it occurs at Classifier.learn_clusters()
, I print the result of self.kmean.predict(tmp)
and find some clues:
plabel
(the results of self.kmean.predict(tmp)
) are all the same# normal
plabel: [0 1 1 0 0 0 1 0 0 1 1]
# exception
plabel: [0 0 0 0 0]
I temporarily avoid this exception by making is_splittable_svm
return False when plabel
only contains a single class.
However, I would like to know that is it possible to happen in the general case? Or it may be caused by my own function?
Thank you for the work & sharing your code!
Why does not provide TurBO code in Classifier.py ?
I notice the method of def propose_samples_turbo(self, num_samples, path, func)
in https://github.com/facebookresearch/LaMCTS/blob/master/LA-MCTS/Classifier.py#L343 is commented.
Are there any plans to add setuptools scripts (or other), so that LA-MCTS and LaNAS can be packaged and installed through pip? It would greatly simplify development of projects that would like to use them as dependencies.
I would be happy to help out with this.
Hi,
firstly, thank you for great work and for sharing your code!
I'm trying to run train the best architecture on CIFAR-10, yet I cannot seem to get the reported numbers, I guess I must be missing something. I tried both 1500 epochs as well as 600 as you suggested in another ticket, but still the accuracy seems lower than 99%. Is there perhaps another hyperparameter setting I need to adjust ?
python train.py --auxiliary --batch_size=32 --init_ch=128 --layer=24 --arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]' --model_ema --model-ema-decay 0.9999 --auto_augment --epochs 600
`2021-01-28 03:44:52,831 gpu device = 0
2021-01-28 03:44:52,831 args = Namespace(arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]', auto_augment=True, auxiliary=True, auxiliary_weight=0.4, batch_size=32, cutout=False, cutout_length=16, data='../data', drop_path_prob=0.2, epochs=600, exp_path='exp/cifar10', gpu=0, grad_clip=5, init_ch=128, layers=24, lr=0.025, model_ema=True, model_ema_decay=0.9999, model_ema_force_cpu=False, model_path='saved_models', momentum=0.9, report_freq=50, save='checkpoints/auto_aug-600-[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]-128-True-0.9999-0.2-24-16', seed=0, track_ema=False, wd=0.0003)
2021-01-28 03:44:56,852 param size = 44.591498MB
2021-01-28 03:44:59,182 epoch 0 lr 2.499966e-02
...
2021-02-03 00:52:40,405 valid_acc: 98.410000
2021-02-03 00:52:40,405 best_acc: 98.490000
2021-02-03 00:52:42,316 epoch 599 lr 0.000000e+00
or for 1500 epochs respectivelly: ```
2021-01-11 10:40:08,600 gpu device = 0
2021-01-11 10:40:08,600 args = Namespace(arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]', auto_augment=True, auxiliary=True, auxiliary_weight=0.4, batch_size=32, cutout=False, cutout_length=16, data='../data', drop_path_prob=0.2, epochs=1500, exp_path='exp/cifar10', gpu=0, grad_clip=5, init_ch=128, layers=24, lr=0.025, model_ema=True, model_ema_decay=0.9999, model_ema_force_cpu=False, model_path='saved_models', momentum=0.9, report_freq=50, save='checkpoints/auto_aug-1500-[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]-128-True-0.9999-0.2-24-16', seed=0, track_ema=False, wd=0.0003)
2021-01-11 10:40:11,371 param size = 44.591498MB
2021-01-11 10:40:12,744 epoch 0 lr 2.499995e-02
2021-01-27 07:02:55,357 valid_acc: 98.250000
2021-01-27 07:02:55,358 best_acc: 98.310000
2021-01-27 07:02:57,400 epoch 1499 lr 5.483112e-08
Thank you!
Lukas
When I use the scripts:
python train.py --auxiliary --batch_size=32 --init_ch=128 --layer=24 --arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]' --model_ema --model-ema-decay 0.9999 --auto_augment --epochs 1500
to train the cifar-10 dataset, I met this runtime error
How could I fix this?
I just use one P100 GPU
my pytorch version is 1.11.0+cu113
Hi
I am now reproducing the acc in cifar10 (99%),since the process is very slow and i get limited gpus ,also i noticed that the previous issue about lanet ,so i want to know what configs should i use.
tks a lot
adamas
Hello I have problem to run your code on CiFar10
i cannot find the function cutmix
from cutmix.cutmix import CutMix from cutmix.utils import CutMixCrossEntropyLoss
Hello,
I am testing LA-MCTS together with TuRBO-1 and found it works well. I tried to increase the batch_size
in TuRBO-1 to reduce the time spent on GP fitting. From what I got, the TuRBO paper indicates this is a legit choice for increasing overall computational efficiency without affecting the quality of the results.
To this end, I only modified collect_samples()
in MCTS.py to handle the multiple values returned by each TuRBO-1 iteration. Nothing particularly ambiguous here.
And now on rare occasion I get an AssertionError
thrown at line 47 in update_kids()
in Node.py:
assert self.kids[0].classifier.get_mean() > self.kids[1].classifier.get_mean()
I'm not sure why just yet. And I don't understand why returning more samples from TuRBO-1 could trigger this.
Could you please indicate possible reasons for this?
I believe increasing batch_size
in TuRBO-1 is something desirable and could benefit many.
Or am I misunderstanding something in the interplay between LA-MCTS and TuRBO-1?
Thanks for the great algo!
In the line
Line 418 in fa76cb5
Hello, I am now re-implementing the LA-MCTS according to the paper. My question is:
I use the labels predicted by SVM to decide which nodes of the parent should go to the good kid or bad kid. However, the SVM predicted labels are NOT always the same as those of K-Means. In some extreme cases, the mean of the bad kid was bigger than that of the good kid, due to the SVM classification error.
I checked the code of yours (as below), I found K-Means labels were used, instead of the SVM labels. I don't understand why should use the clustering labels, they don't represent the splitting, right?
def split_data(self):
good_samples = []
bad_samples = []
train_good_samples = []
train_bad_samples = []
if len(self.samples) == 0:
return good_samples, bad_samples
plabel = self.learn_clusters()
self.learn_boundary(plabel)
for idx in range(0, len(plabel)):
if plabel[idx] == 0:
# ensure the consistency
assert self.samples[idx][-1] - self.fX[idx] <= 1
good_samples.append(self.samples[idx])
train_good_samples.append(self.X[idx])
else:
bad_samples.append(self.samples[idx])
train_bad_samples.append(self.X[idx])
train_good_samples = np.array(train_good_samples)
train_bad_samples = np.array(train_bad_samples)
assert len(good_samples) + len(bad_samples) == len(self.samples)
return good_samples, bad_samples
Hello,
I'd like to apply LaMCTS to a discrete but highly dimensional search space (e.g., 200 dimensions).
Each dimension, however, has only four options.
Would LaMCTS be suitable for this? How hard is it to adapt the code for this?
Hi, recently we try to reproduce the result of mujoco task Ant-v2 in your paper. However we found all algorithms include LAMCTS cannot find reward more than -1000 after 10000 iterations. (In the paper LAMCTS should get a reward more than 1000.)
According to your paper, We use mean and std for linear policy from https://github.com/modestyachts/ARS and use 20 rollouts to get an average reward. (We use turbo as the local optimizer)
Actually, we got similar results of the paper in simple tasks like Swim, Hopper and HalfCheetah. But for Ant-888d task, we cannot reproduce the result in your paper. Can you reproduce the Ant-v2 result using the current code? Is there anything we need to change to reproduce the results?
Hi, when I try to replicate the one-shot LaNAS with the source code, I cannot get the curve as in the paper.
Then I check the code and find something strange, could you please help me work it out?
(1) In one-shot LaNAS/LaNAS/Classifier.py
, it seems that the learning rate is too small and the learned linear model is far from the optimal. I think the learning rate should be bigger (such as 0.01), is that true?
(2) When (1) is solved, I find that in function search_samples_under_constraints
in one-shot LaNAS/LaNAS/MCTS.py
, why just a pair of W and b is considered? And when I change the code into considering all Ws and bs, it seems to be hard to retrieve a sample under all constraints. How to solve this issue?
(3) In the search
function in one-shot LaNAS/LaNAS/MCTS.py
, the tree is updated after each sample is evaluated, but in other scripts (like Distributed_LanAS and LaNAS_NASBench101), the tree is updated after evaluating 20/50 samples. How often should the "learning phase" be recondutced (or in other words, how many samples should be retrieved in each "searching phase")?
(4) Which Cp should be chosen to replicate the result of one-shot LaNAS? I find it very big (Cp=10) in the code and it seems a little bit unreasonable.
I found in the readme that the command python train.py --auxiliary --batch_size=32 --init_ch=128 --**layer**=24
sets the parameter **layer**
to 24. However, in the actual code, it should be specified as **layers**
using parser.add_argument('--**layers**', type=int, default=20, help='total number of layers')
. I am not sure if using 20 for layers
can achieve an accuracy of 99.03%.
Dear authors,
Thank you for open-sourcing the code! I have a few questions regarding the Mujoco environments. First, I wonder how the mean and std arrays of `
class Hopper:
def __init__(self):
self.mean = np.array([1.41599384, -0.05478602, -0.25522216, -0.25404721,
0.27525085, 2.60889529, -0.0085352, 0.0068375,
-0.07123674, -0.05044839, -0.45569644])
self.std = np.array([0.19805723, 0.07824488, 0.17120271, 0.32000514,
0.62401884, 0.82814161, 1.51915814, 1.17378372,
1.87761249, 3.63482761, 5.7164752 ])`
are computed?
I am also hoping to figure out how to construct the equivalent function evaluation class for 2dWalker, HalfCheetah, Ant, and Humanoid as reported in the paper. As these are not provided in the repo, could you let me know how to do so? Do they also require some hard-coded mean and std values to normalize the policy matrix M?
Thank you very much!
Hi,
For example, the search space size of NASNet is 3.5x10^21. it is [3.5x10^21, 160] after encoding all arch to masks. this array is quite huge, so it can not work on my device. so how do you deal with it?
Thanks in advance!
Hi, thank you for your good work.
Now I am trying to reproduce the results of Ackley-20d in the LAMCTS paper (Fig. 4). I use the same hyper-parameters as (cp=1, leaf size=20) and run:
cd LA-MCTS
python run.py --func ackley --dims 20 --iterations 1000
I have repeated the process for 20 times, however I cannot obtain the the results in the paper. Is there anything I need to change to reproduce the results?
Hi, I want to reproduce the experiment using TuRBO. It seems that following the steps in https://github.com/facebookresearch/LaMCTS/tree/main/LA-MCTS can not successfully run the commented code. I have some questions about current code.
Could you give me some suggestions about TuRBO reproduction?
Hi, is there a multi-gpu running option?
when I try to set --gpu=0,1 .. I get an error.
I'm asking because I saw that the checkpoint that got 99% acc had 96 batch size and on one GPU I can't run a train with 96 batch size.
Thanks
The training speed on the V100 without using any tricks is about 0.23h/epoch, which is too slow. If autoaug and ema are used, the speed will become more slowly.
So I would like to ask: how long did it take the author to train the LaNet with 1000 EPOCH.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.