naszilla / naszilla Goto Github PK
View Code? Open in Web Editor NEWNaszilla is a Python library for neural architecture search (NAS)
License: Apache License 2.0
Naszilla is a Python library for neural architecture search (NAS)
License: Apache License 2.0
I have run the bananas ablation test many times with the config of encode path=True
and encode path=False
. I can not get the large performance gap between this two case as in the paper fig5.2
.
I use the default configuration. The following is the evaluated result. bananas_f
means bananas
algorithm without encode path.
How could you get such a big performance gap?
Command: python naszilla/run_experiments.py --search_space nasbench_101 --algo_params all_algos --queries 30 --trials 1
Log:
...
Finished GP-BayesOpt query 0
Finished GP-BayesOpt query 10
* Running NAS algorithm: {'algo_name': 'gcn_predictor', 'total_queries': 30}
Traceback (most recent call last):
File "naszilla/run_experiments.py", line 121, in <module>
main(args)
File "naszilla/run_experiments.py", line 104, in main
run_experiments(args, save_path)
File "naszilla/run_experiments.py", line 52, in run_experiments
result, val_result, run_datum = run_nas_algorithm(algorithm_params[j], search_space, mp)
File "/home/ubuntu/naszilla/naszilla/nas_algorithms.py", line 47, in run_nas_algorithm
data = gcn_predictor(search_space, **ps)
File "/home/ubuntu/naszilla/naszilla/nas_algorithms.py", line 420, in gcn_predictor
fit(net, xtrain, seed=seed)
File "/home/ubuntu/naszilla/naszilla/gcn/train_gcn.py", line 53, in fit
prediction = net(batch)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/naszilla/naszilla/gcn/model.py", line 66, in forward
out = layer(out, adj_with_diag)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/naszilla/naszilla/gcn/model.py", line 37, in forward
output1 = F.relu(torch.matmul(norm_adj, torch.matmul(inputs, self.weight1)))
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_mm
This happened on both single-GPU and multi-GPU machines.
Is sampling of nasbench architectures valid in generate_random_dataset and random_cell methods?
I've tried to sample uniform hashes and got much worse test errors (see this
notebook )
Perhaps generation of random architectures until they are valid makes the sample biased towards more predictable architectures.
Also, how do you ensure that train and test samples do not intersect?
Encodings experiments didn't work by the given command :
python run_experiments.py --algo_params evo_encodings --search_space nasbench_101
By reading the params.py
code, I see that the project of [A Study on Encodings for Neural Architecture Search](https://github.com/naszilla/nas-encodings/tree/master)
has not been integrated here. This part of this code base does not seem to be implemented. Is there any implemented code?
I expect to do some comparative experiments on architecture coding. Is there any convenient platform?
May I ask whether the NASLib platform implemented in your new article integrates the implementation of each architecture coding algorithm?
In the file nas_algorithms.py
the Data
object is created each time run.
def run_nas_algorithm(algo_params, metann_params):
# set up search space
mp = copy.deepcopy(metann_params)
ss = mp.pop('search_space')
search_space = Data(ss)
# run nas algorithm
ps = copy.deepcopy(algo_params)
algo_name = ps.pop('algo_name')
This is quite time consuming. Why the data object is created each time run?
Hey @crwhite14
how do you predict val accuracies with D-VAE? Do you use a similar setting as in the experiment in section 4.2 of the D-VAE paper (i.e. using a sparse GP)?
Hi,
I followed the installation steps and tried the first test python naszilla/run_experiments.py --search_space nasbench_101 --algo_params all_algos --queries 30 --trials 1
. However, it resulted in ModuleNotFoundError
.
Traceback (most recent call last):
File "naszilla/run_experiments.py", line 11, in <module>
from naszilla.nas_benchmarks import Nasbench101, Nasbench201, Nasbench301
File "/home/ubuntu/naszilla/naszilla/nas_benchmarks.py", line 8, in <module>
import nasbench301 as nb
File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/__init__.py", line 1, in <module>
from nasbench301.api import load_ensemble
File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/api.py", line 6, in <module>
from nasbench301.surrogate_models import utils
File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/surrogate_models/utils.py", line 20, in <module>
from nasbench301.surrogate_models.gnn.gnn import GNNSurrogateModel
File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/surrogate_models/gnn/gnn.py", line 16, in <module>
from nasbench301.surrogate_models.gnn.models.deeper_gnn import DeeperGCN
File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/surrogate_models/gnn/models/deeper_gnn.py", line 8, in <module>
from nasbench301.surrogate_models.gnn.models.gcn_lib.sparse.torch_nn import norm_layer
File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/surrogate_models/gnn/models/gcn_lib/sparse/__init__.py", line 1, in <module>
from .torch_nn import *
File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/surrogate_models/gnn/models/gcn_lib/sparse/torch_nn.py", line 3, in <module>
from surrogate_models.gnn.models.gcn_utils.data_util import get_atom_feature_dims, get_bond_feature_dims
ModuleNotFoundError: No module named 'surrogate_models'
Hi Colin,
We've recently released a modular architecture search framework and we were looking at getting reference implementations of search algorithms in it. The main selling point is that we have a well-defined language to write search spaces (our paper to appear at NeurIPS 2019). Once an algorithm is implemented, it can be applied to arbitrary search spaces that a user may write. I would appreciate some help in moving your algorithms here to our framework (if your code is under the MIT license). The main questions are:
Our search algorithm implementation are based on the sample
(get an architecture from the search space) and update
(pass the results back to the searcher and update its state) API.
Do you want to look into it? Here is the repo and colab notebook.
Thanks,
Renato
I create the project environment on Anaconda with python=3.7, and while following the requirements.txt to install packages there occurred some problem while installing torch-scatter, torch-sparse, torch-cluster, torch-spline-conv. Finally, I use "wget" to directly download torch-***. After that, I downloaded the three NasBenchmark datasets following the three wget commands, and then I follow the "Test Installation" with "python naszilla/run_experiments.py --search_space nasbench_101 --algo_params all_algos --queries 30 --trials 1". While running the above run_experiments.py the programm ended with the following statement:
" File "/home/umin/.conda/envs/SXH_AUTOML/lib/python3.7/site-packages/torch_geometric/loader/dynamic_batch_sampler.py", line 9, in
class DynamicBatchSampler(torch.utils.data.sampler.Sampler[List[int]]):
TypeError: 'type' object is not subscriptable"
So, is this just because of my wrongly configuring the project environment?
The bananas
method in nas_algorithms.py
will consume all of the system memory and cause memory error.
I have tried on two different computer and both of the two computer get the error.
Could you fix this bug?
The paper said We note that a plot with respect to wall-clock time would look nearly identical, since all NAS algorithms finish the set of 150 queries in roughly the same amount of time. The fastest, random search, takes 46.5 TPU hours, while BANANAS, the slowest, takes 47.0 TPU hours
.
I am running the code run_experiments_sequential.py
in this repository. The walltimes
of 150 queries with algorithms bananas
, random
, evolution
is quite different. The bananas
algorithms is ten times slower than random. The following line is the output:
[359.68584871292114, 35.89821910858154, 35.73087739944458]
Does I made any mistakes?
Dear authors,
I write a genotype according to the best architecture graph that BANANAS searched as follows:
BANANAS = Genotype( normal = [ ('skip_connect', 0), ('sep_conv_3x3', 1), ('sep_conv_5x5', 0), ('sep_conv_5x5', 2), ('sep_conv_5x5', 0), ('sep_conv_5x5', 1), ('sep_conv_3x3', 0), ('skip_connect', 2) ], normal_concat = [2, 3, 4, 5], reduce = [ ('max_pool_3x3', 0), ('sep_conv_3x3', 1), ('max_pool_3x3', 0), ('none', 1), ('dil_conv_3x3', 2), ('sep_conv_5x5', 3), ('sep_conv_5x5', 4), ('sep_conv_3x3', 1), ], reduce_concat = [2, 3, 4, 5] )
But it reported
'''
RuntimeError: Error(s) in loading state_dict for NetworkCIFAR:
Missing key(s) in state_dict: "cells.0._ops.1.op.1.weight", "cells.0._ops.1.op.2.weight", "cells.0._ops.1.op.3.bias", "cells.0._ops.1.op.3.running_mean", "cells.0._o
ps.1.op.3.weight", "cells.0._ops.1.op.3.running_var", "cells.0._ops.1.op.5.weight", "cells.0._ops.1.op.6.weight", "cells.0._ops.1.op.7.bias", "cells.0._ops.1.op.7.running_me
an", "cells.0._ops.1.op.7.weight",
.....
Unexpected key(s) in state_dict: "auxiliary_head.features.2.weight", "auxiliary_head.features.3.weight", "auxiliary_head.features.3.bias", "auxiliary_head.features.3.running_mean", "auxiliary_head.features.3.running_var", "auxiliary_head.features.3.num_batches_tracked", "auxiliary_head.features.5.weight", "auxiliary_head.features.6.weight", "auxiliary_head.features.6.bias"
....
size mismatch for cells.6._ops.6.op.1.weight: copying a param with shape torch.Size([72, 1, 5, 5]) from checkpoint, the shape in current model is torch.Size([72, 1, 3, 3]).
size mismatch for cells.6._ops.6.op.5.weight: copying a param with shape torch.Size([72, 1, 5, 5]) from checkpoint, the shape in current model is torch.Size([72, 1, 3, 3]).
size mismatch for cells.6._ops.7.op.1.weight: copying a param with shape torch.Size([72, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([72, 1, 5, 5]).
size mismatch for cells.6._ops.7.op.5.weight: copying a param with shape torch.Size([72, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([72, 1, 5, 5]).
size mismatch for cells.13._ops.6.op.1.weight: copying a param with shape torch.Size([144, 1, 5, 5]) from checkpoint, the shape in current model is torch.Size([144, 1, 3, 3]).
size mismatch for cells.13._ops.6.op.5.weight: copying a param with shape torch.Size([144, 1, 5, 5]) from checkpoint, the shape in current model is torch.Size([144, 1, 3, 3]).
size mismatch for cells.13._ops.7.op.1.weight: copying a param with shape torch.Size([144, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([144, 1, 5, 5]).
size mismatch for cells.13._ops.7.op.5.weight: copying a param with shape torch.Size([144, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([144, 1, 5, 5]).
'''
Hey @crwhite14:
The code for generating the path encoding of cell architecture in DARTS seems different from the NASBench-101. Despite the input-to-output paths, the paths that not directly connect to the output also considered to update the path encoding vector. The paths
variable in the code stored all the paths; even it does not connect to the output.
naszilla/naszilla/nas_bench_301/cell_301.py
Line 220 in 393e55e
I am confused by the code. Is this a bug, or the meaning of the input-to-output path in DARTS search space is different from the NASBench-101?
Hello, I am trying to reproduce the code of the network coding part of the project, and the following problems occurred in the environment configuration:
When I reinstall scikit-learn=0.23.1, another problem arises:
This seems to be caught in a dilemma.What should I do to solve this error? I hope to get your guidance.
The code chip in method banans
in file nas_algorithms.py
while query <= total_queries:
candidates = search_space.get_candidates(data,
acq_opt_type=acq_opt_type,
encode_paths=encode_paths,
allow_isomorphisms=allow_isomorphisms,
deterministic_loss=deterministic)
xcandidates = np.array([c[1] for c in candidates])
predictions = []
# train an ensemble of neural networks
train_error = 0
for _ in range(num_ensemble):
meta_neuralnet = MetaNeuralnet()
train_error += meta_neuralnet.fit(xtrain, ytrain, **metann_params)
# predict the validation loss of the candidate architectures
predictions.append(np.squeeze(meta_neuralnet.predict(xcandidates)))
train_error /= num_ensemble
if verbose:
print('Query {}, Meta neural net train error: {}'.format(query, train_error))
In each while loop, you created five totally new neural network and use the same training data to train the five new neural network.
I can not understand why the five meta neural network is trained in this way. Using the same training data and training from scratch in each iteration.
Dear Naszilla team,
I am getting this error when I try to install. It seems that pip can no longer find tensorflow 1.14.0 in the standard pathways. This is an Ubuntu 18.04 64 bit system witha P100 gpu.
(base) dedey@ubuntubox:~/naszilla$ cat requirements.txt | xargs -n 1 -L 1 pip install
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.14.0
ERROR: No matching distribution found for tensorflow-gpu==1.14.0
ERROR: Could not find a version that satisfies the requirement tensorflow==1.14.0
ERROR: No matching distribution found for tensorflow==1.14.0
Hi,
I have a question for get_paths function in cell_301.py. In the second loop (cell_301.py -> get_paths function -> Line 180), you are iterating each tuple item in the self.arch. But in this loop, the last element of the list (last edge, ops pair) is not visited. For example, suppose I have an example like the one below.
self.arch = {[(1, 2), (0, 1), (0, 2), (2, 0), (2, 2), (3, 1), (4, 3), (0, 2)],
[(0, 3), (1, 4), (0, 6), (2, 1), (1, 4), (3, 6), (0, 5), (4, 5)]}
In the inner loop between lines 181 and 193, the last elements of the list are not included because the loop is defined like this:
for j in range(len(OPS)):
I think it needs to be changed to:
for j in range(len(OPS) + 1):
Thanks,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.