Giter Club home page Giter Club logo

naszilla's People

Contributors

administrator2992 avatar ajscheff avatar crwhite14 avatar daikikatsuragawa avatar willieneis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

naszilla's Issues

The performance gap between encode path or not encode path

I have run the bananas ablation test many times with the config of encode path=True and encode path=False. I can not get the large performance gap between this two case as in the paper fig5.2.
I use the default configuration. The following is the evaluated result. bananas_f means bananas algorithm without encode path.
Screenshot_2019-12-17_15-31-35
How could you get such a big performance gap?

RuntimeError with GCN predictor in installation test

Command: python naszilla/run_experiments.py --search_space nasbench_101 --algo_params all_algos --queries 30 --trials 1
Log:

...
Finished GP-BayesOpt query 0
Finished GP-BayesOpt query 10

* Running NAS algorithm: {'algo_name': 'gcn_predictor', 'total_queries': 30}
Traceback (most recent call last):
  File "naszilla/run_experiments.py", line 121, in <module>
    main(args)
  File "naszilla/run_experiments.py", line 104, in main
    run_experiments(args, save_path)
  File "naszilla/run_experiments.py", line 52, in run_experiments
    result, val_result, run_datum = run_nas_algorithm(algorithm_params[j], search_space, mp)
  File "/home/ubuntu/naszilla/naszilla/nas_algorithms.py", line 47, in run_nas_algorithm
    data = gcn_predictor(search_space, **ps)
  File "/home/ubuntu/naszilla/naszilla/nas_algorithms.py", line 420, in gcn_predictor
    fit(net, xtrain, seed=seed)
  File "/home/ubuntu/naszilla/naszilla/gcn/train_gcn.py", line 53, in fit
    prediction = net(batch)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/naszilla/naszilla/gcn/model.py", line 66, in forward
    out = layer(out, adj_with_diag)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/naszilla/naszilla/gcn/model.py", line 37, in forward
    output1 = F.relu(torch.matmul(norm_adj, torch.matmul(inputs, self.weight1)))
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_mm

This happened on both single-GPU and multi-GPU machines.

nasbench sampling procedure

Is sampling of nasbench architectures valid in generate_random_dataset and random_cell methods?

I've tried to sample uniform hashes and got much worse test errors (see this
notebook )

Perhaps generation of random architectures until they are valid makes the sample biased towards more predictable architectures.

Also, how do you ensure that train and test samples do not intersect?

Encodings experiments didn't work by the given command

Encodings experiments didn't work by the given command :
python run_experiments.py --algo_params evo_encodings --search_space nasbench_101
By reading the params.py code, I see that the project of [A Study on Encodings for Neural Architecture Search](https://github.com/naszilla/nas-encodings/tree/master) has not been integrated here. This part of this code base does not seem to be implemented. Is there any implemented code?
I expect to do some comparative experiments on architecture coding. Is there any convenient platform?
May I ask whether the NASLib platform implemented in your new article integrates the implementation of each architecture coding algorithm?

Why load the nasbench data file each time running?

In the file nas_algorithms.py the Data object is created each time run.

def run_nas_algorithm(algo_params, metann_params):
    # set up search space
    mp = copy.deepcopy(metann_params)
    ss = mp.pop('search_space')
    search_space = Data(ss)

    # run nas algorithm
    ps = copy.deepcopy(algo_params)
    algo_name = ps.pop('algo_name')

This is quite time consuming. Why the data object is created each time run?

ModuleNotFoundError: No module named 'surrogate_models'

Hi,

I followed the installation steps and tried the first test python naszilla/run_experiments.py --search_space nasbench_101 --algo_params all_algos --queries 30 --trials 1. However, it resulted in ModuleNotFoundError.

Traceback (most recent call last):
  File "naszilla/run_experiments.py", line 11, in <module>
    from naszilla.nas_benchmarks import Nasbench101, Nasbench201, Nasbench301
  File "/home/ubuntu/naszilla/naszilla/nas_benchmarks.py", line 8, in <module>
    import nasbench301 as nb
  File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/__init__.py", line 1, in <module>
    from nasbench301.api import load_ensemble
  File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/api.py", line 6, in <module>
    from nasbench301.surrogate_models import utils
  File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/surrogate_models/utils.py", line 20, in <module>
    from nasbench301.surrogate_models.gnn.gnn import GNNSurrogateModel
  File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/surrogate_models/gnn/gnn.py", line 16, in <module>
    from nasbench301.surrogate_models.gnn.models.deeper_gnn import DeeperGCN
  File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/surrogate_models/gnn/models/deeper_gnn.py", line 8, in <module>
    from nasbench301.surrogate_models.gnn.models.gcn_lib.sparse.torch_nn import norm_layer
  File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/surrogate_models/gnn/models/gcn_lib/sparse/__init__.py", line 1, in <module>
    from .torch_nn import *
  File "/home/ubuntu/naszilla/src/nasbench301/nasbench301/surrogate_models/gnn/models/gcn_lib/sparse/torch_nn.py", line 3, in <module>
    from surrogate_models.gnn.models.gcn_utils.data_util import get_atom_feature_dims, get_bond_feature_dims
ModuleNotFoundError: No module named 'surrogate_models' 

Questions about porting algorithms to architecture search framework

Hi Colin,

We've recently released a modular architecture search framework and we were looking at getting reference implementations of search algorithms in it. The main selling point is that we have a well-defined language to write search spaces (our paper to appear at NeurIPS 2019). Once an algorithm is implemented, it can be applied to arbitrary search spaces that a user may write. I would appreciate some help in moving your algorithms here to our framework (if your code is under the MIT license). The main questions are:

  • What search space encodings do you use?
  • How do your search algorithms plug into (i.e., interface or query) the search space?
  • What information do the search algorithms consume?
    Some pointers would be great!

Our search algorithm implementation are based on the sample (get an architecture from the search space) and update (pass the results back to the searcher and update its state) API.

Do you want to look into it? Here is the repo and colab notebook.

Thanks,
Renato

Does Python version matter?

I create the project environment on Anaconda with python=3.7, and while following the requirements.txt to install packages there occurred some problem while installing torch-scatter, torch-sparse, torch-cluster, torch-spline-conv. Finally, I use "wget" to directly download torch-***. After that, I downloaded the three NasBenchmark datasets following the three wget commands, and then I follow the "Test Installation" with "python naszilla/run_experiments.py --search_space nasbench_101 --algo_params all_algos --queries 30 --trials 1". While running the above run_experiments.py the programm ended with the following statement:

" File "/home/umin/.conda/envs/SXH_AUTOML/lib/python3.7/site-packages/torch_geometric/loader/dynamic_batch_sampler.py", line 9, in
class DynamicBatchSampler(torch.utils.data.sampler.Sampler[List[int]]):
TypeError: 'type' object is not subscriptable"

So, is this just because of my wrongly configuring the project environment?

The search speed of bananas algorithm compare with random is quite solow?

The paper said We note that a plot with respect to wall-clock time would look nearly identical, since all NAS algorithms finish the set of 150 queries in roughly the same amount of time. The fastest, random search, takes 46.5 TPU hours, while BANANAS, the slowest, takes 47.0 TPU hours.
I am running the code run_experiments_sequential.py in this repository. The walltimes of 150 queries with algorithms bananas, random, evolution is quite different. The bananas algorithms is ten times slower than random. The following line is the output:
[359.68584871292114, 35.89821910858154, 35.73087739944458]
Does I made any mistakes?

BANANAS genotype

Dear authors,

I write a genotype according to the best architecture graph that BANANAS searched as follows:

BANANAS = Genotype( normal = [ ('skip_connect', 0), ('sep_conv_3x3', 1), ('sep_conv_5x5', 0), ('sep_conv_5x5', 2), ('sep_conv_5x5', 0), ('sep_conv_5x5', 1), ('sep_conv_3x3', 0), ('skip_connect', 2) ], normal_concat = [2, 3, 4, 5], reduce = [ ('max_pool_3x3', 0), ('sep_conv_3x3', 1), ('max_pool_3x3', 0), ('none', 1), ('dil_conv_3x3', 2), ('sep_conv_5x5', 3), ('sep_conv_5x5', 4), ('sep_conv_3x3', 1), ], reduce_concat = [2, 3, 4, 5] )

But it reported
'''
RuntimeError: Error(s) in loading state_dict for NetworkCIFAR:
Missing key(s) in state_dict: "cells.0._ops.1.op.1.weight", "cells.0._ops.1.op.2.weight", "cells.0._ops.1.op.3.bias", "cells.0._ops.1.op.3.running_mean", "cells.0._o
ps.1.op.3.weight", "cells.0._ops.1.op.3.running_var", "cells.0._ops.1.op.5.weight", "cells.0._ops.1.op.6.weight", "cells.0._ops.1.op.7.bias", "cells.0._ops.1.op.7.running_me
an", "cells.0._ops.1.op.7.weight",
.....

Unexpected key(s) in state_dict: "auxiliary_head.features.2.weight", "auxiliary_head.features.3.weight", "auxiliary_head.features.3.bias", "auxiliary_head.features.3.running_mean", "auxiliary_head.features.3.running_var", "auxiliary_head.features.3.num_batches_tracked", "auxiliary_head.features.5.weight", "auxiliary_head.features.6.weight", "auxiliary_head.features.6.bias"
....

size mismatch for cells.6._ops.6.op.1.weight: copying a param with shape torch.Size([72, 1, 5, 5]) from checkpoint, the shape in current model is torch.Size([72, 1, 3, 3]).
size mismatch for cells.6._ops.6.op.5.weight: copying a param with shape torch.Size([72, 1, 5, 5]) from checkpoint, the shape in current model is torch.Size([72, 1, 3, 3]).
size mismatch for cells.6._ops.7.op.1.weight: copying a param with shape torch.Size([72, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([72, 1, 5, 5]).
size mismatch for cells.6._ops.7.op.5.weight: copying a param with shape torch.Size([72, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([72, 1, 5, 5]).
size mismatch for cells.13._ops.6.op.1.weight: copying a param with shape torch.Size([144, 1, 5, 5]) from checkpoint, the shape in current model is torch.Size([144, 1, 3, 3]).
size mismatch for cells.13._ops.6.op.5.weight: copying a param with shape torch.Size([144, 1, 5, 5]) from checkpoint, the shape in current model is torch.Size([144, 1, 3, 3]).
size mismatch for cells.13._ops.7.op.1.weight: copying a param with shape torch.Size([144, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([144, 1, 5, 5]).
size mismatch for cells.13._ops.7.op.5.weight: copying a param with shape torch.Size([144, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([144, 1, 5, 5]).
'''

Path encoding in DARTS search space

Hey @crwhite14:

The code for generating the path encoding of cell architecture in DARTS seems different from the NASBench-101. Despite the input-to-output paths, the paths that not directly connect to the output also considered to update the path encoding vector. The paths variable in the code stored all the paths; even it does not connect to the output.

for i, paths in enumerate((normal_paths, reduce_paths)):

I am confused by the code. Is this a bug, or the meaning of the input-to-output path in DARTS search space is different from the NASBench-101?

About the error in the environment configuration

Hello, I am trying to reproduce the code of the network coding part of the project, and the following problems occurred in the environment configuration:

  • ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
  • naszilla 1.0 requires scikit-learn>=0.23.1, but you have scikit-learn 0.23.0 which is incompatible.

When I reinstall scikit-learn=0.23.1, another problem arises:

  • autopytorch 0.0.2 requires scikit-learn==0.23.0, but you have scikit-learn 0.23.1 which is incompatible.

This seems to be caught in a dilemma.What should I do to solve this error? I hope to get your guidance.

About the Meta Neural Network training method

The code chip in method banans in file nas_algorithms.py

    while query <= total_queries:
        candidates = search_space.get_candidates(data,
                                                 acq_opt_type=acq_opt_type,
                                                 encode_paths=encode_paths,
                                                 allow_isomorphisms=allow_isomorphisms,
                                                 deterministic_loss=deterministic)
        xcandidates = np.array([c[1] for c in candidates])
        predictions = []
        # train an ensemble of neural networks
        train_error = 0
        for _ in range(num_ensemble):
            meta_neuralnet = MetaNeuralnet()
            train_error += meta_neuralnet.fit(xtrain, ytrain, **metann_params)
            # predict the validation loss of the candidate architectures
            predictions.append(np.squeeze(meta_neuralnet.predict(xcandidates)))
        train_error /= num_ensemble
        if verbose:
            print('Query {}, Meta neural net train error: {}'.format(query, train_error))

In each while loop, you created five totally new neural network and use the same training data to train the five new neural network.
I can not understand why the five meta neural network is trained in this way. Using the same training data and training from scratch in each iteration.

Dependency installation error on Ubuntu

Dear Naszilla team,

I am getting this error when I try to install. It seems that pip can no longer find tensorflow 1.14.0 in the standard pathways. This is an Ubuntu 18.04 64 bit system witha P100 gpu.

(base) dedey@ubuntubox:~/naszilla$ cat requirements.txt | xargs -n 1 -L 1 pip install
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.14.0
ERROR: No matching distribution found for tensorflow-gpu==1.14.0
ERROR: Could not find a version that satisfies the requirement tensorflow==1.14.0
ERROR: No matching distribution found for tensorflow==1.14.0

Cell301 get_paths function

Hi,

I have a question for get_paths function in cell_301.py. In the second loop (cell_301.py -> get_paths function -> Line 180), you are iterating each tuple item in the self.arch. But in this loop, the last element of the list (last edge, ops pair) is not visited. For example, suppose I have an example like the one below.

self.arch = {[(1, 2), (0, 1), (0, 2), (2, 0), (2, 2), (3, 1), (4, 3), (0, 2)],
[(0, 3), (1, 4), (0, 6), (2, 1), (1, 4), (3, 6), (0, 5), (4, 5)]}

In the inner loop between lines 181 and 193, the last elements of the list are not included because the loop is defined like this:

for j in range(len(OPS)):

I think it needs to be changed to:

for j in range(len(OPS) + 1):

Thanks,

Ekran Alıntısı

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.