The Cerebros package is an ultra-precise Neural Architecture Search (NAS) / AutoML that is intended to much more closely mimic biological neurons than conventional neural network architecture strategies.

License: Other

Jupyter Notebook 89.58% Python 10.42%

accuracy automl deep-learning keras-tensorflow machine-learning mean-square-error neural-architecture-search tensorflow automl-algorithms automl-api

cerebros-core-algorithm-alpha's Introduction

Cerebros AutoML

The Cerebros package is an ultra-precise Neural Architecture Search (NAS) / AutoML that is intended to much more closely mimic biological neurons than conventional Multi Layer Perceptron based neural network architecture search strategies.

Cerebros Community Edition and Cerebros Enterprise

The Cerebros community edition provides an open-source minimum viable single parameter set NAS and also also provides an example manifest for an exhaustive Neural Architecture Search to run on Kubeflow/Katib. This is licensed for free use provided that the use is consistent with the ethical use provisions in the license described at the bottom of this page. You can easily reproduce this with the Jupyter notebook in the directory /kubeflow-pipeline, using the Kale Jupyter notebook extension. For a robust managed neural architecture search experience hosted on Google Cloud Platform and supported by our SLA, we recommend Cerebros Enterprise, our commercial version. Soon you will be able to sign up and immediately start using it at https://www.cerebros.one. In the meantime, we can set up your own Cerbros managed neural architecture search pipeline for you with a one business day turnaround. We offer consulting, demos, full service machine learning service and can provision you with your own full neural architecture search pipeline complete with automated Bayesian hyperparameter search. Contact David Thrower:[email protected] or call us at (US country code 1) (650) 789-4375. Additionally, we can complete machine learning tasks for your organization. Give us a call.

In summary what is it and what is different:

A biological brain looks like this:

Multi layer perceptrons look like this:

If the goal of MLPs was to mimic how a biological neuron works, why do we still build neural networks that are structurally similar to the first prototypes from 1989? At the time, it was the closest we could get, but both hardware and software have changed since.

In a biological brain, neurons connect in a multi-dimensional lattice of vertical and lateral connections, which may repeat. Why don't we try to mimic this? In recent years, we got a step closer to this by using single skip connections, but why not simply randomize the connectivity to numerous levels in the network's structure altogether and add lateral connections that overlap like a biological brain? (We presume God knew what He was doing, so why re-invent the wheel.)

That is what we did here. We built a neural architecture search that connects Dense layers in this manner.

What if we made a multi-layer pereceptron that looks like this: (Green triangles are Keras Input layers. Blue Squares are Keras Concatenate layers. The Pink stretched ovals are Keras Dense layers. The one stretched red oval is the network's Output layer. It is presumed that there is a batch normaliation layer between each Concatenate layer and the Dense layer it feeds into.)

... or what if we made one like this:

and like this

What if we made a single-layer perceptron that looks like this:

The deeper technical details can be found here:

Use example: Try it for yourself:

shell:

Clone the repo git clone https://github.com/david-thrower/cerebros-core-algorithm-alpha.git

cd into it cd cerebros-core-algorithm-alpha

install all required packages

pip3 install -r requirements.txt

Run the Ames housing data example:

python3 regression-example-ames-no-preproc.py

Example output from this task:

... # lots of summaries of training trials
...

Best result this trial was: 169.04592895507812
Type of best result: <class 'float'>
Best model name: 2023_01_12_23_42_cerebros_auto_ml_test_meta_0/models/tr_0000000000000006_subtrial_0000000000000000
...

Summary of Results

Ames housing data set, not pre-processed or scaled, non-numerical columns dropped:
House sell price predictions, val_rmse $169.04592895507812.
The mean sale price in the data was $180,796.06.
Val set RMSE was 0.0935% of the mean sale price. In other words, on average, the model predicted the sale price accurate to less than 0.1% of the actual sale price. Yes, you are reading it right. Less than 1/10 of a percent off on average.
There was no pre-trained base model used. The data in ames.csv which was selected for training is the only data any of the model's weights have ever seen.

For further details, see

Documentation

Open source license:

license.md

Licnse terms may be amended at any time as deemed necessry at Cerebros sole discretion.

Acknowledgements:

My Jennifer and my step-kids who have chosen to stay around and have rode out quite a storm because of my career in science.
My son Aidyn, daughter Jenna, and my collaborators Max Morganbesser and Andres Espinosa.
Mingxing Tan, Quoc V. Le for EfficientNet (recommeded image embedding base model).
My colleagues who I work with every day.
Tensorflow, Keras, Kubeflow, Kale, Optuna, Keras Tuner, and Ray open source communities and contributors.
Google Cloud Platform, Arikto, Canonical, and Paperspace and their support staff for the commercial compute and ML OPS platforms used.
Microk8s, minikube,and the core Kubernetes communities and associated projects.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", 2018. Base embedding usee for text classification tests.
Andrew Howard1, Mark Sandler1, Grace Chu1, Liang-Chieh Chen1, Bo Chen1, Mingxing Tan2, Weijun Wang1, Yukun Zhu1, Ruoming Pang2, Vijay Vasudevan2, Quoc V. Le2, Hartwig Ada MobileNet image embedding used for CICD tests.

Legal disclaimers:

Cerebros is an independent initiative. Nothing published herein, nor any predictions made by models developed by the Cerebros algorithm should be construed as an opinion of any Cerebros maintainer or contributor or community member nor any of such community member's, clients, or employer, whether private companies, academic institutions, or government agencies.
Although Cerebros may produce astoundingly accurate models from a relatively minuscule amount of data as the example above depicts, past performance does not constitute a promise of similar results on your data set or even that such results would bear relevance in your business use case. Numerous variables will determine the outcome of your experiments and models used in production developed therefrom, including but not limited to:
1. The characteristics, distribution, and scale of your data
2. Sampling methods used
3. How data was trained - test split (hint, if samples with identical data is a possibility, random selection is usually not the best way, hashing each sample then modulus division by a constant, and placing samples where the result of this is <= train set proportion, is better. This will force all occurrences of a given set of identical samples on the same side of the train, test split),
4. Hyperparameter selection and tuning algorithm chosen
5. Feature selection practices and features available in your use case
6. Model drift, changes in the patterns in data, trends over time, climate change, social changes over time, evolution, etc.
Users are responsible for validating one's own models and the suitability for their use case. Cerebros does not make predictions. Cerebros parses neural networks (models) that your data will train, and these models will make predictions based on your data whether or not it is correct, sampled in a sensible way, or otherwise unbiased and useful. Cerebros does a partial validation, solely by metrics such as 'val_root_mean_squared_error'. This is a preliminary metric of how the model is performing, assuming numerous logical and ethical assumptions that only humans with subject matter expertise can validate (think spurious associations and correlations), in addition to statistical parameters such as valid sampling of the training data and that the distribution of the data is not skewed.
The mechanism by which Cerebros works, gives it an ability to deduce and extrapolate intermediate variables which are not in your training data. This is in theory how it is able to make such accurate predictions in data sets which seem to not have enough features to make such accurate predictions. With this said, care should be taken to avoid including proxy variables that can be used to extract variables which are unethical to consider in decision making in your use case. An example would be an insurance company including a variable closely correlated with race and or disability status, such as residential postal code in a model development task which will be used to build models that determine insurance premium pricing. This is unethical, and using Cerebros or any derivative work to facilitate such is prohibited and will be litigated without notice or opportunity to voluntarily settle, if discovered by Cerebros maintainers.
Furthermore, an association however strong it may be does not imply causality, nor implies that it is ethical to apply the knowledge of such association in your business case. You are encouraged to use as conservative of judgment as possible in such, and if necessary consulting with the right subject matter experts to assist in making these determinations. Failure to do so is a violation of the license agreement.

cerebros-core-algorithm-alpha's People

Contributors

Stargazers

Watchers

Forkers

qazihamza23 vasu018 sashakolpakov

cerebros-core-algorithm-alpha's Issues

Investigate race conditions with using a self-contained hyperameter tuner such as Optuna

Optuna has made one update that may have the issue fixed on their end.
If there is a way to use a local blockchain based SQL as a replacement for Sqlite and this is more realistic to do as a self contained component than a full blown SQL server like Postgres, this may be an approach to look into.

create-new-architectural-pattern

update the Katib job for wine dataset

add-efficientnetv2s-base-model

update ames hoiusing, parameterize file name and target column

Add license agreement.

add-contributing-guidelines

Copy of content from SOP-0001
Required to get higher GitHub ratings...

Integrate a suitable AI to automatically write the documentation

add Cerebros prameters to the documentation

distributed-training-example: make min_x a constant, make max x tunable

distributed-training-example: make min_x a constant, make max x tunable, where the min in Katib it the min is hard set in Katib to one value and the max is set to explore range [min + 1, max].

Miswording on readme kubeflow-pipeline

add pip install to Kale notebook

to the header of the Kale notebook, add:

Cell 1


! pip3 install -r requirements

cell 2

# Restart to apply installs
import sys
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

Add to the base models / embeddings / encoders branch a computer vision example

A sub_task of issue 67.

add-github-page-for-cerebros-enterprise-for-customers

New repo
Depends on SLA
Depends on #39
Depends on #37
Depends on #36
Depends on #34
Add documentation for the enterprise version.

Add some abstractions for basic problems

Add an ancillary module or class that loads data like the notebook and searches a problem - specific presumed feasible space.

Dependencies: Fix issues with race conditions with past attempts at making hyperparameter tuning self-contained: hold to #74

Possible limitations:

May be infeasible for images and text, as these may depend on true distributed training.

Add a caption about cerebros enterprise

Rearange the API such that the tune function and random search are all a self-container python package.

add-code-of-conduct

Required to raise Cerebros rating of Github community standards.

add a classification example

add a header to the readme.md that is a bit more novice user friendly

Add option for custom activation (e.g. activation with hyperparams).

Add a python package release in actions

Kind of issue: Process Change

Additional context:

Make the package installable via pip / pypi; Use Github actions to automate pypi release.

Optional additional items

Add a SHAP or other ML explainability package integration with Cerebros

Kind of issue: enhancement / integration

Question: do we make a direct integration in the Cerebros API or do we build the integration within the Cerebros Enterprise / Kale-Kubeflow templates?

Additional context
Add any other context about the problem here.

Create an automated ML explainability for Cerebros.

imporve-the-ames-example

add-successful-ames-katib-nb-and-yaml

add disclosure not endorsed by employer | government | clients and IP ...

Documentation updates - Move documentation to separate folder

Move the documentation to a different folder.
Add the documentation for adding an embedding.

Pull request template

Required for github community guidelines ratings ...

Fix formatting on Acknowledgements

Add Security policy

Required for better Github ratings
Will involve content from various SOPs
See what the template Github provides supplies us with

Clean-up-readme-1

add-basic-visualizations

de-clutter the codebase.

Add logo

Add a way to get the best model

Add activation randomization (not all Dense nodes have the same activation)

All param for list of activations
Parameter for probably of a given option [e.g. list of n activations to be tried as one param, list of n probabilities, that must add up to 1 as the second parameter]

move license to license.md

try adding a conv1d layer base model example

A conv1d example, for example time series, like sames forecasting:

This can be used as a template for the embedding to duplicate for this:

https://colab.research.google.com/github/kmkarakaya/ML_tutorials/blob/master/Conv1d_Predict_house_prices.ipynb#scrollTo=A4JNcPxXqrBo

This may be a suitable example data set.

https://www.kaggle.com/competitions/competitive-data-science-predict-future-sales/data?select=sales_train.csv

replace diabetes example with breast cancer dataset example

remove wine test from CICD test pipeline

add ames example to readme

fix CVE-2022-41902 and CVE-2022-41902

issue templates

Required to get better Github community standards rating.

Add customer template for NLP-Text-Classification

Kind of issue: feature-request-or-enhancement

Steps to reproduce the behavior:

Add a task template to load a task on Cerebros Enterprise.

Re-visit MLflow integration

Challenges:

Making sure that if this is run in the same environment, it will not conflict the existing DB, e.g. overwriting instead of appending.
race condition #74

Update demo with new best run

Add a notebook for deploying a model

base models / embeddings / encoders

Kind of issue: feature-request-or-enhancement

Parameter base_models, list will take one or many objects of type Keras model (having a 1d output) and will insert these embeddings between the input layer(s) and downstream layers in order that matches the list of inputs. (Mirror of Cerebros Enterprise repo issue of same title)

Add a summary text to full text module

If we add a word2vec or tfidf input embedding to a Cerebros module and make 500 separate softmax outputs, 1 for each of 500 sequential words, then an argmax then a vocabulary lookup, we could train this on a small sample of pairs of summarized writing instructions and desired full text writing task results. This may make this do the same type of task as ChatGPT.

Also, doing the inverse operation may make a good summarization engine.

add init file to package cerebros

Kind of issue: Performance enhancement

Realized I failed to include init in the root directory of the cerebros package. This may be resulting in un-compiled code running and slowing Cerebros down.

Execution environment: [Linux (distro and version if you have it) | Mac]: All

Suggested Labels (If you don't know, that's ok): kind/performance

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

david-thrower / cerebros-core-algorithm-alpha Goto Github PK