produvia / ai-platform Goto Github PK
View Code? Open in Web Editor NEWAn open-source platform for automating tasks using machine learning models
License: MIT License
An open-source platform for automating tasks using machine learning models
License: MIT License
I am having coding assignment test and I have created a dog image classification model which will be benefit to this project. Therefore, may I pull it as my submission?
The repo:
https://github.com/cfcdavidchan/dog_image_project
Consider three approaches to developing meta-models:
Source: Learning to Learn (NeurIPS 2018)
Each dataset (public or private) must include a LICENSE
file which identifies the usage and permission rights to the data.
Examples of Commercial
licenses include:
Research and commercial
Research and commercial licenses available
CC BY 3.0
Attribution 3.0 International (CC BY 3.0) -
--
| You are free to:
| Share - copy and redistribute,
| Adapt - remix, transform, and build upon, even commercially,
| Under the following terms:
| Attribution - you must give appropriate credit.
CC-0
CC-0 - No Copyright
CC-BY-SA 3.0
Attribution-ShareAlike International -
--
| You are free to:
| Share - copy and redistribute,
| Adapt - remix, transform, and build upon, even commercialy,
| Under the following terms:
| Attribution - you must give approprate credit,
| ShareAlike - if you make changes, you must distribute your contributions.
CC BY 4.0
Attribution 4.0 International (CC BY 4.0) -
--
| You are free to:
| Share - copy and redistribute,
| Adapt - remix, transform, and build upon, even commercially,
| Under the following terms:
| Attribution - you must give appropriate credit.
MIT
MIT - You are free to: use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the work. Under the following terms: the work is provided "as is", you must include copyright and the license in all copies or substantial uses of the work.
GPL
GPL - You are free to: copy, distribute and modify the software as long as you track changes/dates in source files. Under the following terms: any modifications to or software including (via compiler) GPL-licensed code must also be made available under the GPL along with build & install instructions.
Non-commercial & commercial
Non-commercial and commercial licenses available
CC BY-NC-SA 4.0 or commercial
Attribution-NonCommercial-ShareAlike 4.0 International - You are free to:
| Share - copy and redistribute,
| Adapt - remix, transform, and build upon, even commercially,
| Under the following terms:
| Attribution - you must give appropriate credit,
| NonCommercial - you may not use the material for commercial purposes,
| ShareAlike - if you make changes, you must distribute your contributions.
Examples of non-commercial
licenses include:
Non-commercial
Can only be used for research and educational purposes. Commercial use is prohibited.
CC-BY-NC-SA 4.0
Attribution-NonCommercial-ShareAlike International -
--
| You are free to:
| Share - copy and redistribute,
| Adapt - remix, transform, and build upon, even commercially,
| Under the following terms:
| Attribution - you must give appropriate credit,
| NonCommercial - you may not use the material for commercial purposes,
| ShareAlike - if you make changes, you must distribute your contributions.
Various
The dataset contains data from several sources, check the links on the website for individual licenses
Not Found
License information not found
As we scale AI Platform, we created a database to track machine learning models and all model implementations differentiated by universally unique identifiers (UUIDs).
We will keep track of tasks, models and model UUIDs using SQLite. This dataset can be found on:
We are open to suggestions on how to improve this system.
Network which I will use:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED.
I will implement this CNN and use SIGNS dataset representing numbers from 0 to 5. I will use Tensorflow for its implementation. And approximate accuracy which I will get will be around:
train accuracy : 93-99%
test accuracy : 78-99%
(I got a match on angelList and you asked me to do a coding assignment, so is implementing this model will be sufficient for me to clear coding round?)
Goals
As a developer, I want to figure out how to manage 100s or 1,000s of AI tasks, so that I can scale the ai-platform better.
As a developer, I want to experiment and create 10s or 100s of variations of each AI task, so that I can determine the most performant machine learning algorithm and method for solving each task.
Motivation
We currently store each AI task into a separate folder (see /tasks folder). This allows us to develop independent machine learning models and conduct experiments on each task using MLflow. The current folder structure also aligns with MLflow storage format. As the ai-platform repository grows in size, we need to better scale the management of independent AI tasks.
Possible Solutions
Use distributed storage, such as: AWS S3, DBFS, or Google Drive, to minimize input (models or datasets) and output (models).
Use submodules, such as: Git Submodules or Google Repo, to manage multi-repository integrations in Git.
Use virtual filesystems, such as: VFS for Git (developed by Microsoft) or Git LFS (developed by Atlassian), to enable git to operate at enterprise-scale. Microsoft wrote about it's motivations to to launch VFS for Git: here and here. Atlassian wrote a tutorial about Git LFS: here.
Additional Considerations
MLflow was designed to scale to large data sets, large output files (for example, models), and large numbers of experiments (MLflow Docs, 2019).
- An individual MLflow run can execute on a distributed cluster, for example, using Apache Spark. You can launch runs on the distributed infrastructure of your choice and report results to a Tracking Server to compare them. MLflow includes a built-in API to launch runs on Databricks.
- MLflow supports launching multiple runs in parallel with different parameters, for example, for hyperparameter tuning. You can simply use the Projects API to start multiple runs and the Tracking API to track them.
- MLflow Projects can take input from, and write output to, distributed storage systems such as AWS S3 and DBFS. MLflow can automatically download such files locally for projects that can only run on local files, or give the project a distributed storage URI if it supports that. This means that you can write projects that build large datasets, such as featurizing a 100 TB file.
Let's consider dataset sizes used for training machine learning models. Imagine ai-platform contains 100 models and each dataset is 100 GB (i.e. image dataset). Storing datasets on ai-platform will require 10 TB. If ai-platform contains 1,000 models, that translates to 100 TB.
Let's consider machine learning model sizes used to solve tasks. Imagine ai-platform contains 100 models and each model is 100 MB. Storing models on ai-platform will require 10 GB. If ai-platform contains 1,000 models, that translates to 100 GB.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.