aredier / chariots Goto Github PK
View Code? Open in Web Editor NEWversioned machine learning pipelines
License: GNU Lesser General Public License v3.0
versioned machine learning pipelines
License: GNU Lesser General Public License v3.0
requires #4
this perticularly a problem in training pipeline when for instance a vectorizers transforms the x but we need the y to be passed along
when using Runner::from_runner_iterator
returns a Runner<RunnerBatch<DataType, Op>, Op>
instead of `Runner<DataType, Op>
we should have some
It would be great to have a named tuple with all the configuration in order to pass to all the different Server and Clients
it should be possible to write something in the likes of
pipe = Pipeline(
[
DataLoadingNode( ..., output_nodes="loaded_data"),
Node(train_test_split, input_nodes=["loaded_data"], output_nodes=["train_data", "test_data"]),
...
]
)
this will allow to fit some common machine learning workflows (train/test split, x/y split, ...) and many more
we should pass the bulk of the code to python
all the endpoints in the backend should have a mapping to the client (no curl/request should ever be required to use chariots.
should we build a data type that represents sort of a DataFrame with fields in order for ops to know which fields they are dealing with.
Would be useful once #10 is implemented
for now as the graph of operation in a pipeline is represented as a dict, an operation can only appear once in its keys
ops that can be trained saved, and loaded
the idea would be to have a new, more usable structure with max one level:
chariots
the idea is to have a structure that is different from the
today an op just assumes that a certain op was made upstream in the pipeline. We need to find a way for an op to be relatively agnostic to its ancestor and still remain reliable (eg: A/B testing)
we should be able to release in pythons from 3.5 to 3.7
choose between numpy and google style doctrsings rather than rst:
we should have logging
as for now an op cannot change the data type from input to output this should change
be able to inject in the framework:
requires #2
we should be able to merge and split runners in order to make fully functional graphs
the version hash comes from stringifying the the versioned Field. This should be done more regourously
ther are too much versions being used everywhere in a fairly undocumented way. this needs to be clearer
Have pre commit hooks for:
testing
we should test all the code snipets in the doc work, both for the docstrings and for the rst files
the version type enum is confusing since it actually defines the types of subversions. this should be renamed.
chariots/versioning/version_type.py::VersionType
it seems that because the versioned fields become real python objects instead of their underlying class, their behavior changes. this is not the desired seemless integration of versioning in the ops that is our objective.
from chariots.core.ops import BaseOp
from chariots.core.versioning import VersionField
from chariots.core.versioning import VersionType
class VersionedOp(BaseOp):
name = "fake_op"
versioned_field = VersionField(VersionType.MAJOR, default_value=2)
def _main():
pass
op = VersionedOp()
op.versioned_field = 3
op2 = VersionedOp()
op2.versioned_field
this outputs 3 the value instead of 2 (the default factory value)
It has become aparrent that storing metadata on each op and pipeline is key. I should do this sooner rather than latter
errors that occur in the server do not transmit to the server instead we get this generic errors:
ValueError: the execution of the pipeline failed, see _deployment logs for traceback
which doesn't help and is running on my nerves.
today to execute a Pipeline locally (in a notebook for instance) you still need to setup an OpStore and a Runner. This should be hidden during prototyping stage and left do deal with during deployment (the only actual work needed to go from one to the other)
the aim of this is to create a tutorial to show how to implement a RL pipeline with chariots using Gym environments and pytorch. The aim is also to implement all necessary changes in chariots to make this process streamlined
we need to add some stuff to the documentation
we need ops to support keras NN building, and potentially compositing
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.