flipkart-incubator / databuilderframework Goto Github PK
View Code? Open in Web Editor NEWA data driven execution engine
A data driven execution engine
DataDelta is not cloned before passing the data to builders. If the builder produces the same data (that was present in DataDelta) after modification, it will result in changing the dataDelta content as well (as both are having the same reference).
Parallelization with overhead of context switch would be beneficial if there are more than one builders in the same rank.
Lets assume in flow#1 builders produced data1, now in second execution same builder run and now it is returning null. But still we are able to access data1 which is stale data. Ideally in next available data data1 should not be present.
Should be able to access data in a builder in addition to those declared in the "consumes" list. For the builder all data mentioned in the "accesses" list will be access-only, non-mandatory and nullable.
Context variable is not sent in the execution listener. This could be sent as certain clients could have better use of this variable.
I'm using this with dropwizard. How do I inject some of the dependent objects into the builder classes?
Can you suggest if the framework needs any changes to support this?
Thanks
Post introduction of Access no builder would need to add its data produced to consumes for the sake of access. Hence this check should be removed. As databuilder should be a capable of being a cyclic graph if needed and this check prevents it.
I am creating a SimpleDataFlowExecutor
with a custom DataBuilderFactory.
DataFlowExecutor executor = new SimpleDataFlowExecutor(myDataBuilderFactory);
However, when I run a dataflow with this executor the factory set in the constructor of DataFlowExecutor is not used.
The databuilderFactory of dataFlow takes precedence over the executor's builderFactory.
https://github.com/flipkart-incubator/databuilderframework/blob/master/src/main/java/com/flipkart/databuilderframework/engine/DataFlowExecutor.java#L57
public DataExecutionResponse run(DataFlow dataFlow, DataDelta dataDelta) throws DataBuilderFrameworkException, DataValidationException {
Preconditions.checkNotNull(dataFlow);
Preconditions.checkArgument(null != dataFlow.getDataBuilderFactory() || null != this.dataBuilderFactory);
return this.run(new DataBuilderContext(), new DataFlowInstance(), dataDelta, dataFlow, dataFlow.getDataBuilderFactory());
}
Suggestion: If dataflow's builderFactory is null, exectuor's factory can be used.
So whenever i have to run the dataflow, i have to explicitly set the databuilderFactory and then call run.
dataFlow.setDataBuilderFactory(myDatabuilderFactory);
result = executor.run(dataFlow, data);
Also the default factory set in DataFlowBuilder is MixedDataBuilderFactory
. So can't really use this DataFlowBuilder to create a DataFlow.
As of now there are two methods by which we can get dataset from DataBuilderContext
getDataSet()
(marked as @Deprecated
)getDataSet(DataBuilder builder)
While implementing DataBuilder#process(DataBuilderContext context)
we would need to get dataset and isn't using context.getDataSet()
is the correct way to access? (If yes then why is it marked as deprecated?)
I don't see a clean way to use getDataSet(DataBuilder builder)
inside process
, if I do getDataSet(this)
there will a NPE as there won't be any dataBuilderMeta
set with current instance
getDataSet(DataBuilder builder)
enforces to use only data classes mentioned in consumes
, optional
and access
anyways this enforcement is already happening from executors when calling process
.
Ref:
Either we have to set dataBuilderMeta
when processed in withDataBuilder
here something like dataBuilder.setDataBuilderMeta(dataBuilderMeta)
which set's it to dataBuilder instance and can be accessed when processing or make other non-deprecated or I should be missing something ๐
Since Builder is topo sorted, there is a check builder executor to break if the first builder in the respective rank does not run.
But this a bug as there are cases where builders in the same rank can consume independent data such that first builder will expect Data A which is produced by builder above it but second builder on the other hand does not depend on Data A.
The reason these two builders are in the same rank is because when execution graph is builder from bottom up.
Suppose any builder is stateful , which accesses same data and produces same data as action of the respective builder depends on the previous state of the data.
As our application of Databuilder primarily was used in Orchestrating downstream service calls we noticed while using Multithreaded executor, A lot of threads needed to be created for Databuilder as dowstream service in their on thread pools where blocking and builder was running out of threads in timed_wait. The situation worsens when builder starts blocking controller threads.
The idea here is to have databuilder threads recyclable and reusable, such that they could leverage IO hand off to respective httpClient pools.
DataBuilderExecutor would need to implement and Observable kinda of interface returning data when possible. Also Internal to builder each process method invoke is blocking. This should also be reactive.
Builder A produces some dataA
No other builder have dependency on dataA , neither in consumes, nor in optionals. Because of bottom up graph construction builder A doesn't come in in execution graph.
One of the primary use case of builder is to do state management of entity which is being represented as data. For this one builder access same data which is being produced by the same. In case builder prematurely exits for some reasons(exceptions) partial commit happens which is not correct. IMO ideally immutable copy should be passed to handle this scenarios
Suppose any builder is stateful , which accesses same data and produces same data as action of the respective builder depends on the previous state of the data.
With the annotation DataBuilderClassInfo we get to specify classNames rather than String names. But this involves using the canonical name of the class by default. We will need to provide ability to customize this and expose that to client configuring DatabuilderMetaManager such that they can choose to use their logic like using simpleName rather than canonicalName.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.