flipkart-incubator / databuilderframework Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 29.0 362 KB

A data driven execution engine

Java 100.00%

databuilderframework's People

Contributors

Stargazers

Watchers

databuilderframework's Issues

DataDelta content is getting changed if builder is producing the same data that is present in dataDelta.

DataDelta is not cloned before passing the data to builders. If the builder produces the same data (that was present in DataDelta) after modification, it will result in changing the dataDelta content as well (as both are having the same reference).

Readme.md sample image

MultiThreadedExecutor runs executor even when there is one builder in the respective rank.

Parallelization with overhead of context switch would be beneficial if there are more than one builders in the same rank.

Null returned from data builder is not nullifying the previous generated data.

Lets assume in flow#1 builders produced data1, now in second execution same builder run and now it is returning null. But still we are able to access data1 which is stale data. Ideally in next available data data1 should not be present.

Accessible datas

Should be able to access data in a builder in addition to those declared in the "consumes" list. For the builder all data mentioned in the "accesses" list will be access-only, non-mandatory and nullable.

DataBuilderContext not accessible in DataBuilderExecutionListener

Context variable is not sent in the execution listener. This could be sent as certain clients could have better use of this variable.

Guice support

I'm using this with dropwizard. How do I inject some of the dependent objects into the builder classes?
Can you suggest if the framework needs any changes to support this?

Thanks

ProcessedBuilders check removal

Post introduction of Access no builder would need to add its data produced to consumes for the sake of access. Hence this check should be removed. As databuilder should be a capable of being a cyclic graph if needed and this check prevents it.

DataFlowExecutor's databuilderFactory is not used

I am creating a SimpleDataFlowExecutor with a custom DataBuilderFactory.
DataFlowExecutor executor = new SimpleDataFlowExecutor(myDataBuilderFactory);
However, when I run a dataflow with this executor the factory set in the constructor of DataFlowExecutor is not used.
The databuilderFactory of dataFlow takes precedence over the executor's builderFactory.
https://github.com/flipkart-incubator/databuilderframework/blob/master/src/main/java/com/flipkart/databuilderframework/engine/DataFlowExecutor.java#L57

    public DataExecutionResponse run(DataFlow dataFlow, DataDelta dataDelta) throws DataBuilderFrameworkException, DataValidationException {
        Preconditions.checkNotNull(dataFlow);
        Preconditions.checkArgument(null != dataFlow.getDataBuilderFactory() || null != this.dataBuilderFactory);
        return this.run(new DataBuilderContext(), new DataFlowInstance(), dataDelta, dataFlow, dataFlow.getDataBuilderFactory());
    }

Suggestion: If dataflow's builderFactory is null, exectuor's factory can be used.
So whenever i have to run the dataflow, i have to explicitly set the databuilderFactory and then call run.

            dataFlow.setDataBuilderFactory(myDatabuilderFactory);
            result = executor.run(dataFlow, data);

Also the default factory set in DataFlowBuilder is MixedDataBuilderFactory. So can't really use this DataFlowBuilder to create a DataFlow.

Proper way to access dataSet from DataBuilderContext

As of now there are two methods by which we can get dataset from DataBuilderContext

getDataSet() (marked as @Deprecated)
getDataSet(DataBuilder builder)

While implementing DataBuilder#process(DataBuilderContext context) we would need to get dataset and isn't using context.getDataSet() is the correct way to access? (If yes then why is it marked as deprecated?)
I don't see a clean way to use getDataSet(DataBuilder builder) inside process, if I do getDataSet(this) there will a NPE as there won't be any dataBuilderMeta set with current instance

getDataSet(DataBuilder builder) enforces to use only data classes mentioned in consumes, optional and access anyways this enforcement is already happening from executors when calling process.
Ref:

Either we have to set dataBuilderMeta when processed in withDataBuilder here something like dataBuilder.setDataBuilderMeta(dataBuilderMeta) which set's it to dataBuilder instance and can be accessed when processing or make other non-deprecated or I should be missing something 😅

Builders of the same rank terminate when first one does not run

Since Builder is topo sorted, there is a check builder executor to break if the first builder in the respective rank does not run.
But this a bug as there are cases where builders in the same rank can consume independent data such that first builder will expect Data A which is produced by builder above it but second builder on the other hand does not depend on Data A.
The reason these two builders are in the same rank is because when execution graph is builder from bottom up.

Making accesses mandatory for running a builder defeats purpose of statefulness of a builder

Suppose any builder is stateful , which accesses same data and produces same data as action of the respective builder depends on the previous state of the data.

EnhancementRequest - ReactiveExecutor

As our application of Databuilder primarily was used in Orchestrating downstream service calls we noticed while using Multithreaded executor, A lot of threads needed to be created for Databuilder as dowstream service in their on thread pools where blocking and builder was running out of threads in timed_wait. The situation worsens when builder starts blocking controller threads.

The idea here is to have databuilder threads recyclable and reusable, such that they could leverage IO hand off to respective httpClient pools.

DataBuilderExecutor would need to implement and Observable kinda of interface returning data when possible. Also Internal to builder each process method invoke is blocking. This should also be reactive.

Builder whose data is not consumed in is left out of execution graph ( Because of bottom up building of graph)

Builder A produces some dataA
No other builder have dependency on dataA , neither in consumes, nor in optionals. Because of bottom up graph construction builder A doesn't come in in execution graph.

DataSet Passed to builder should be immutable

One of the primary use case of builder is to do state management of entity which is being represented as data. For this one builder access same data which is being produced by the same. In case builder prematurely exits for some reasons(exceptions) partial commit happens which is not correct. IMO ideally immutable copy should be passed to handle this scenarios

Builder whose data is not consumed is left out of execution graph ( Because of bottom up building of graph)

Making accesses mandatory for running a builder defeats purpose of statefulness of a builder

Suppose any builder is stateful , which accesses same data and produces same data as action of the respective builder depends on the previous state of the data.

DataBuilderClassInfo - customize Builder Name

With the annotation DataBuilderClassInfo we get to specify classNames rather than String names. But this involves using the canonical name of the class by default. We will need to provide ability to customize this and expose that to client configuring DatabuilderMetaManager such that they can choose to use their logic like using simpleName rather than canonicalName.

flipkart-incubator / databuilderframework Goto Github PK

databuilderframework's People

Contributors

Stargazers

Watchers

Forkers

databuilderframework's Issues

Recommend Projects

Recommend Topics

Recommend Org