Giter Club home page Giter Club logo

event-reduce's Introduction


Event-Reduce

An algorithm to optimize database queries that run multiple times




  • 1. You make a query to the database which returns the result in 100 milliseconds
  • 2. A write event occurs on the database and changes some data
  • 3. To get the new version of the query's results you now have three options:
    • a. Run the query over the database again which takes another 100 milliseconds
    • b. Write complex code that somehow merges the incoming event with the old state
    • c. Use Event-Reduce to calculate the new results on the CPU without disc-IO nearly instant


Efficiency

In the browser demo you can see that for randomly generated events, about 94% of them could be optimized by EventReduce. In real world usage, with non-random events, this can be even higher. For the different implementations in common browser databases, we can observe an up to 12 times faster displaying of new query results after a write occurred.

How they do it

EventReduce uses 18 different state functions to 'describe' an event+previousResults combination. A state function is a function that returns a boolean value like isInsert(), wasResultsEmpty(), sortParamsChanged() and so on.

Also there are 16 different action functions. An action function gets the event+previousResults and modifies the results array in a given way like insertFirst(), replaceExisting(), insertAtSortPosition(), doNothing() and so on.

For each of our 2^19 state combinations, we calculate which action function gives the same results that the database would return when the full query is executed again.

From this state-action combinations we create a big truth table that is used to create a binary decision diagram. The BDD is then optimized to call as few state functions as possible to determine the correct action of an incoming event-results combination.

The resulting optimized BDD is then shipped as the EventReduce algoritm and can be used in different programming languages and implementations. The programmer does not need to know about all this optimisation stuff and can directly use three simple functions like shown in the javascript implementation

When to use this

You can use this to

  • reduce the latency until a change to the database updates your application
  • make observing query results more scalable by doing less disk-io
  • reduce the bandwith when streaming realtime query results from the backend to the client
  • create a better form of caching where instead of invalidating the cache on write, you update its content

Limitations

  • EventReduce only works with queries that have a predictable sort-order for any given documents. (you can make any query predicable by adding the primary key as last sort parameter)

  • EventReduce can be used with relational databases but not on relational queries that run over multiple tables/collections. (you can use views as workarround so that you can query over only one table). In theory Event-Reduce could also be used for relational queries but I did not need this for now. Also it takes about one week on an average machine to run all optimizations, and having more state functions looks like an NP problem.

Implementations

At the moment there is only the JavaScript implementation that you can use over npm. Pull requests for other languages are welcomed.

Previous Work

FAQ

Is this something like materialized views? Yes and no. Materialized views solve a similar problem but in a different way with different trade-offs. When you have many users, all subscribing to different queries, you cannot create that many views because they are all recalculated on each write access to the database. EventReduce however has better scalability because it does not affect write performance and the calculation is done when the fresh query results are requested, not beforehand.
Is this something like event sourcing or CQRS? No, event sourcing is mostly used to calculate a current state by attaching the full event stream to the starting state. This allows for stuff like time travel and so on. EventReduce solves a completely different (performance-) problem and only shares some common keywords like event.
Isn't this optimization already done by database engines? No. I tested EventReduce with many common databases like MongoDB, MySQL and Postgres. Each of them had better performance with Event-Reduce then just observing the eventstream and running the queries again. If you understand what Event-Reduce exactly does, it comes clear that this optimization can not done by pull-based databases because they have missing information.
Isn't this the same as product XY? No. EventReduce is not a product, it is not comparable to any database or streaming backend out there. EventReduce is an algorithm with a specific input and output, nothing more, nothing less. You can use EventReduce without having to change your underlaying data infrastructure.

event-reduce's People

Contributors

acaloiaro avatar joshuaissac avatar pubkey avatar qubbit avatar renovate-bot avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

event-reduce's Issues

Incorrect event-reduce action

We discovered some scary behavior in our app where items were being sorted wrong. We dug into and realized that event-reduce is actually suggesting an insertLast when it should be an insertFirst.

Here's the test case showing it: #443

I thiiink (though I don't fully understand the library yet) the issue is somewhere in the truth table to bdd conversion?

I think this because the json truth table calculateActionFromMap seems to correctly tell us runFullQueryAgain (well it's not ideal, which would be insertFirst, but it doesn't give us incorrect results) for this case of 0101000000010011001.

However, calculateActionName is wrong. It tells us to insertLast.

--

We can obviously patch this, but the issue opens up some serious questions for us:

  1. From reading the iterative fuzzing process, it seems there is no guarantee of correctness -- is that true?
  2. How can we be confident there are no further issues like this?

Intuitive explanation of how the library works

I can understand the use-case for this library, but I don't understand (even marginally) the "how" of it.

Let's take the example query in the readme:

const exampleQuery: MongoQuery = {
    selector: {
        age: {
            $gt: 18
        },
        gender: 'm'
    },
    limit: 10,
    sort: ['name', '_id']
};

And let's say these are the initial database contents: (as JSON)

[
    {gender: "m", age: 10, name: "Bob"},
    {gender: "m", age: 20, name: "Dan"},
    {gender: "f", age: 10, name: "Alice"},
    {gender: "f", age: 20, name: "Sally"},
]

So then, the initial result set for the query would be this:

[
    {gender: "m", age: 20, name: "Dan"},
]

However, let's say a new user was just added:

{gender: "m", age: 30, name: "Mike"},

The new entry is, of course, a match for the query. And since the name "Mike" is later than "Dan", it should be added to the end of the query result set (rather than the start).

My specific question for this case then is: How on earth is your generic event-reduce algorithm able to know that the new "Michael" entry should be added after the "Dan" entry rather than before it?

This is of course easy to solve if you're hand-coding the query-result update system (eg. you do a binary search through the old results, to find the insert point for the new entry). But I currently have no idea how your library is able to accomplish this generically, based just on the MongoQuery params, the old results, and a change-stream entry. (it seems like it would have to recreate the entire MongoDB query execution code within the library -- yet the library's described as not being specific to MongoDB)

As explaining the entire algorithm would be too much to ask, I give the specific case above, in the hopes that there is an intuitive answer that can be given for this small case, that can give insight into how the library solves these cases in general.

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.


  • Check this box to trigger a request for Renovate to run again on this repository

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.