badtuple / remits Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 2.0 175 KB

Remote Iterator Server

Rust 100.00%

data database iterator stream-processing streams

remits's People

Contributors

Stargazers

Watchers

Forkers

andrewscibek volgorean

remits's Issues

When adding a Message to Log, return Message ID to caller

Right now adding a message just results in an error response or "ok" response. It'd be useful to have the message id returned to them.

Integration tests should tear down created resources

As we add persistence to the db, we can't just turn off the server and turn it back on anymore. Instead, we'll need to delete the logs and iterators we create so we can start from a clean slate.

Idea: Promote Iterators to Logs

Initially, we had the idea for normal Iterators that operated at query time and an optional "Indexed" Iterator that would persist the query to disk so that it could be re-queried without redoing the work.

The more I think about it, the more I'm of the opinion that Indexed Iterators sound a little too much like Logs. What if instead, we could define an Iterator, but promote it to a Log? Then we could have a single concept for a "disk persisted stream".

There's at least one serious question that arises when you go down this route:

Do promoted iterators keep the newly formed Log up to date? An indexed iterator would have created the new values as you query them and persist to disk...but that's really not how Logs work.

Right now, Log == Persistence && Iterator == Query is a pretty simple idea in my head. But at the same time, a Log that extends based on another Log seems to violate the initial idea...at that point an Indexed Iterator just seems conceptually more correct.

Just opening the issue for discussion.

Decide on Iterator query language

When writing Iterators, we need to define the map, reduce, and filter functions using a scripting language. It should be a language we can embed and interpret from inside Remits.

In an ideal world, Lua would be the best language imo. It's a very simple but powerful language, is well known, and has alot of precedent for this exact use case. However, my understanding is that there are no good Lua bindings for Rust. There are plenty of bindings, but in the past I've read that due to how Lua was built, certain memory actions are by their nature unsafe and it's not really something someone can work around without rewriting Lua natively. This is just something I read somewhere at some point though, so perhaps it's not actually an issue.

We should:

Investigate whether Lua can be used safely and ergonomically from Rust.
Investigate alternative languages. (dyon, gluon, javascript, etc...)

Whatever languages we choose, we'll have to provide standard functions for serialization and deserialization among other things. Remits' use case seems particularly suited for JIT-ing, where you have a few (or more likely just 1) super hotspot function you call every iteration. While a JIT isn't necessary imo, it seems like it'd be a massive plus to a language if it's supported.

CLI Client

We need a CLI client that can connect to a running Remits instance and execute commands.

This will help:

Testing during development
Finding any weirdness or edge cases in the protocol
Instrumenting outside integration tests
As a reference implementation for other client libraries

I would envision it very much like the Redis cli client. A very barebones line-based repl that takes commands and sends them to Remits, and simply displays the response.

Wrap messages in an object when responding to Iterator Next

So it turns out, we don't return the message id or ingestion timestamp with the message 🤦 . That's pretty necessary to be able to take the last message id and get the next batch.

To do that, we'll need to wrap each message in an object containing it's ID. I can't think of any other metadata to hang there, but it'd be available if we need a place to do so.

"Iterator List" should return single message response containing an array.

"Iterator List" and "Log List" respond in different ways right now.

Log List returns a single message. Within that message is a CBOR encoded array that contains the names of each Log.

Iterator List returns 1 message per Iterator. Each of those messages is a string representing the name of an Iterator.

For consistency, we should align on returning a single message, and convert Iterator List to work like Log List does.

Define and normalize response schemas

Right now our request format is very well defined. As specified in the design doc, it uses framed requests with a 32bit length header followed by a command.

Responses, however, are super adhoc. Sometimes we return data, sometimes errors, and sometimes messages that just say whether something succeeded.

@volgorean has expressed interest in taking a pass at implementing a client, but before he's able to do that we have to define what a response should look like. This includes how we specify whether something is data, an error, or info, and how we encode it on the line.

Opening an issue since it'll likely take some thought and it's good to track it.

Hold a RWMutex per Log, not per DB

Right now, there's a single Mutex wrapped around the entire db:

// from main()
let db = Arc::new(Mutex::new(db::DB::new()));

This was mainly for simplicity while bootstrapping. Since adding a Message to a Log is the only place where mutation happens, we should instead have a RWLock on the Log itself. That would allow us to have:

Multiple Logs adding messages at the same time
Multiple reads happening from the same Log at the same time as long as a write isn't happening.
Multiple connections not blocking each other trying to gain a db handle.

Once the persistence layer is in place, this will likely be our main non-IO bottleneck. It's not an issue yet, there's way more to do first...but I wanted to make sure to log the issue so we can come back to it.

If you are familiar with Rust and aren't scared of the borrow checker, this would be a great first issue with a really large impact. If unfamiliar with Rust, it's still a good first issue but realize you're going to get very familiar with the borrow checker.

Major overhaul of project structure

I'm coming back to the project and the vision for it is still the same, but I'm going to overhaul the structure to better support rigorous testing and modularity. I may also pull in pipelang as a crate within the workspace.

Along with this I'm moving to the most recent Rust stable and updating all dependencies. Previously we were using nightly but I think all nightly features we used have been stabilized at this point.

I don't think anyone is actively working on Remits right now though there are some forks, so I wanted to make an issue just incase and let people know over the next week or so I'm going to be rebasing and changing things freely on master. At a certain point I'll close this issue so people know it's safe to rebase/repull and not have to worry about finding ways to reconcile the main branch.

badtuple / remits Goto Github PK

remits's People

Contributors

Stargazers

Watchers

Forkers

remits's Issues

When adding a Message to Log, return Message ID to caller

Integration tests should tear down created resources

Idea: Promote Iterators to Logs

Decide on Iterator query language

CLI Client

Wrap messages in an object when responding to Iterator Next

"Iterator List" should return single message response containing an array.

Define and normalize response schemas

Hold a RWMutex per Log, not per DB

Major overhaul of project structure

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent