badtuple / remits Goto Github PK
View Code? Open in Web Editor NEWRemote Iterator Server
Remote Iterator Server
Right now adding a message just results in an error response or "ok" response. It'd be useful to have the message id returned to them.
As we add persistence to the db, we can't just turn off the server and turn it back on anymore. Instead, we'll need to delete the logs and iterators we create so we can start from a clean slate.
Initially, we had the idea for normal Iterators that operated at query time and an optional "Indexed" Iterator that would persist the query to disk so that it could be re-queried without redoing the work.
The more I think about it, the more I'm of the opinion that Indexed Iterators sound a little too much like Logs. What if instead, we could define an Iterator, but promote it to a Log? Then we could have a single concept for a "disk persisted stream".
There's at least one serious question that arises when you go down this route:
Right now, Log == Persistence && Iterator == Query is a pretty simple idea in my head. But at the same time, a Log that extends based on another Log seems to violate the initial idea...at that point an Indexed Iterator just seems conceptually more correct.
Just opening the issue for discussion.
When writing Iterators, we need to define the map, reduce, and filter functions using a scripting language. It should be a language we can embed and interpret from inside Remits.
In an ideal world, Lua would be the best language imo. It's a very simple but powerful language, is well known, and has alot of precedent for this exact use case. However, my understanding is that there are no good Lua bindings for Rust. There are plenty of bindings, but in the past I've read that due to how Lua was built, certain memory actions are by their nature unsafe and it's not really something someone can work around without rewriting Lua natively. This is just something I read somewhere at some point though, so perhaps it's not actually an issue.
We should:
Whatever languages we choose, we'll have to provide standard functions for serialization and deserialization among other things. Remits' use case seems particularly suited for JIT-ing, where you have a few (or more likely just 1) super hotspot function you call every iteration. While a JIT isn't necessary imo, it seems like it'd be a massive plus to a language if it's supported.
We need a CLI client that can connect to a running Remits instance and execute commands.
This will help:
I would envision it very much like the Redis cli client. A very barebones line-based repl that takes commands and sends them to Remits, and simply displays the response.
So it turns out, we don't return the message id or ingestion timestamp with the message ๐คฆ . That's pretty necessary to be able to take the last message id and get the next batch.
To do that, we'll need to wrap each message in an object containing it's ID. I can't think of any other metadata to hang there, but it'd be available if we need a place to do so.
"Iterator List" and "Log List" respond in different ways right now.
Log List returns a single message. Within that message is a CBOR encoded array that contains the names of each Log.
Iterator List returns 1 message per Iterator. Each of those messages is a string representing the name of an Iterator.
For consistency, we should align on returning a single message, and convert Iterator List to work like Log List does.
Right now our request format is very well defined. As specified in the design doc, it uses framed requests with a 32bit length header followed by a command.
Responses, however, are super adhoc. Sometimes we return data, sometimes errors, and sometimes messages that just say whether something succeeded.
@volgorean has expressed interest in taking a pass at implementing a client, but before he's able to do that we have to define what a response should look like. This includes how we specify whether something is data, an error, or info, and how we encode it on the line.
Opening an issue since it'll likely take some thought and it's good to track it.
Right now, there's a single Mutex wrapped around the entire db:
// from main()
let db = Arc::new(Mutex::new(db::DB::new()));
This was mainly for simplicity while bootstrapping. Since adding a Message to a Log is the only place where mutation happens, we should instead have a RWLock on the Log itself. That would allow us to have:
Once the persistence layer is in place, this will likely be our main non-IO bottleneck. It's not an issue yet, there's way more to do first...but I wanted to make sure to log the issue so we can come back to it.
If you are familiar with Rust and aren't scared of the borrow checker, this would be a great first issue with a really large impact. If unfamiliar with Rust, it's still a good first issue but realize you're going to get very familiar with the borrow checker.
I'm coming back to the project and the vision for it is still the same, but I'm going to overhaul the structure to better support rigorous testing and modularity. I may also pull in pipelang as a crate within the workspace.
Along with this I'm moving to the most recent Rust stable and updating all dependencies. Previously we were using nightly but I think all nightly features we used have been stabilized at this point.
I don't think anyone is actively working on Remits right now though there are some forks, so I wanted to make an issue just incase and let people know over the next week or so I'm going to be rebasing and changing things freely on master. At a certain point I'll close this issue so people know it's safe to rebase/repull and not have to worry about finding ways to reconcile the main branch.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.