Comments (15)
@alexeyzimarev - this sounds very interesting.
I wondered if the following case would be handled by this.
We were running Evenstore 21.6 and one of our projections started writing out every single state change as event (the projection was partitioned)
So the stream created for specific aprtion was:
$projections-projectionStreamManager-StoreProduct-027c3bace2d1188bf5a2c3c5f193c49f-result
And this contained the state every time it was changed
Our Evenstore is flooded with this (gigs of space unfortunately)
We were going to write a tool to try and reclaim this, but maybe this gives us an altenative.
Btw, upgrading to 21.10 fixed the issue.
from eventuous.
The archive is more for closing business entities that aren't going to change anymore. Like when the order is fulfilled you might not need to keep it anymore. Restore in this case would bring the stream back so the original entity can be mutated with new events, also read models can re-project the stream again and make it available for queries.
from eventuous.
Ok, the design proposal is:
- Archiving could be done by moving events to the archive store
- Events can be copied all the time (with Elastic connector, for example), or it is an explicit action
- Implicit archive: stream max-age is set to one year, and old events get scavenged. when needed, old events can be retrieved from the archive
- Explicit archive: when the stream reaches certain size, events get archived, and the stream is truncated
- When reading the stream because we handle a command, we try loading events from the primary store. If the stream is missing, or is incomplete (event number starts from more than zero), missing events are fetched from the archive
- Snapshot is an event
from eventuous.
Regarding handling commands on archived streams I foresee a problem with projected readmodels.
Given a stream is archived, and a readmodel is rehydrated afterwards, it will not receive archived events. When a command is handled (based on archived events), the new event(s) would be the first events of that aggregate that said projection would receive.
My suggestion is to not promote such feature by providing it in the library. People can build it themselves should they need to. It could be helpful to have a specific exception thrown by command-handling if/when a command is handled for an archived stream (i imagine the stream would only contain a "stream archived" event at that time, previous events being truncated/scavenged).
A compromise could be to, by configuration, allow the command-handler to "un-archive" a stream automatically, writing old events back to the primary storeaspart of the command-handling transaction.
from eventuous.
Yes, we were discussing this as well :) Replays will be highly problematic.
However, a colleague made this point: you introduced a new feature, so you can hardly expect that all your 15 years of data will be present in this new feature. Say, you want to show the number of bookings cancelled over time. I won't expect to see the total number of cancellations for 10 years back. In reality, I only want to know this for the last couple of months. The archiving strategy needs to be tuned with those requirements in mind.
Plus, we don't want to have the StreamArchived
event really. The idea is to have a composition event store, which will load the stream from the operational store and check the first event version. If the version is higher than zero, it will attempt to get the remaining set of events from the archive store. So, for executing commands it will be fully transparent. You'd need to be aware of this and be prepared for it in projections accordingly. The archiving process can be done mechanically. For example (that's what we plan) the Elastic connector replicates everything to the archive store. Not only you get all the analytics in Kibana, you also get an archive with tiered storage, and it's very cheap. "Archiving" as such happens by setting the max age of the stream in ESDB. So, when you accidentally hit a need to execute a transaction on a very old stream, you get all the events from both stores.
from eventuous.
Unachiving streams would be undesirable as it will fuck up the versioning, and create additional concerns for projections and replication of the events to the archive store (it must be fully idempotent, and we must use the original event ids).
from eventuous.
That's why I want dev calls
from eventuous.
Yea, having a discussion asynchronously like this is never optimal 😌
You make good points (no surprise), I'm just concerned about the the " You'd need to be aware of this and be prepared for it in projections accordingly ", At least this caveat should ideally be prominently mentioned in the documentation around archiving. I would be curious to see examples of how to prepare for it, like would your projections need access to the archive as well? To me it seems it would have to, unless you can simply ignore the new event to an otherwise archived stream.
The scenario I'm circling around is "users invoking commands for an archived aggregate" - maybe I'm overthinking it...
The user would probably have a view from a projected readmodel from before the stream was archived, and would assume that view to be updated with the changes they made - which would work. The day after though, that (order, booking, whatever) would be gone though, because the projection was replayed having only the change from the day before and therefore not being viewable (or at least very wrong-looking).
If we allow users to handle commands on archived streams, we should expect they could make several, over days, weeks, even though it has been archived for years?
To me it seems like one of those situations where it's better to say "you can't, but you can easily make a copy of the old one and make you changes to that.", so not un-archiving, more like superseding.
I'm always open for a call if you'd like to chat
from eventuous.
The user would probably have a view from a projected readmodel from before the stream was archived, and would assume that view to be updated with the changes they made - which would work.
Correct, that's the idea
The day after though, that (order, booking, whatever) would be gone though, because the projection was replayed having only the change from the day before and therefore not being viewable (or at least very wrong-looking).
If we talk about the same aggregate, we have the following:
- There's a read model built before from archived events, so it's correct
- We get a new command
- We read the archived stream to get the aggregate state
- The new event got appended to the stream in the hot store
- This new event gets projected to all the read models
- It will also stay in the hot store until it gets truncated based on the stream TTL or size
There's no issue here.
We do, still, have an issue with re-projecting everything. But then again, just think about it. Does re-playing everything ever was a good idea? Say, we have a projection "pending arrivals". It always looks into the future, so replaying all the history would just create a lot of useless ops on the reporting database, as we add a record to "pending arrivals" when the booking is paid, and remove it after the guest checked in. Projecting the whole history here would mean that we do a lot of appends and deletions for no reason at all, as none of those historical arrivals is relevant, they happened long in the past. If we just project from what's in the hot store, we don't project the whole history, but just a relevant part of it.
What we discussed is that data in the hot store could give enough events to build new read models that give relevant information about what recently happened. It depends on the use case, of course. It makes me think that the starting point of a new projection needs some flexibility as right now it's from the beginning of times.
At the other hand, if all the events in the archive store are produced by a connector, they look 100% identical to the original events, including the commit position, etc. So, I don't think it's impossible to have a subscription there, a catch-up subscription without the realtime functionality.
from eventuous.
As I said "I'm probably overthinking it"
What trips me up is if you one command on an archived aggregate, why not several - with maybe months in between, and readmodel rehydrations in the meantime.
That temporal couplin seems unnerving -
But I probably just don't have a good enough idea of a relevant usecase to think it through.
The ability to archive streams is definitely a good and important feature, and it should be modelled how that fits in to the respective application.
from eventuous.
The use case is simple. We have 15 years of data collected in SQL, and now we are migrating to an event-sourced system. About 1.6 billion streams will be initiated by the ImportedFromLegacy
event. Our users, however, only make mutations for the streams that are just a couple of weeks old (that's a usual behaviour almost everywhere), maybe two months. We also have a cap for aggregate mutability, which is set to two years from the date the aggregate is created.
So, we decided not to move all the events through ESDB as they will be just sitting there doing nothing, occupying space and making the database too large. We will project the legacy to the read model by a one-time importer, at the same time as we produce those events. But we want to produce the events directly to the archive. That gives us the first scenario - if the stream is not found, let's look in the archive. Then, a mutation of an archived aggregate would be following the normal flow:
- rehydrate from the archive
- apply new events
- new events go to the regular store
- new events get copied to the archive using the connector
- new events get projected to all the read models as normal
- eventually, those events will get scavenged, and disappear from the main store
The original issue was about something else, which is still in plans. A deliberate action to archive the stream. But, it's very similar, as the first step would be to support reading events from the archive seamlessly. The writing part for now would be connector-based (Elastic). The other types of archive store can be added later, and then the explicit archive will need to be implemented (cloud buckets, etc).
from eventuous.
"rehydrate from the archive" solves my concern, as you could do this if you needed to rehydrate while having "unfinished business" to do on an otherwise archived stream (the commands that triggered me).
Thanks for elaborating
from eventuous.
Right, I might have not explained the idea well enough.
I already built a store for Elastic, which can read and write (not sure about the writing part though).
The plan is now to split reads and writers, then make an archiving store using a composition of two readers and a single writer.
from eventuous.
Merged #80, it has the aggregate store with archive fallback, and sample implementation with Elastic.
from eventuous.
Releasing it as 0.7.0
from eventuous.
Related Issues (20)
- [EVE-34] Process managers HOT 2
- Links to edit markdown in GitHub's documentation are broken due to language code in path HOT 1
- Feature: The ability to fold historical events into one another (to keep the Event Store size managable) HOT 1
- [EVE-33] Race condition in ESDB subscriptions with random NullReferenceException HOT 10
- Eventuous breaks with npgsql >= 7.0.0
- [EVE-38] Subscriptions pool HOT 1
- [EVE-36] Bug: mapped commands expose the wrong contract to OpenAPI
- [EVE-39] Expose endpoint configuration for command mapping
- [EVE-43] Test issue
- [EVE-44] Support authorisation through attributes (HTTP)
- Avoid unnecessary copy of Payload and Metadata in EsdbEventStore
- Most of the AspNetCore extensions should likely be changed/moved to something more generic HOT 1
- Multitenant implementation HOT 20
- Custom queue name in RMQ gateway subscription
- Crypto Shredding Serialization/Deserialization Support
- Subscriptions sometimes skips events HOT 24
- Postgres subscription is flooding database server with polling queries HOT 3
- Postgres subscription does not automatically reconnect after database server restart HOT 1
- Promote InMemoryEventStore to the main Eventuous library
- Wrong event stream name in EventStoreDB persistent subscription events
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eventuous.