In order to keep the database size in an event store upper bound, it would be useful t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Ok, the design proposal is: Archiving could be done by moving

That's why I want dev calls <g-emoji class="g-emoji" alias="sweat_smile" fallback-src=

As I said "I'm probably overthinking it" <g-emoji class="g-emoji" alias="relieved" fal

Archiving streams about eventuous HOT 15 OPEN

eventuous commented on May 26, 2024

Archiving streams

from eventuous.

Comments (15)

StevenBlair123 commented on May 26, 2024

@alexeyzimarev - this sounds very interesting.
I wondered if the following case would be handled by this.

We were running Evenstore 21.6 and one of our projections started writing out every single state change as event (the projection was partitioned)

So the stream created for specific aprtion was:

$projections-projectionStreamManager-StoreProduct-027c3bace2d1188bf5a2c3c5f193c49f-result

And this contained the state every time it was changed

Our Evenstore is flooded with this (gigs of space unfortunately)

We were going to write a tool to try and reclaim this, but maybe this gives us an altenative.

Btw, upgrading to 21.10 fixed the issue.

from eventuous.

alexeyzimarev commented on May 26, 2024

The archive is more for closing business entities that aren't going to change anymore. Like when the order is fulfilled you might not need to keep it anymore. Restore in this case would bring the stream back so the original entity can be mutated with new events, also read models can re-project the stream again and make it available for queries.

from eventuous.

alexeyzimarev commented on May 26, 2024

Ok, the design proposal is:

Archiving could be done by moving events to the archive store
Events can be copied all the time (with Elastic connector, for example), or it is an explicit action
- Implicit archive: stream max-age is set to one year, and old events get scavenged. when needed, old events can be retrieved from the archive
- Explicit archive: when the stream reaches certain size, events get archived, and the stream is truncated
When reading the stream because we handle a command, we try loading events from the primary store. If the stream is missing, or is incomplete (event number starts from more than zero), missing events are fetched from the archive
Snapshot is an event

from eventuous.

JulianMay commented on May 26, 2024

Regarding handling commands on archived streams I foresee a problem with projected readmodels.
Given a stream is archived, and a readmodel is rehydrated afterwards, it will not receive archived events. When a command is handled (based on archived events), the new event(s) would be the first events of that aggregate that said projection would receive.

My suggestion is to not promote such feature by providing it in the library. People can build it themselves should they need to. It could be helpful to have a specific exception thrown by command-handling if/when a command is handled for an archived stream (i imagine the stream would only contain a "stream archived" event at that time, previous events being truncated/scavenged).

A compromise could be to, by configuration, allow the command-handler to "un-archive" a stream automatically, writing old events back to the primary storeaspart of the command-handling transaction.

from eventuous.

alexeyzimarev commented on May 26, 2024

Yes, we were discussing this as well :) Replays will be highly problematic.

However, a colleague made this point: you introduced a new feature, so you can hardly expect that all your 15 years of data will be present in this new feature. Say, you want to show the number of bookings cancelled over time. I won't expect to see the total number of cancellations for 10 years back. In reality, I only want to know this for the last couple of months. The archiving strategy needs to be tuned with those requirements in mind.

Plus, we don't want to have the StreamArchived event really. The idea is to have a composition event store, which will load the stream from the operational store and check the first event version. If the version is higher than zero, it will attempt to get the remaining set of events from the archive store. So, for executing commands it will be fully transparent. You'd need to be aware of this and be prepared for it in projections accordingly. The archiving process can be done mechanically. For example (that's what we plan) the Elastic connector replicates everything to the archive store. Not only you get all the analytics in Kibana, you also get an archive with tiered storage, and it's very cheap. "Archiving" as such happens by setting the max age of the stream in ESDB. So, when you accidentally hit a need to execute a transaction on a very old stream, you get all the events from both stores.

from eventuous.

alexeyzimarev commented on May 26, 2024

Unachiving streams would be undesirable as it will fuck up the versioning, and create additional concerns for projections and replication of the events to the archive store (it must be fully idempotent, and we must use the original event ids).

from eventuous.

alexeyzimarev commented on May 26, 2024

That's why I want dev calls 😅. We were discussing those issues just two hours ago...

from eventuous.

JulianMay commented on May 26, 2024

Yea, having a discussion asynchronously like this is never optimal 😌
You make good points (no surprise), I'm just concerned about the the " You'd need to be aware of this and be prepared for it in projections accordingly ", At least this caveat should ideally be prominently mentioned in the documentation around archiving. I would be curious to see examples of how to prepare for it, like would your projections need access to the archive as well? To me it seems it would have to, unless you can simply ignore the new event to an otherwise archived stream.

The scenario I'm circling around is "users invoking commands for an archived aggregate" - maybe I'm overthinking it...
The user would probably have a view from a projected readmodel from before the stream was archived, and would assume that view to be updated with the changes they made - which would work. The day after though, that (order, booking, whatever) would be gone though, because the projection was replayed having only the change from the day before and therefore not being viewable (or at least very wrong-looking).

If we allow users to handle commands on archived streams, we should expect they could make several, over days, weeks, even though it has been archived for years?

To me it seems like one of those situations where it's better to say "you can't, but you can easily make a copy of the old one and make you changes to that.", so not un-archiving, more like superseding.

I'm always open for a call if you'd like to chat 🤙

from eventuous.

alexeyzimarev commented on May 26, 2024

The user would probably have a view from a projected readmodel from before the stream was archived, and would assume that view to be updated with the changes they made - which would work.

Correct, that's the idea

The day after though, that (order, booking, whatever) would be gone though, because the projection was replayed having only the change from the day before and therefore not being viewable (or at least very wrong-looking).

If we talk about the same aggregate, we have the following:

There's a read model built before from archived events, so it's correct
We get a new command
We read the archived stream to get the aggregate state
The new event got appended to the stream in the hot store
This new event gets projected to all the read models
It will also stay in the hot store until it gets truncated based on the stream TTL or size

There's no issue here.

We do, still, have an issue with re-projecting everything. But then again, just think about it. Does re-playing everything ever was a good idea? Say, we have a projection "pending arrivals". It always looks into the future, so replaying all the history would just create a lot of useless ops on the reporting database, as we add a record to "pending arrivals" when the booking is paid, and remove it after the guest checked in. Projecting the whole history here would mean that we do a lot of appends and deletions for no reason at all, as none of those historical arrivals is relevant, they happened long in the past. If we just project from what's in the hot store, we don't project the whole history, but just a relevant part of it.

What we discussed is that data in the hot store could give enough events to build new read models that give relevant information about what recently happened. It depends on the use case, of course. It makes me think that the starting point of a new projection needs some flexibility as right now it's from the beginning of times.

At the other hand, if all the events in the archive store are produced by a connector, they look 100% identical to the original events, including the commit position, etc. So, I don't think it's impossible to have a subscription there, a catch-up subscription without the realtime functionality.

from eventuous.

JulianMay commented on May 26, 2024

As I said "I'm probably overthinking it" 😌
What trips me up is if you one command on an archived aggregate, why not several - with maybe months in between, and readmodel rehydrations in the meantime.

That temporal couplin seems unnerving -
But I probably just don't have a good enough idea of a relevant usecase to think it through.
The ability to archive streams is definitely a good and important feature, and it should be modelled how that fits in to the respective application.

from eventuous.

alexeyzimarev commented on May 26, 2024

The use case is simple. We have 15 years of data collected in SQL, and now we are migrating to an event-sourced system. About 1.6 billion streams will be initiated by the ImportedFromLegacy event. Our users, however, only make mutations for the streams that are just a couple of weeks old (that's a usual behaviour almost everywhere), maybe two months. We also have a cap for aggregate mutability, which is set to two years from the date the aggregate is created.

So, we decided not to move all the events through ESDB as they will be just sitting there doing nothing, occupying space and making the database too large. We will project the legacy to the read model by a one-time importer, at the same time as we produce those events. But we want to produce the events directly to the archive. That gives us the first scenario - if the stream is not found, let's look in the archive. Then, a mutation of an archived aggregate would be following the normal flow:

rehydrate from the archive
apply new events
new events go to the regular store
new events get copied to the archive using the connector
new events get projected to all the read models as normal
eventually, those events will get scavenged, and disappear from the main store

The original issue was about something else, which is still in plans. A deliberate action to archive the stream. But, it's very similar, as the first step would be to support reading events from the archive seamlessly. The writing part for now would be connector-based (Elastic). The other types of archive store can be added later, and then the explicit archive will need to be implemented (cloud buckets, etc).

from eventuous.

JulianMay commented on May 26, 2024

"rehydrate from the archive" solves my concern, as you could do this if you needed to rehydrate while having "unfinished business" to do on an otherwise archived stream (the commands that triggered me).
Thanks for elaborating 👍

from eventuous.

alexeyzimarev commented on May 26, 2024

Right, I might have not explained the idea well enough.

I already built a store for Elastic, which can read and write (not sure about the writing part though).

The plan is now to split reads and writers, then make an archiving store using a composition of two readers and a single writer.

from eventuous.

alexeyzimarev commented on May 26, 2024

Merged #80, it has the aggregate store with archive fallback, and sample implementation with Elastic.

from eventuous.

alexeyzimarev commented on May 26, 2024

Releasing it as 0.7.0

from eventuous.

Archiving streams about eventuous HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent