Giter Club home page Giter Club logo

Comments (15)

StevenBlair123 avatar StevenBlair123 commented on May 26, 2024

@alexeyzimarev - this sounds very interesting.
I wondered if the following case would be handled by this.

We were running Evenstore 21.6 and one of our projections started writing out every single state change as event (the projection was partitioned)

So the stream created for specific aprtion was:

$projections-projectionStreamManager-StoreProduct-027c3bace2d1188bf5a2c3c5f193c49f-result

And this contained the state every time it was changed

Our Evenstore is flooded with this (gigs of space unfortunately)

We were going to write a tool to try and reclaim this, but maybe this gives us an altenative.

Btw, upgrading to 21.10 fixed the issue.

from eventuous.

alexeyzimarev avatar alexeyzimarev commented on May 26, 2024

The archive is more for closing business entities that aren't going to change anymore. Like when the order is fulfilled you might not need to keep it anymore. Restore in this case would bring the stream back so the original entity can be mutated with new events, also read models can re-project the stream again and make it available for queries.

from eventuous.

alexeyzimarev avatar alexeyzimarev commented on May 26, 2024

Ok, the design proposal is:

  • Archiving could be done by moving events to the archive store
  • Events can be copied all the time (with Elastic connector, for example), or it is an explicit action
    • Implicit archive: stream max-age is set to one year, and old events get scavenged. when needed, old events can be retrieved from the archive
    • Explicit archive: when the stream reaches certain size, events get archived, and the stream is truncated
  • When reading the stream because we handle a command, we try loading events from the primary store. If the stream is missing, or is incomplete (event number starts from more than zero), missing events are fetched from the archive
  • Snapshot is an event

from eventuous.

JulianMay avatar JulianMay commented on May 26, 2024

Regarding handling commands on archived streams I foresee a problem with projected readmodels.
Given a stream is archived, and a readmodel is rehydrated afterwards, it will not receive archived events. When a command is handled (based on archived events), the new event(s) would be the first events of that aggregate that said projection would receive.

My suggestion is to not promote such feature by providing it in the library. People can build it themselves should they need to. It could be helpful to have a specific exception thrown by command-handling if/when a command is handled for an archived stream (i imagine the stream would only contain a "stream archived" event at that time, previous events being truncated/scavenged).

A compromise could be to, by configuration, allow the command-handler to "un-archive" a stream automatically, writing old events back to the primary storeaspart of the command-handling transaction.

from eventuous.

alexeyzimarev avatar alexeyzimarev commented on May 26, 2024

Yes, we were discussing this as well :) Replays will be highly problematic.

However, a colleague made this point: you introduced a new feature, so you can hardly expect that all your 15 years of data will be present in this new feature. Say, you want to show the number of bookings cancelled over time. I won't expect to see the total number of cancellations for 10 years back. In reality, I only want to know this for the last couple of months. The archiving strategy needs to be tuned with those requirements in mind.

Plus, we don't want to have the StreamArchived event really. The idea is to have a composition event store, which will load the stream from the operational store and check the first event version. If the version is higher than zero, it will attempt to get the remaining set of events from the archive store. So, for executing commands it will be fully transparent. You'd need to be aware of this and be prepared for it in projections accordingly. The archiving process can be done mechanically. For example (that's what we plan) the Elastic connector replicates everything to the archive store. Not only you get all the analytics in Kibana, you also get an archive with tiered storage, and it's very cheap. "Archiving" as such happens by setting the max age of the stream in ESDB. So, when you accidentally hit a need to execute a transaction on a very old stream, you get all the events from both stores.

from eventuous.

alexeyzimarev avatar alexeyzimarev commented on May 26, 2024

Unachiving streams would be undesirable as it will fuck up the versioning, and create additional concerns for projections and replication of the events to the archive store (it must be fully idempotent, and we must use the original event ids).

from eventuous.

alexeyzimarev avatar alexeyzimarev commented on May 26, 2024

That's why I want dev calls 😅. We were discussing those issues just two hours ago...

from eventuous.

JulianMay avatar JulianMay commented on May 26, 2024

Yea, having a discussion asynchronously like this is never optimal 😌
You make good points (no surprise), I'm just concerned about the the " You'd need to be aware of this and be prepared for it in projections accordingly ", At least this caveat should ideally be prominently mentioned in the documentation around archiving. I would be curious to see examples of how to prepare for it, like would your projections need access to the archive as well? To me it seems it would have to, unless you can simply ignore the new event to an otherwise archived stream.

The scenario I'm circling around is "users invoking commands for an archived aggregate" - maybe I'm overthinking it...
The user would probably have a view from a projected readmodel from before the stream was archived, and would assume that view to be updated with the changes they made - which would work. The day after though, that (order, booking, whatever) would be gone though, because the projection was replayed having only the change from the day before and therefore not being viewable (or at least very wrong-looking).

If we allow users to handle commands on archived streams, we should expect they could make several, over days, weeks, even though it has been archived for years?

To me it seems like one of those situations where it's better to say "you can't, but you can easily make a copy of the old one and make you changes to that.", so not un-archiving, more like superseding.

I'm always open for a call if you'd like to chat 🤙

from eventuous.

alexeyzimarev avatar alexeyzimarev commented on May 26, 2024

The user would probably have a view from a projected readmodel from before the stream was archived, and would assume that view to be updated with the changes they made - which would work.

Correct, that's the idea

The day after though, that (order, booking, whatever) would be gone though, because the projection was replayed having only the change from the day before and therefore not being viewable (or at least very wrong-looking).

If we talk about the same aggregate, we have the following:

  • There's a read model built before from archived events, so it's correct
  • We get a new command
  • We read the archived stream to get the aggregate state
  • The new event got appended to the stream in the hot store
  • This new event gets projected to all the read models
  • It will also stay in the hot store until it gets truncated based on the stream TTL or size

There's no issue here.

We do, still, have an issue with re-projecting everything. But then again, just think about it. Does re-playing everything ever was a good idea? Say, we have a projection "pending arrivals". It always looks into the future, so replaying all the history would just create a lot of useless ops on the reporting database, as we add a record to "pending arrivals" when the booking is paid, and remove it after the guest checked in. Projecting the whole history here would mean that we do a lot of appends and deletions for no reason at all, as none of those historical arrivals is relevant, they happened long in the past. If we just project from what's in the hot store, we don't project the whole history, but just a relevant part of it.

What we discussed is that data in the hot store could give enough events to build new read models that give relevant information about what recently happened. It depends on the use case, of course. It makes me think that the starting point of a new projection needs some flexibility as right now it's from the beginning of times.

At the other hand, if all the events in the archive store are produced by a connector, they look 100% identical to the original events, including the commit position, etc. So, I don't think it's impossible to have a subscription there, a catch-up subscription without the realtime functionality.

from eventuous.

JulianMay avatar JulianMay commented on May 26, 2024

As I said "I'm probably overthinking it" 😌
What trips me up is if you one command on an archived aggregate, why not several - with maybe months in between, and readmodel rehydrations in the meantime.

That temporal couplin seems unnerving -
But I probably just don't have a good enough idea of a relevant usecase to think it through.
The ability to archive streams is definitely a good and important feature, and it should be modelled how that fits in to the respective application.

from eventuous.

alexeyzimarev avatar alexeyzimarev commented on May 26, 2024

The use case is simple. We have 15 years of data collected in SQL, and now we are migrating to an event-sourced system. About 1.6 billion streams will be initiated by the ImportedFromLegacy event. Our users, however, only make mutations for the streams that are just a couple of weeks old (that's a usual behaviour almost everywhere), maybe two months. We also have a cap for aggregate mutability, which is set to two years from the date the aggregate is created.

So, we decided not to move all the events through ESDB as they will be just sitting there doing nothing, occupying space and making the database too large. We will project the legacy to the read model by a one-time importer, at the same time as we produce those events. But we want to produce the events directly to the archive. That gives us the first scenario - if the stream is not found, let's look in the archive. Then, a mutation of an archived aggregate would be following the normal flow:

  • rehydrate from the archive
  • apply new events
  • new events go to the regular store
  • new events get copied to the archive using the connector
  • new events get projected to all the read models as normal
  • eventually, those events will get scavenged, and disappear from the main store

The original issue was about something else, which is still in plans. A deliberate action to archive the stream. But, it's very similar, as the first step would be to support reading events from the archive seamlessly. The writing part for now would be connector-based (Elastic). The other types of archive store can be added later, and then the explicit archive will need to be implemented (cloud buckets, etc).

from eventuous.

JulianMay avatar JulianMay commented on May 26, 2024

"rehydrate from the archive" solves my concern, as you could do this if you needed to rehydrate while having "unfinished business" to do on an otherwise archived stream (the commands that triggered me).
Thanks for elaborating 👍

from eventuous.

alexeyzimarev avatar alexeyzimarev commented on May 26, 2024

Right, I might have not explained the idea well enough.

I already built a store for Elastic, which can read and write (not sure about the writing part though).

The plan is now to split reads and writers, then make an archiving store using a composition of two readers and a single writer.

from eventuous.

alexeyzimarev avatar alexeyzimarev commented on May 26, 2024

Merged #80, it has the aggregate store with archive fallback, and sample implementation with Elastic.

from eventuous.

alexeyzimarev avatar alexeyzimarev commented on May 26, 2024

Releasing it as 0.7.0

from eventuous.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.