Giter Club home page Giter Club logo

Comments (4)

yitizi avatar yitizi commented on June 16, 2024 1

Thanks for your advice. We are in the process of updating Axon to version 4.7.6 to see if it resolves the issue. The problem has occurred, and we are working to identify the cause. Given that the code appears to have undergone minimal changes, we suspect it might recur. Additionally, this problem is not 100% reproducible.

from axonframework.

smcvb avatar smcvb commented on June 16, 2024

First off, thanks for filing this concern with us, @yitizi.

Secondly, perhaps an obvious pointer, but Axon Framework 4.4.5 is an old version (November 2020) of Axon Framework, as we are currently at release 4.9.1.
Although we make exceptions from time to time, we aim to provide critical bug fixes about three minor releases back.

Hence, it would be very helpful if you could show the same predicament under a more recent release of Axon Framework.
Not just for us but also for you and your team receiving or coming to a fix, of course.

Another point that interests me, is your specific mention of the optimizeEventConsumption property being set to true in your environment.
Does that mean you have shown the issue not to happen when optimizeEventConsumption is set to false?

Now let me comment a little to your explanation:

firstly,the eventstore has some event ,and then the trackingprocessor will process those events.
when consume to the end,it will start a EventProducer to fetch events instead of the privateStream.

The EmbeddedEventStore, when optimizeEventConsumption is enabled (which is the default), will aim to use a common EventConsumer among different StreamingEventProcessor instances as a means to optimize.
If a given StreamingEventProcessor falls outside the cache window of (by default) 10_000 events, the StreamingEventProcessor will receive a custom EventConsumer as its message source.

secondly,insert a lot of events to the event store,about 10000,which can't be done immediately.

This number you're inserting is pretty identical to the default cache window of the EmbeddedEventStore.
Have you, perchance, made any other configuration changes to the EmbeddedEventStore next to specifically setting the optimizeEventConsumption to true?
As stated earlier, when making this shift, your environment will consciously shift from a common EventConsumer for all tailing StreamingEventProcessors to unique EventConsumer instances per StreamingEventProcessor.

This raises another question: how many StreamingEventProcessors (as you're using version 4.4.5, only the TrackingEventProcessor applies) does your application run?

thirdly,release the trackingtoken which hold it,so another one will take it.

How are you exactly telling application X to release "a" TrackingToken so that the other instance picks it up?


Concluding: let's get to the bottom of this, @yitizi 💪

from axonframework.

yitizi avatar yitizi commented on June 16, 2024

First off, thanks for filing this concern with us, @yitizi.

Thanks for the reply. You suggested that I update the Axon Framework to the newest version, but the application is already published in production, so the update cannot be done quickly. I also noticed that the EventProducer code hasn't changed between versions 4.4.5 and 4.9.1.

Secondly, perhaps an obvious pointer, but Axon Framework 4.4.5 is an old version (November 2020) of Axon Framework, as we are currently at release 4.9.1. Although we make exceptions from time to time, we aim to provide critical bug fixes about three minor releases back.

Hence, it would be very helpful if you could show the same predicament under a more recent release of Axon Framework. Not just for us but also for you and your team receiving or coming to a fix, of course.

Another point that interests me, is your specific mention of the optimizeEventConsumption property being set to true in your environment. Does that mean you have shown the issue not to happen when optimizeEventConsumption is set to false?

Yes, only when optimizeEventConsumption is set to true, will the EventProducer initiate, and the linked list-like usage will be activated. When the EventProducer opens an event stream with JdbcEventStorageEngine, it fetches to the end, regardless of the existence of trailing consumers. Therefore, when I insert numerous events, they get cached in memory, and the trimCache method resets the oldest. However, these cached nodes still need to wait for the JVM garbage collection to be triggered.

Now let me comment a little to your explanation:

firstly,the eventstore has some event ,and then the trackingprocessor will process those events.
when consume to the end,it will start a EventProducer to fetch events instead of the privateStream.

The EmbeddedEventStore, when optimizeEventConsumption is enabled (which is the default), will aim to use a common EventConsumer among different StreamingEventProcessor instances as a means to optimize. If a given StreamingEventProcessor falls outside the cache window of (by default) 10_000 events, the StreamingEventProcessor will receive a custom EventConsumer as its message source.

I have no doubt about the capability to switch from a standard EventConsumer to a custom EventConsumer. It's a fantastic feature that can significantly decrease the usage of database connections.

secondly,insert a lot of events to the event store,about 10000,which can't be done immediately.

This number you're inserting is pretty identical to the default cache window of the EmbeddedEventStore. Have you, perchance, made any other configuration changes to the EmbeddedEventStore next to specifically setting the optimizeEventConsumption to true? As stated earlier, when making this shift, your environment will consciously shift from a common EventConsumer for all tailing StreamingEventProcessors to unique EventConsumer instances per StreamingEventProcessor.

This raises another question: how many StreamingEventProcessors (as you're using version 4.4.5, only the TrackingEventProcessor applies) does your application run?

about two.
image

thirdly,release the trackingtoken which hold it,so another one will take it.

How are you exactly telling application X to release "a" TrackingToken so that the other instance picks it up?

in debug environment is calling the org.axonframework.eventhandling.TrackingEventProcessor#releaseSegment(int) method. in production sometime just occur an error.but i search for the application i mention before,it seems to is just trigger the cleaner ,but just can't be done?im sure it is just two tracking.it has those log.
image
image

Concluding: let's get to the bottom of this, @yitizi 💪

Through the EventProducer's fetching method and the caching of nodes using a linked list approach, it puts pressure on the garbage collector (GC). Eventually, this prevents the garbage collector from functioning properly.
@smcvb

from axonframework.

smcvb avatar smcvb commented on June 16, 2024

Thanks for the reply. You suggested that I update the Axon Framework to the newest version, but the application is already published in production, so the update cannot be done quickly. I also noticed that the EventProducer code hasn't changed between versions 4.4.5 and 4.9.1.

Although I am aware that we have practically made no changes to the EmbeddedEventStore the last several release, the chances of a patch release on AF 4.4, 4.5, and 4.6 is very slim.
So, although your input here is appreciated, if you want to use a fix once present, you will simply be inclined to upgrade to a more recent version of Axon Framework.

Note that this choice is not something we've made lightly.
It is purely in place to minimize the maintenance effort for the Axon Framework team.
Given the team size, we are unfortunately required to make this request of you. My apologies for that.


With that said, I will reflect on your others comments.

Yes, only when optimizeEventConsumption is set to true, will the EventProducer initiate, and the linked list-like usage will be activated.
When the EventProducer opens an event stream with JdbcEventStorageEngine, it fetches to the end, regardless of the existence of trailing consumers.
Therefore, when I insert numerous events, they get cached in memory, and the trimCache method resets the oldest.
However, these cached nodes still need to wait for the JVM garbage collection to be triggered.

Thanks for the investigation here, @yitizi :-)
Do you perhaps also have an idea how to adjust trimCache to release resources quicker?
To be frank, I am at this stage not entirely certain how to improve things there...hence the request.

about two.
(I would prefer to quote a helpful text, but in this case there's only an image I can work with.)

It seems like your environment only has two Streaming Event Processors in place.

It's a fantastic feature that can significantly decrease the usage of database connections.

Although true, if you just have two Streaming Event Processors with a single token each, the TrackingEventProcessor will only open a single event stream per instance to begin with.
As such, the optimization is very limited, and apparently causes more harm in your environment than benefit.

Granted, if you would increase the number of segments for your TrackingEventProcessor, you would have an event stream (so database connection) per segment.
This is in turn extremely easily covered with the PooledStreamingEventProcessor, which in most cases has better performance characteristics any how.
However, the PooledStreamingEventProcessor was introduced in 4.5; again meriting an upgrade of your environment to a more recent release of Axon Framework.

in debug environment is calling the org.axonframework.eventhandling.TrackingEventProcessor#releaseSegment(int) method. in production sometime just occur an error.but i search for the application i mention before,it seems to is just trigger the cleaner ,but just can't be done?im sure it > is just two tracking.it has those log.

Furthermore, they're both sagas, which I base on the processor name.
Giving how most users implement Sagas, it is to be expected one or both of your Streaming Event Processors is lacking behind compared to the other.

The log messages provided by Axon Framework are set to WARN on purpose, as this typically shouldn't happen given bursts of events.
And if you do, it would require (1) switching to disable the optimizeEventConsumption or (2) changing the cachedEvents value to a number you would expect to be feasible.

Through the EventProducer's fetching method and the caching of nodes using a linked list approach, it puts pressure on the garbage collector (GC). Eventually, this prevents the garbage collector from functioning properly.

If you would point the issue to be the linked list, that would mean references to Nodes are kept (to long).
However, if that would be the case, I would expect this issue you have occurred to happen for a lot more people, which I haven't noticed to be the case either 🤔
Just an idea, but perhaps it helps if you enforce your EmbeddedEventStore to clean out EventConsumers that fall behind quicker.
To achieve that, you can adjust the cleanupDelayMillis value, which defaults to 10_000.

Would you mind running a test with that value set to, e.g., 1_000? Just to see whether the issue remains by the way!

By the way, I have spent quite some time in the EmbeddedEventStore to look for a spot where the Node instances that are stored may keep the Node#next field.
As in your original comment, it is the Node#next field that hits the 4Gb mark.

The most reasonable culprits of this, would be the Node oldest value of the overall EmbeddedEventStore, and the Node lastNode of an EventConsumer.
The latter should be covered by the Cleaner task`. The former sounds the most reasonable culprit, but to be frank, I would expect this predicament to occur for virtually every users of Axon Framework if that's the case 🤔


So, let me conclude this comment:

  1. Upgrading to a recent version of Axon Framework helps you to receive bug fixes at all. If you decide to stick to old versions, the effort lies on your shoulders instead.
  2. Having just two Sagas does not necessarily merit the use of the optimizeEventConsumption property. Perhaps this will change in the future, but as you're still on 4.4.5, I anticipate there's not to much active development happening on this project.
  3. Having just one segment per Tracking Event Processor does not necessarily merit the use of the optimizeEventConsumption property.
  4. If you could validate in your debug environment whether lowering the cleanupDelayMillis value resolves the predicament, that would be valuable information.

Thanks again for opening this issue, @yitizi! :-)

from axonframework.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.