Giter Club home page Giter Club logo

Comments (6)

andreabalducci avatar andreabalducci commented on September 17, 2024

Haven't worker yet with SqlAzure but is possibile to have on a distributed system this kind of issue. You should handle this "glitch" in your client:
A) polling client -> sequencer -> projection
B) modify the polling client to read with few milliseconds delay

I will go for A

from neventstore.persistence.sql.

adamfur avatar adamfur commented on September 17, 2024

I've messed around a bit, received transaction exceptions in the polling client while using the EnlistInAmbientTransaction() call during the Wireup(). Not sure if it actually solves anything, but gonna try it out for a few days.

Regarding opt A and B.
I think creating a sequencer is difficult, we except a lot of holes in CheckpointNumber identity column, as we are using several buckets, also SqlAzure sometimes "randomly" bumps the identity by +10'000.

Implemented a version of B were we changed ObserveFrom*() to pass UtcNow - 300ms, and ignore all commits newer than that. Giving some time for the infrastructure to catch up.

If we settle for opt B, I will eventually send a pull request with a SqlAzureDialect.

from neventstore.persistence.sql.

adamfur avatar adamfur commented on September 17, 2024

Our issues:

  • Azure SQL sometimes bump up the identity (CheckpointNumber) value by 10'000.
  • Fetching a range of recently written commits, is sometimes missing a portion of the data.
  • There are unexplained gaps in CheckpointNumber, usually ranging from 1-2.

The workaround:

  • Added a predicate so we can filter on the bucketId, whatever we should use the OnNext()-method on the current commit.
  • Always fetches from all-the-buckets (Only way to know if we have a gap).
  • Always consume commits older than 5 seconds.
  • Throw if the next commit is not the last checkpointnumber + 1.

Notes:

  1. If case we stumble upon a gap, the clients will have to tolerate a lag of 5 seconds before their projections are updated (happens like two times a day).
  2. Invalid sequence retrieval is treated like a transient error, will retry until the next commit has aged to 5 seconds or more, or if we receive the correct sequence.
  3. In our logs we can see that it has taken almost 0.5s before we ultimately receive our expected sequence number.

from neventstore.persistence.sql.

fschmied avatar fschmied commented on September 17, 2024

I believe this to be caused by READ COMMITTED SNAPSHOT, which seems to be on in Azure SQL by default (https://blogs.msdn.microsoft.com/sqlcat/2013/12/26/be-aware-of-the-difference-in-isolation-levels-if-porting-an-application-from-windows-azure-sql-db-to-sql-server-in-windows-azure-virtual-machine/) and is incompatible with NES.

I wonder if creating an AzureSqlDialect using READCOMMITTEDLOCK would have resolved the issue (if that works on Azure SQL).

from neventstore.persistence.sql.

fschmied avatar fschmied commented on September 17, 2024

I just did a bit of experimentation that showed adding the WITH (READCOMMITTEDLOCK) table hint to NEventStore's queries would probably solve the problem observed by @adamfur as it reintroduces the blocking behavior of the polling client normally seen under SQL Server, but lost under Azure SQL.

from neventstore.persistence.sql.

fschmied avatar fschmied commented on September 17, 2024

We've created a subclass of MsSqlDialect that adds the READCOMITTEDLOCK table hint for Azure SQL, and it seems to fix the main problem.

What remains is the very low likelihood of #21 occurring, but I think noone has actually ever seen this in production.

from neventstore.persistence.sql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.