Giter Club home page Giter Club logo

nservicebus.persistence.cosmosdb's Introduction

NServiceBus.Persistence.CosmosDB

NServiceBus.Persistence.CosmosDB is the official NServiceBus persistence for Azure Cosmos DB utilizing the Core (SQL) API.

It is part of the Particular Service Platform, which includes NServiceBus and tools to build, monitor, and debug distributed systems.

See the Azure Cosmos DB Persistence documentation for more details on how to use it.

Running tests locally

All test projects utilize NUnit. The test projects can be executed using the test runner included in Visual Studio or using the dotnet test command from the command line.

The tests in the AcceptanceTesting projects and the PersistenceTests project require a Cosmos DB server in order for the test to pass.

Using the Cosmos DB Emulator

The AcceptanceTests and PersistenceTests projects will connect to a local Cosmos DB emulator without configuring a connection string.

The Cosmos DB Emulator, including a data explorer, can be located at https://localhost:8081/_explorer/index.html.

Once the emulator is setup, create a Database named CosmosDBPersistence.

Using the Cosmos DB Service

To create a Cosmos DB Core (SQL) Account refer to the Microsoft instructions for managing Accounts.

Once a Cosmos DB account is setup, you can use the Azure Cosmos explorer to create a Database named CosmosDBPersistence which is required by the test projects.

To use the created Cosmos DB Account, set an environment variable named CosmosDBPersistence_ConnectionString with a Cosmos DB connection string for your Account.

nservicebus.persistence.cosmosdb's People

Contributors

adamralph avatar aleksandr-samila avatar andreasohlund avatar awright18 avatar boblangley avatar bording avatar danielmarbach avatar davidboike avatar dependabot[bot] avatar helenktsai avatar heskandari avatar internalautomation[bot] avatar jpalac avatar kbaley avatar kentdr avatar lailabougria avatar mauroservienti avatar mikeminutillo avatar particularbot avatar ramonsmits avatar seanfeldman avatar sergioc avatar soujay avatar szymonpobiega avatar timbussmann avatar tmasternak avatar williambza avatar yvesgoeleven avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

eventellect

nservicebus.persistence.cosmosdb's Issues

Unclear to users what the cause is when OCC fails

Improve the experience for users that use sagas but saga instance are failing due to an OCC conflict:

image

Solutions could be:

Catch the execption and retrow a new exception like OptimisticConcurrencyException with details on why this happens.

Throw an exception that provides information about all the operations that failed within the batch instead of the first one

Currently, we throw TransactionalBatchOperationException that gives access to the TransactionalBatchOperationResult of the first operation that failed
Technically we could collect all the failures and then throw some sort of an aggregate exception, for example, rename TransactionalBatchOperationException to TransactionalBatchOperationsException and there expose a list of TransactionalBatchOperationResult or even better have TransactionalBatchOperationsException have a list of TransactionalBatchOperationException because that would allow us to expose also a message with some addition reasons
Alternatively, TransactionalBatchOperationsException could expose "tuples" of string and TransactionalBatchOperationResult which I find a bit weird honestly

Add support for configuring partition key extraction rules

Raised by @Ivan-L

We found that teams who used our persistence package had a difficult time wrapping their heads around the fact that one needs to look inside a message in order to determine the partition to which a Cosmos session needs to be scoped. We thought of but never got around to it, to implement a fluent API for configuring message partition key extraction rules to make that configuration easier and clearer to reason about. For example, we considered the following:

endpointConfiguration.UsePersistence<CosmosDbPersistence>()
    .CosmosClient(new CosmosClient("ConnectionString"))
    .DatabaseName("DatabaseName")
    .ForMessage<OneSpecificMessage>(message => (new ContainerInformation("containerName", "partitionKeyPath"), message.PartitionKeyValue))
    .ForMessage<ISomeMessageInterface>(message => (new ContainerInformation("containerName", "partitionKeyPath"), message.PartitionKeyValue))

Outbox cleaning using Time-To-Live does not work

Symptoms

Outbox records for dispatched messages are not evicted.

Who's affected

Anyone using Cosmos DB persistence with outbox feature enabled.

Root cause

CosmosDB doesn't have a normal "outbox cleaner" but uses a TTL feature so that the record is automatically deleted after the TTL is complete.

Except according to the docs linked above, if the container TTL is set to null then it doesn't matter what the per-item TTL is set to—the item will never expire.

The container is created here and does not include setting a TTL, so outbox records will never expire.

Containers should be created with a default TTL of -1 which still means "no expiry" but will allow per-item TTLs to work correctly.

Workaround

Manually update the TTL of the container to On (no default).

Plan of action

Verify client-side encryption with Always Encrypted can be used

More details about the feature

At first glance, initializing the client should work by specifying the key resolver and taking control over the client creation:

var tokenCredential = new DefaultAzureCredential();
var keyResolver = new KeyResolver(tokenCredential);
var client = new CosmosClient("<connection-string>")
    .WithEncryption(keyResolver, KeyEncryptionKeyResolverName.AzureKeyVault);

container level encryption policies are currently not considered when running installers because there is no way to take control over the installation process when installers run with EnableInstallers. The sane way to do this is probably to disable installers and create the table with other means. Reading and writing encrypted data should then happen automatically.

There might be caveats with filter queries on encrypted properties, see https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-always-encrypted?tabs=dotnet#filter-queries-on-encrypted-properties

NServiceBus.Persistence.CosmosDB - Public preview release available

We've just released a public preview version of NServiceBus.Persistence.CosmosDB with support for sagas and outbox, as well as a migration path from Azure Storage persistence.

Azure Cosmos DB is a great choice to persist your saga data and custom data when running in Azure with NServiceBus. As a fully managed, globally distributed, elastically scaled, pay-as-you-go service, your NServiceBus-based systems can benefit from guaranteed single-digit-millisecond latency with 99.999% availability.

The NServiceBus.Persistence.CosmosDB persistence comes with the following benefits:

  • Support for the outbox with transactional guarantees for messaging and business data
  • Run with SQL persistence guarantees at Cosmos DB cost
  • Faster than Azure Storage persistence while transactional
  • Fully partitioning aware, unlocking advanced data storage scenarios such a multi-tenancy
  • Supported migration from Azure Storage persistence

Getting Started

To use NServiceBus.Persistence.CosmosDB, install the NuGet package:

nuget install-package NServiceBus.Persistence.CosmosDB

Configure NServiceBus endpoint to use Cosmos DB:

endpointConfiguration.UsePersistence<CosmosDBPersistence>()
   .CosmosClient(new CosmosClient("your-connection-string"))
   .DatabaseName("your-database-name")
   .DefaultContainer("container-name", "/partition/key/path");

Now you are ready to use sagas with Azure Cosmos DB.

For detailed configuration options see the documentation.

About the public preview

The NServiceBus.Persistence.CosmosDB package is released as a public preview. Public previews are separately licensed, production-ready packages, aiming to react more quickly to customers' needs. See the support policy for previews for more information about our support commitment. Preview packages may transition to fully supported versions after the preview period.

User adoption is crucial and helps us decide whether to make NServiceBus.Persistence.CosmosDB a permanent part of the Particular Platform. Please let us know if you are using this preview by emailing us at [email protected].

We'd also love to receive your feedback about the new NServiceBus.Persistence.CosmosDB package via our support channels, the project repository, or our public previews discussion group.

Where to get it

You can install the preview from NuGet.

With thanks,
The team in Particular

Please read our release policy for more details. Follow @ParticularNews to be notified of new releases and bug fixes.

Deduplication and outbox is executed for non-business (e.g. subscription) messages resulting in NullReferenceException being thrown because of the lack of PartitionKey

Describe the bug here by adding any information that you may know under the respective headers below.

Observed behavior

Deduplication and outbox is executed for non-business (e.g. subscription) messages resulting in NullReferenceException being thrown because of the lack of PartitionKey

Expected behavior

Messages which don't have associated partition key are skipped in the outbox mechanism.

Symptoms

Subscription messages are sent to the error queue and endpoints can't perform pub/sub on transports that rely on message-driven pub/sub

Steps to reproduce

Run pub/sub with non-native pub/sub transport such as ASQ

Is it a regression?

No

Who's affected

All customers who wish to use the CosmosDB persistence with ASQ transport

Root cause

Known Workaround

Have a behavior in the ITransportReceive context that set's some default partition key to be used for non-business messages and another behavior later that sets the correct partition key for business messages.

Triage

Use the following checklist to help triage the bug by assigning the points corresponding to any applicable factors. A staff member from Particular Software will review and adjust if necessary.

Effect Points Notes
Performance degrades over time requiring manual interventions or automating restarts (1 point)
The cost of utilizing the Preview significantly exceeds what is justified (2 points)
The user interface responsiveness is longer than 10s between user request and UI response (1 point)
The issue impacts the correctness of the system but does not cause irreversible damage (3 points) 3
The issue causes unscheduled or unplanned process shutdown. (3 points)
The issue impacts the system correctness causing irreversible damage (5 points)
The issue occurs in less than 1 in 1000 usages/invokations (0 points)
The issue occurs in greater than 1 in 1000 usages/invokations (1 point)
The issue occurs in greater than 1 in 100 usages/invokations (2 points) 2
The issue significantly impacts the adoption of the Preview for the core use-case (3 points) 3

Exception thrown for handlers that do not use synchronized storage

Who's affected

  • All customers using the persistence

Symptoms

For cases when a message is destined to go to a handler that doesn't use CosmosDB a synchronized storage session is still created behind the scenes. In such cases it is not necessary to provide the container information yet is is enforced by us and an exception is thrown

Unable to retrieve the container name and the partition key during processing. Make sure that either persistence.Container() is used or the relevant container information is available on the message handling pipeline.

Backported to

Prevent Outbox behavior from creating large amounts of duplicates when RU limits have been reached

As part of the Outbox process, the last step is to mark the dispatched outgoing messages as dispatched in the Outbox storage. However, this operation might run into an exception in case the provisioned RUs have been exceeded when the system is under heavy load (this might easily be the case when combining CosmosDB persistence with auto-scaling hosting options like Azure Functions). When Azure throttles the client due to exceeded RU capacity, the exception thrown will make NServiceBus to retry the message while the outgoing message has already been dispatched to the transport as there is no distributed transaction.
Retrying the message will continue to create duplicate messages for the same outbox record while the system stays under heavy load.

It seems the read operations that happen first as part of the Outbox process are less likely to hit the RU limits due to lower RU costs compared to the involved update operation (especially when trying to update multiple records when there are multiple outgoing messages).

It would be desirable to minimize the amount of duplicate messages, e.g. by throttling the endpoint processing once 429 status code errors have been received. Additionally, better feedback and guidance might be helpful in case users run into such situations.

Triage

This triage checklist should be filled in by a staff member from Particular Software.

Yes / No Triage criteria
Hard to estimate, potentially No The expected effort to code and document the enhancement is less than 8 hours.
Hard to estimate, potentially No A reviewer can approve any changes in less than 1 hour.
No The solution is obvious and does not require analysis by a task force.
Yes The issue is NOT blocking the customer from using the Preview in production.
No The issue is blocking the user from using the Preview in production.
Personally I'm aware of one user running into this scenario. There's evidence solving the problem will benefit multiple users.

Update Microsoft.Azure.Cosmos package

Microsoft had a bug in the version of the Cosmos DB SDK (Microsoft.Azure.Cosmos 3.14.0 package) we're referencing and has unlisted it. Unfortunately, being a transitive dependency, anyone pulling down our Preview persistence package will get the unlisted version. We should release a new minor with a bumped Cosmos DB dependency.

Cache partition key path and segments to reduce allocations

While writing outbox and saga records, the partition key enrichment procedure transforms the partition key path and splits out the various segments from the path for every operation. The partition key path has low cardinality, and thus the result can be cached to save allocations.

Could not enumerate all types for 'NServiceBus.Persistence.CosmosDB'

Hi,

I have been having some issues during a recent upgrade/migration to .NET 6.0, following on from this announcement. I originally raised my issue on the Core repository. This issue details the exception and stack trace that I got after upgrading to NServiceBus.Persistence.CosmosDB 1.0.0. This package does not play nicely when I introduce another reference to Microsoft.Azure.Cosmos, specifically any reference to a version > 3.20.1.

Support for Custom Saga Finders

Hi,

It would be great to have support for Custom Saga finders for the CosmosDB Persistence package. It appears as though we do not support it, see SagaIdGenerator.

I ran into this issue while I was building a more complicated look-up scenario that involved correlating two message headers, like [Type]_[Id]. I can work around this by opting to create a new header with this value, allowing me to use the more conventional look up mechanisms under ConfigureHowToFindSaga.

I'm anticipating that this was intentional, given the resource consumption concerns when querying CosmosDB incorrectly (I.e. not within a single logical partition).

Many thanks,

Provide a little more info around what happens with high contention creating a saga

Describe the suggested improvement

Is your improvement related to a problem? Please describe.

Not a problem really, just an easily misunderstood behavior of ComsosDB Saga persistence. What happens when many similar events come close together that map to the same id for the saga?

Describe the suggested solution

Make the expectations for this behavior more clear so devs can know that it is important to avoid it. Could you run the tests for this PR so that I can link to the results in some internal docs for my team? After the tests are run, I am fine with closing the PR without merging.
#653

Describe alternatives you've considered

SQL Persistence handles this better?

Additional Context

No response

Add support for pessimistic lease lock when handling sagas

In high contention scenarios, for example scatter-gather, saga performance may suffer due to a large number of message retries.

The performance decrease is caused by optimistic concurrency control, forcing many messages related to the same saga instance to be retried multiple times. In some cases, retries may even be exhausted, and messages may end up in the error queue.

In such cases, pessimistic locking may be a better approach. With pessimistic locking, messages processing is serialized, and unnecessary retries are avoided.

CosmosDB doesn't support document pessimistic locking. The implementation, is based on a lease lock model.

Provide guidance on RU planning

As NServiceBus uses different CosmosDB operations depending on the user's code and configuration (e.g. whether Sagas, Outbox or both are used), it's not clearly visible what amount RUs need to be provisioned for the CosmosDB. Given the user has little insights into the actual operations used by the persistence, it's not possible to calculate the needs of the persistence from a user perspective.

Ideally, there is some guidance that helps understand the rough needs of RUs, e.g. per message being processed, depending on activated features that help provision a close enough RU capacity that is not too expensive but greatly reduces the risk of running into throttling. This is especially valuable in cases where this might prevent duplicates being created due to the Outbox running into request throttling.

Triage

This triage checklist should be filled in by a staff member from Particular Software.

Yes / No Triage criteria
The expected effort to code and document the enhancement is less than 8 hours.
A reviewer can approve any changes in less than 1 hour.
The solution is obvious and does not require analysis by a task force.
The issue is NOT blocking the customer from using the Preview in production.
The issue is blocking the user from using the Preview in production.
There's evidence solving the problem will benefit multiple users.

Outbox operations may leak during concurrent processing or retries

Who's affected

You are affected if:

Symptoms

Concurrent outbox operations for messages whose partition keys are extracted from the message content may leak into each other. This causes the amount of memory used by the endpoint processing those messages to grow, which may lead to out of memory problems and the dispatch of multiple redundant outgoing messages.

Root cause

The ConcurrentStack of transportOperations for the incoming logical message was being emitted as a constant (static) and was not being cleared before adding the outbox transportOperations.

Plan of action

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.