bitfaster / bitfaster.caching Goto Github PK

View Code? Open in Web Editor NEW

393.0 6.0 23.0 1.22 MB

High performance, thread-safe in-memory caching primitives for .NET

License: MIT License

C# 99.89% Batchfile 0.11%

cache lru threadsafe dotnet lfu

bitfaster.caching's People

Contributors

Stargazers

Watchers

bitfaster.caching's Issues

Current time provider

On every lookup, item age is calculated and compared to the TTL. Internally, this results in a call to Stopwatch.GetTimestamp(), which is relatively expensive compared to the dictionary lookup that fetches the item.

Instead of calling Stopwatch.GetTimestamp() on every lookup, a current time provider object could be used that would hold a cached current time timestamp property. High frequency calls to get a current timestamp would become very cheap.

At initialization this provider would create a new thread/task that would update it's current time timestamp property at regular intervals. Interval's length would depend on time resolution passed into a constructor. For example, even a resolution of once per second would probably be enough for most purposes, and would make cost of calling Stopwatch.GetTimestamp() negligible.

class TimeProvider 
{
    public TimeProvider(TimeSpan resolution)    { ... } // creates and runs background thread (Task)

    // cached timestamp, updated in regular intervals by a background thread
    public long Timestamp { get; private set; }

    ...
}

NOTE: I haven't run any benchmarks to validate this. This is just an idea I had while reading the docs.

Any plans to evict expired items on the background?

First of all, Thanks for this gem. Really handy!

Any plans to "expire" items and remove them from the dictionary on the background? Like MemoryCache does.

Somewhat like a background thread/timer that will evict old items.

[Feature request] - Enumerate all cached keys

Use Case: basically I am caching some db records with an external functions that deletes the records using different properties and not the key. Work around for now is to read the key from database before deleting them, and then use the key to remove from cache.

Having an option to enumerate all keys would give me to do a faster way of checking those properties in memory instead of database: and that would be sweet.

Additionally it also be used for diagnostic purposes.

Add a GetOrAdd that takes a value?

ConcurrentDictionary has an overload of GetOrAdd that takes a value instead of a valueFactory, can an overload be added for the Lru?

[Feature request] individual items expiry

Item-specific expiry TTL.

Probably by inheriting LruItem and LruItemPolicy implementation that will implement individual expiry for items.

Is there a way to dump all values in the cache?

I am currently using FastConcurrentTLru for my caching. For diagnostics purposes I would like to implement a dump feature which will basically return all key/values. It doesn't seem to have a Enumerator option.

How can I achieve this?

Ability to Clear the cache

Would be nice to being able to evict all items from the cache

[Feature request] Expire after access LRU

It would be great to be able to use "expire after access" in addition to the already implemented "expire after write" found in ConcurrentTlru.

My use case is resource optimization of unloading unused values, not cache duration which is what I have used the "expire after write" implementation for recently. If I use the "expire after write" implementation I end up running the same expensive loading code more often.

I could use the plain LRU but then I am always keeping close to capacity in memory which is results in high resource use.

Caffeine provides expireAfterAccess for the same scenarios: https://github.com/ben-manes/caffeine/wiki/Eviction#time-based

[Feature request] Add TryRemove(K, out V) overload

The TryRemove(K) method implementation retrieves the value associated with the given key, but does not return that value. This means that if we need the value associated with the given key, we need to first invoke TryGet(K, out V) to get the value and then invoke TryRemove(K), which is wasteful.

Is it possible to add a method where it'll provide rough memory that is being used by the cache stance?

Not sure whether doing this will defeats the purpose of being "fast". But what if we want to show the rough memory usage of the cache may be in a dashboard. Can we have something like mycache.memoryusage() ? I know this may be a problem when caching objects as there are similar questions out there and no straight forward method of doing that except for attaching a profiler and find it out.

Expiration based on a Value's property?

I am using this library to store cached results from some vendor API and I essentially set up my valueFactory to pull from a database if I already have the result or to go to the vendor api.

As part of the contract term, they set a max length I am allowed to cache a value. Obviously, not a concern for a value that came from the vendor's api, but a potential concern for a value I pulled from the database. Yes, I can check that the value is still within the absolute limit when it is retrieved from the db, but the current write expiry will give it the full length, not the length remaining time to live. Is there anyway to have the Item's time to live be defined against a property of the value?

For example, google geocode results can only be used in a cache or stored for 30 days by contract.

[Feature request] Atomic TryRemove

I have an LRU cache that is used like this:

When a directory is requested, it does GetOrAdd on an atomic LRU to enumerate the current directory entries and store them in the cache.
When a file or subdirectory is modified, it calls TryRemove on the LRU to evict the entry from the cache.

However, looking at the source code of AtomicFactoryCache, it doesn't look like TryRemove is guarded at all, which means I can run into this scenario:

Thread A requests a directory which isn't in the cache, so GetOrAdd starts running. It gets the directory enumerations at this point in time.
Thread B modifies a file or subdirectory and calls TryRemove to remove the item from the cache. GetOrAdd hasn't finished running yet, but the value produced by the value factory will be stale.
Thread A saves the stale value into the LRU cache.

I thought initially to set some sort of "stale" flag before and after TryRemove executes, and have the value factory check it before returning (if the stale flag is set at the end of the value factory, it would clear the stale flag and re-evaluate itself). However, I don't believe this will be perfect because it's still possible for this sequence of events to occur:

Thread A's value factory checks stale flag, it's not set.
Thread A schedules and calls TryRemove and sets the stale flag.
Thread A's value factory returns it's now stale value to the LRU which gets stored.

Admittedly the window here is much much smaller than the normal case, but I would still like to prevent it.

What I would like is a setting that makes TryRemove discard not only the current value in the cache, but also prevents any current in-flight GetOrAdd requests from storing their results in the cache (it's totally fine for them to return stale data for their individual requests, I just don't want it persisting in the cache).

FastConcurrentTLru's Constructor timetolive is not precise on liunx mac platform

FastConcurrentTLru use TLruLongTicksPolicy. this policy discard item by

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public bool ShouldDiscard(LongTickCountLruItem<K, V> item)
{
    if (Stopwatch.GetTimestamp() - item.TickCount > this.timeToLive)
    {
        return true;
    }

    return false;
}

but the Stopwatch.GetTimestamp() has platform different result. see

https://stackoverflow.com/questions/67205283/stopwatch-gettimestamp-produced-different-results-on-linux-vs-windows

Doesn't look like capacity expiration/evicting is happening properly (might be specific to the initial population of the warm queue)

Given this unit test:

[Fact]
public void WhenItemEvictedDueToCapacityShouldNoLongerBeRetrievable()
{
    for (var i = 1; i < 100; i++)
    {
        lru.AddOrUpdate(i, i.ToString());
    }

    bool found;
    string value;

    // Last 3 items should be retrievable
    found = lru.TryGet(99, out value);
    found.Should().BeTrue();
    value.Should().Be(99.ToString());
    
    found = lru.TryGet(98, out value);
    found.Should().BeTrue();
    value.Should().Be(98.ToString());
    
    found = lru.TryGet(97, out value);
    found.Should().BeTrue();
    value.Should().Be(97.ToString());

    // First 3 items should not be retrievable
    found = lru.TryGet(1, out value);
    found.Should().BeFalse();
    
    found = lru.TryGet(2, out value);
    found.Should().BeFalse(); // fails at this point
    
    found = lru.TryGet(3, out value);
    found.Should().BeFalse();
}

Are my assumptions incorrect here?

Getting this as the keys present in the LRU when debugging:

Looks like the first entry of 1 is being successfully evicted, but entries 2, 3, and 4 are for some reason staying in the queue.

FWIW it looks like the items 2, 3, and 4 are for some reason "stuck" in the warm queue:

Clearing ConcurrentLru leaves cache in broken state

Originally posted by @khellang in #399 (comment)

I think we're bumping up against the same bug. Can you confirm this also affects ConcurrentLru<K, V>?

Is there a workaround? Looping through entries and calling TryRemove?

We're using v2.2.1.

UPDATE: Looping with TryRemove seems to leave the cache in a broken state as well 😅

Serialise contents of LRU

Is it possible to serialize the contents of the LRU - I want to use the LRU as a bounded error log, so I can easily view the X most recent errors.

Clearing LFU cache doesn't actually clear it

I have a situation where clearing the cache doesn't work. I can reproduce it with the following test case:

    [Test]
    public async Task TestClear()
    {
        var cache = new ConcurrentLfuBuilder<string, string>()
            .WithScheduler(new BackgroundThreadScheduler())
            .WithCapacity(40)
            .WithAtomicGetOrAdd()
            .AsAsyncCache()
            .Build();

        var emptyCount = 0;
        for (var a = 0; a < 200; a++)
        {
            for (var i = 0; i < 10; i++)
            {
                var str = "a" + i;
                await getOrAdd(str);
            }

            cache.Clear();
            cache.Clear(); // attempt to trigger maintenance after previous clear operation

            emptyCount += cache.Count;
        }

        Assert.That(emptyCount, Is.EqualTo(0));

        async Task getOrAdd(string key)
        {
            await cache.GetOrAddAsync(key, Task.FromResult);
        }
    }

When I run this test, it fails most of the times, but occasionally succeeds. When it fails, emptyCount varies from 2 to 7.

When I debugged the real situation, I ended up in ConcurrentLfu.Trim. Here, I noticed that:

this.Count was 3.
this.probationLru had something like 26 nodes in it - but it looked like multiple nodes had the same key.
The repeated calls to TakeCandidatesInLruOrder produced 3 candidates.
TryRemove failed to remove the candidates.

Debugging from the test has the same pattern - TryRemove fails since this.dictionary doesn't contain the candidate, so it appears to be out of sync.

I know that there is some sort of async maintenance going on (that's why I try to force maintenance in the test with the second Clear), but in the application where the cache is used, the entries don't go away even after a while - the count is stuck at non-zero after clearing.

Add Put Method

Hi,

Would you consider adding an ICache.Put(K key, V value) method to add or replace an item in the cache without using a callback to retrieve it.

From reading your documentation, it looks like the GetOrAdd method assumes that the lambda will always be successful in getting an item to add. I could throw an exception from the callback, but that has poor performance (unless returning a null value prevents the key being stored?)

Also, there are situations where you know the old value for a key is out of date and already have the replacement value to hand.

[Feature request] Allow to pass additional factory argument to the `GetOrAdd`/`GetOrAddAsync` cache methods

Starting from .NET Framework 4.7.2 we have a GetOrAdd overload for the ConcurrentDictionary<,> that takes in additional argument being passed to the value factory delegate. The suggestion is to add the same to the ICache<,>.GetOrAdd and IAsyncCache<,>.GetOrAddAsync.

This would eliminate the need to allocate instances for closures necessary to pass additional context/state or CancellationToken to the async value factories.

`cache.Clear()` doesn't seem to be clearing entire cache

Given this test on v2.3.2, running .net 7 on an m1 mac:

    private record FakeCacheKey(int SomeProperty);
    private record FakeCacheValue
    {
        public int SomeProperty { get; set; }
    };

    [Theory]
    [InlineData(true)]
    [InlineData(false)]
    public void WhenClearingCache_ShouldActuallyClearCache(bool shouldClearTwice)
    {
        var cache = new ConcurrentTLru<FakeCacheKey, FakeCacheValue>(3, TimeSpan.FromMinutes(10));
        var keyOne = new FakeCacheKey(1);
        var keyTwo = new FakeCacheKey(2);
        var cacheValue = new FakeCacheValue();
        
        cache.AddOrUpdate(keyOne, cacheValue);
        cache.AddOrUpdate(keyTwo, cacheValue);
        
        cache.TryGet(keyOne, out var retrievedKeyValueOne);
        retrievedKeyValueOne.Should().BeSameAs(cacheValue);
        cache.TryGet(keyTwo, out var retrievedKeyTwoValue);
        retrievedKeyTwoValue.Should().BeSameAs(cacheValue);
        
        
        cache.Clear();
        if (shouldClearTwice)
            cache.Clear();

        cache.TryGet(keyOne, out retrievedKeyValueOne);
        retrievedKeyValueOne.Should().NotBeSameAs(cacheValue);
        cache.TryGet(keyOne, out retrievedKeyTwoValue);
        retrievedKeyTwoValue.Should().NotBeSameAs(cacheValue);
    }

The test is consistently failing except when the cache.Clear() is run a second time. Is this a race condition of some sort? Am i just misunderstanding what Clear() is supposed to be doing?

Stepping through the "double clear" test while watching the internal cache state:

[Feature request] - Stale while revalidate caching

It would be nice to have the feature for those caches that support TTL, that returns the stale value during revalidation of the cached instance, being able to react to the newer instance once retrieved updating the cache in the background for later hits.

This would speed up misses, allowing a stale object to be returned during revalidating time.

This could be optionally defined on the policies or cache options

Thank you :)

Enumerate all cached keys

It seems like this feature is not available in any of the caches.

[Feature request] - Keys without values

Is there a way to configure cache to not store values? Basically a Set instead of a Dictionary. Best I can come up with is this:

var cache = new FastConcurrentLru<Guid, Object>(capacity: 128);
cache.TryGet(someId, out _);
cache.AddOrUpdate(someId, null);

Something like this would be really nice. Will be a simpler API surface and will probably use less memory internally:

var cache = new FastConcurrentLru[Set]<Guid>(capacity: 128);
cache.TryGet(someId);
cache.AddOrUpdate(someId);

Entry left in cache configured with WithAtomicGetOrAdd when value factory throws

Thank you for a great caching library!

I have a value factory that may throw an exception. I noticed that if this happens, an entry remains in the cache. This happens when I use WithAtomicGetOrAdd.

Example code:

using BitFaster.Caching.Lfu;

var cache = new ConcurrentLfuBuilder<string, bool>().WithAtomicGetOrAdd().Build();

try
{
    _ = cache.GetOrAdd("foo", s => throw new Exception(s));
}
catch
{
    // ignore
}

Console.WriteLine(cache.Count);                   // prints: 1
Console.WriteLine(string.Join(", ", cache.Keys)); // prints: foo

It happens for an LRU cache as well.

The main problem, as I see it, is that this results in cache eviction. Consider this longer example:

using BitFaster.Caching.Lfu;
using BitFaster.Caching.Scheduler;

var cache = new ConcurrentLfuBuilder<string, bool>().WithAtomicGetOrAdd().WithScheduler(new ForegroundScheduler()).WithCapacity(3).Build();

try
{
    _ = cache.GetOrAdd("aa", _ => true);
    _ = cache.GetOrAdd("bb", _ => true);
    _ = cache.GetOrAdd("bb", _ => true);
    _ = cache.GetOrAdd("cc", _ => true);
    _ = cache.GetOrAdd("cc", _ => true);
    _ = cache.GetOrAdd("foo", s => throw new Exception(s));
}
catch
{
    // ignore
}

Console.WriteLine(cache.Count);                   // prints: 3
Console.WriteLine(string.Join(", ", cache.Keys)); // prints: cc, bb, foo

Here, aa is unexpectedly (in my opinion) evicted from the cache.

ConcurrentLru's "GetOrAdd" vs ConcurrentDictionary's "GetOrAdd" behave differently

        var lru = new ConcurrentLru<int, User>(5);
        var cur = new ConcurrentDictionary<int, User>();

        for (int i = 0; i < 5; i++)
        {
            lru.GetOrAdd(i, k =>
            {
                Console.WriteLine("Called LRU " + i);
                return new User { Id = i, UserName = "Name " + k };
            });

            cur.GetOrAdd(i, k =>
            {
                Console.WriteLine("Called Dict " + i);
                return new User { Id = i, UserName = "Name " + k };
            });
        }

        for (int i = 0; i < 5; i++)
        {
            lru.GetOrAdd(i, k =>
            {
                Console.WriteLine("Called Again LRU " + i);
                return new User { Id = i, UserName = "Name " + k };
            });

            cur.GetOrAdd(i, k =>
            {
                Console.WriteLine("Called Again Dict " + i);
                return new User { Id = i, UserName = "Name " + k };
            });
        }

        Console.ReadLine();

Output:

[Feature] Add event for when an item is expired.

We have a use case where we need to get notified when an item is evicted, would it be possible to add an event one could wire up to?

[Bug] Cold queue increases infinitely for some partitions and cache sizes

For some specific capacity partitions and cache sizes cold qeuue becomes unbounded an increases infinitely.

It can be observed by test WhenKeysAreContinuouslyRequestedInTheOrderTheyAreAddedCountIsBounded with WarmFavourPartition with Warm ratio 0.6 and cache sizes 64, 128. Its for ConcurrentLru implementation.

The fix is not to allow to cycle back from cold queue to warm if warm is full.

NuGet package is missing intellisense xml file

I noticed that I wasn't getting any intellisense documentation while using the package even though you've got documentation comments in the code.

I took a quick look at the project file, and the reason the xml documentation file isn't being included in the package is that <GenerateDocumentationFile>true</GenerateDocumentationFile> is missing from the PropertyGroup.

It would be very helpful to have that added!

[Feature]: Add ttl policy that takes a Func(Value, Timestamp) to resolve timestamp from value

Add ttl policy that takes a Func(Value, Timestamp) to resolve timestamp from value would allow users to have a dynamic TTL based on each item. It would be a few modifications to TLruLongTicksPolicy

[Feature request] Add MRU cache

We sometimes need to cache items where the older the item in the cache, the more likely it is to be accessed. Once accessed, the item is very unlikely to be accessed again. Therefore, we wish to evict/replace most-recently-used items. It would be great if a set of MRU cache types were to be made available in this fantastic library.

Use NonBlocking instead of ConcurrentDictionary

Maybe we can increase the speed when https://github.com/VSadov/NonBlocking is used instead of ConcurrentDictionary?

Usage as global Dictionary within Website

Hi, looking for a better Collection than ConcurrentDictionary for Read Heavy website in terms of memory usage. I have several million items of string / objects
Is this what you target for ?

[Feature Request] Add TryRemove(KeyValuePair<K, V>) overload

Essentially what I am trying to achieve here is to "try to remove this key only if it has this value". To give some background on why this is needed, my use case for this library is to have an LRU cache for resource mappings done by an external server. Every resource mapping comes with an expiration date, so at some point the value has to be refreshed from the server. I am using the AsAsyncCache and WithAtomicGetOrAdd options to cache the result of the network call. Each value inserted into the cache has an ExpiryTime property. After calling GetOrAddAsync, if the ExpiryTime has passed, the existing value has to be removed and replaced with a new one. Here is the basic flow with existing APIs:

var entry = await this.cache.GetOrAddAsync(key, async (k) => ...);
if (entry.ExpiryTime < DateTimeOffset.UtcNow)
{
    this.logger.Log(...);
    this.cache.TryRemove(key);
    entry = await this.cache.GetOrAddAsync(key, async (k) => ...);
}

The problem with TryRemove(K) in this scenario is that multiple threads will access this logic, so there would be a race between TryRemove and GetOrAddAsync. You can correct me if I'm wrong on how those are synchronized, but what I imagine this would mean that it's possible for one thread to see an expired entry, remove it, call GetOrAddAsync to cache and return a new value, and then another thread which had also previously read the expired value could call TryRemove, remove the newly added value, and make another expensive network call in the second GetOrAddAsync. In reality this would be rare, but when it does happen it would artificially lower the cache hit rate.

So the solution here would be to have an atomic operation to "remove this key if it has the old value". With the new API it would be a guarantee in that situation that the network call is only done by one thread:

var entry = await this.cache.GetOrAddAsync(key, async (k) => ...);
if (entry.ExpiryTime < DateTimeOffset.UtcNow)
{
    this.logger.Log(...);
    this.cache.TryRemove(new KeyValuePair<K, V>(key, entry));
    entry = await this.cache.GetOrAddAsync(key, async (k) => ...);
}

Looking at the ConcurrentLruCore implementation for TryRemove(K), it looks like this should be a relatively simple addition, because it would just skip the enclosing while loop in the existing TryRemove logic and go straight to the atomic remove step. But I'm not sure if there are other factors to consider.

I'm sure there are workarounds to this today, but based on my research I would need a custom IItemPolicy for my cache entry type which customizes ShouldDiscard, and would need to chain together my own async cache with atomic get-or-add, rather than being able to use the ConcurrentLruBuilder. So having this API feels like the simplest option. Let me know what you think. Thanks!

Is it possible to disable the eviction?

I'd like to start with a thank you for this library. I am evaluating it as a replacement to IMemoryCache from Microsoft and wondering whether it is possible to disable automatic trimming/eviction and handling it manually. My understanding is that if I allocate cache with big enough capacity which I know won't be fully utilized fully, will the trimming/eviction process not run at all?

Scenario:

I have a need to cache a maximum of 500000 items.
I know for the fact that, by the applications constraints, this limit won't ever be exceeded.
A single thread will do the add/remove operations. There will be many threads which can read from the cache.

If I create a cache object with WithCapacity(1_000_000), will I effectively disable the automatic eviction? Also, are there any downsides allocating a capacity much larger than the application will ever use?

bitfaster / bitfaster.caching Goto Github PK

bitfaster.caching's People

Contributors

Stargazers

Watchers

Forkers

bitfaster.caching's Issues

Recommend Projects

Recommend Topics

Recommend Org