bitfaster / bitfaster.caching Goto Github PK
View Code? Open in Web Editor NEWHigh performance, thread-safe in-memory caching primitives for .NET
License: MIT License
High performance, thread-safe in-memory caching primitives for .NET
License: MIT License
On every lookup, item age is calculated and compared to the TTL. Internally, this results in a call to
Stopwatch.GetTimestamp()
, which is relatively expensive compared to the dictionary lookup that fetches the item.
Instead of calling Stopwatch.GetTimestamp()
on every lookup, a current time provider object could be used that would hold a cached current time timestamp property. High frequency calls to get a current timestamp would become very cheap.
At initialization this provider would create a new thread/task that would update it's current time timestamp property at regular intervals. Interval's length would depend on time resolution passed into a constructor. For example, even a resolution of once per second would probably be enough for most purposes, and would make cost of calling Stopwatch.GetTimestamp()
negligible.
class TimeProvider
{
public TimeProvider(TimeSpan resolution) { ... } // creates and runs background thread (Task)
// cached timestamp, updated in regular intervals by a background thread
public long Timestamp { get; private set; }
...
}
NOTE: I haven't run any benchmarks to validate this. This is just an idea I had while reading the docs.
First of all, Thanks for this gem. Really handy!
Any plans to "expire" items and remove them from the dictionary on the background? Like MemoryCache
does.
Somewhat like a background thread/timer that will evict old items.
Use Case: basically I am caching some db records with an external functions that deletes the records using different properties and not the key. Work around for now is to read the key from database before deleting them, and then use the key to remove from cache.
Having an option to enumerate all keys would give me to do a faster way of checking those properties in memory instead of database: and that would be sweet.
Additionally it also be used for diagnostic purposes.
ConcurrentDictionary has an overload of GetOrAdd that takes a value instead of a valueFactory, can an overload be added for the Lru?
Item-specific expiry TTL.
Probably by inheriting LruItem
and LruItemPolicy
implementation that will implement individual expiry for items.
I am currently using FastConcurrentTLru for my caching. For diagnostics purposes I would like to implement a dump feature which will basically return all key/values. It doesn't seem to have a Enumerator option.
How can I achieve this?
Would be nice to being able to evict all items from the cache
It would be great to be able to use "expire after access" in addition to the already implemented "expire after write" found in ConcurrentTlru
.
My use case is resource optimization of unloading unused values, not cache duration which is what I have used the "expire after write" implementation for recently. If I use the "expire after write" implementation I end up running the same expensive loading code more often.
I could use the plain LRU but then I am always keeping close to capacity in memory which is results in high resource use.
Caffeine provides expireAfterAccess
for the same scenarios: https://github.com/ben-manes/caffeine/wiki/Eviction#time-based
The TryRemove(K)
method implementation retrieves the value associated with the given key, but does not return that value. This means that if we need the value associated with the given key, we need to first invoke TryGet(K, out V)
to get the value and then invoke TryRemove(K)
, which is wasteful.
Not sure whether doing this will defeats the purpose of being "fast". But what if we want to show the rough memory usage of the cache may be in a dashboard. Can we have something like mycache.memoryusage() ? I know this may be a problem when caching objects as there are similar questions out there and no straight forward method of doing that except for attaching a profiler and find it out.
I am using this library to store cached results from some vendor API and I essentially set up my valueFactory
to pull from a database if I already have the result or to go to the vendor api.
As part of the contract term, they set a max length I am allowed to cache a value. Obviously, not a concern for a value that came from the vendor's api, but a potential concern for a value I pulled from the database. Yes, I can check that the value is still within the absolute limit when it is retrieved from the db, but the current write expiry will give it the full length, not the length remaining time to live. Is there anyway to have the Item's time to live be defined against a property of the value?
For example, google geocode results can only be used in a cache or stored for 30 days by contract.
I have an LRU cache that is used like this:
GetOrAdd
on an atomic LRU to enumerate the current directory entries and store them in the cache.TryRemove
on the LRU to evict the entry from the cache.However, looking at the source code of AtomicFactoryCache, it doesn't look like TryRemove
is guarded at all, which means I can run into this scenario:
GetOrAdd
starts running. It gets the directory enumerations at this point in time.TryRemove
to remove the item from the cache. GetOrAdd
hasn't finished running yet, but the value produced by the value factory will be stale.I thought initially to set some sort of "stale" flag before and after TryRemove executes, and have the value factory check it before returning (if the stale flag is set at the end of the value factory, it would clear the stale flag and re-evaluate itself). However, I don't believe this will be perfect because it's still possible for this sequence of events to occur:
TryRemove
and sets the stale flag.Admittedly the window here is much much smaller than the normal case, but I would still like to prevent it.
What I would like is a setting that makes TryRemove discard not only the current value in the cache, but also prevents any current in-flight GetOrAdd requests from storing their results in the cache (it's totally fine for them to return stale data for their individual requests, I just don't want it persisting in the cache).
FastConcurrentTLru
use TLruLongTicksPolicy
. this policy discard item by
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public bool ShouldDiscard(LongTickCountLruItem<K, V> item)
{
if (Stopwatch.GetTimestamp() - item.TickCount > this.timeToLive)
{
return true;
}
return false;
}
but the Stopwatch.GetTimestamp() has platform different result. see
Given this unit test:
[Fact]
public void WhenItemEvictedDueToCapacityShouldNoLongerBeRetrievable()
{
for (var i = 1; i < 100; i++)
{
lru.AddOrUpdate(i, i.ToString());
}
bool found;
string value;
// Last 3 items should be retrievable
found = lru.TryGet(99, out value);
found.Should().BeTrue();
value.Should().Be(99.ToString());
found = lru.TryGet(98, out value);
found.Should().BeTrue();
value.Should().Be(98.ToString());
found = lru.TryGet(97, out value);
found.Should().BeTrue();
value.Should().Be(97.ToString());
// First 3 items should not be retrievable
found = lru.TryGet(1, out value);
found.Should().BeFalse();
found = lru.TryGet(2, out value);
found.Should().BeFalse(); // fails at this point
found = lru.TryGet(3, out value);
found.Should().BeFalse();
}
Are my assumptions incorrect here?
Getting this as the keys present in the LRU when debugging:
Looks like the first entry of 1
is being successfully evicted, but entries 2
, 3
, and 4
are for some reason staying in the queue.
FWIW it looks like the items 2
, 3
, and 4
are for some reason "stuck" in the warm queue:
Originally posted by @khellang in #399 (comment)
I think we're bumping up against the same bug. Can you confirm this also affects ConcurrentLru<K, V>
?
Is there a workaround? Looping through entries and calling TryRemove
?
We're using v2.2.1.
UPDATE: Looping with TryRemove
seems to leave the cache in a broken state as well ๐
Is it possible to serialize the contents of the LRU - I want to use the LRU as a bounded error log, so I can easily view the X most recent errors.
I have a situation where clearing the cache doesn't work. I can reproduce it with the following test case:
[Test]
public async Task TestClear()
{
var cache = new ConcurrentLfuBuilder<string, string>()
.WithScheduler(new BackgroundThreadScheduler())
.WithCapacity(40)
.WithAtomicGetOrAdd()
.AsAsyncCache()
.Build();
var emptyCount = 0;
for (var a = 0; a < 200; a++)
{
for (var i = 0; i < 10; i++)
{
var str = "a" + i;
await getOrAdd(str);
}
cache.Clear();
cache.Clear(); // attempt to trigger maintenance after previous clear operation
emptyCount += cache.Count;
}
Assert.That(emptyCount, Is.EqualTo(0));
async Task getOrAdd(string key)
{
await cache.GetOrAddAsync(key, Task.FromResult);
}
}
When I run this test, it fails most of the times, but occasionally succeeds. When it fails, emptyCount
varies from 2 to 7.
When I debugged the real situation, I ended up in ConcurrentLfu.Trim
. Here, I noticed that:
this.Count
was 3.this.probationLru
had something like 26 nodes in it - but it looked like multiple nodes had the same key.TakeCandidatesInLruOrder
produced 3 candidates.TryRemove
failed to remove the candidates.Debugging from the test has the same pattern - TryRemove
fails since this.dictionary
doesn't contain the candidate, so it appears to be out of sync.
I know that there is some sort of async maintenance going on (that's why I try to force maintenance in the test with the second Clear
), but in the application where the cache is used, the entries don't go away even after a while - the count is stuck at non-zero after clearing.
Hi,
Would you consider adding an ICache.Put(K key, V value) method to add or replace an item in the cache without using a callback to retrieve it.
From reading your documentation, it looks like the GetOrAdd method assumes that the lambda will always be successful in getting an item to add. I could throw an exception from the callback, but that has poor performance (unless returning a null value prevents the key being stored?)
Also, there are situations where you know the old value for a key is out of date and already have the replacement value to hand.
Starting from .NET Framework 4.7.2 we have a GetOrAdd
overload for the ConcurrentDictionary<,>
that takes in additional argument being passed to the value factory delegate. The suggestion is to add the same to the ICache<,>.GetOrAdd
and IAsyncCache<,>.GetOrAddAsync
.
This would eliminate the need to allocate instances for closures necessary to pass additional context/state or CancellationToken
to the async value factories.
Given this test on v2.3.2, running .net 7 on an m1 mac:
private record FakeCacheKey(int SomeProperty);
private record FakeCacheValue
{
public int SomeProperty { get; set; }
};
[Theory]
[InlineData(true)]
[InlineData(false)]
public void WhenClearingCache_ShouldActuallyClearCache(bool shouldClearTwice)
{
var cache = new ConcurrentTLru<FakeCacheKey, FakeCacheValue>(3, TimeSpan.FromMinutes(10));
var keyOne = new FakeCacheKey(1);
var keyTwo = new FakeCacheKey(2);
var cacheValue = new FakeCacheValue();
cache.AddOrUpdate(keyOne, cacheValue);
cache.AddOrUpdate(keyTwo, cacheValue);
cache.TryGet(keyOne, out var retrievedKeyValueOne);
retrievedKeyValueOne.Should().BeSameAs(cacheValue);
cache.TryGet(keyTwo, out var retrievedKeyTwoValue);
retrievedKeyTwoValue.Should().BeSameAs(cacheValue);
cache.Clear();
if (shouldClearTwice)
cache.Clear();
cache.TryGet(keyOne, out retrievedKeyValueOne);
retrievedKeyValueOne.Should().NotBeSameAs(cacheValue);
cache.TryGet(keyOne, out retrievedKeyTwoValue);
retrievedKeyTwoValue.Should().NotBeSameAs(cacheValue);
}
The test is consistently failing except when the cache.Clear()
is run a second time. Is this a race condition of some sort? Am i just misunderstanding what Clear()
is supposed to be doing?
Stepping through the "double clear" test while watching the internal cache state:
It would be nice to have the feature for those caches that support TTL, that returns the stale value during revalidation of the cached instance, being able to react to the newer instance once retrieved updating the cache in the background for later hits.
This would speed up misses, allowing a stale object to be returned during revalidating time.
This could be optionally defined on the policies or cache options
Thank you :)
It seems like this feature is not available in any of the caches.
Is there a way to configure cache to not store values? Basically a Set instead of a Dictionary. Best I can come up with is this:
var cache = new FastConcurrentLru<Guid, Object>(capacity: 128);
cache.TryGet(someId, out _);
cache.AddOrUpdate(someId, null);
Something like this would be really nice. Will be a simpler API surface and will probably use less memory internally:
var cache = new FastConcurrentLru[Set]<Guid>(capacity: 128);
cache.TryGet(someId);
cache.AddOrUpdate(someId);
Thank you for a great caching library!
I have a value factory that may throw an exception. I noticed that if this happens, an entry remains in the cache. This happens when I use WithAtomicGetOrAdd
.
Example code:
using BitFaster.Caching.Lfu;
var cache = new ConcurrentLfuBuilder<string, bool>().WithAtomicGetOrAdd().Build();
try
{
_ = cache.GetOrAdd("foo", s => throw new Exception(s));
}
catch
{
// ignore
}
Console.WriteLine(cache.Count); // prints: 1
Console.WriteLine(string.Join(", ", cache.Keys)); // prints: foo
It happens for an LRU cache as well.
The main problem, as I see it, is that this results in cache eviction. Consider this longer example:
using BitFaster.Caching.Lfu;
using BitFaster.Caching.Scheduler;
var cache = new ConcurrentLfuBuilder<string, bool>().WithAtomicGetOrAdd().WithScheduler(new ForegroundScheduler()).WithCapacity(3).Build();
try
{
_ = cache.GetOrAdd("aa", _ => true);
_ = cache.GetOrAdd("bb", _ => true);
_ = cache.GetOrAdd("bb", _ => true);
_ = cache.GetOrAdd("cc", _ => true);
_ = cache.GetOrAdd("cc", _ => true);
_ = cache.GetOrAdd("foo", s => throw new Exception(s));
}
catch
{
// ignore
}
Console.WriteLine(cache.Count); // prints: 3
Console.WriteLine(string.Join(", ", cache.Keys)); // prints: cc, bb, foo
Here, aa
is unexpectedly (in my opinion) evicted from the cache.
var lru = new ConcurrentLru<int, User>(5);
var cur = new ConcurrentDictionary<int, User>();
for (int i = 0; i < 5; i++)
{
lru.GetOrAdd(i, k =>
{
Console.WriteLine("Called LRU " + i);
return new User { Id = i, UserName = "Name " + k };
});
cur.GetOrAdd(i, k =>
{
Console.WriteLine("Called Dict " + i);
return new User { Id = i, UserName = "Name " + k };
});
}
for (int i = 0; i < 5; i++)
{
lru.GetOrAdd(i, k =>
{
Console.WriteLine("Called Again LRU " + i);
return new User { Id = i, UserName = "Name " + k };
});
cur.GetOrAdd(i, k =>
{
Console.WriteLine("Called Again Dict " + i);
return new User { Id = i, UserName = "Name " + k };
});
}
Console.ReadLine();
We have a use case where we need to get notified when an item is evicted, would it be possible to add an event one could wire up to?
For some specific capacity partitions and cache sizes cold qeuue becomes unbounded an increases infinitely.
It can be observed by test WhenKeysAreContinuouslyRequestedInTheOrderTheyAreAddedCountIsBounded
with WarmFavourPartition
with Warm ratio 0.6
and cache sizes 64
, 128
. Its for ConcurrentLru
implementation.
The fix is not to allow to cycle back from cold queue to warm if warm is full.
I noticed that I wasn't getting any intellisense documentation while using the package even though you've got documentation comments in the code.
I took a quick look at the project file, and the reason the xml documentation file isn't being included in the package is that <GenerateDocumentationFile>true</GenerateDocumentationFile>
is missing from the PropertyGroup
.
It would be very helpful to have that added!
Add ttl policy that takes a Func(Value, Timestamp) to resolve timestamp from value would allow users to have a dynamic TTL based on each item. It would be a few modifications to TLruLongTicksPolicy
We sometimes need to cache items where the older the item in the cache, the more likely it is to be accessed. Once accessed, the item is very unlikely to be accessed again. Therefore, we wish to evict/replace most-recently-used items. It would be great if a set of MRU cache types were to be made available in this fantastic library.
Maybe we can increase the speed when https://github.com/VSadov/NonBlocking is used instead of ConcurrentDictionary?
Hi, looking for a better Collection than ConcurrentDictionary for Read Heavy website in terms of memory usage. I have several million items of string / objects
Is this what you target for ?
Essentially what I am trying to achieve here is to "try to remove this key only if it has this value". To give some background on why this is needed, my use case for this library is to have an LRU cache for resource mappings done by an external server. Every resource mapping comes with an expiration date, so at some point the value has to be refreshed from the server. I am using the AsAsyncCache
and WithAtomicGetOrAdd
options to cache the result of the network call. Each value inserted into the cache has an ExpiryTime
property. After calling GetOrAddAsync
, if the ExpiryTime
has passed, the existing value has to be removed and replaced with a new one. Here is the basic flow with existing APIs:
var entry = await this.cache.GetOrAddAsync(key, async (k) => ...);
if (entry.ExpiryTime < DateTimeOffset.UtcNow)
{
this.logger.Log(...);
this.cache.TryRemove(key);
entry = await this.cache.GetOrAddAsync(key, async (k) => ...);
}
The problem with TryRemove(K)
in this scenario is that multiple threads will access this logic, so there would be a race between TryRemove
and GetOrAddAsync
. You can correct me if I'm wrong on how those are synchronized, but what I imagine this would mean that it's possible for one thread to see an expired entry, remove it, call GetOrAddAsync
to cache and return a new value, and then another thread which had also previously read the expired value could call TryRemove
, remove the newly added value, and make another expensive network call in the second GetOrAddAsync
. In reality this would be rare, but when it does happen it would artificially lower the cache hit rate.
So the solution here would be to have an atomic operation to "remove this key if it has the old value". With the new API it would be a guarantee in that situation that the network call is only done by one thread:
var entry = await this.cache.GetOrAddAsync(key, async (k) => ...);
if (entry.ExpiryTime < DateTimeOffset.UtcNow)
{
this.logger.Log(...);
this.cache.TryRemove(new KeyValuePair<K, V>(key, entry));
entry = await this.cache.GetOrAddAsync(key, async (k) => ...);
}
Looking at the ConcurrentLruCore implementation for TryRemove(K), it looks like this should be a relatively simple addition, because it would just skip the enclosing while loop in the existing TryRemove
logic and go straight to the atomic remove step. But I'm not sure if there are other factors to consider.
I'm sure there are workarounds to this today, but based on my research I would need a custom IItemPolicy
for my cache entry type which customizes ShouldDiscard
, and would need to chain together my own async cache with atomic get-or-add, rather than being able to use the ConcurrentLruBuilder
. So having this API feels like the simplest option. Let me know what you think. Thanks!
I'd like to start with a thank you for this library. I am evaluating it as a replacement to IMemoryCache from Microsoft and wondering whether it is possible to disable automatic trimming/eviction and handling it manually. My understanding is that if I allocate cache with big enough capacity which I know won't be fully utilized fully, will the trimming/eviction process not run at all?
Scenario:
If I create a cache object with WithCapacity(1_000_000), will I effectively disable the automatic eviction? Also, are there any downsides allocating a capacity much larger than the application will ever use?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.