Giter Club home page Giter Club logo

Comments (9)

bitfaster avatar bitfaster commented on May 30, 2024 1

You beat me to it - I was going to try with ForegroundScheduler.

I think the reason ForegroundScheduler is working and BackgroundThreadScheduler fails (even when calling Clear twice) is due to a bug - the Clear method is immediately purging the write buffer which contains the removed items that were just added by TryRemove - they are removed before they have been processed in the background. Foreground scheduler works because TryRemove is fully completed before the write buffer is cleared.

What isn't clear to me is how the unit test for this can pass. I will debug this in detail.

from bitfaster.caching.

bitfaster avatar bitfaster commented on May 30, 2024 1

I looked at this some more yesterday - the code is wrong and leaves orphan nodes in the window/probation/protected LRUs. These are the bogus items you observed in the probation LRU.

The unit test I have passes because it only validates via the count and whether the removed keys exist. Both count and exists are based on the internal dictionary. Clear internally calls TryRemove which immediately removes the key from the dictionary (thus test criteria is met and the unit test passes). TryRemove also stores a record of removal in the write buffer (so the LRUs can be cleaned up by maintenance later). But since Clear immediately drops the buffers before maintenance, orphans are left behind in window/probation/protected. These are not observable by the test when performing only a single iteration of clear.

In your test, it is possible background maintenance runs before Clear drops the write buffer (so it can pass sometimes). If maintenance doesn't run, orphans are left behind and subsequent iterations build up more orphans. Since Clear will attempt to Trim n cache entries, the orphans with dupe keys will be processed ahead of other valid items that appear in the list after n items are processed. For example, let's say the cache is size 3 and contains keys [a, b, c], but the probation LRU ends up with orphans of key a, so contains [a, a, a, b, c]. Trim will take n=3 candidates from probation, then attempt to remove key a 3 times. The next time around, it may have cleared all the orphans and produce a valid result, so it works intermittently.

I will fix this so that the write buffer items are not dropped. I will also implement an integrity check in the tests to validate the internal data structures are in a consistent state to prevent re-introducing this bug. Last month I added some soak tests for ConcurrentLru (multi-threaded tests that interleave different read/write method pairs), I will also get this done for LFU and check for any other similar bugs.

from bitfaster.caching.

provegard avatar provegard commented on May 30, 2024

Update: I tested with ForegroundScheduler, and that makes the test green of course, which means that the test doesn't reflect the real-world situation, where BackgroundThreadScheduler supposedly never gets around to clean things up.

from bitfaster.caching.

provegard avatar provegard commented on May 30, 2024

I would have thought that calling Clear multiple times would work, since it calls Maintenance internally, which also is what the background scheduler does.

from bitfaster.caching.

khellang avatar khellang commented on May 30, 2024

Hey @bitfaster! 👋🏻

I think we're bumping up against the same bug. Can you confirm this also affects ConcurrentLru<K, V>?

Is there a workaround? Looping through entries and calling TryRemove?

We're using v2.2.1.

UPDATE: Looping with TryRemove seems to leave the cache in a broken state as well 😅

from bitfaster.caching.

khellang avatar khellang commented on May 30, 2024

Moved repro to #402

from bitfaster.caching.

khellang avatar khellang commented on May 30, 2024

Moved repro to #402

from bitfaster.caching.

khellang avatar khellang commented on May 30, 2024

Moved repro to #402

from bitfaster.caching.

bitfaster avatar bitfaster commented on May 30, 2024

@khellang I created a new issue for ConcurrentLru here #402, and copied over all your notes.

from bitfaster.caching.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.