Giter Club home page Giter Club logo

Comments (6)

otoolep avatar otoolep commented on June 13, 2024 1

Yeah, I like that. Expose the information, so it's there for diagnostics and power users if needed.

from rqlite.

otoolep avatar otoolep commented on June 13, 2024

Actually, it wouldn't need to change the busy timeout. Just exec the SQL statement (PRAGMA CHECKPOINT) with a timeout -- the SQLite driver supports it via context objects.

from rqlite.

otoolep avatar otoolep commented on June 13, 2024

An alternative idea, which might be more elegant, would be to use channels (or condition variables) to signal when there are Store.Request() or Store.Query() functions active. Whenever the number of those requests goes from non-zero to zero, a window has opened up on which a snapshot could take place. Use that window to block more read requests, check if a snapshot is needed due to WAL size, and do it if necessary.

from rqlite.

Tjstretchalot avatar Tjstretchalot commented on June 13, 2024

I like the alternative idea. The only additional part that I would add is some hint that clients can see that provides a hint for if a snapshot is desired right now, perhaps as an addition to one of the existing endpoints. For example, adding "snapshot_overdue_ms": 123 to store in /db/info, which is how many milliseconds overdue snaphotting is. This could also be the same metric to decide whether to stop accepting new queries to the instance to allow snapshotting to complete, e.g., a setting like "if the WAL snapshot is more than 5s overdue, return 503 to all new none or weak level queries to the instance"

This would be useful both for analytics and for potentially deferring non-urgent requests, and would guarantee snapshotting completes.

from rqlite.

Tjstretchalot avatar Tjstretchalot commented on June 13, 2024

Building on that, if additional 503s that aren't opt-in aren't desirable, it could be instead accomplished with as a precondition on the request, similar to freshness. I can't think of a good name of it but e.g.

POST /db/query?level=none&freshness=5m&only_if_snapshot_overdue_less_than=5s

This will allow the 503s to be opt-in, which could be friendlier for environments where client retries aren't robustly implemented (e.g., short-term projects, one-off scripts, etc)

from rqlite.

Tjstretchalot avatar Tjstretchalot commented on June 13, 2024

Additional context I'm relinking here from my interpretation of the sqlite forum post you made - https://sqlite.org/forum/forumpost/d6ac6bf752 is that it's not just one read query that slows it down, its this scenario

req6 |           xxxxx
req5 |         xxxxx
req4 |       xxxxx
req3 |     xxxxx
req2 |   xxxxx
req1 | xxxxx
walL | 111112233445566   
     -----------------
       time ->

reqN = incoming none-level request N
walL = WAL snapshotting is prevented due to lock,
       indicating the oldest owner of the lock
       (1,2,3,4,5,6 = request number)

which can be resolved only in one of 2 ways as I understand it:

  • TRUNCATE tries to take an exclusive lock on the WAL, which will prevent reads/writes going through. your doubling of the time for this lock, assuming the lock strategy is fair, means eventually TRUNCATE will last longer than any reads while it is running and hence succeed:
req6 |           BBxxxxx
req5 |         BBBBxxxxx
req4 |       xxxxx    
req3 |     Bxxxxx
req2 |   xxxxx
req1 | xxxxx
walL | 11111223344 55566   
snap |  F  BF  BBBS  F
     -------------------
       time ->

reqN = incoming none-level request N; x means running,
       B means blocked
walL = WAL snapshotting is prevented due to lock,
       indicating the oldest owner of the lock
       (1,2,3,4,5,6 = request number)
snap = WAL snapshot attempt; B = blocked, F = fail,
       S = success

Depending on the system this could be pretty decent; you don't get failures on your none-queries, but they might take a very long time to respond because they spend most of the request time blocked in the extreme case.

  • Incoming requests start being rejected in order to allow snapshotting
req6 |           F
req5 |         F
req4 |       xxxxx
req3 |     F
req2 |   F
req1 | xxxxx
walL | 11111 44444    
snap |  FRRRS  FRRS
     -------------------
       time ->

reqN = incoming none-level request N; x means running,
       F means failed due to snapshotting
walL = WAL snapshotting is prevented due to lock,
       indicating the oldest owner of the lock
       (1,2,3,4,5,6 = request number)
snap = WAL snapshot attempt; F = fail, R = causing rejections
       to new queries, S = success

Depending on the system this could be pretty decent; if you have a large enough cluster, presumably there is at least one node not snapshotting at a time and so standard retrying on other nodes will find you a place to put your request. Requests are fast in one sense: they either instantly fail, or are dominated by the actual time required for the query.

Personally, I prefer the latter since it allows for more control later, e.g., prioritizing queries, since rqlite is managing the failures rather than sqlite's locks which will treat all the queries equally

from rqlite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.