Comments (6)
Yeah, I like that. Expose the information, so it's there for diagnostics and power users if needed.
from rqlite.
Actually, it wouldn't need to change the busy timeout. Just exec the SQL statement (PRAGMA CHECKPOINT
) with a timeout -- the SQLite driver supports it via context objects.
from rqlite.
An alternative idea, which might be more elegant, would be to use channels (or condition variables) to signal when there are Store.Request() or Store.Query() functions active. Whenever the number of those requests goes from non-zero to zero, a window has opened up on which a snapshot could take place. Use that window to block more read requests, check if a snapshot is needed due to WAL size, and do it if necessary.
from rqlite.
I like the alternative idea. The only additional part that I would add is some hint that clients can see that provides a hint for if a snapshot is desired right now, perhaps as an addition to one of the existing endpoints. For example, adding "snapshot_overdue_ms": 123
to store
in /db/info
, which is how many milliseconds overdue snaphotting is. This could also be the same metric to decide whether to stop accepting new queries to the instance to allow snapshotting to complete, e.g., a setting like "if the WAL snapshot is more than 5s overdue, return 503 to all new none or weak level queries to the instance"
This would be useful both for analytics and for potentially deferring non-urgent requests, and would guarantee snapshotting completes.
from rqlite.
Building on that, if additional 503s that aren't opt-in aren't desirable, it could be instead accomplished with as a precondition on the request, similar to freshness. I can't think of a good name of it but e.g.
POST /db/query?level=none&freshness=5m&only_if_snapshot_overdue_less_than=5s
This will allow the 503s to be opt-in, which could be friendlier for environments where client retries aren't robustly implemented (e.g., short-term projects, one-off scripts, etc)
from rqlite.
Additional context I'm relinking here from my interpretation of the sqlite forum post you made - https://sqlite.org/forum/forumpost/d6ac6bf752 is that it's not just one read query that slows it down, its this scenario
req6 | xxxxx
req5 | xxxxx
req4 | xxxxx
req3 | xxxxx
req2 | xxxxx
req1 | xxxxx
walL | 111112233445566
-----------------
time ->
reqN = incoming none-level request N
walL = WAL snapshotting is prevented due to lock,
indicating the oldest owner of the lock
(1,2,3,4,5,6 = request number)
which can be resolved only in one of 2 ways as I understand it:
- TRUNCATE tries to take an exclusive lock on the WAL, which will prevent reads/writes going through. your doubling of the time for this lock, assuming the lock strategy is fair, means eventually TRUNCATE will last longer than any reads while it is running and hence succeed:
req6 | BBxxxxx
req5 | BBBBxxxxx
req4 | xxxxx
req3 | Bxxxxx
req2 | xxxxx
req1 | xxxxx
walL | 11111223344 55566
snap | F BF BBBS F
-------------------
time ->
reqN = incoming none-level request N; x means running,
B means blocked
walL = WAL snapshotting is prevented due to lock,
indicating the oldest owner of the lock
(1,2,3,4,5,6 = request number)
snap = WAL snapshot attempt; B = blocked, F = fail,
S = success
Depending on the system this could be pretty decent; you don't get failures on your none-queries, but they might take a very long time to respond because they spend most of the request time blocked in the extreme case.
- Incoming requests start being rejected in order to allow snapshotting
req6 | F
req5 | F
req4 | xxxxx
req3 | F
req2 | F
req1 | xxxxx
walL | 11111 44444
snap | FRRRS FRRS
-------------------
time ->
reqN = incoming none-level request N; x means running,
F means failed due to snapshotting
walL = WAL snapshotting is prevented due to lock,
indicating the oldest owner of the lock
(1,2,3,4,5,6 = request number)
snap = WAL snapshot attempt; F = fail, R = causing rejections
to new queries, S = success
Depending on the system this could be pretty decent; if you have a large enough cluster, presumably there is at least one node not snapshotting at a time and so standard retrying on other nodes will find you a place to put your request. Requests are fast in one sense: they either instantly fail, or are dominated by the actual time required for the query.
Personally, I prefer the latter since it allows for more control later, e.g., prioritizing queries, since rqlite is managing the failures rather than sqlite's locks which will treat all the queries equally
from rqlite.
Related Issues (20)
- Add "vtypes" field to API response to indicate actual value types
- Read-only node showing failed to open store: set log info: failed to get last command index HOT 4
- Can't create a empty database in rqlite shell HOT 3
- failed to install snapshot HOT 26
- Leader election times seem too long with max current term HOT 16
- Recovery process can result in old snapshot getting sent to node HOT 43
- Rust sqlx-rqlite HOT 1
- CLI support Home and End key
- Build and upload binaries automatically HOT 4
- Which version of Sqlite dialect is supported? HOT 4
- PRAGMA foreign_keys are turning them off even we turn them on. HOT 4
- connect rqlite over unix sockets HOT 2
- how to debug random "database disk image is malformed" error? HOT 10
- rqlited + DNS client 100% CPU usage after network disconnecting (windows) HOT 21
- CTRL-C should stop the process HOT 1
- [FeatureRequest] Make number of retries configurable for /nodes and potentially other relevant http calls HOT 7
- Build from source fails in Windows VM HOT 7
- Synchronisation bug related to http.Server.AllowedOrigin HOT 1
- Dynamic backup file naming HOT 9
- Multiple issues with snapshot process HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rqlite.