ahrtr / etcd-defrag Goto Github PK
View Code? Open in Web Editor NEWAn easier to use and smarter etcd defragmentation tool
License: MIT License
An easier to use and smarter etcd defragmentation tool
License: MIT License
1.22.3 includes security fixes.
Hi Benjamin Wang
Is it already ready for production?
I have seen dirty and unreliable scripts doing the etcd defragmentation, that's why I am very happy to see this tool. Thank you!
Actually when you start the etcd-defrag if the fragmentation is not performed due to a specific defrag-rule, the log indicating which indicate this will be forwarded to Stderr.
It is not an error so it shouldn't be redirected to Stderr, but probably more into Stdout.
here is the related go line :
Line 141 in 88a7fdd
I can create a PR if you think its ok !
I have set up a defragmentation rule: dbSizeInUse / dbSize < 0.5
.
Based on the etcd database size, none of the endpoints should be defragmented.
Endpoints rule calculation:
https://10.8.38.111:2379: 48812032 (dbSizeInUse) / 48812032 (dbSize) = 1
https://10.8.60.123:2379: 48799744 (dbSizeInUse) / 48799744 (dbSize) = 1
https://10.8.62.107:2379: 48803840 (dbSizeInUse) / 48807936 (dbSize) = 0,99991
But for some reason defragmentation was executed anyway.
etcd-defrag execution example:
Validating configuration.
Validating the defragmentation rule: dbSizeInUse / dbSize < 0.5 ... valid
Performing health check.
endpoint: https://10.8.60.123:2379, health: true, took: 13.016349ms, error:
endpoint: https://10.8.62.107:2379, health: true, took: 11.974902ms, error:
endpoint: https://10.8.38.111:2379, health: true, took: 18.972517ms, error:
Getting members status
endpoint: https://10.8.38.111:2379, dbSize: 48812032, dbSizeInUse: 48812032, memberId: b12d455af0b42502, leader: 27e0fbbbc2bc90b, revision: 1560450397, term: 5761, index: 1905174647
endpoint: https://10.8.60.123:2379, dbSize: 48799744, dbSizeInUse: 48799744, memberId: 27e0fbbbc2bc90b, leader: 27e0fbbbc2bc90b, revision: 1560450397, term: 5761, index: 1905174648
endpoint: https://10.8.62.107:2379, dbSize: 48807936, dbSizeInUse: 48803840, memberId: c97a792b85f34523, leader: 27e0fbbbc2bc90b, revision: 1560450397, term: 5761, index: 1905174648
Running compaction until revision: 1560450397 ... successful
3 endpoint(s) need to be defragmented: [https://10.8.38.111:2379 https://10.8.62.107:2379 https://10.8.60.123:2379]
[Before defragmentation] endpoint: https://10.8.38.111:2379, dbSize: 49053696, dbSizeInUse: 46804992, memberId: b12d455af0b42502, leader: 27e0fbbbc2bc90b, revision: 1560450404, term: 5761, index: 1905174656
Defragmenting endpoint "https://10.8.38.111:2379"
Finished defragmenting etcd endpoint "https://10.8.38.111:2379". took 1.083007173s
[Post defragmentation] endpoint: https://10.8.38.111:2379, dbSize: 46170112, dbSizeInUse: 46170112, memberId: b12d455af0b42502, leader: 27e0fbbbc2bc90b, revision: 1560450416, term: 5761, index: 1905174668
[Before defragmentation] endpoint: https://10.8.62.107:2379, dbSize: 49025024, dbSizeInUse: 46235648, memberId: c97a792b85f34523, leader: 27e0fbbbc2bc90b, revision: 1560450420, term: 5761, index: 1905174672
Defragmenting endpoint "https://10.8.62.107:2379"
Finished defragmenting etcd endpoint "https://10.8.62.107:2379". took 962.878881ms
[Post defragmentation] endpoint: https://10.8.62.107:2379, dbSize: 46219264, dbSizeInUse: 46219264, memberId: c97a792b85f34523, leader: 27e0fbbbc2bc90b, revision: 1560450429, term: 5761, index: 1905174681
[Before defragmentation] endpoint: https://10.8.60.123:2379, dbSize: 49004544, dbSizeInUse: 46272512, memberId: 27e0fbbbc2bc90b, leader: 27e0fbbbc2bc90b, revision: 1560450432, term: 5761, index: 1905174684
Defragmenting endpoint "https://10.8.60.123:2379"
Finished defragmenting etcd endpoint "https://10.8.60.123:2379". took 936.36402ms
[Post defragmentation] endpoint: https://10.8.60.123:2379, dbSize: 46223360, dbSizeInUse: 46215168, memberId: 27e0fbbbc2bc90b, leader: 27e0fbbbc2bc90b, revision: 1560450435, term: 5761, index: 1905174687
The defragmentation is successful.
Currently, we tried to use etcd-defrag to implement defragmentations on our etcd clusters, and we found it failed quickly due to that the learner node in cluster did not support health check.
Here is the execution log:
Validating configuration.Validating the defragmentation rule: dbQuotaUsage > 0.8 || dbSizeFree/dbQuotaUsage > 0.5 ... validPerforming health check.{"level":"warn","ts":"2023-10-12T17:51:18.358902+0800","logger":"client","caller":"[email protected]/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00030a000/11.11.11.11:2379","method":"/etcdserverpb.KV/Range","attempt":0,"error":"rpc error: code = **Unavailable desc = etcdserver: rpc not supported for learner"}**endpoint: https://11.11.11.11:2379/, health: false, took: 7.499546ms, error: etcdserver: rpc not supported for learnerendpoint: https://33.33.33.33:2379/, health: true, took: 7.733879ms, error:endpoint: https://44.44.44.44:2379/, health: true, took: 9.555876ms, error:endpoint: https://55.55.55.55:2379/, health: true, took: 10.164246ms, error:endpoint: https://66.66.66.66:2379/, health: true, took: 9.741549ms, error:endpoint: https://22.22.22.22:2379/, health: true, took: 43.014812ms, error:
So is this an ongoing issue?
To include the following fix & enhancement,
Is it really necessary to set quota bytes as CLI arg? Can it be pulled from etcd server? Now we need to tune such CLI flag accordingly to every cluster, looks like bad work.
And thanks for really usefull tool!
Add a flag --compaction
, and execute compaction before the defragmentation if it's true
(default). If users don't want to execute compaction, please set --compaction=false
.
cc @batistein @bradjones1320 @chaochn47 @guettli @janiskemper @TechDufus
I think it will be convenient if we add the following defragmentation rule variables, so that we don't need to write expressions manually:
dbQuotaUsage = dbSize/dbQuota
dbSizeUnused = dbSize - dbSizeInUse
I can make a pull request to work on these.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.