briangallew / cassandra_range_repair Goto Github PK
View Code? Open in Web Editor NEWThis project forked from mstump/cassandra_range_repair
python script to repair the primary range of a node in N discrete steps
License: MIT License
This project forked from mstump/cassandra_range_repair
python script to repair the primary range of a node in N discrete steps
License: MIT License
Hello,
First off, thank you for keeping this tool maintained. It is appreciated.
I am having some confusion around the number of step and number of workers. I was testing this and found that with the default 100 steps and a single worker, it takes a very long time to repair a small keyspace. I stopped it after 12 hours on a single node. How can I tweak these values to increase performance yet still retain the advantages of the ranged repair? What would be considered the "default" behavior of Cassandra? 15 steps with a single worker?
I have 12 nodes in a single datacenter and on my largest keyspace it can take upwards of 2 hours to repair a single node. If you throw in the occasional failure, then it can take over 24 hours to repair one of my datacenters. I would like to improve on this.
In my db nodetool command will work only with below arguments.
nodetool -h -u cassandra -pw xxxxxx repair
Failed to execute repair script and triggered below errors.
Could any one please suggest where can i update the script to pass arguments <hostname/ip> -u -pw
Script error:
./range_repair.py -k
Error fetching ring tokens
nodetool: Failed to connect to '127.0.0.1:7199' - FailedLoginException: 'Required key 'username' is missing'.
./src/range_repair.py", line 241 called from get_host_tokens uses the self.options.port which is a int type and not string hence the " ".join(command) commands fails with port being int rather than str
Hi all,
seems like C* 3 doesn't play well yet with the repair script.
Attached the error (python 2.7).
Cheers,
Stefano
root@node5:~# ./range_repair.py -H localhost --inc --par --verbose --dry-run
INFO 2016-07-04 09:05:07,494 get_ring_tokens line: 135 : running nodetool ring, this will take a little bit of time
INFO 2016-07-04 09:05:12,456 get_ring_tokens line: 157 : Found 3328 tokens
INFO 2016-07-04 09:05:12,508 repair line: 353 : [1/256] repairing range (+09208181131779273900, -09221551068845876240) in 100 steps for keyspace <all>
Traceback (most recent call last):
File "./range_repair.py", line 475, in <module>
main()
File "./range_repair.py", line 471, in main
repair(options)
File "./range_repair.py", line 364, in repair
r.get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
TypeError: sequence item 4: expected string, int found
0001/1/256 0002/1/256 0003/1/256 0004/1/256 0005/1/256 0006/1/256 0007/1/256 0008/1/256 0009/1/256 0010/1/256 0011/1/256 0012/1/256 0013/1/256 0014/1/256 0015/1/256 0016/1/256 0017/1/256 0018/1/256 0019/1/256 0020/1/256 0021/1/256 0022/1/256 0023/1/256 0024/1/256 0025/1/256 0026/1/256 0027/1/256 0028/1/256 0029/1/256 0030/1/256 0031/1/256 0032/1/256 0033/1/256 0034/1/256 0035/1/256 0036/1/256 0037/1/256 0038/1/256 0039/1/256 0040/1/256 0041/1/256 0042/1/256 0043/1/256 0044/1/256 0045/1/256 0046/1/256 0047/1/256 0048/1/256 0049/1/256 0050/1/256 0051/1/256 0052/1/256 0053/1/256 0054/1/256 0055/1/256 0056/1/256 0057/1/256 0058/1/256 0059/1/256 0060/1/256 0061/1/256 0062/1/256 0063/1/256 0064/1/256 0065/1/256 0066/1/256 0067/1/256 0068/1/256 0069/1/256 0070/1/256 0071/1/256 0072/1/256 0073/1/256 0074/1/256
Ours is on like 16199..
I tried hardhacking
"-p","16199",
in the various nodetool calls, but no dice. is -h necessary if I'm local to the cass node?
Hi,
I am running the script from a distant machine that access all the machine through a distant nodetool.
I have the following errors, in the middle of the repair process, always at the same step (I can reproduce any time I run the script).
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:46,907 - ERROR - FAILED: 6/256 step 0004 nodetool -h server1 repair myks -pr -st 003566854792210432805804684718019426758 -et 003576489470876730731489057894769207747
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:46,907 - ERROR -
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:46,952 - ERROR - FAILED: 6/256 step 0003 nodetool -h server1 repair myks -pr -st 003557220113544134880120311541269645769 -et 003566854792210432805804684718019426758
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:46,952 - ERROR -
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:48,380 - ERROR - FAILED: 6/256 step 0005 nodetool -h server1 repair myks -pr -st 003576489470876730731489057894769207747 -et 003586124149543028657173431071518988736
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:48,380 - ERROR -
/var/log/cassandra/repair-server1.log:2014-11-06 08:48:33,737 - ERROR - FAILED: 6/256 step 0009 nodetool -h server1 repair myks -pr -st 003615028185541922434226550601768331703 -et 003624662864208220359910923778518112692
/var/log/cassandra/repair-server1.log:2014-11-06 08:48:33,737 - ERROR -
/var/log/cassandra/repair-server1.log:2014-11-06 08:48:34,975 - ERROR - FAILED: 6/256 step 0010 nodetool -h server1 repair myks -pr -st 003624662864208220359910923778518112692 -et 003634297542874518285595296955267893690
/var/log/cassandra/repair-server1.log:2014-11-06 08:48:34,975 - ERROR -
/var/log/cassandra/repair-server1.log:2014-11-06 09:07:03,448 - ERROR - FAILED: 48/256 step 0002 nodetool -h server1 repair myks -pr -st 028668104739202107587390869338472501441 -et 028671794365167405627200796676495488044
/var/log/cassandra/repair-server1.log:2014-11-06 09:07:03,448 - ERROR -
/var/log/cassandra/repair-server1.log:2014-11-06 09:09:11,053 - ERROR - FAILED: 48/256 step 0006 nodetool -h server1 repair myks -pr -st 028682863243063299746630578690564447853 -et 028686552869028597786440506028587434456
/var/log/cassandra/repair-server1.log:2014-11-06 09:09:11,054 - ERROR -
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:09,969 - ERROR - FAILED: 59/256 step 0004 nodetool -h server1 repair myks -pr -st 031996204459821099337625237853642567879 -et 032003300043610329938193956053709712248
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:09,969 - ERROR -
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:09,983 - ERROR - FAILED: 59/256 step 0001 nodetool -h server1 repair myks -pr -st 031974917708453407535919083253441134772 -et 031982013292242638136487801453508279141
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:09,983 - ERROR -
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:11,364 - ERROR - FAILED: 59/256 step 0006 nodetool -h server1 repair myks -pr -st 032010395627399560538762674253776856617 -et 032017491211188791139331392453844000986
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:11,364 - ERROR -
/var/log/cassandra/repair-server2.log:2014-11-06 08:50:23,550 - ERROR - FAILED: 1/256 step 0004 nodetool -h server2 repair myks -pr -st 000062581616001737953441086003047058605 -et 000067972228648236125697676642629550658
/var/log/cassandra/repair-server2.log:2014-11-06 08:50:23,550 - ERROR -
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:13,495 - ERROR - FAILED: 1/256 step 0008 nodetool -h server2 repair myks -pr -st 000084144066587730642467448561377026817 -et 000089534679234228814724039200959518870
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:13,495 - ERROR -
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:15,535 - ERROR - FAILED: 1/256 step 0009 nodetool -h server2 repair myks -pr -st 000089534679234228814724039200959518870 -et 000094925291880726986980629840542010923
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:15,535 - ERROR -
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:16,975 - ERROR - FAILED: 1/256 step 0010 nodetool -h server2 repair myks -pr -st 000094925291880726986980629840542010923 -et 000100315904527225159237220480124502977
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:16,975 - ERROR -
/var/log/cassandra/repair-server2.log:2014-11-06 08:58:28,616 - ERROR - FAILED: 4/256 step 0001 nodetool -h server2 repair myks -pr -st 002566461680764933896469632486562479823 -et 002567675350693503585419490978573669130
/var/log/cassandra/repair-server2.log:2014-11-06 08:58:28,617 - ERROR -
/var/log/cassandra/repair-server4.log:2014-11-06 08:55:30,110 - ERROR - FAILED: 1/256 step 0004 nodetool -h server4 repair myks -pr -st 000757548894433026070250815910734884539 -et 000759690434458478126763932193890239443
/var/log/cassandra/repair-server4.log:2014-11-06 08:55:30,110 - ERROR -
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:08,145 - ERROR - FAILED: 2/256 step 0005 nodetool -h server3 repair myks -pr -st 002407373345623250677953856079648273474 -et 002412903179755960954826113682508450817
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:08,145 - ERROR -
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:13,766 - ERROR - FAILED: 2/256 step 0007 nodetool -h server3 repair myks -pr -st 002418433013888671231698371285368628160 -et 002423962848021381508570628888228805503
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:13,767 - ERROR -
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:21,332 - ERROR - FAILED: 2/256 step 0009 nodetool -h server3 repair myks -pr -st 002429492682154091785442886491088982846 -et 002435022516286802062315144093949160189
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:21,333 - ERROR -
Is that a bug you faced too ?
When I run failed repairs directly from the bash, it works like a charm. That's pretty weird. Actually the script keep going on, failing from time to time with no specific reason...
Maybe a quick fix would be to retry on failure...
I will dig a bit, but any insight would be very nice.
Hi,
First of all this project looks interesting. Are you keeping it up to date / still using it ? Maybe can I help somehow ?
I am using a 3 nodes cluster with C*1.2.18, vnodes activated, and RandomPartitioner.
I have this at debug level:
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
Looks like it process some parts up to 10 times (maybe more). I ran with this command:
./cassandra_range_repair/range_repair.py -k myks -H 10.55.xxx.xxx -s 100 -w 16 -D eu-west-xl -v -d --logfile=/var/log/cassandra/result.log
Also, while running a ps aux it seems things are run 2 times.
$ ps aux | grep nodetool
root 28906 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003527227502730104786072662883066462783 -et 003527873595605486575538223266831799943
root 28907 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003527227502730104786072662883066462783 -et 003527873595605486575538223266831799943
root 28952 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003527873595605486575538223266831799943 -et 003528519688480868365003783650597137103
root 28953 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003527873595605486575538223266831799943 -et 003528519688480868365003783650597137103
root 29004 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003528519688480868365003783650597137103 -et 003529165781356250154469344034362474263
root 29005 0.0 0.0 4400 740 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003528519688480868365003783650597137103 -et 003529165781356250154469344034362474263
root 29037 0.0 0.0 4400 612 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003529165781356250154469344034362474263 -et 003529811874231631943934904418127811423
root 29038 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003529165781356250154469344034362474263 -et 003529811874231631943934904418127811423
root 29185 0.0 0.0 4400 612 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003529811874231631943934904418127811423 -et 003530457967107013733400464801893148583
root 29186 0.0 0.0 4400 748 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003529811874231631943934904418127811423 -et 003530457967107013733400464801893148583
root 29248 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003530457967107013733400464801893148583 -et 003531104059982395522866025185658485743
root 29249 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003530457967107013733400464801893148583 -et 003531104059982395522866025185658485743
root 29262 0.0 0.0 4400 612 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003531104059982395522866025185658485743 -et 003531750152857777312331585569423822903
root 29263 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003531104059982395522866025185658485743 -et 003531750152857777312331585569423822903
root 29295 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003531750152857777312331585569423822903 -et 003532396245733159101797145953189160063
root 29311 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003531750152857777312331585569423822903 -et 003532396245733159101797145953189160063
root 29732 0.0 0.0 4400 616 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003532396245733159101797145953189160063 -et 003533042338608540891262706336954497223
root 29733 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003532396245733159101797145953189160063 -et 003533042338608540891262706336954497223
root 29736 0.0 0.0 4400 616 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003533042338608540891262706336954497223 -et 003533688431483922680728266720719834383
root 29737 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003533042338608540891262706336954497223 -et 003533688431483922680728266720719834383
root 29763 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003533688431483922680728266720719834383 -et 003534334524359304470193827104485171543
root 29767 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003533688431483922680728266720719834383 -et 003534334524359304470193827104485171543
root 29840 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003534334524359304470193827104485171543 -et 003534980617234686259659387488250508703
root 29847 0.0 0.0 4400 740 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003534334524359304470193827104485171543 -et 003534980617234686259659387488250508703
root 30263 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003534980617234686259659387488250508703 -et 003535626710110068049124947872015845863
root 30264 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003534980617234686259659387488250508703 -et 003535626710110068049124947872015845863
root 30329 0.0 0.0 4400 616 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003535626710110068049124947872015845863 -et 003536272802985449838590508255781183023
root 30330 0.0 0.0 4400 748 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003535626710110068049124947872015845863 -et 003536272802985449838590508255781183023
root 30378 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003536272802985449838590508255781183023 -et 003536918895860831628056068639546520183
root 30379 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003536272802985449838590508255781183023 -et 003536918895860831628056068639546520183
root 30392 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003536918895860831628056068639546520183 -et 003537564988736213417521629023311857343
root 30393 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003536918895860831628056068639546520183 -et 003537564988736213417521629023311857343
root 30629 0.0 0.0 8108 924 pts/0 S+ 15:48 0:00 grep --color=auto nodetool
AND
$ ps aux | grep nodetool | wc -l
33
--> 16 * 2 + 1 (last line, root 30629 0.0 0.0 8108 924 pts/0 S+ 15:48 0:00 grep --color=auto nodetool)
$ make test
...
writing manifest file 'cassandra_range_repair.egg-info/SOURCES.txt'
running build_ext
test_ten_commands (tests.test_execution_counts.execution_count_tests) ... Traceback (most recent call last):
File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 405, in <module>
main()
File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 401, in main
repair(options)
File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 279, in repair
tokens = Token_Container(options)
File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 36, in __init__
self.get_host_tokens()
File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 116, in get_host_tokens
success, _, stdout, stderr = run_command(*cmd)
File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 194, in run_command
cmd = " ".join(command)
TypeError: sequence item 4: expected string, int found
ERROR
test_Murmur3_format_length (tests.test_ranges.range_tests) ... ok
test_Murmur3_range_end_zero (tests.test_ranges.range_tests) ... ok
test_Murmur3_range_start_zero (tests.test_ranges.range_tests) ... ok
test_Murmur3_range_wrap (tests.test_ranges.range_tests) ... ok
test_Random_range_end_zero (tests.test_ranges.range_tests) ... ok
test_Random_range_start_zero (tests.test_ranges.range_tests) ... ok
test_Random_range_wrap (tests.test_ranges.range_tests) ... ok
======================================================================
ERROR: test_ten_commands (tests.test_execution_counts.execution_count_tests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/jens/Development/src/cassandra_range_repair/tests/test_execution_counts.py", line 14, in test_ten_commands
subprocess.check_output(cmd)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 573, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py', '--nodetool', '/Users/jens/Development/src/cassandra_range_repair/tests/mock_nodetool_script', '-s', '4', '-w', '2']' returned non-zero exit status 1
----------------------------------------------------------------------
Ran 8 tests in 0.084s
FAILED (errors=1)
make: *** [test] Error 1
I'm running Python 2.7.10. This might have been caught earlier if something like #36 was in place.
https://groups.google.com/d/msg/nosql-databases/peTArLfhXMU/IuUYnnUhBgAJ claims it doesn't work.
I was investigating this because I was curious if I could slowly enable incremental repairs using this script.
Ranges for repairs need to be calculated against tokens held by local DC ring members only. Currently, if you have a situation like this:
DC1: nodeA(1), nodeB(3)
DC2: nodeC(2), nodeD(4)
running this script on nodeA will result in repairs of the range (1,2), which is insufficient for the needs of the local DC.
I originally opened this as mstump#3 but that repo is really dead.
...what do you think about adding Travis CI (or other CI tool) to check pull requests?
$ make debian
rm -rf debian/rangerepair
rm -rf debian/rangerepair.debhelper.log
rm -rf debian/rangerepair.substvars
rm -rf debian/rangerepair.postinst.debhelper
find ../ -maxdepth 1 -iname 'rangerepair_*_amd64.changes' -exec rm -f {} +
find ../ -maxdepth 1 -iname 'rangerepair_*_amd64.deb' -exec rm -f {} +
find ../ -maxdepth 1 -iname 'rangerepair_*.dsc' -exec rm -f {} +
find ../ -maxdepth 1 -iname 'rangerepair_*.tar.gz' -exec rm -f {} +
sh make_deb.sh
+ python setup.py sdist
running sdist
[pbr] Writing ChangeLog
[pbr] Generating ChangeLog
[pbr] ChangeLog complete (0.0s)
[pbr] Generating AUTHORS
[pbr] AUTHORS complete (0.0s)
running egg_info
writing pbr to cassandra_range_repair.egg-info/pbr.json
writing requirements to cassandra_range_repair.egg-info/requires.txt
writing cassandra_range_repair.egg-info/PKG-INFO
writing top-level names to cassandra_range_repair.egg-info/top_level.txt
writing dependency_links to cassandra_range_repair.egg-info/dependency_links.txt
[pbr] Processing SOURCES.txt
[pbr] In git context, generating filelist from git
warning: no previously-included files found matching '.gitreview'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
writing manifest file 'cassandra_range_repair.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt
running check
warning: check: missing required meta-data: url
creating cassandra_range_repair-0.0.1.dev124
creating cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
creating cassandra_range_repair-0.0.1.dev124/debian
creating cassandra_range_repair-0.0.1.dev124/src
creating cassandra_range_repair-0.0.1.dev124/tests
making hard links in cassandra_range_repair-0.0.1.dev124...
hard linking .travis.yml -> cassandra_range_repair-0.0.1.dev124
hard linking AUTHORS -> cassandra_range_repair-0.0.1.dev124
hard linking ChangeLog -> cassandra_range_repair-0.0.1.dev124
hard linking LICENSE.md -> cassandra_range_repair-0.0.1.dev124
hard linking Makefile -> cassandra_range_repair-0.0.1.dev124
hard linking README.md -> cassandra_range_repair-0.0.1.dev124
hard linking make_deb.sh -> cassandra_range_repair-0.0.1.dev124
hard linking requirements.txt -> cassandra_range_repair-0.0.1.dev124
hard linking setup.cfg -> cassandra_range_repair-0.0.1.dev124
hard linking setup.py -> cassandra_range_repair-0.0.1.dev124
hard linking cassandra_range_repair.egg-info/PKG-INFO -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/SOURCES.txt -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/dependency_links.txt -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/not-zip-safe -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/pbr.json -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/requires.txt -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/top_level.txt -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking debian/changelog -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/compat -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/control -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/files -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/rangerepair.install -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/rangerepair.postinst -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/rules -> cassandra_range_repair-0.0.1.dev124/debian
hard linking src/__init__.py -> cassandra_range_repair-0.0.1.dev124/src
hard linking src/range_repair.py -> cassandra_range_repair-0.0.1.dev124/src
hard linking src/repair_failed_ranges.py -> cassandra_range_repair-0.0.1.dev124/src
hard linking tests/__init__.py -> cassandra_range_repair-0.0.1.dev124/tests
hard linking tests/mock_nodetool_script -> cassandra_range_repair-0.0.1.dev124/tests
hard linking tests/rangerepair_test.py -> cassandra_range_repair-0.0.1.dev124/tests
hard linking tests/test_execution_counts.py -> cassandra_range_repair-0.0.1.dev124/tests
hard linking tests/test_ranges.py -> cassandra_range_repair-0.0.1.dev124/tests
hard linking tests/test_retry.py -> cassandra_range_repair-0.0.1.dev124/tests
copying setup.cfg -> cassandra_range_repair-0.0.1.dev124
Writing cassandra_range_repair-0.0.1.dev124/setup.cfg
Creating tar archive
removing 'cassandra_range_repair-0.0.1.dev124' (and everything under it)
+ grep PACKAGE.*:= Makefile
+ sed -e s/[ \t]*//g -e s/.*:=//
+ PACKAGE=rangerepair
+ ls dist/rangerepair-*.tar.gz
ls: cannot access 'dist/rangerepair-*.tar.gz': No such file or directory
+ VERSION_BUILDER=
Makefile:44: recipe for target 'debian' failed
make: *** [debian] Error 2
Notice I've added set -ex
at the top of make_deb.sh
to get more verbose output.
Currently, when hitting control-c to stop the script the following error is printed.
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
self._target(_self._args, *_self._kwargs)
File "/usr/lib64/python2.6/multiprocessing/pool.py", line 68, in worker
result = (True, func(_args, *_kwds))
File "/root/bin/range_repair.py", line 221, in repair_range
success, cmd, _, stderr = run_command(*cmd)
File "/root/bin/range_repair.py", line 194, in run_command
stdout, stderr = proc.communicate()
File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate
stdout, stderr = self._communicate(input, endtime)
File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate
stdout, stderr = self._communicate_with_poll(input, endtime)
File "/usr/lib64/python2.6/subprocess.py", line 1388, in _communicate_with_poll
ready = poller.poll(self._remaining_time(endtime))
KeyboardInterrupt
The process then hangs until the pid is killed from a different session.
It would be nice to add authentication functionality to nodetool like this:
cmd = [self.options.nodetool, "-h", self.options.host, "ring", "-u", self.options.username, "-pw", self.options.password]
Many Thanks!
Sebastian
command I used: python range_repair.py -H 127.0.0.1 -s 1 --datacenter DC2
$ nodetool ring | grep -B1 $(facter ipaddress) | tail -n 2
10.2.0.1 R1 Up Normal 15.57 GiB ? 9099366847329376090
10.2.0.2 R2 Up Normal 14.53 GiB ? 9124888514323768492
$ nodetool repair -st 9099366847329376090 -et 9124888514323768492 -pr system_auth
[2016-09-27 20:56:31,822] Starting repair command #4071, repairing keyspace system_auth with repair options (parallelism: parallel, primary range: true, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 1)
[2016-09-27 20:56:31,884] Requested range intersects a local range but is not fully contained in one; this would lead to imprecise repair
[2016-09-27 20:56:31,885] null
system_auth is: CREATE KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '8', 'DC2': '8'} AND durable_writes = true;
The two tokens (9099366847329376090, 9124888514323768492) are also the ones used by range-repair. Those tokens are in DC2, but there's another DC1 token that sits in the middle, 9108060243154565075. When I trigger two individual nodetool repair
commands ((9099366847329376090 9108060243154565075]
and (9108060243154565075 9124888514323768492]
) for them, it works fine. Only when the two ranges are merged, does it fail.
Ironically, not passing --datacenter
to the script allows repairs to complete.
This issue is bound to show up here eventually, so I'm creating a task for it :) See 536c80b.
Document in README how to install this.
Hi
https://github.com/BrianGallew/cassandra_range_repair#multiple-datacenters
mentions-
"If you have multiple datacenters in your ring, then you MUST specify the name of the datacenter containing the node you are repairing as part of the command-line options (--datacenter=DCNAME). Failure to do so will result in only a subset of your data being repaired (approximately data/number-of-datacenters). This is because nodetool has no way to determine the relevant DC on its own, which in turn means it will use the tokens from every ring member in every datacenter."
So, if we are running repair on every node in the multi DC Cassandra cluster then do we need to specify --datacenter=name on every node or are we good without specifying the datacenter ?
On a 3.x cluster even without asking for incremental repair using
./range_repair.py -k odin -l -v
I still keep seeing this message as warning in the logs.
RepairOption.java:148 - Incremental repair can't be requested with subrange repair because each subrange repair would generate an anti-compacted table. The repair will occur but without anti-compaction.
# This is a really well-specified value. If the format of the
# output of 'nodetool gossipinfo' changes, this will have to be
# revisited.
The time has come, the output is changed in Version 2.1 ;)
Cassandra added numbers to the gossipinfo output, so that the search value no longer works:
cassandra 2.0.x output
/<ip_address_node_x>
generation:1447001651
heartbeat:1474563
HOST_ID:7728d6dc-81f5-4b65-8f6f-56154a699aa9
STATUS:NORMAL,-1256802229644094855
SEVERITY:6.938893903907228E-18
NET_VERSION:6
LOAD:4.42385489232E11
DC:datacenter1
SCHEMA:5b0d09a2-5a55-3307-9349-c9c6a7af4581
RACK:rack1
RPC_ADDRESS:<ip_address_node_x>
RELEASE_VERSION:1.2.19
/<ip_address_node_x>
generation:1448029831
heartbeat:427997
HOST_ID:5d11e4e6-219e-4a22-8c6b-245a5cbd6d3a
STATUS:NORMAL,6214099238388467859
SEVERITY:0.0
NET_VERSION:6
LOAD:3.2200086618E11
DC:datacenter1
SCHEMA:5b0d09a2-5a55-3307-9349-c9c6a7af4581
RACK:rack1
RPC_ADDRESS:<ip_address_node_x>
RELEASE_VERSION:1.2.19
cassandra 2.1.x output
/<ip_address_node_x>
generation:1445591070
heartbeat:8677350
NET_VERSION:1:8
HOST_ID:2:3bd77948-2dad-45d2-9a39-46fca7498641
STATUS:14:NORMAL,-107043648109545801
RPC_ADDRESS:3:<ip_address_node_x>
SEVERITY:8677352:0.0
SCHEMA:8316519:ab586e11-897e-30ff-8732-391b15786f79
LOAD:8677214:2.9703713869E10
RELEASE_VERSION:4:2.1.10
DC:6:datacenter1
RACK:8:rack1
TOKENS:13:<hidden>
/<ip_address_node_x>
generation:1445591100
heartbeat:8677298
NET_VERSION:1:8
HOST_ID:2:72f794c7-6066-4117-888c-92e91c779dfb
STATUS:14:NORMAL,-1002667987248120783
RPC_ADDRESS:3:<ip_address_node_x>
SEVERITY:8677297:0.0
SCHEMA:8316467:ab586e11-897e-30ff-8732-391b15786f79
LOAD:8677249:2.7910465723E10
RELEASE_VERSION:4:2.1.10
DC:6:datacenter1
RACK:8:rack1
TOKENS:13:<hidden>
We use only cassandra version 2.1.x, so that I changed the search_value to
search_value = "\n DC:6:{datacenter}\n".format(datacenter=self.options.datacenter)
I'm not a good pearl developer, so I hope that anyone could create a better solution with a version und search_value handling.
How does the script handle a repair session that hangs after sucessfully repairing some part of the repair range?
On 2.0.x, I usually run repair with range_repair -k keyspace -c columnfamily
, which run :
nodetool repair -st (start token) -et (end token) $keyspace $columnfamily
The cassandra log was usually something like :
StorageService.java (line 2496) starting user-requested repair of range [(300005707039417874,300156110175996167]] for keyspace $keyspace and column families [$columnfamily]
In 2.1, nodetool doesn't pass the "column family" (cfnames) option (as seen in the code in NodeTool.java), in case of use of startToken/endToken, so you repair all tables in the range… : See https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob;f=src/java/org/apache/cassandra/tools/NodeTool.java;h=1d4a42024dd406f0c90fc6b99f599b4ab4a0948e;hb=refs/heads/cassandra-2.1#l1917
if (!startToken.isEmpty() || !endToken.isEmpty())
probe.forceRepairRangeAsync(System.out, keyspace, parallelismDegree, dataCenters,hosts, startToken, endToken, !incrementalRepair);
else
probe.forceRepairAsync(System.out, keyspace, parallelismDegree, dataCenters, hosts, primaryRange, !incrementalRepair, cfnames);
And the log :
StorageService.java:2846 - starting user-requested repair of range [(9187601611781349802,9189958813088431561]] for keyspace $keyspace and column families []
The "column family" is emtpy, so we are reparing all column families between theses 2 tokens.
Is it a bug (in cassandra ?), is it expected, or am I missing something ? :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.