Giter Club home page Giter Club logo

cassandra_range_repair's People

Contributors

agoetz00 avatar akonkol avatar arodrime avatar briangallew avatar dmertl avatar elsmorian avatar elubow avatar gbagnoli avatar higgsmass avatar jensrantil avatar mstump avatar nite23 avatar pauloricardomg avatar pilate avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cassandra_range_repair's Issues

Question regarding steps and workers

Hello,

First off, thank you for keeping this tool maintained. It is appreciated.

I am having some confusion around the number of step and number of workers. I was testing this and found that with the default 100 steps and a single worker, it takes a very long time to repair a small keyspace. I stopped it after 12 hours on a single node. How can I tweak these values to increase performance yet still retain the advantages of the ranged repair? What would be considered the "default" behavior of Cassandra? 15 steps with a single worker?

I have 12 nodes in a single datacenter and on my largest keyspace it can take upwards of 2 hours to repair a single node. If you throw in the occasional failure, then it can take over 24 hours to repair one of my datacenters. I would like to improve on this.

failed to run repair due to JMX authentication expecting username and password

In my db nodetool command will work only with below arguments.
nodetool -h -u cassandra -pw xxxxxx repair

Failed to execute repair script and triggered below errors.
Could any one please suggest where can i update the script to pass arguments <hostname/ip> -u -pw

Script error:

./range_repair.py -k
Error fetching ring tokens
nodetool: Failed to connect to '127.0.0.1:7199' - FailedLoginException: 'Required key 'username' is missing'.

Wrong parameters when executing a dry run

Hi all,

seems like C* 3 doesn't play well yet with the repair script.
Attached the error (python 2.7).

Cheers,
Stefano

root@node5:~# ./range_repair.py -H localhost --inc --par --verbose --dry-run
INFO       2016-07-04 09:05:07,494 get_ring_tokens      line: 135 : running nodetool ring, this will take a little bit of time
INFO       2016-07-04 09:05:12,456 get_ring_tokens      line: 157 : Found 3328 tokens
INFO       2016-07-04 09:05:12,508 repair               line: 353 : [1/256] repairing range (+09208181131779273900, -09221551068845876240) in 100 steps for keyspace <all>
Traceback (most recent call last):
  File "./range_repair.py", line 475, in <module>
    main()
  File "./range_repair.py", line 471, in main
    repair(options)
  File "./range_repair.py", line 364, in repair
    r.get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
TypeError: sequence item 4: expected string, int found
0001/1/256 0002/1/256 0003/1/256 0004/1/256 0005/1/256 0006/1/256 0007/1/256 0008/1/256 0009/1/256 0010/1/256 0011/1/256 0012/1/256 0013/1/256 0014/1/256 0015/1/256 0016/1/256 0017/1/256 0018/1/256 0019/1/256 0020/1/256 0021/1/256 0022/1/256 0023/1/256 0024/1/256 0025/1/256 0026/1/256 0027/1/256 0028/1/256 0029/1/256 0030/1/256 0031/1/256 0032/1/256 0033/1/256 0034/1/256 0035/1/256 0036/1/256 0037/1/256 0038/1/256 0039/1/256 0040/1/256 0041/1/256 0042/1/256 0043/1/256 0044/1/256 0045/1/256 0046/1/256 0047/1/256 0048/1/256 0049/1/256 0050/1/256 0051/1/256 0052/1/256 0053/1/256 0054/1/256 0055/1/256 0056/1/256 0057/1/256 0058/1/256 0059/1/256 0060/1/256 0061/1/256 0062/1/256 0063/1/256 0064/1/256 0065/1/256 0066/1/256 0067/1/256 0068/1/256 0069/1/256 0070/1/256 0071/1/256 0072/1/256 0073/1/256 0074/1/256

JMXPort option?

Ours is on like 16199..

I tried hardhacking
"-p","16199",
in the various nodetool calls, but no dice. is -h necessary if I'm local to the cass node?

ERROR - FAILED with no info

Hi,

I am running the script from a distant machine that access all the machine through a distant nodetool.

I have the following errors, in the middle of the repair process, always at the same step (I can reproduce any time I run the script).

/var/log/cassandra/repair-server1.log:2014-11-06 08:45:46,907 - ERROR - FAILED: 6/256 step 0004 nodetool -h server1 repair myks   -pr -st 003566854792210432805804684718019426758 -et 003576489470876730731489057894769207747
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:46,907 - ERROR - 
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:46,952 - ERROR - FAILED: 6/256 step 0003 nodetool -h server1 repair myks   -pr -st 003557220113544134880120311541269645769 -et 003566854792210432805804684718019426758
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:46,952 - ERROR - 
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:48,380 - ERROR - FAILED: 6/256 step 0005 nodetool -h server1 repair myks   -pr -st 003576489470876730731489057894769207747 -et 003586124149543028657173431071518988736
/var/log/cassandra/repair-server1.log:2014-11-06 08:45:48,380 - ERROR - 
/var/log/cassandra/repair-server1.log:2014-11-06 08:48:33,737 - ERROR - FAILED: 6/256 step 0009 nodetool -h server1 repair myks   -pr -st 003615028185541922434226550601768331703 -et 003624662864208220359910923778518112692
/var/log/cassandra/repair-server1.log:2014-11-06 08:48:33,737 - ERROR - 
/var/log/cassandra/repair-server1.log:2014-11-06 08:48:34,975 - ERROR - FAILED: 6/256 step 0010 nodetool -h server1 repair myks   -pr -st 003624662864208220359910923778518112692 -et 003634297542874518285595296955267893690
/var/log/cassandra/repair-server1.log:2014-11-06 08:48:34,975 - ERROR - 
/var/log/cassandra/repair-server1.log:2014-11-06 09:07:03,448 - ERROR - FAILED: 48/256 step 0002 nodetool -h server1 repair myks   -pr -st 028668104739202107587390869338472501441 -et 028671794365167405627200796676495488044
/var/log/cassandra/repair-server1.log:2014-11-06 09:07:03,448 - ERROR - 
/var/log/cassandra/repair-server1.log:2014-11-06 09:09:11,053 - ERROR - FAILED: 48/256 step 0006 nodetool -h server1 repair myks   -pr -st 028682863243063299746630578690564447853 -et 028686552869028597786440506028587434456
/var/log/cassandra/repair-server1.log:2014-11-06 09:09:11,054 - ERROR - 
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:09,969 - ERROR - FAILED: 59/256 step 0004 nodetool -h server1 repair myks   -pr -st 031996204459821099337625237853642567879 -et 032003300043610329938193956053709712248
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:09,969 - ERROR - 
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:09,983 - ERROR - FAILED: 59/256 step 0001 nodetool -h server1 repair myks   -pr -st 031974917708453407535919083253441134772 -et 031982013292242638136487801453508279141
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:09,983 - ERROR - 
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:11,364 - ERROR - FAILED: 59/256 step 0006 nodetool -h server1 repair myks   -pr -st 032010395627399560538762674253776856617 -et 032017491211188791139331392453844000986
/var/log/cassandra/repair-server1.log:2014-11-06 09:14:11,364 - ERROR - 
/var/log/cassandra/repair-server2.log:2014-11-06 08:50:23,550 - ERROR - FAILED: 1/256 step 0004 nodetool -h server2 repair myks   -pr -st 000062581616001737953441086003047058605 -et 000067972228648236125697676642629550658
/var/log/cassandra/repair-server2.log:2014-11-06 08:50:23,550 - ERROR - 
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:13,495 - ERROR - FAILED: 1/256 step 0008 nodetool -h server2 repair myks   -pr -st 000084144066587730642467448561377026817 -et 000089534679234228814724039200959518870
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:13,495 - ERROR - 
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:15,535 - ERROR - FAILED: 1/256 step 0009 nodetool -h server2 repair myks   -pr -st 000089534679234228814724039200959518870 -et 000094925291880726986980629840542010923
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:15,535 - ERROR - 
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:16,975 - ERROR - FAILED: 1/256 step 0010 nodetool -h server2 repair myks   -pr -st 000094925291880726986980629840542010923 -et 000100315904527225159237220480124502977
/var/log/cassandra/repair-server2.log:2014-11-06 08:55:16,975 - ERROR - 
/var/log/cassandra/repair-server2.log:2014-11-06 08:58:28,616 - ERROR - FAILED: 4/256 step 0001 nodetool -h server2 repair myks   -pr -st 002566461680764933896469632486562479823 -et 002567675350693503585419490978573669130
/var/log/cassandra/repair-server2.log:2014-11-06 08:58:28,617 - ERROR - 
/var/log/cassandra/repair-server4.log:2014-11-06 08:55:30,110 - ERROR - FAILED: 1/256 step 0004 nodetool -h server4 repair myks   -pr -st 000757548894433026070250815910734884539 -et 000759690434458478126763932193890239443
/var/log/cassandra/repair-server4.log:2014-11-06 08:55:30,110 - ERROR - 
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:08,145 - ERROR - FAILED: 2/256 step 0005 nodetool -h server3 repair myks   -pr -st 002407373345623250677953856079648273474 -et 002412903179755960954826113682508450817
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:08,145 - ERROR - 
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:13,766 - ERROR - FAILED: 2/256 step 0007 nodetool -h server3 repair myks   -pr -st 002418433013888671231698371285368628160 -et 002423962848021381508570628888228805503
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:13,767 - ERROR - 
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:21,332 - ERROR - FAILED: 2/256 step 0009 nodetool -h server3 repair myks   -pr -st 002429492682154091785442886491088982846 -et 002435022516286802062315144093949160189
/var/log/cassandra/repair-server3.log:2014-11-06 09:10:21,333 - ERROR - 

Is that a bug you faced too ?

When I run failed repairs directly from the bash, it works like a charm. That's pretty weird. Actually the script keep going on, failing from time to time with no specific reason...

Maybe a quick fix would be to retry on failure...

I will dig a bit, but any insight would be very nice.

Multiprocess = processing same thing x times ?

Hi,

First of all this project looks interesting. Are you keeping it up to date / still using it ? Maybe can I help somehow ?

I am using a 3 nodes cluster with C*1.2.18, vnodes activated, and RandomPartitioner.

I have this at debug level:

2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks.
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210
2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210

Looks like it process some parts up to 10 times (maybe more). I ran with this command:

./cassandra_range_repair/range_repair.py -k myks -H 10.55.xxx.xxx -s 100 -w 16 -D eu-west-xl -v -d --logfile=/var/log/cassandra/result.log

Also, while running a ps aux it seems things are run 2 times.

$ ps aux | grep nodetool
root 28906 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003527227502730104786072662883066462783 -et 003527873595605486575538223266831799943
root 28907 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003527227502730104786072662883066462783 -et 003527873595605486575538223266831799943
root 28952 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003527873595605486575538223266831799943 -et 003528519688480868365003783650597137103
root 28953 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003527873595605486575538223266831799943 -et 003528519688480868365003783650597137103
root 29004 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003528519688480868365003783650597137103 -et 003529165781356250154469344034362474263
root 29005 0.0 0.0 4400 740 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003528519688480868365003783650597137103 -et 003529165781356250154469344034362474263
root 29037 0.0 0.0 4400 612 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003529165781356250154469344034362474263 -et 003529811874231631943934904418127811423
root 29038 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003529165781356250154469344034362474263 -et 003529811874231631943934904418127811423
root 29185 0.0 0.0 4400 612 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003529811874231631943934904418127811423 -et 003530457967107013733400464801893148583
root 29186 0.0 0.0 4400 748 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003529811874231631943934904418127811423 -et 003530457967107013733400464801893148583
root 29248 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003530457967107013733400464801893148583 -et 003531104059982395522866025185658485743
root 29249 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003530457967107013733400464801893148583 -et 003531104059982395522866025185658485743
root 29262 0.0 0.0 4400 612 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003531104059982395522866025185658485743 -et 003531750152857777312331585569423822903
root 29263 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003531104059982395522866025185658485743 -et 003531750152857777312331585569423822903
root 29295 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003531750152857777312331585569423822903 -et 003532396245733159101797145953189160063
root 29311 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003531750152857777312331585569423822903 -et 003532396245733159101797145953189160063
root 29732 0.0 0.0 4400 616 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003532396245733159101797145953189160063 -et 003533042338608540891262706336954497223
root 29733 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003532396245733159101797145953189160063 -et 003533042338608540891262706336954497223
root 29736 0.0 0.0 4400 616 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003533042338608540891262706336954497223 -et 003533688431483922680728266720719834383
root 29737 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003533042338608540891262706336954497223 -et 003533688431483922680728266720719834383
root 29763 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003533688431483922680728266720719834383 -et 003534334524359304470193827104485171543
root 29767 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003533688431483922680728266720719834383 -et 003534334524359304470193827104485171543
root 29840 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003534334524359304470193827104485171543 -et 003534980617234686259659387488250508703
root 29847 0.0 0.0 4400 740 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003534334524359304470193827104485171543 -et 003534980617234686259659387488250508703
root 30263 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003534980617234686259659387488250508703 -et 003535626710110068049124947872015845863
root 30264 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003534980617234686259659387488250508703 -et 003535626710110068049124947872015845863
root 30329 0.0 0.0 4400 616 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003535626710110068049124947872015845863 -et 003536272802985449838590508255781183023
root 30330 0.0 0.0 4400 748 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003535626710110068049124947872015845863 -et 003536272802985449838590508255781183023
root 30378 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003536272802985449838590508255781183023 -et 003536918895860831628056068639546520183
root 30379 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003536272802985449838590508255781183023 -et 003536918895860831628056068639546520183
root 30392 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003536918895860831628056068639546520183 -et 003537564988736213417521629023311857343
root 30393 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003536918895860831628056068639546520183 -et 003537564988736213417521629023311857343
root 30629 0.0 0.0 8108 924 pts/0 S+ 15:48 0:00 grep --color=auto nodetool

AND

$ ps aux | grep nodetool | wc -l
33

--> 16 * 2 + 1 (last line, root 30629 0.0 0.0 8108 924 pts/0 S+ 15:48 0:00 grep --color=auto nodetool)

One test failing

$ make test
...
writing manifest file 'cassandra_range_repair.egg-info/SOURCES.txt'
running build_ext
test_ten_commands (tests.test_execution_counts.execution_count_tests) ... Traceback (most recent call last):
  File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 405, in <module>
    main()
  File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 401, in main
    repair(options)
  File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 279, in repair
    tokens = Token_Container(options)
  File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 36, in __init__
    self.get_host_tokens()
  File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 116, in get_host_tokens
    success, _, stdout, stderr = run_command(*cmd)
  File "/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py", line 194, in run_command
    cmd = " ".join(command)
TypeError: sequence item 4: expected string, int found
ERROR
test_Murmur3_format_length (tests.test_ranges.range_tests) ... ok
test_Murmur3_range_end_zero (tests.test_ranges.range_tests) ... ok
test_Murmur3_range_start_zero (tests.test_ranges.range_tests) ... ok
test_Murmur3_range_wrap (tests.test_ranges.range_tests) ... ok
test_Random_range_end_zero (tests.test_ranges.range_tests) ... ok
test_Random_range_start_zero (tests.test_ranges.range_tests) ... ok
test_Random_range_wrap (tests.test_ranges.range_tests) ... ok

======================================================================
ERROR: test_ten_commands (tests.test_execution_counts.execution_count_tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jens/Development/src/cassandra_range_repair/tests/test_execution_counts.py", line 14, in test_ten_commands
    subprocess.check_output(cmd)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['/Users/jens/Development/src/cassandra_range_repair/tests/../src/range_repair.py', '--nodetool', '/Users/jens/Development/src/cassandra_range_repair/tests/mock_nodetool_script', '-s', '4', '-w', '2']' returned non-zero exit status 1

----------------------------------------------------------------------
Ran 8 tests in 0.084s

FAILED (errors=1)
make: *** [test] Error 1

I'm running Python 2.7.10. This might have been caught earlier if something like #36 was in place.

Unsafe in a multi-DC environment

Ranges for repairs need to be calculated against tokens held by local DC ring members only. Currently, if you have a situation like this:
DC1: nodeA(1), nodeB(3)
DC2: nodeC(2), nodeD(4)

running this script on nodeA will result in repairs of the range (1,2), which is insufficient for the needs of the local DC.

I originally opened this as mstump#3 but that repo is really dead.

TravisCI integration

...what do you think about adding Travis CI (or other CI tool) to check pull requests?

Unable to build Debian package

$ make debian
rm -rf debian/rangerepair
rm -rf debian/rangerepair.debhelper.log
rm -rf debian/rangerepair.substvars
rm -rf debian/rangerepair.postinst.debhelper
find ../ -maxdepth 1 -iname 'rangerepair_*_amd64.changes' -exec rm -f {} +
find ../ -maxdepth 1 -iname 'rangerepair_*_amd64.deb' -exec rm -f {} +
find ../ -maxdepth 1 -iname 'rangerepair_*.dsc' -exec rm -f {} +
find ../ -maxdepth 1 -iname 'rangerepair_*.tar.gz' -exec rm -f {} +
sh make_deb.sh
+ python setup.py sdist
running sdist
[pbr] Writing ChangeLog
[pbr] Generating ChangeLog
[pbr] ChangeLog complete (0.0s)
[pbr] Generating AUTHORS
[pbr] AUTHORS complete (0.0s)
running egg_info
writing pbr to cassandra_range_repair.egg-info/pbr.json
writing requirements to cassandra_range_repair.egg-info/requires.txt
writing cassandra_range_repair.egg-info/PKG-INFO
writing top-level names to cassandra_range_repair.egg-info/top_level.txt
writing dependency_links to cassandra_range_repair.egg-info/dependency_links.txt
[pbr] Processing SOURCES.txt
[pbr] In git context, generating filelist from git
warning: no previously-included files found matching '.gitreview'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
writing manifest file 'cassandra_range_repair.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt

running check
warning: check: missing required meta-data: url

creating cassandra_range_repair-0.0.1.dev124
creating cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
creating cassandra_range_repair-0.0.1.dev124/debian
creating cassandra_range_repair-0.0.1.dev124/src
creating cassandra_range_repair-0.0.1.dev124/tests
making hard links in cassandra_range_repair-0.0.1.dev124...
hard linking .travis.yml -> cassandra_range_repair-0.0.1.dev124
hard linking AUTHORS -> cassandra_range_repair-0.0.1.dev124
hard linking ChangeLog -> cassandra_range_repair-0.0.1.dev124
hard linking LICENSE.md -> cassandra_range_repair-0.0.1.dev124
hard linking Makefile -> cassandra_range_repair-0.0.1.dev124
hard linking README.md -> cassandra_range_repair-0.0.1.dev124
hard linking make_deb.sh -> cassandra_range_repair-0.0.1.dev124
hard linking requirements.txt -> cassandra_range_repair-0.0.1.dev124
hard linking setup.cfg -> cassandra_range_repair-0.0.1.dev124
hard linking setup.py -> cassandra_range_repair-0.0.1.dev124
hard linking cassandra_range_repair.egg-info/PKG-INFO -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/SOURCES.txt -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/dependency_links.txt -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/not-zip-safe -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/pbr.json -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/requires.txt -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking cassandra_range_repair.egg-info/top_level.txt -> cassandra_range_repair-0.0.1.dev124/cassandra_range_repair.egg-info
hard linking debian/changelog -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/compat -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/control -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/files -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/rangerepair.install -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/rangerepair.postinst -> cassandra_range_repair-0.0.1.dev124/debian
hard linking debian/rules -> cassandra_range_repair-0.0.1.dev124/debian
hard linking src/__init__.py -> cassandra_range_repair-0.0.1.dev124/src
hard linking src/range_repair.py -> cassandra_range_repair-0.0.1.dev124/src
hard linking src/repair_failed_ranges.py -> cassandra_range_repair-0.0.1.dev124/src
hard linking tests/__init__.py -> cassandra_range_repair-0.0.1.dev124/tests
hard linking tests/mock_nodetool_script -> cassandra_range_repair-0.0.1.dev124/tests
hard linking tests/rangerepair_test.py -> cassandra_range_repair-0.0.1.dev124/tests
hard linking tests/test_execution_counts.py -> cassandra_range_repair-0.0.1.dev124/tests
hard linking tests/test_ranges.py -> cassandra_range_repair-0.0.1.dev124/tests
hard linking tests/test_retry.py -> cassandra_range_repair-0.0.1.dev124/tests
copying setup.cfg -> cassandra_range_repair-0.0.1.dev124
Writing cassandra_range_repair-0.0.1.dev124/setup.cfg
Creating tar archive
removing 'cassandra_range_repair-0.0.1.dev124' (and everything under it)
+ grep PACKAGE.*:= Makefile
+ sed -e s/[ \t]*//g -e s/.*:=//
+ PACKAGE=rangerepair
+ ls dist/rangerepair-*.tar.gz
ls: cannot access 'dist/rangerepair-*.tar.gz': No such file or directory
+ VERSION_BUILDER=
Makefile:44: recipe for target 'debian' failed
make: *** [debian] Error 2

Notice I've added set -ex at the top of make_deb.sh to get more verbose output.

feature request: Handle control-c (KeyboardInterrupt) gracefully

Currently, when hitting control-c to stop the script the following error is printed.

Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
self._target(_self._args, *_self._kwargs)
File "/usr/lib64/python2.6/multiprocessing/pool.py", line 68, in worker
result = (True, func(_args, *_kwds))
File "/root/bin/range_repair.py", line 221, in repair_range
success, cmd, _, stderr = run_command(*cmd)
File "/root/bin/range_repair.py", line 194, in run_command
stdout, stderr = proc.communicate()
File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate
stdout, stderr = self._communicate(input, endtime)
File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate
stdout, stderr = self._communicate_with_poll(input, endtime)
File "/usr/lib64/python2.6/subprocess.py", line 1388, in _communicate_with_poll
ready = poller.poll(self._remaining_time(endtime))
KeyboardInterrupt

The process then hangs until the pid is killed from a different session.

Add authentication functionality

It would be nice to add authentication functionality to nodetool like this:

cmd = [self.options.nodetool, "-h", self.options.host, "ring", "-u", self.options.username, "-pw", self.options.password]

Many Thanks!
Sebastian

Multi-DC repair giving errors about imprecise repairs

command I used: python range_repair.py -H 127.0.0.1 -s 1 --datacenter DC2

$ nodetool ring | grep -B1 $(facter ipaddress) | tail -n 2
10.2.0.1   R1          Up     Normal  15.57 GiB       ?                   9099366847329376090
10.2.0.2  R2         Up     Normal  14.53 GiB       ?                   9124888514323768492

$ nodetool repair -st 9099366847329376090 -et 9124888514323768492 -pr system_auth
[2016-09-27 20:56:31,822] Starting repair command #4071, repairing keyspace system_auth with repair options (parallelism: parallel, primary range: true, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 1)
[2016-09-27 20:56:31,884] Requested range intersects a local range but is not fully contained in one; this would lead to imprecise repair
[2016-09-27 20:56:31,885] null

system_auth is: CREATE KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '8', 'DC2': '8'} AND durable_writes = true;

The two tokens (9099366847329376090, 9124888514323768492) are also the ones used by range-repair. Those tokens are in DC2, but there's another DC1 token that sits in the middle, 9108060243154565075. When I trigger two individual nodetool repair commands ((9099366847329376090 9108060243154565075] and (9108060243154565075 9124888514323768492]) for them, it works fine. Only when the two ranges are merged, does it fail.

Ironically, not passing --datacenter to the script allows repairs to complete.

Multi Datacenter Repair

Hi

https://github.com/BrianGallew/cassandra_range_repair#multiple-datacenters
mentions-

"If you have multiple datacenters in your ring, then you MUST specify the name of the datacenter containing the node you are repairing as part of the command-line options (--datacenter=DCNAME). Failure to do so will result in only a subset of your data being repaired (approximately data/number-of-datacenters). This is because nodetool has no way to determine the relevant DC on its own, which in turn means it will use the tokens from every ring member in every datacenter."

So, if we are running repair on every node in the multi DC Cassandra cluster then do we need to specify --datacenter=name on every node or are we good without specifying the datacenter ?

Not sure if incremental repair option is getting passed accidentally

On a 3.x cluster even without asking for incremental repair using

./range_repair.py -k odin -l -v

I still keep seeing this message as warning in the logs.

RepairOption.java:148 - Incremental repair can't be requested with subrange repair because each subrange repair would generate an anti-compacted table. The repair will occur but without anti-compaction.

datacenter search_value not 2.1 compatible

# This is a really well-specified value.  If the format of the
# output of 'nodetool gossipinfo' changes, this will have to be
# revisited.

The time has come, the output is changed in Version 2.1 ;)

Cassandra added numbers to the gossipinfo output, so that the search value no longer works:

cassandra 2.0.x output

/<ip_address_node_x>
  generation:1447001651
  heartbeat:1474563
  HOST_ID:7728d6dc-81f5-4b65-8f6f-56154a699aa9
  STATUS:NORMAL,-1256802229644094855
  SEVERITY:6.938893903907228E-18
  NET_VERSION:6
  LOAD:4.42385489232E11
  DC:datacenter1
  SCHEMA:5b0d09a2-5a55-3307-9349-c9c6a7af4581
  RACK:rack1
  RPC_ADDRESS:<ip_address_node_x>
  RELEASE_VERSION:1.2.19
/<ip_address_node_x>
  generation:1448029831
  heartbeat:427997
  HOST_ID:5d11e4e6-219e-4a22-8c6b-245a5cbd6d3a
  STATUS:NORMAL,6214099238388467859
  SEVERITY:0.0
  NET_VERSION:6
  LOAD:3.2200086618E11
  DC:datacenter1
  SCHEMA:5b0d09a2-5a55-3307-9349-c9c6a7af4581
  RACK:rack1
  RPC_ADDRESS:<ip_address_node_x>
  RELEASE_VERSION:1.2.19

cassandra 2.1.x output

/<ip_address_node_x>
  generation:1445591070
  heartbeat:8677350
  NET_VERSION:1:8
  HOST_ID:2:3bd77948-2dad-45d2-9a39-46fca7498641
  STATUS:14:NORMAL,-107043648109545801
  RPC_ADDRESS:3:<ip_address_node_x>
  SEVERITY:8677352:0.0
  SCHEMA:8316519:ab586e11-897e-30ff-8732-391b15786f79
  LOAD:8677214:2.9703713869E10
  RELEASE_VERSION:4:2.1.10
  DC:6:datacenter1
  RACK:8:rack1
  TOKENS:13:<hidden>
/<ip_address_node_x>
  generation:1445591100
  heartbeat:8677298
  NET_VERSION:1:8
  HOST_ID:2:72f794c7-6066-4117-888c-92e91c779dfb
  STATUS:14:NORMAL,-1002667987248120783
  RPC_ADDRESS:3:<ip_address_node_x>
  SEVERITY:8677297:0.0
  SCHEMA:8316467:ab586e11-897e-30ff-8732-391b15786f79
  LOAD:8677249:2.7910465723E10
  RELEASE_VERSION:4:2.1.10
  DC:6:datacenter1
  RACK:8:rack1
  TOKENS:13:<hidden>

We use only cassandra version 2.1.x, so that I changed the search_value to

search_value = "\n  DC:6:{datacenter}\n".format(datacenter=self.options.datacenter)

I'm not a good pearl developer, so I hope that anyone could create a better solution with a version und search_value handling.

In cassandra 2.1, nodetool repair -st / -et doesn't pass the "column family" option

On 2.0.x, I usually run repair with range_repair -k keyspace -c columnfamily, which run :

 nodetool repair -st (start token) -et (end token) $keyspace $columnfamily

The cassandra log was usually something like :

StorageService.java (line 2496) starting user-requested repair of range [(300005707039417874,300156110175996167]] for keyspace $keyspace and column families [$columnfamily]

In 2.1, nodetool doesn't pass the "column family" (cfnames) option (as seen in the code in NodeTool.java), in case of use of startToken/endToken, so you repair all tables in the range… : See https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob;f=src/java/org/apache/cassandra/tools/NodeTool.java;h=1d4a42024dd406f0c90fc6b99f599b4ab4a0948e;hb=refs/heads/cassandra-2.1#l1917

if (!startToken.isEmpty() || !endToken.isEmpty())
    probe.forceRepairRangeAsync(System.out, keyspace, parallelismDegree, dataCenters,hosts, startToken, endToken, !incrementalRepair);
else
    probe.forceRepairAsync(System.out, keyspace, parallelismDegree, dataCenters, hosts, primaryRange, !incrementalRepair, cfnames);

And the log :

StorageService.java:2846 - starting user-requested repair of range [(9187601611781349802,9189958813088431561]] for keyspace $keyspace and column families []

The "column family" is emtpy, so we are reparing all column families between theses 2 tokens.

Is it a bug (in cassandra ?), is it expected, or am I missing something ? :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.