Comments (10)
Thank you for your time.
Team RabbitMQ uses GitHub issues for specific actionable items engineers can work on. This assumes two things:
- GitHub issues are not used for questions, investigations, root cause analysis, discussions of potential issues, etc (as defined by this team)
- We have a certain amount of information to work with
We get at least a dozen of questions through various venues every single day, often quite light on details.
At that rate GitHub issues can very quickly turn into a something impossible to navigate and make sense of even for our team. Because of that questions, investigations, root cause analysis, discussions of potential features are all considered to be mailing list material by our team. Please post this to rabbitmq-users.
Getting all the details necessary to reproduce an issue, make a conclusion or even form a hypothesis about what's happening can take a fair amount of time. Our team is multiple orders of magnitude smaller than the RabbitMQ community. Please help others help you by providing a way to reproduce the behavior you're
observing, or at least sharing as much relevant information as possible on the list:
- Server, client library and plugin (if applicable) versions used
- Server logs
- A code example or terminal transcript that can be used to reproduce
- Full exception stack traces (not a single line message)
rabbitmqctl status
(and, if possible,rabbitmqctl environment
output)- Other relevant things about the environment and workload, e.g. a traffic capture
Feel free to edit out hostnames and other potentially sensitive information.
When/if we have enough details and evidence we'd be happy to file a new issue.
Thank you.
from rabbitmq-autocluster.
Node '[email protected]' thinks it's clustered with node '[email protected]', but '[email protected]' disagrees
appears in this repository's issues as well as many other places. It means one node was reset and another one wasn't, so A thinks it is not already clustered with B and thus can join it but A disagrees. Resetting B will help. How exactly you can end up with this situation with various provisioning tools, I cannot know.
from rabbitmq-autocluster.
Hey, thanks @michaelklishin - sorry for not including greater details I also am hitting up the #autocluster channel in rmq slack. It seems even trying to manually join it still fails with the same error. Here is some more debugging i have done:
/ # rabbitmqctl reset Resetting node '[email protected]' Error: Mnesia is still running on node '[email protected]'. Please stop the node with rabbitmqctl stop_app first. / # rabbitmqctl stop_app Stopping rabbit application on node '[email protected]' / # rabbitmqctl reset Resetting node '[email protected]'
Appears the solution is I have to remote forget cluster node (not from same node):
/ # rabbitmqctl join_cluster [email protected] Clustering node '[email protected]' with '[email protected]' Error: {inconsistent_cluster,"Node '[email protected]' thinks it's clustered with node '[email protected]', but '[email protected]' disagrees"} / # rabbitmqctl join_cluster [email protected] Clustering node '[email protected]' with '[email protected]'
However, when I run the suggested cluster status:
/ # rabbitmqctl cluster_status Cluster status of node '[email protected]' [{nodes,[{disc,['[email protected]','[email protected]']}]}, {running_nodes,['[email protected]']}, {cluster_name,<<"rabbit@rabbitmq-statefulset-development-0.rabbitmq.default.svc.cluster.local">>}, {partitions,[]}, {alarms,[{'[email protected]',[]}]}]
I see two disc nodes but only one running. Upon inspection.. So I start the app on the downnode:
~/ops-tools/build-files/rabbitmq$ ./test_status.sh Cluster status of node '[email protected]' [{nodes,[{disc,['[email protected]','[email protected]']}]}, {running_nodes,['[email protected]','[email protected]']}, {cluster_name,<<"rabbit@rabbitmq-statefulset-development-0.rabbitmq.default.svc.cluster.local">>}, {partitions,[]}, {alarms,[{'[email protected]',[]},{'[email protected]',[]}]}]
Then works ^^. I thought this would be something handled by the plugin using default settings in my statefulset and service which I took from this repo..
from rabbitmq-autocluster.
Mnesia is still running on node '[email protected]'. Please stop the node with rabbitmqctl stop_app first
has a hint.
from rabbitmq-autocluster.
As the README for this plugin states, it is not a replacement for understanding of the basics of cluster formation. Please follow the clustering 101 transcript on rabbitmq.com and the meaning of the message(s) will be clearer.
from rabbitmq-autocluster.
@michaelklishin I understand but I feel it's a little more than that. The issue is on boot up one broker starts find however the second cannot cluster with the first. both are started using the defaults in the k8s examples.
I had even attempted cleaning up the mnesia stuff as I had found elsewhere:
`
rm -rf /var/lib/rabbitmq/* | true
rm -rf /rabbitmq/var/lib/rabbitmq/* | true
rabbitmq-server -detached
`
from rabbitmq-autocluster.
The second broker always starts / fails to join and crashes with the generic "node disagrees agree" error.
from rabbitmq-autocluster.
Removing a data directory without first stopping the node won’t get you where you want. There is only one scenario which produces the error message in question.
This is not a support forum. Please post step by step instructions to reproduce to rabbitmq-users or we won’t be able to help you.
from rabbitmq-autocluster.
Alternatively nodes can be reset without restarting with rabbitmqctl reset
. It’s a good idea to reset both nodes before trying further.
from rabbitmq-autocluster.
Steps to roughly get into the state @srflaxu40's nodes are:
- Start node A
- Start node B
- rabbitmqctl stop_app node A
- rabbitmqctl join -n node-A node-B
- Now both nodes are clustered
- rabbitmqctl stop_app node A (or stop it any other way without resetting)
- Wipe node A's data directory
- Start node A
- try to
rabbitmqctl join -n node-A node-B
— now A thinks it is not a member of a cluster with B but B thinks A is an existing member since it was never removed from (or reset, which notifies any running cluster peers)
How do we get out of this state? Reset node B or stop it and wipe its data directory, then restart.
This really isn't rocket science.
from rabbitmq-autocluster.
Related Issues (20)
- Autocluster attempts to create a session in Consul before session endpoint ready HOT 5
- Replace gavinmroy/alpine-rabbitmq-autocluster with pivotalrabbitmq/rabbitmq-autocluster image
- Please support microsoft service fabric on linux and maybe windows too HOT 3
- AWS instance cannot create cluster with other nodes within the same AWS autoscaling group HOT 2
- [enhancement] add cleanup_failures for pruning dead nodes HOT 7
- 0.9.0 is incompatible with official rabbitmq Docker image HOT 5
- Plugin not compatible to consul 1.0.0 HOT 3
- How is RABBITMQ_NODENAME value used by rabbitmq-autocluster? HOT 5
- Plugin re-activation can fail HOT 3
- autocluster: Step maybe_cluster failed with failure: inconsistent_cluster HOT 7
- rabbitmq-autocluster does not function with consul 1.0.0 due to API changes HOT 1
- Statefulset example does not work
- Docker swarm mode native service discovery HOT 6
- AWS: Use PTR records for discovered instances HOT 2
- plugin not detected on ubuntu 18.04, rabbitmq 3.6.10 HOT 2
- Mirroring HOT 2
- Unable to configure using environment variables HOT 1
- Cannot get list of discovered service from consul HOT 2
- [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rabbitmq-autocluster.