Giter Club home page Giter Club logo

rabbitmq-autocluster's Introduction

RabbitMQ Autocluster

What it Does

This plugin provides a mechanism for peer node discovery in RabbitMQ clusters. It also supports a few opinionated features around cluster formation and "permanently unavailable" node detection.

Note for RabbitMQ 3.7.x Users

Starting with RabbitMQ 3.7.0 this plugin was superseded by a new peer discovery subsystem built on the same ideas and supporting the same backends via separate plugins.

This plugin therefore is deprecated and should not be used by those running RabbitMQ 3.7.0 or a later version.

Supported Discovery Backends

Nodes using this plugin will discover its peers on boot and (optionally) register with one of the supported backends:

If at least one peer node has been discovered, cluster formation proceeds as usual, otherwise the node is considered to be the first one to come up and becomes the seed node.

To avoid a natural race condition around seed node "election" when a newly formed cluster first boots, peer discovery backends use either randomized delays or a locking mechanism.

Some backends support node health checks. Nodes not reporting their status periodically are considered to be in an errored state. If the user opts in, such nodes can be automatically removed from the cluster. This is useful for deployments that use AWS autoscaling groups or similar IaaS features, for example.

This plugin only covers cluster formation and does not change how RabbitMQ clusters operate once formed.

Note: This plugin is not a replacement for first-hand knowledge of how to manually create a RabbitMQ cluster. If you run into issues using the plugin, you should try and manually create the cluster in the same environment as you are trying to use the plugin in. For information on how to cluster RabbitMQ manually, please see the RabbitMQ documentation.

Current Maintainers

This plugin was originally developed by Gavin Roy at AWeber and is now co-maintained by several RabbitMQ core contributors. Parts of it were adopted into RabbitMQ core (as of 3.7.0).

Supported RabbitMQ Versions

There are three branches in this repository that target different RabbitMQ release series:

  • v3.6.x targets RabbitMQ 3.6.x (current stable RabbitMQ branch)
  • v3.7.x is compatible with RabbitMQ 3.7.x but this plugin was superseded by a new peer discovery subsystem built on the same ideas.
  • master is a development branch that's not of much use at the moment.

Please take this into account when building this plugin from source.

Please also note that key ideas of this plugin have been incorporated into RabbitMQ master branch and will be included into 3.7.0. This plugin therefore will become a collection of backends (e.g. AWS and etcd) rather than a wholesale alternative cluster formation implementation.

Supported Erlang Versions

This plugin requires Erlang/OTP 18.3 or later. Also see the RabbitMQ Erlang version requirements guide.

Binary Releases

Binary releases of autocluster can be found on the GitHub Releases page.

The most recent release is 0.10.0 that targets RabbitMQ 3.6.12 or later.

See release notes for details.

Installation

This plugin is installed the same way as other RabbitMQ plugins.

  1. Place both autocluster-{version}.ez and the rabbitmq_aws-{version}.ez plugin files in the RabbitMQ plugins directory.
  2. Enable the plugin, e.g. with rabbitmq-plugins enable autocluster --offline.
  3. Configure the plugin.
  4. Start the node.

Alternatively, there is a pre-built Docker Image available at on DockerHub as pivotalrabbitmq/rabbitmq-autocluster.

Note that plugin does not have a default backend configured. A little bit of configuration is therefore mandatory regardless of the backend used.

Configuration

General settings

Configuration for the plugin can be set in two places: operating system environment variables or the rabbitmq.config file under the autocluster section.

Available Settings

The following settings are generic and used by most (or all) service discovery backends:

Backend Type
Which type of service discovery backend to use. One of aws, consul, dns, etcd or k8s.
Startup Delay
To prevent a race condition when creating a new cluster for the first time, the startup delay performs a random sleep that should cause nodes to start in a slightly random offset from each other. The setting lets you control the maximum value for the startup delay.
Failure Mode
What behavior to use when the node fails to cluster with an existing RabbitMQ cluster or during initialization of the autocluster plugin. The two valid options are ignore and stop.
Log Level
You can set the log level via the environment variable AUTOCLUSTER_LOG_LEVEL or the autocluster.autocluster_log_level key (see below).
Longname (FQDN) Support
This is a RabbitMQ environment variable setting that is used by the autocluster plugin as well. When set to true this will cause RabbitMQ and the autocluster plugin to use fully qualified names to identify nodes. For more information about the RABBITMQ_USE_LONGNAME environment variable, see the RabbitMQ documentation
Node Name
Like long node name support, node name is a RabbitMQ server setting that can be used together with this plugin. When set to true this will cause RabbitMQ and the autocluster plugin. The RABBITMQ_NODENAME environment variable explicitly sets the node name that is used to identify the node with RabbitMQ. The autocluster plugin will use this value when constructing the local part/name/prefix for all nodes in this cluster. For example, if RABBITMQ_NODENAME is set to bunny@rabbit1, bunny will be prefixed to all nodes discovered by the various backends. For more information about the RABBITMQ_NODENAME environment variable, see the RabbitMQ documentation. Note that some backends offer ways to dynamically compute node name (e.g. AWS, Consul), others assume that node names are preconfigured out-of-band and provided by the discovery service (e.g. DNS). In those cases it may or not be possible (or recommended) to use RABBITMQ_NODENAME.
Node Type
Define the type of node to join the cluster as. One of disc or ram. See the RabbitMQ Clustering Guide for more information.
Cluster Cleanup
Enables a periodic check that removes any nodes that are not alive in the cluster and no longer listed in the service discovery list. This is a destructive action that removes nodes from the cluster. Nodes that are flapping and removed will be re-added as if they were coming in new and their database, including any persisted messages will be gone. To use this feature, you must not only enable it with this flag, but also disable the "Cleanup Warn Only" flag. Added in v0.5

Note: This is an experimental feature and should be used with caution.

Cleanup Interval
If cluster cleanup is enabled, this is the interval that specifies how often to look for dead nodes to remove (in seconds). Added in v0.5
Cleanup Warn Only
If set, the plugin will only warn about nodes that it would cleanup and will not perform any destructive actions on the cluster. Added in v0.5
HTTP Proxy
If set, the given HTTP URL will be used as a proxy to connect to the service discovery backend.
HTTPS Proxy
If set, the given HTTPS URL will be used as a proxy to connect to the service discovery backend.
Proxy Exclusions
List of host names which shouldn't use any proxy.
When using environment variables, the NoProxy list must be provided as a comma separated string: PROXY_EXCLUSIONS="localhost, 127.0.0.1"

How to Configure Settings

You are able to configure autocluster plugin via Environment Variables or in the rabbitmq.config file.

Note: RabbitMQ reads its own config file with environment variables - rabbitmq-env.conf, but you can't easily reuse it for autocluster configuration. If you absolutely want to do it, you should use export VAR_NAME=var_value instead of a plain assignment to VAR_NAME.

The following chart details each general setting, with the environment variable name, rabbitmq.config setting key and data type, and the default value if there is one.

Setting Environment Variable Setting Key Type Default
Backend Type AUTOCLUSTER_TYPE backend atom unconfigured
Startup Delay AUTOCLUSTER_DELAY startup_delay integer 5
Failure Mode AUTOCLUSTER_FAILURE autocluster_failure atom ignore
Log Level AUTOCLUSTER_LOG_LEVEL autocluster_log_level atom info
Longname RABBITMQ_USE_LONGNAME bool false
Node Name RABBITMQ_NODENAME string rabbit@$HOSTNAME
Node Type RABBITMQ_NODE_TYPE node_type atom disc
Cluster Cleanup AUTOCLUSTER_CLEANUP cluster_cleanup bool false
Cleanup Interval CLEANUP_INTERVAL cleanup_interval integer 60
Cleanup Warn Only CLEANUP_WARN_ONLY cleanup_warn_only bool true

Logging Configuration

To configure logging level used by this plugin, use the AUTOCLUSTER_LOG_LEVEL environment variable or autocluster.autocluster_log_level setting.

Here's a very minimalistic example that enables debug logging:

[
  {autocluster, [
    {autocluster_log_level, debug}
  ]}
].

Valid log levels are debug, info, warning, and error. For more information on RabbitMQ configuration please refer to RabbitMQ documentation.

AWS Configuration

The AWS backend for the autocluster supports two different node discovery, Autoscaling Group membership and EC2 tags.

The following settings impact the behavior of the AWS backend. See the AWS API Credentials section below for additional settings.

Autoscaling
Cluster based upon membership in an Autoscaling Group. Set to true to enable.
EC2 Tags
Filter the cluster node list with the specified tags. Use a comma delimiter for multiple tags when specifying as an environment variable.
Use private IP
Use the private IP address returned by autoscaling as hostname, instead of the private DNS name

NOTE: If this is your first time setting up RabbitMQ with the autoscaling cluster and are doing so for R&D purposes, you may want to check out the gavinmroy/alpine-rabbitmq-autocluster Docker Image repository for a working example of the plugin using a CloudFormation template that creates everything required for an Autoscaling Group based cluster.

Details

Environment Variable Setting Key Type Default
AWS_AUTOSCALING aws_autoscaling atom false
AWS_EC2_TAGS aws_ec2_tags [string()]
AWS_USE_PRIVATE_IP aws_use_private_ip atom false

Notes '''''

If aws_autoscaling is enabled, the EC2 backend will dynamically determine the autoscaling group that the node is a member of and attempt to join the other nodes in the autoscaling group.

If aws_autoscaling is disabled, you must specify EC2 tags to use to filter the nodes that the backend should cluster with.

AWS API Configuration and Credentials

As with the AWS CLI, the autocluster plugin configures the AWS API requests by attempting to resolve the values in a number of steps.

The configuration values are discovered in the following order:

  1. Explicitly configured in the autocluster configuration.
  2. Environment variables
  3. Configuration file
  4. EC2 Instance Metadata Service (for Region)

The credentials values are discovered in the following order:

  1. Explicitly configured in the autocluster configuration.
  2. Environment variables
  3. Credentials file
  4. EC2 Instance Metadata Service

AWS Credentials and Configuration Settings

The following settings and environment variables impact the configuration and credentials behavior. For more information see the Amazon AWS CLI documentation.

Environment Variable Setting Key Type Default
AWS_ACCESS_KEY_ID aws_access_key string
AWS_SECRET_ACCESS_KEY aws_secret_key string
AWS_DEFAULT_REGION aws_ec2_region string us-east-1
AWS_DEFAULT_PROFILE N/A string
AWS_CONFIG_FILE N/A string
AWS_SHARED_CREDENTIALS_FILE N/A string

IAM Policy

If you intend to use the EC2 Instance Metadata Service along with an IAM Role that is assigned to EC2 instances, you will need a policy that allows the plugin to discover the node list. The following is an example of such a policy:

{
"Version": "2012-10-17",
"Statement": [
              {
              "Effect": "Allow",
              "Action": [
                         "autoscaling:DescribeAutoScalingInstances",
                         "ec2:DescribeInstances"
                         ],
              "Resource": [
                           "*"
                           ]
              }
              ]
}

Example Configuration

The following configuration example enables the autoscaling based cluster discovery and sets the EC2 region to us-west-2:

[
  {autocluster, [
    {autocluster_log_level, debug},
    {backend, aws},
    {aws_autoscaling, true},
    {aws_ec2_region, "us-west-2"}
  ]}
].

For non-autoscaling group based clusters, the following configuration demonstrates how to limit EC2 instances in the cluster to nodes with the tags region=us-west-2 and service=rabbitmq. It also specifies the AWS access key and AWS secret key.

[
  {autocluster, [
    {autocluster_log_level, debug},
    {backend, aws},
    {aws_ec2_tags, [
      {"region", "us-west-2"},
      {"service", "rabbitmq"}
    ]},
    {aws_ec2_region, "us-east-1"},
    {aws_access_key, "AKIDEXAMPLE"},
    {aws_secret_key, "wJalrXUtnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY"}
  ]}
].

When using environment variables, the tags must be provided in JSON format:

AWS_EC2_TAGS="{\"region\": \"us-west-2\",\"service\": \"rabbitmq\"}"

Example Cloud-Init

The following is an example cloud-init that was tested with Ubuntu Trusty for use with an Autoscaling Group:

#cloud-config
apt_update: true
apt_upgrade: true
apt_sources:
- source: deb https://apt.dockerproject.org/repo ubuntu-trusty main
keyid: 58118E89F3A912897C070ADBF76221572C52609D
filename: docker.list
packages:
- docker-engine
runcmd:
- docker run -d --name rabbitmq --net=host -p 4369:4369 -p 5672:5672 -p 15672:15672 -p 25672:25672 gavinmroy/rabbitmq-autocluster

Consul configuration

The following settings impact the configuration of the Consul backend for the autocluster plugin:

Consul Scheme
The URI scheme to use when connecting to Consul
Consul Host
The hostname to use when connecting to Consul's API
Consul Port
The port to use when connecting to Consul's API
Consul ACL Token
The Consul access token to use when registering the node with Consul (optional)
Service Name
The name of the service to register with Consul for automatic clustering
Service Address
An IP address or host name to use when registering the service. If this is specified, the value will automatically be appended to the service ID. This is useful when you are testing with a single Consul server instead of having an agent for every RabbitMQ node.(optional)
Service Auto Address
Use the hostname of the current machine (retrieved with `gethostname(2)`) for the service address when registering the service with Consul. If this is enabled, the hostname will automatically be appended to the service ID. This is useful when you are testing with a single Consul server instead of having an agent for every RabbitMQ node. (optional)
Service Auto Address by NIC
Use the IP address of the specified network interface controller (NIC) as the service address when registering with Consul. (optional)
Service Port
Used to set a port for the service in Consul, allowing for the automatic clustering service registration to double as a general RabbitMQ service registration.

Note: Set the CONSUL_SVC_PORT to an empty value to disable port announcement and health checking. For example: CONSUL_SVC_PORT=""

Consul Use Longname
When node names are registered with Consul, instead of FQDN's addresses, this option allows to append .node. to the node names retrieved from Consul.
Consul Domain
The domain suffix appended to peer node hostname when long node names are used (see above).
Service TTL
Used to specify the Consul health check interval that is used to let Consul know that RabbitMQ is alive an healthy.
Service Tags
Used to specify the Consul service tags. If a cluster name is specified, the tags specified here are added to the cluster name tag
Service unregistration timeout
How soon should Consul unregister a node that's failing its health check? The value is in second and cannot be lower than 60.
Include nodes that fail Consul health checks?
If set to `true`, nodes that fail their health checks with Consul will still be included into discovery results.

Configuration Details

Setting Environment Variable Setting Key Type Default
Consul Scheme CONSUL_SCHEME consul_scheme string http
Consul Host CONSUL_HOST consul_host string localhost
Consul Port CONSUL_PORT consul_port integer 8500
Consul ACL Token CONSUL_ACL_TOKEN consul_acl_token string
Service Name CONSUL_SVC consul_svc string rabbitmq
Service Address CONSUL_SVC_ADDR consul_svc_addr string
Service Auto Address CONSUL_SVC_ADDR_AUTO consul_svc_addr_auto boolean false
Service Auto Address by NIC CONSUL_SVC_ADDR_NIC consul_svc_addr_nic string
Service Port CONSUL_SVC_PORT consul_svc_port integer 5672
Service TTL CONSUL_SVC_TTL consul_svc_ttl integer 30
Service Tags CONSUL_SVC_TAGS consul_svc_tags list []
Service unregistration timeout CONSUL_DEREGISTER_AFTER consul_deregister_after integer 60
Consul Use Longname CONSUL_USE_LONGNAME consul_use_longname boolean false
Consul Domain CONSUL_DOMAIN consul_domain string consul
Include nodes that fail Consul health checks? CONSUL_INCLUDE_NODES_WITH_WARNINGS consul_include_nodes_with_warnings boolean false

Example rabbitmq.config

An example that configures an ACL token and contacts a local Consul agent:

[
  {rabbit,      []},
  {autocluster, [
            {backend, consul},
            {consul_host, "localhost"},
            {consul_port, 8500},
            {consul_acl_token, "example-acl-token"},
            {consul_svc, "rabbitmq-test"},
            {cluster_name, "test"}
  ]}
].

The following example can be used to for a cluster of N nodes, one running on a development machine (my-laptop.local) and N - 1 running in VMs or containers with access to host networking.

Node names will be [email protected], [email protected], and [email protected].

[
  {rabbit,      []},
  {autocluster, [
            {backend, consul},
            {consul_host, "my-laptop.local"},
            {consul_port, 8500},
            {consul_use_longname, true},
            {consul_svc, "rabbitmq"},
            {consul_svc_addr_auto, true},
            {consul_svc_addr_nodename, true}
  ]}
].

In the following example, the service address reported to Consul is hardcoded to hostname1.local instead of being computed automatically from the environment:

[
  {rabbit,      []},
  {autocluster, [
            {backend, consul},
            {consul_host, "my-laptop.local"},
            {consul_port, 8500},
            {consul_use_longname, true},
            {consul_svc, "rabbitmq"},
            {consul_svc_addr_auto, false},
            {consul_svc_addr, "hostname1.messaging.dev.local"}
  ]}
].

Example Docker Compose File

The example demonstrates how to create a dynamic RabbitMQ cluster using:

DNS configuration

The following setting applies only to the DNS backend:

DNS Hostname

The FQDN to use when the backend type is dns for looking up the RabbitMQ nodes to cluster via a DNS A record round-robin.

Environment Variable AUTOCLUSTER_HOST
Setting Key autocluster_host
Data type string
Default Value consul

Example Configuration

The following configuration example enables the DNS based cluster discovery and sets the autocluster_host variable to your DNS Round-Robin A record:

[
  {autocluster, [
    {backend, dns},
    {autocluster_host, "YOUR_ROUND_ROBIN_A_RECORD"}
  ]}
].

Troubleshooting

If you are having issues getting your RabbitMQ cluster formed, please check that Erlang can resolve:

  • The DNS Round-Robin A Record. Imagine having 3 nodes with IPS 10.0.0.2, 10.0.0.3 and 10.0.0.4
> inet_res:lookup("YOUR_ROUND_ROBIN_A_RECORD", in, a).
[{10,0,0,2},{10,0,0,3},{10,0,0,4}]
  • All the nodes have reverse lookup entries in your DNS server. You should get something similar to this:
> inet_res:gethostbyaddr({10,0,0,2}).
{ok,{hostent,"YOUR_REVERSE_LOOKUP_ENTRY",[],
inet,4,
[{10,0,0,2}]}}
  • Erlang will always receive lowercase DNS names so be careful if you use your /etc/hosts file to resolve the other nodes in the cluster and you use uppercase there as RabbitMQ will get confused and the cluster will not form

etcd configuration

The following settings apply to the etcd backend only:

etcd Scheme
The URI scheme to use when connecting to etcd
etcd Host
The hostname to use when connecting to etcd's API
etcd Port
The port to connect to when using to etcd's API
etcd Key Prefix
The prefix used when storing cluster membership keys in etcd
etcd Node TTL
Used to specify how long a node can be down before it is removed from etcd's list of RabbitMQ nodes in the cluster
Setting Environment Variable Setting Key Type Default
etcd Scheme ETCD_SCHEME etcd_scheme list http
etcd Host ETCD_HOST etcd_host list localhost
etcd Port ETCD_PORT etcd_port int 2379
etcd Key Prefix ETCD_PREFIX etcd_prefix list rabbitmq
etcd Node TTL ETCD_TTL etcd_ttl integer 30

NOTE The etcd backend supports etcd v2 and v3.

K8S configuration

The following settings impact the configuration of the Kubernetes backend for the autocluster plugin:

K8S Scheme
The URI scheme to use when connecting to Kubernetes API server
K8S Host
The hostname of the kubernetes API server
K8S Port
The port ot use when connecting to kubernetes API server
K8S Token Path
The token path of the Pod's service account
K8S Cert Path
The path of the service account authentication certificate with the k8s API server
K8S Namespace Path
The path of the service account namespace file
K8S Service Name
The rabbitmq service name in Kubernetes
K8S Adddress Type
The address type, either ip or hostname
K8S Hostname Suffix
The suffix to append to the hostname
Setting Environment Variable Setting Key Type Default
K8S Scheme K8S_SCHEME k8s_scheme string https
K8S Host K8S_HOST k8s_host string kubernetes.default.svc.cluster.local
K8S Port K8S_PORT k8s_port integer 443
K8S Token Path K8S_TOKEN_PATH k8s_token_path string /var/run/secrets/kubernetes.io/serviceaccount/token
K8S Cert Path K8S_CERT_PATH k8s_cert_path string /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
K8S Namespace Path K8S_NAMESPACE_PATH k8s_namespace_path string /var/run/secrets/kubernetes.io/serviceaccount/namespace
K8S Service Name K8S_SERVICE_NAME k8s_service_name string rabbitmq
K8S Adddress Type K8S_ADDRESS_TYPE k8s_address_type string ip
K8S Hostname Suffix K8S_HOSTNAME_SUFFIX k8s_hostname_suffix string

Kubernetes Setup

In order for this plugin to work, your nodes need to use FQDN. i.e. set RABBITMQ_USE_LONGNAME=true in your pod

Development

WIP Notes for dev environment

Requirements

  • erlang 17.5
  • docker-machine
  • docker-compose
  • make

Setup

Startup docker-machine:

docker-machine create --driver virtualbox default
eval $(docker-machine env)

Start client containers:

docker-compose up -d

Development environment

Work in Progress

Make Commands

  • tests
  • run-broker
  • shell
  • dist

Docker

Building the container:

docker build -t rabbitmq-autocluster .

Testing Consul behaviors

Here's the base pattern for how I test against Consul when developing:

make dist
docker build -t rabbitmq-autocluster .

docker network create rabbitmq_network

docker run --rm -t -i --net=rabbitmq_network --name=consul -p 8500:8500 consul

docker run --rm -t -i --net=rabbitmq_network --name=node0 -e AUTOCLUSTER_TYPE=consul -e CONSUL_HOST=consul -e CONSUL_PORT=8500 -e CONSUL_SERVICE_TTL=60  -e AUTOCLUSTER_CLEANUP=true -e CLEANUP_WARN_ONLY=false -e CONSUL_SVC_ADDR_AUTO=true -p 15672:15672 rabbitmq-autocluster

docker run --rm -t -i --net=rabbitmq_network --name=node1 -e RABBITMQ_NODE_TYPE=ram -e AUTOCLUSTER_TYPE=consul -e CONSUL_HOST=consul -e CONSUL_PORT=8500 -e CONSUL_SERVICE_TTL=60  -e AUTOCLUSTER_CLEANUP=true -e CLEANUP_WARN_ONLY=false -e CONSUL_SVC_ADDR_AUTO=true rabbitmq-autocluster

docker run --rm -t -i --net=rabbitmq_network --name=node2 -e RABBITMQ_NODE_TYPE=ram -e AUTOCLUSTER_TYPE=consul -e CONSUL_HOST=consul -e CONSUL_PORT=8500 -e CONSUL_SERVICE_TTL=60  -e AUTOCLUSTER_CLEANUP=true -e CLEANUP_WARN_ONLY=false -e CONSUL_SVC_ADDR_AUTO=true rabbitmq-autocluster

- Consul managent: http://localhost:8500/ui
- RabbitMQ cluster: http://localhost:15672/

License

BSD 3-Clause

rabbitmq-autocluster's People

Contributors

a1dutch avatar akissa avatar alanprot avatar alexeyraga avatar avvs avatar binarin avatar cap10morgan avatar dcorbacho avatar dumbbell avatar elifa avatar gigablah avatar gmr avatar jcarr-sailthru avatar luxflux avatar michaelklishin avatar noxdafox avatar repl-andrew-ovens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rabbitmq-autocluster's Issues

[enhancement] add cleanup_failures for pruning dead nodes

I have enabled cluster_cleanup/cleanup_interval on my system for removing dead RabbitMQ nodes if they are failing health checks in a Consul cluster (backend=consul)

One issue I have experienced is the cleanup occurs immediately as soon as the check fails. This turns it into a kind of russian roulette for brief outages, where cleanup_interval doesn't matter if a 5 second outage happens to occur at the time that it has checked.

Suggest addition of cleanup_failures for the minimum number of consecutive failures before the node is pruned. While this won't remove the possibility of short outages causing early pruning (there could always be two short outages that happen right at consecutive checks) this does reduce the possibility of them happening.

Examples:

  • cleanup_interval=60,cleanup_failures=0 - check every 60sec, drop immediately (0=current behavior)
  • cleanup_interval=60,cleanup_failures=1 - check every 60sec, drop if failed two checks (60s+ outage)
  • cleanup_interval=60,cleanup_failures=2 - check every 60sec, drop if failed three checks (120s+ outage)
  • cleanup_interval=600,cleanup_failures=2 - check every 10min, drop if failed three checks (20min+ outage)

FYI for me: I am not running this in a production environment. I have increased cleanup_interval to 600 to reduce the chance of this occurring (and my needing to manually recycle nodes). I am aware I have the option to disable experimental features.

rabbitmq-autocluster failing

Have been at this for a few days:

/ # rabbitmqctl join_cluster [email protected] Clustering node '[email protected]' with '[email protected]' Error: {inconsistent_cluster,"Node '[email protected]' thinks it's clustered with node '[email protected]', but '[email protected]' disagrees"}

It doesn't seem that whatever I do I cannot get the second node to join properly nor have it join as a "disc" node. It continually joins as a ram node even when performing manually.

Here is the environment:

/ # env | grep RABBITMQ RABBITMQ_USE_LONGNAME=true RABBITMQ_DISK_FREE_LIMIT="8GiB" RABBITMQ_PORT_15672_TCP=tcp://10.110.213.110:15672 RABBITMQ_PORT_25672_TCP=tcp://10.110.213.110:25672 RABBITMQ_LOGS=- RABBITMQ_MANAGER_PORT_NUMBER=15672 [email protected] RABBITMQ_SERVICE_PORT_HTTP=15672 RABBITMQ_PLUGINS_EXPAND_DIR=/var/lib/rabbitmq/plugins RABBITMQ_PASSWORD=abc123 RABBITMQ_VERSION=3.6.14 RABBITMQ_PLUGINS_DIR=/usr/lib/rabbitmq/plugins RABBITMQ_SERVICE_HOST=10.110.213.110 RABBITMQ_SASL_LOGS=- RABBITMQ_NODE_TYPE=stats RABBITMQ_BASE=/rabbitmq RABBITMQ_PORT_5672_TCP_ADDR=10.110.213.110 RABBITMQ_PORT_4369_TCP_ADDR=10.110.213.110 RABBITMQ_SERVICE_PORT_EPMD=4369 RABBITMQ_SERVICE_PORT=15672 RABBITMQ_PORT=tcp://10.110.213.110:15672 RABBITMQ_PORT_5672_TCP_PORT=5672 RABBITMQ_PORT_5672_TCP_PROTO=tcp RABBITMQ_VHOST=/ RABBITMQ_PORT_4369_TCP_PORT=4369 RABBITMQ_PORT_4369_TCP_PROTO=tcp RABBITMQ_NODE_PORT_NUMBER=5672 RABBITMQ_PID_FILE=/var/lib/rabbitmq/rabbitmq.pid RABBITMQ_SERVER_ERL_ARGS=+K true +A128 +P 1048576 -kernel inet_default_connect_options [{nodelay,true}] RABBITMQ_PORT_15672_TCP_ADDR=10.110.213.110 RABBITMQ_SERVICE_PORT_AMQP=5672 RABBITMQ_PORT_25672_TCP_ADDR=10.110.213.110 RABBITMQ_MNESIA_DIR=/var/lib/rabbitmq/mnesia RABBITMQ_PORT_5672_TCP=tcp://10.110.213.110:5672 RABBITMQ_PORT_15672_TCP_PORT=15672 RABBITMQ_USERNAME=user RABBITMQ_HOME=/rabbitmq RABBITMQ_PORT_4369_TCP=tcp://10.110.213.110:4369 RABBITMQ_PORT_15672_TCP_PROTO=tcp RABBITMQ_PORT_25672_TCP_PORT=25672 RABBITMQ_PORT_25672_TCP_PROTO=tcp RABBITMQ_DIST_PORT=25672 RABBITMQ_SERVICE_PORT_DIST=25672

Mirroring

Hi,
This plugin appears to work only with the pivotal image and not the base rabbitmq 3.6.x image.
I also need to enable mirroring , but the pods ends up "PostHookError".
This is what I have in the kube spec YML file , can you please help me on this :-
( the yml formatting is all messed up with this editor, but otherwise it is well-formed )
[....]
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- >
sleep 1m && rabbitmqctl set_policy queue-mirror-ha ".*" '{"ha-mode":"all","ha-sync-mode": "automatic"}' --apply-to queues
[....]

Unable to configure using environment variables

I am not able to configure RMQ default user name, password and vhost name using environment variables. The container starts with default user name and password guest.

This is the command i used to run a container
docker run -d --name rabbitmq --net=host -p 4369:4369 -p 5672:5672 -p 15672:15672 -p 25672:25672 -e RABBITMQ_DEFAULT_PASS=<MyPassWord> -e RABBITMQ_DEFAULT_USER=<MyUserName> -e RABBITMQ_DEFAULT_VHOST=<MyVHost> -e AUTOCLUSTER_TYPE=aws -e AWS_AUTOSCALING=true -e AUTOCLUSTER_CLEANUP=true -e CLEANUP_WARN_ONLY=false -e AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION pivotalrabbitmq/rabbitmq-autocluster

Can some one help me if i miss something?

plugin not detected on ubuntu 18.04, rabbitmq 3.6.10

Hi,

I'm running Ubuntu 18.04 with RabbitMQ 3.6.10.

I have tried copying the *.ez files to /usr/lib/rabbitmq/plugins (this folder does not exist on default apt install. I have also tried copying the *.ez files to /usr/lib/rabbitmq/plugins/rabbitmq_server-3.6.10/plugins folder.

In both instances when I run rabbitmq-plugins list the plugins don't show up in the list which mean when I run the rabbitmq-plugins enable command it fails.

Is there something I'm missing? I have tried for v0.10.0 and v0.8.0 of the plugins.

autocluster: Step maybe_cluster failed with failure: inconsistent_cluster

while I doing load testing against 3 node rabbitmq cluster in kubernetes env, one rabbitmq pod(10.244.1.40) got restarted, and failed to rejoin the cluster.

Below is the logs it reported, which complained that "Node '[email protected]' thinks it's clustered with node '[email protected]', but '[email protected]' disagrees"

=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Running step find_best_node_to_join
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: GET https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/wxgigo/endpoints/rabbitmq
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Response: [{ok,{{"HTTP/1.1",200,"OK"},
                             [{"date","Wed, 15 Nov 2017 15:16:55 GMT"},
                              {"content-length","1024"},
                              {"content-type","application/json"}],
                             "{\"kind\":\"Endpoints\",\"apiVersion\":\"v1\",\"metadata\":{\"name\":\"rabbitmq\",\"namespace\":\"wxgigo\",\"selfLink\":\"/api/v1/namespaces/wxgigo/endpoints/rabbitmq\",\"uid\":\"211b434a-ca14-11e7-8c24-080027aacdc9\",\"resourceVersion\":\"196162\",\"creationTimestamp\":\"2017-11-15T14:49:06Z\",\"labels\":{\"app\":\"rabbitmq\"}},\"subsets\":[{\"addresses\":[{\"ip\":\"10.244.2.37\",\"hostname\":\"rabbitmq-0\",\"nodeName\":\"kube-node1\",\"targetRef\":{\"kind\":\"Pod\",\"namespace\":\"wxgigo\",\"name\":\"rabbitmq-0\",\"uid\":\"21508476-ca14-11e7-8c24-080027aacdc9\",\"resourceVersion\":\"193957\"}},{\"ip\":\"10.244.2.38\",\"hostname\":\"rabbitmq-2\",\"nodeName\":\"kube-node1\",\"targetRef\":{\"kind\":\"Pod\",\"namespace\":\"wxgigo\",\"name\":\"rabbitmq-2\",\"uid\":\"3a7c7d39-ca14-11e7-8c24-080027aacdc9\",\"resourceVersion\":\"194047\"}}],\"notReadyAddresses\":[{\"ip\":\"10.244.1.40\",\"hostname\":\"rabbitmq-1\",\"nodeName\":\"kube-node2\",\"targetRef\":{\"kind\":\"Pod\",\"namespace\":\"wxgigo\",\"name\":\"rabbitmq-1\",\"uid\":\"2dd9f567-ca14-11e7-8c24-080027aacdc9\",\"resourceVersion\":\"196160\"}}],\"ports\":[{\"name\":\"amqp\",\"port\":5672,\"protocol\":\"TCP\"}]}]}\n"}}]
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: k8s endpoint listing returned nodes not yet ready: 10.244.1.40
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: List of registered nodes retrieved from the backend: ['[email protected]',
                                                                   '[email protected]']

=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Fetching node details. Unreachable nodes (or nodes that responded with an error): []
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Fetching node details. Responses: [{candidate_seed_node,
                                                 '[email protected]',1620526,
                                                 true,
                                                 ['[email protected]',
                                                  '[email protected]',
                                                  '[email protected]'],
                                                 ['[email protected]',
                                                  '[email protected]'],
                                                 [],[]},
                                                {candidate_seed_node,
                                                 '[email protected]',1663139,
                                                 true,
                                                 ['[email protected]',
                                                  '[email protected]',
                                                  '[email protected]'],
                                                 ['[email protected]',
                                                  '[email protected]'],
                                                 [],[]}]

=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Asked to choose preferred node from the list of: [{candidate_seed_node,
                                                                '[email protected]',
                                                                1620526,true,
                                                                ['[email protected]',
                                                                 '[email protected]',
                                                                 '[email protected]'],
                                                                ['[email protected]',
                                                                 '[email protected]'],
                                                                [],[]},
                                                               {candidate_seed_node,
                                                                '[email protected]',
                                                                1663139,true,
                                                                ['[email protected]',
                                                                 '[email protected]',
                                                                 '[email protected]'],
                                                                ['[email protected]',
                                                                 '[email protected]'],
                                                                [],[]}]
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Filtered node list (does not include us and non-running/reachable nodes): [{candidate_seed_node,
                                                                                         '[email protected]',
                                                                                         1663139,
                                                                                         true,
                                                                                         ['[email protected]',
                                                                                          '[email protected]',
                                                                                          '[email protected]'],
                                                                                         ['[email protected]',
                                                                                          '[email protected]'],
                                                                                         [],
                                                                                         []},
                                                                                        {candidate_seed_node,
                                                                                         '[email protected]',
                                                                                         1620526,
                                                                                         true,
                                                                                         ['[email protected]',
                                                                                          '[email protected]',
                                                                                          '[email protected]'],
                                                                                         ['[email protected]',
                                                                                          '[email protected]'],
                                                                                         [],
                                                                                         []}]

=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Picked node as the preferred choice for joining: '[email protected]'
=INFO REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Running step maybe_cluster
=ERROR REPORT==== 15-Nov-2017::15:16:55 ===
Node '[email protected]' thinks it's clustered with node '[email protected]', but '[email protected]' disagrees
=ERROR REPORT==== 15-Nov-2017::15:16:55 ===
autocluster: Step maybe_cluster failed, will conitnue nevertheless. Failure reason: Failed to cluster with [email protected]: {inconsistent_cluster,[78,111,100,101,32,39,114,97,98,98,105,116,64,49,48,46,50,52,52,46,50,46,51,55,39,32,116,104,105,110,107,115,32,105,116,39,115,32,99,108,117,115,116,101,114,101,100,32,119,105,116,104,32,110,111,100,101,32,39,114,97,98,98,105,116,64,49,48,46,50,52,52,46,49,46,52,48,39,44,32,98,117,116,32,39,114,97,98,98,105,116,64,49,48,46,50,52,52,46,49,46,52,48,39,32,100,105,115,97,103,114,101,101,115]}.

Later, I cluster 10.244.1.40 with 10.244.2.37 successfully. So what's could be the possible reason? Is it possible rabbitmq cluster still not kick off the bad node(CLEANUP) from it's cluster info when the node try to rejoin again? Is it reduce the value of CLEANUP_INTERVAL helpful?

AWS: Use PTR records for discovered instances

AWS plugin returns privateDNS field or ip address as hostname for discovered instances in cluster. PrivateDNS always has ip-xx format. It would be nice to have a way to use own hostnames for instances discovered with AWS plugin. The best way I see is private ip name lookup instead of using privateDNS fields.

[error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404

Describe the bug:
I've used the configuration in minikube,and I have this problem:

2021-02-18 08:14:52.827 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:52.831 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 9 retries left...
2021-02-18 08:14:53.337 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:53.340 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 8 retries left...
2021-02-18 08:14:53.847 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:53.851 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 7 retries left...
2021-02-18 08:14:54.357 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:54.360 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 6 retries left...
2021-02-18 08:14:54.867 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:54.870 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 5 retries left...
2021-02-18 08:14:55.375 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:55.378 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 4 retries left...
2021-02-18 08:14:55.885 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:55.888 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 3 retries left...
2021-02-18 08:14:56.395 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:56.398 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 2 retries left...
2021-02-18 08:14:56.905 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:56.907 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 1 retries left...
2021-02-18 08:14:57.414 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:57.416 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 0 retries left...

BOOT FAILED
===========
Exception during startup:

    rabbit_boot_steps:run_boot_steps/1 line 20
    rabbit_boot_steps:'-run_boot_steps/1-lc$^0/1-0-'/1 line 19
    rabbit_boot_steps:run_step/2 line 46
    rabbit_boot_steps:'-run_step/2-lc$^0/1-0-'/2 line 41
    rabbit_mnesia:init/0 line 76
    rabbit_mnesia:init_with_lock/3 line 111
    rabbit_mnesia:run_peer_discovery_with_retries/2 line 145
    rabbit_mnesia:run_peer_discovery_with_retries/2 line 138
error:{badmatch,ok}

2021-02-18 08:14:57.920 [info] <0.44.0> Application mnesia exited with reason: stopped
2021-02-18 08:14:57.921 [error] <0.272.0> 
2021-02-18 08:14:57.921 [info] <0.44.0> Application mnesia exited with reason: stopped
2021-02-18 08:14:57.921 [error] <0.272.0> BOOT FAILED
2021-02-18 08:14:57.921 [error] <0.272.0> ===========
2021-02-18 08:14:57.921 [error] <0.272.0> Exception during startup:
2021-02-18 08:14:57.922 [error] <0.272.0> 
2021-02-18 08:14:57.922 [error] <0.272.0>     rabbit_boot_steps:run_boot_steps/1 line 20
2021-02-18 08:14:57.922 [error] <0.272.0>     rabbit_boot_steps:'-run_boot_steps/1-lc$^0/1-0-'/1 line 19
2021-02-18 08:14:57.922 [error] <0.272.0>     rabbit_boot_steps:run_step/2 line 46
2021-02-18 08:14:57.922 [error] <0.272.0>     rabbit_boot_steps:'-run_step/2-lc$^0/1-0-'/2 line 41
2021-02-18 08:14:57.923 [error] <0.272.0>     rabbit_mnesia:init/0 line 76
2021-02-18 08:14:57.923 [error] <0.272.0>     rabbit_mnesia:init_with_lock/3 line 111
2021-02-18 08:14:57.923 [error] <0.272.0>     rabbit_mnesia:run_peer_discovery_with_retries/2 line 145
2021-02-18 08:14:57.923 [error] <0.272.0>     rabbit_mnesia:run_peer_discovery_with_retries/2 line 138
2021-02-18 08:14:57.923 [error] <0.272.0> error:{badmatch,ok}
2021-02-18 08:14:57.923 [error] <0.272.0> 
2021-02-18 08:14:58.925 [info] <0.271.0> [{initial_call,{application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}},{pid,<0.271.0>},{registered_name,[]},{error_info,{exit,{{badmatch,ok},{rabbit,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,138}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}},{ancestors,[<0.270.0>]},{message_queue_len,1},{messages,[{'EXIT',<0.272.0>,normal}]},{links,[<0.270.0>,<0.44.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,376},{stack_size,28},{reductions,354}], []
2021-02-18 08:14:58.925 [error] <0.271.0> CRASH REPORT Process <0.271.0> with 0 neighbours exited with reason: {{badmatch,ok},{rabbit,start,[normal,[]]}} in application_master:init/4 line 138
2021-02-18 08:14:58.926 [info] <0.44.0> Application rabbit exited with reason: {{badmatch,ok},{rabbit,start,[normal,[]]}}
2021-02-18 08:14:58.926 [info] <0.44.0> Application rabbit exited with reason: {{badmatch,ok},{rabbit,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{badmatch,ok},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{badmatch,ok},{rabbit,start,[normal,[]]}}})

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

How is RABBITMQ_NODENAME value used by rabbitmq-autocluster?

Problem description

I have started two rabbitmq docker containers in two AWS EC2 different instances in sa-east-1 region using AWS ECS to manage them. I've constructed the image and I'm pretty sure that this plugin was correctly installed (this info could be confirmed by the logs bellow).

I need to use RABBITMQ_NODENAME to setup the rabbitmq node name to a defined hostname because I can't use docker container with host networking.

For testing, you should start a rabbitmq 3.6.x container with rabbitmq-autocluster plugin version 0.10.0 using the following run command in two EC2 instances tagged with env=socialbase:

docker run -ti --name rmq --hostname rabbitmq \
    -e AUTOCLUSTER_CLEANUP=true \
    -e AUTOCLUSTER_DELAY=30 \
    -e AUTOCLUSTER_LOG_LEVEL=debug \
    -e AUTOCLUSTER_TYPE=aws \
    -e AWS_DEFAULT_REGION=sa-east-1 \
    -e AWS_EC2_TAGS='{"env": "socialbase"}' \
    -e CLEANUP_WARN_ONLY=false \
    -e RABBITMQ_ERLANG_COOKIE=xxx \
    -e RABBITMQ_NODENAME=rabbit@rabbitmq \
  rabbitmq-image-name

Here are the details of my env:

Environment variables

/ # printenv
AUTOCLUSTER_LOG_LEVEL=debug
HOSTNAME=rabbitmq
AWS_EC2_TAGS={"env": "socialbase"}
SHLVL=2
HOME=/var/lib/rabbitmq
RABBITMQ_LOGS=-
RABBITMQ_NODENAME=rabbit@rabbitmq
RABBITMQ_ERLANG_COOKIE=xxx
AUTOCLUSTER_TYPE=aws
S6_FILE=s6-overlay-amd64.tar.gz
AWS_DEFAULT_REGION=sa-east-1
RABBITMQ_GPG_KEY=0A9AF2115F4687BD29803A206B73A36E6026DFCA
RABBITMQ_VERSION=3.6.12
TERM=xterm
AUTOCLUSTER_CLEANUP=true
AUTOCLUSTER_DELAY=30
RABBITMQ_SASL_LOGS=-
PATH=/opt/rabbitmq/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
CONFD_FILE=confd-0.11.0-linux-amd64.gz
BIN_PATH=/usr/local/bin
PWD=/
RABBITMQ_HOME=/opt/rabbitmq
CLEANUP_WARN_ONLY=false
DOCKERIZE_FILE=dockerize-linux-amd64-v0.2.0.tar.gz
RABBITMQ_GITHUB_TAG=rabbitmq_v3_6_12

RabbitMQ/Erlang Version

/ # rabbitmqctl status
Status of node rabbit@rabbitmq
[{pid,2222},
 {running_applications,
     [{autocluster,
          "Forms RabbitMQ clusters using a variety of backends (AWS EC2, DNS, Consul, Kubernetes, etc)",
          "0.9.0+4.g0e7899d"},
      {rabbitmq_management,"RabbitMQ Management Console","3.6.12"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.12"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.12"},
      {rabbitmq_delayed_message_exchange,"RabbitMQ Delayed Message Exchange",
          "0.0.1"},
      {rabbit,"RabbitMQ","3.6.12"},
      {mnesia,"MNESIA  CXC 138 12","4.13.4"},
      {amqp_client,"RabbitMQ AMQP Client","3.6.12"},
      {rabbit_common,
          "Modules shared by rabbitmq-server and rabbitmq-erlang-client",
          "3.6.12"},
      {cowboy,"Small, fast, modular HTTP server.","1.0.4"},
      {xmerl,"XML parser","1.3.10"},
      {os_mon,"CPO  CXC 138 46","2.4"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.3.0"},
      {rabbitmq_aws,
          "A minimalistic AWS API interface used by rabbitmq-autocluster (3.6.x) and other RabbitMQ plugins",
          "3.6.13.milestone1+2.g946e794"},
      {ssl,"Erlang/OTP SSL application","7.3.1"},
      {public_key,"Public key infrastructure","1.1.1"},
      {asn1,"The Erlang ASN1 compiler version 4.0.2","4.0.2"},
      {compiler,"ERTS  CXC 138 10","6.0.3"},
      {cowlib,"Support library for manipulating Web protocols.","1.0.2"},
      {crypto,"CRYPTO","3.6.3"},
      {syntax_tools,"Syntax tools","1.7"},
      {inets,"INETS  CXC 138 49","6.2.2"},
      {sasl,"SASL  CXC 138 11","2.7"},
      {stdlib,"ERTS  CXC 138 10","2.8"},
      {kernel,"ERTS  CXC 138 10","4.2"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 18 [erts-7.3.1] [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true]\n"},
 {memory,
     [{connection_readers,0},
      {connection_writers,0},
      {connection_channels,0},
      {connection_other,2712},
      {queue_procs,2712},
      {queue_slave_procs,0},
      {plugins,1895896},
      {other_proc,23448304},
      {metrics,194360},
      {mgmt_db,142288},
      {mnesia,68456},
      {other_ets,2247272},
      {binary,123032},
      {msg_index,40864},
      {code,28026822},
      {atom,1033377},
      {other_system,14351505},
      {total,71577600}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{http,15672,"::"}]},
 {vm_memory_calculation_strategy,rss},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,429496729},
 {disk_free_limit,50000000},
 {disk_free,6930915328},
 {file_descriptors,
     [{total_limit,924},{total_used,2},{sockets_limit,829},{sockets_used,0}]},
 {processes,[{limit,1048576},{used,334}]},
 {run_queue,0},
 {uptime,187},
 {kernel,{net_ticktime,60}}]

RabbitMQ server and client application log files

=INFO REPORT==== 31-Oct-2017::10:21:16 ===
Starting RabbitMQ 3.6.12 on Erlang 18.3.2
Copyright (C) 2007-2017 Pivotal Software, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/

              RabbitMQ 3.6.12. Copyright (C) 2007-2017 Pivotal Software, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: tty
  ######  ##        tty
  ##########
              Starting broker...

=INFO REPORT==== 31-Oct-2017::10:21:16 ===
node           : rabbit@rabbitmq
home dir       : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash    : Q8goAtoH0kfw25pdUQxb2Q==
log            : tty
sasl log       : tty
database dir   : /var/lib/rabbitmq/mnesia/rabbit@rabbitmq

=INFO REPORT==== 31-Oct-2017::10:21:18 ===
autocluster: log level set to debug

=INFO REPORT==== 31-Oct-2017::10:21:18 ===
autocluster: Running discover/join step

=INFO REPORT==== 31-Oct-2017::10:21:18 ===
    application: mnesia
    exited: stopped
    type: temporary

=INFO REPORT==== 31-Oct-2017::10:21:18 ===
autocluster: Apps 'rabbit' and 'mnesia' successfully stopped

=INFO REPORT==== 31-Oct-2017::10:21:18 ===
autocluster: Running step initialize_backend

=INFO REPORT==== 31-Oct-2017::10:21:18 ===
autocluster: Using AWS backend

=INFO REPORT==== 31-Oct-2017::10:21:18 ===
autocluster: Starting dependencies of backend aws: [rabbitmq_aws]

=INFO REPORT==== 31-Oct-2017::10:21:18 ===
autocluster: Running step acquire_startup_lock

=INFO REPORT==== 31-Oct-2017::10:21:18 ===
autocluster: Delaying startup for 27737ms.

=INFO REPORT==== 31-Oct-2017::10:21:45 ===
autocluster: Running step find_best_node_to_join

=INFO REPORT==== 31-Oct-2017::10:21:45 ===
autocluster: Setting region: "sa-east-1"

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: AWS request: /?Action=DescribeInstances&Filter.1.Name=tag%3Aenv&Filter.1.Value.1=socialbase&Version=2015-10-01
Response: [{"DescribeInstancesResponse",
            [{"requestId","f9c2ee3f-4292-4515-a1c5-735021b18161"},
             {"reservationSet",
              [{"item",
                [{"reservationId","r-0c26f5305400263de"},
                 {"ownerId","832266673134"},
                 {"groupSet",[]},
                 {"instancesSet",
                  [{"item",
                    [{"instanceId","i-01b5692949213d3c9"},
                     {"imageId","ami-ae0971c2"},
                     {"instanceState",[{"code","16"},{"name","running"}]},
                     {"privateDnsName",
                      "ip-10-0-3-20.sa-east-1.compute.internal"},
                     {"dnsName",
                      "ec2-54-233-190-88.sa-east-1.compute.amazonaws.com"},
                     {"reason",[]},
                     {"keyName","mysshkey"},
                     {"amiLaunchIndex","0"},
                     {"productCodes",[]},
                     {"instanceType","c3.xlarge"},
                     {"launchTime","2017-10-31T09:11:59.000Z"},
                     {"placement",
                      [{"availabilityZone","sa-east-1c"},
                       {"groupName",[]},
                       {"tenancy","default"}]},
                     {"monitoring",[{"state","disabled"}]},
                     {"subnetId","subnet-b90123ff"},
                     {"vpcId","vpc-656bd800"},
                     {"privateIpAddress","10.0.3.20"},
                     {"ipAddress","54.233.190.88"},
                     {"sourceDestCheck","true"},
                     {"groupSet",
                      [{"item",
                        [{"groupId","sg-88d9a5ef"},{"groupName","SG_ECS"}]}]},
                     {"architecture","x86_64"},
                     {"rootDeviceType","ebs"},
                     {"rootDeviceName","/dev/xvda"},
                     {"blockDeviceMapping",
                      [{"item",
                        [{"deviceName","/dev/xvda"},
                         {"ebs",
                          [{"volumeId","vol-05afac15ca8df17b4"},
                           {"status","attached"},
                           {"attachTime","2017-10-31T09:12:00.000Z"},
                           {"deleteOnTermination","true"}]}]},
                       {"item",
                        [{"deviceName","/dev/xvdcz"},
                         {"ebs",
                          [{"volumeId","vol-0be274acfd288e5b6"},
                           {"status","attached"},
                           {"attachTime","2017-10-31T09:12:00.000Z"},
                           {"deleteOnTermination","true"}]}]}]},
                     {"instanceLifecycle","spot"},
                     {"spotInstanceRequestId","sir-2m4rdafm"},
                     {"virtualizationType","hvm"},
                     {"clientToken","e308a29e-27fc-4d9d-8f78-c4dd35f38d2d"},
                     {"tagSet",
                      [{"item",
                        [{"key","weave:peerGroupName"},
                         {"value","socialbase"}]},
                       {"item",
                        [{"key","aws:ec2spot:fleet-request-id"},
                         {"value",
                          "sfr-25ac2bb7-6c67-42b5-b2b0-26683c7f6496"}]},
                       {"item",[{"key","ssh_user"},{"value","ec2-user"}]},
                       {"item",[{"key","ssh_port"},{"value","22"}]},
                       {"item",[{"key","Name"},{"value","ecs"}]},
                       {"item",
                        [{"key","ssh_key"},{"value","mysshkey.pem"}]},
                       {"item",[{"key","env"},{"value","socialbase"}]}]},
                     {"hypervisor","xen"},
                     {"networkInterfaceSet",
                      [{"item",
                        [{"networkInterfaceId","eni-33bdd52b"},
                         {"subnetId","subnet-b90123ff"},
                         {"vpcId","vpc-656bd800"},
                         {"description",[]},
                         {"ownerId","832266673134"},
                         {"status","in-use"},
                         {"macAddress","0a:57:74:cc:cf:4c"},
                         {"privateIpAddress","10.0.3.20"},
                         {"privateDnsName",
                          "ip-10-0-3-20.sa-east-1.compute.internal"},
                         {"sourceDestCheck","true"},
                         {"groupSet",
                          [{"item",
                            [{"groupId","sg-88d9a5ef"},
                             {"groupName","SG_ECS"}]}]},
                         {"attachment",
                          [{"attachmentId","eni-attach-6ef50185"},
                           {"deviceIndex","0"},
                           {"status","attached"},
                           {"attachTime","2017-10-31T09:11:59.000Z"},
                           {"deleteOnTermination","true"}]},
                         {"association",
                          [{"publicIp","54.233.190.88"},
                           {"publicDnsName",
                            "ec2-54-233-190-88.sa-east-1.compute.amazonaws.com"},
                           {"ipOwnerId","amazon"}]},
                         {"privateIpAddressesSet",
                          [{"item",
                            [{"privateIpAddress","10.0.3.20"},
                             {"privateDnsName",
                              "ip-10-0-3-20.sa-east-1.compute.internal"},
                             {"primary","true"},
                             {"association",
                              [{"publicIp","54.233.190.88"},
                               {"publicDnsName",
                                "ec2-54-233-190-88.sa-east-1.compute.amazonaws.com"},
                               {"ipOwnerId","amazon"}]}]}]}]}]},
                     {"iamInstanceProfile",
                      [{"arn",
                        "arn:aws:iam::832266673134:instance-profile/ecsInstanceRole"},
                       {"id","AIPAJOCNFNPLL25WXF3DQ"}]},
                     {"ebsOptimized","false"},
                     {"enaSupport","true"}]}]},
                 {"requesterId","AIDAJKWGGSFI5CGBVYWOY"}]},
               {"item",
                [{"reservationId","r-038d132ce42f84dec"},
                 {"ownerId","832266673134"},
                 {"groupSet",[]},
                 {"instancesSet",
                  [{"item",
                    [{"instanceId","i-03972fdcc1fef7506"},
                     {"imageId","ami-ae0971c2"},
                     {"instanceState",[{"code","16"},{"name","running"}]},
                     {"privateDnsName",
                      "ip-10-0-1-162.sa-east-1.compute.internal"},
                     {"dnsName",
                      "ec2-54-233-153-55.sa-east-1.compute.amazonaws.com"},
                     {"reason",[]},
                     {"keyName","mysshkey"},
                     {"amiLaunchIndex","0"},
                     {"productCodes",[]},
                     {"instanceType","c3.xlarge"},
                     {"launchTime","2017-10-31T09:43:08.000Z"},
                     {"placement",
                      [{"availabilityZone","sa-east-1a"},
                       {"groupName",[]},
                       {"tenancy","default"}]},
                     {"monitoring",[{"state","disabled"}]},
                     {"subnetId","subnet-9018def5"},
                     {"vpcId","vpc-656bd800"},
                     {"privateIpAddress","10.0.1.162"},
                     {"ipAddress","54.233.153.55"},
                     {"sourceDestCheck","true"},
                     {"groupSet",
                      [{"item",
                        [{"groupId","sg-88d9a5ef"},{"groupName","SG_ECS"}]}]},
                     {"architecture","x86_64"},
                     {"rootDeviceType","ebs"},
                     {"rootDeviceName","/dev/xvda"},
                     {"blockDeviceMapping",
                      [{"item",
                        [{"deviceName","/dev/xvda"},
                         {"ebs",
                          [{"volumeId","vol-01a8d67bf15d79b4c"},
                           {"status","attached"},
                           {"attachTime","2017-10-31T09:43:08.000Z"},
                           {"deleteOnTermination","true"}]}]},
                       {"item",
                        [{"deviceName","/dev/xvdcz"},
                         {"ebs",
                          [{"volumeId","vol-0864bd115f76fc03d"},
                           {"status","attached"},
                           {"attachTime","2017-10-31T09:43:08.000Z"},
                           {"deleteOnTermination","true"}]}]}]},
                     {"instanceLifecycle","spot"},
                     {"spotInstanceRequestId","sir-j8sgctnm"},
                     {"virtualizationType","hvm"},
                     {"clientToken","a8fe9bf3-069a-4075-b8ea-bb8e3391aa7e"},
                     {"tagSet",
                      [{"item",[{"key","env"},{"value","socialbase"}]},
                       {"item",[{"key","Name"},{"value","ecs"}]},
                       {"item",
                        [{"key","aws:ec2spot:fleet-request-id"},
                         {"value",
                          "sfr-25ac2bb7-6c67-42b5-b2b0-26683c7f6496"}]},
                       {"item",[{"key","ssh_user"},{"value","ec2-user"}]},
                       {"item",
                        [{"key","weave:peerGroupName"},
                         {"value","socialbase"}]},
                       {"item",
                        [{"key","ssh_key"},{"value","mysshkey.pem"}]},
                       {"item",[{"key","ssh_port"},{"value","22"}]}]},
                     {"hypervisor","xen"},
                     {"networkInterfaceSet",
                      [{"item",
                        [{"networkInterfaceId","eni-0b052c07"},
                         {"subnetId","subnet-9018def5"},
                         {"vpcId","vpc-656bd800"},
                         {"description",[]},
                         {"ownerId","832266673134"},
                         {"status","in-use"},
                         {"macAddress","02:ab:16:6e:be:2e"},
                         {"privateIpAddress","10.0.1.162"},
                         {"privateDnsName",
                          "ip-10-0-1-162.sa-east-1.compute.internal"},
                         {"sourceDestCheck","true"},
                         {"groupSet",
                          [{"item",
                            [{"groupId","sg-88d9a5ef"},
                             {"groupName","SG_ECS"}]}]},
                         {"attachment",
                          [{"attachmentId","eni-attach-e8929901"},
                           {"deviceIndex","0"},
                           {"status","attached"},
                           {"attachTime","2017-10-31T09:43:08.000Z"},
                           {"deleteOnTermination","true"}]},
                         {"association",
                          [{"publicIp","54.233.153.55"},
                           {"publicDnsName",
                            "ec2-54-233-153-55.sa-east-1.compute.amazonaws.com"},
                           {"ipOwnerId","amazon"}]},
                         {"privateIpAddressesSet",
                          [{"item",
                            [{"privateIpAddress","10.0.1.162"},
                             {"privateDnsName",
                              "ip-10-0-1-162.sa-east-1.compute.internal"},
                             {"primary","true"},
                             {"association",
                              [{"publicIp","54.233.153.55"},
                               {"publicDnsName",
                                "ec2-54-233-153-55.sa-east-1.compute.amazonaws.com"},
                               {"ipOwnerId","amazon"}]}]}]}]}]},
                     {"iamInstanceProfile",
                      [{"arn",
                        "arn:aws:iam::832266673134:instance-profile/ecsInstanceRole"},
                       {"id","AIPAJOCNFNPLL25WXF3DQ"}]},
                     {"ebsOptimized","false"},
                     {"enaSupport","true"}]}]},
                 {"requesterId","AIDAJKWGGSFI5CGBVYWOY"}]}]}]}]


=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: List of registered nodes retrieved from the backend: ['rabbit@ip-10-0-1-162',
                                                                   'rabbit@ip-10-0-3-20']

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: Fetching node details. Unreachable nodes (or nodes that responded with an error): ['rabbit@ip-10-0-1-162',
                                                                                                'rabbit@ip-10-0-3-20']

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: Fetching node details. Responses: []

=ERROR REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: No nodes to choose the preferred from!

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: Picked node as the preferred choice for joining: undefined

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: Running step maybe_cluster

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: We are the first node in the cluster, starting up unconditionally.

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: Starting back 'rabbit' application

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Memory high watermark set to 409 MiB (429496729 bytes) of 1024 MiB (1073741824 bytes) total

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Enabling free disk space monitoring

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Disk free limit set to 50MB

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Limiting to approx 924 file handles (829 sockets)

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
FHC read buffering:  OFF
FHC write buffering: ON

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Waiting for Mnesia tables for 30000 ms, 9 retries left

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Waiting for Mnesia tables for 30000 ms, 9 retries left

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Priority queues enabled, real BQ is rabbit_variable_queue

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Starting rabbit_node_monitor

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Management plugin: using rates mode 'basic'

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
started TCP Listener on [::]:5672

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: Running registeration step

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: Running step register_with_backend

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: Running lock release step

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: Running step release_startup_lock

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Management plugin started. Port: 15672

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Statistics database started.

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
autocluster: (cleanup) Timer started {60,false}
 completed with 9 plugins.

=INFO REPORT==== 31-Oct-2017::10:21:46 ===
Server startup complete; 9 plugins started.
 * autocluster
 * rabbitmq_management
 * rabbitmq_web_dispatch
 * rabbitmq_management_agent
 * rabbitmq_delayed_message_exchange
 * amqp_client
 * cowboy
 * rabbitmq_aws
 * cowlib

RabbitMQ plugin information via rabbitmq-plugins list

/ # rabbitmq-plugins list
 Configured: E = explicitly enabled; e = implicitly enabled
 | Status:   * = running on rabbit@rabbitmq
 |/
[e*] amqp_client                       3.6.12
[E*] autocluster                       0.9.0+4.g0e7899d
[e*] cowboy                            1.0.4
[e*] cowlib                            1.0.2
[  ] rabbitmq_amqp1_0                  3.6.12
[  ] rabbitmq_auth_backend_ldap        3.6.12
[  ] rabbitmq_auth_mechanism_ssl       3.6.12
[e*] rabbitmq_aws                      3.6.13.milestone1+2.g946e794
[  ] rabbitmq_consistent_hash_exchange 3.6.12
[E*] rabbitmq_delayed_message_exchange 0.0.1
[  ] rabbitmq_event_exchange           3.6.12
[  ] rabbitmq_federation               3.6.12
[  ] rabbitmq_federation_management    3.6.12
[  ] rabbitmq_jms_topic_exchange       3.6.12
[E*] rabbitmq_management               3.6.12
[e*] rabbitmq_management_agent         3.6.12
[  ] rabbitmq_management_visualiser    3.6.12
[  ] rabbitmq_mqtt                     3.6.12
[  ] rabbitmq_recent_history_exchange  3.6.12
[  ] rabbitmq_sharding                 3.6.12
[  ] rabbitmq_shovel                   3.6.12
[  ] rabbitmq_shovel_management        3.6.12
[  ] rabbitmq_stomp                    3.6.12
[  ] rabbitmq_top                      3.6.12
[  ] rabbitmq_tracing                  3.6.12
[  ] rabbitmq_trust_store              3.6.12
[e*] rabbitmq_web_dispatch             3.6.12
[  ] rabbitmq_web_mqtt                 3.6.12
[  ] rabbitmq_web_mqtt_examples        3.6.12
[  ] rabbitmq_web_stomp                3.6.12
[  ] rabbitmq_web_stomp_examples       3.6.12
[  ] sockjs                            0.3.4

Operating system, version, and patch level

Alpine Linux 3.4

rabbitmq-collect-env

https://drive.google.com/file/d/0B7odw6Q-9mFdSzFZWDl5eFBRZ1k/view?usp=sharing

The rabbitmq state has only the current server

[root@k8smaster k8s_statefulsets]# FIRST_POD=$(kubectl get pods --namespace test-rabbitmq -l 'app=rabbitmq' -o jsonpath='{.items[0].metadata.name }')
[root@k8smaster k8s_statefulsets]# kubectl exec --namespace=test-rabbitmq $FIRST_POD rabbitmqctl cluster_status
Cluster status of node '[email protected]'
[{nodes,[{disc,['[email protected]']}]},
{running_nodes,['[email protected]']},
{cluster_name,<<"[email protected]">>},
{partitions,[]},
{alarms,[{'[email protected]',[]}]}]

The rabbitmq state has only the current server, and does not show the state of the entire cluster. What's the problem?thank you

Plugin not compatible to consul 1.0.0

Hi, I've updated consul and noticed that consul was rejecting the service registrarion with this message:

[ERR] http: Request POST /v1/agent/service/register, error: method POST not allowed from=127.0.0.1:34213

I've checked the consul changelog (https://github.com/hashicorp/consul/blob/master/CHANGELOG.md) and they now are accepting only PUT (not POST) for this endpoint.

As per https://github.com/aweber/rabbitmq-autocluster/blob/a8271e8d71b38dd917957aee0f4bd35d055f43f6/src/autocluster_consul.erl#L88 - rabbitmq-autocluster is sending a POST.

Can we change this for PUT to make it compatible?

Autocluster attempts to create a session in Consul before session endpoint ready

We're seeing this when bringing up a RabbitMQ cluster on the same nodes that act as Consul servers: when the autocluster plugin attempts to connect to the session endpoint to create a lock, in order to overcome the race condition on startup, Consul returns a 500 when the session endpoint is unavailable, and as a result the RabbitMQ server does not start at all.

We've been able to work around this by creating a script which polls the session endpoint in Consul until it's available, and then in our Puppet manifests, we ensure this script runs first before the RabbitMQ server is started.

Ideally, we'd expect that the autocluster plugin would poll for the availability of the session endpoint before creating a session/lock, and retry - without this, the Rabbit daemon doesn't start properly, so it feels like this is something that the autocluster plugin should be doing, to make sure Consul is completely ready before attempting to create the lock/session.

Nodes fail to communicate with peers in AWS Autoscaling Group

I've been trying to get this plugin working for a few days now and cannot seem to get it to create a cluster. Please let me know what I'm doing wrong or if there's a legit bug in the plugin.

I'm running RabbitMQ within a Docker container, hosted on EC2 instances in an AutoScaling Group. There is only one container running on each server.

The attached zip file has the Dockerfile and resources it needs to build.

rabbit-autocluster-docker.zip

My instances use the following User Data script to configure Rabbit as a systemd service (on CentOS 7).

#!/bin/bash
mkdir /root/.docker
chmod 700 /root/.docker
aws configure set default.s3.signature_version s3v4
aws s3 cp s3://my-config-bucket/docker-config-for-private-registry.json /root/.docker/config.json
chmod 600 /root/.docker/config.json

cat >> /etc/systemd/system/rabbit-docker.service <<EOF
[Unit]
Description=RabbitMQ Docker Container
Requires=docker.service
After=docker.service

[Service]
Restart=always
ExecStartPre=/usr/bin/docker volume create --name=rabbit-data
ExecStartPre=/usr/bin/docker pull my.private.registry/myco/rabbit:dev
ExecStart=/usr/bin/docker run --name rabbitmq \
                              --log-driver=awslogs \
                              --log-opt awslogs-region=us-east-1 \
                              --log-opt awslogs-group=/rabbit \
                              --log-opt awslogs-stream=${HOSTNAME} \
                              -p 4369:4369 \
                              -p 5671:5671 \
                              -p 5672:5672 \
                              -p 15672:15672 \
                              -p 25672:25672 \
                              -v rabbit-data:/var/lib/rabbitmq \
                              -e ERLANG_COOKIE=BananaChocolateChip_TryIt_Really_ItsGood \
                              -e RABBITMQ_NODENAME=rabbit@${HOSTNAME}.mydomain.com \
                              -e RABBITMQ_USE_LONGNAME=true \
                              -e AUTOCLUSTER_DELAY=10 \
                              -e AUTOCLUSTER_LOG_LEVEL=debug \
                              -e AUTOCLUSTER_CLEANUP=true \
                              -e CLEANUP_WARN_ONLY=false \
                              -e AWS_AUTOSCALING=true \
                              -e AWS_EC2_TAGS={\"Name\":\"rabbit-autocluster-test\"} \
                              -e AWS_USE_PRIVATE_IP=false \
                              --network host \
                              my.private.registry/myco/rabbit:dev
ExecStop=/usr/bin/docker stop -t 2 rabbitmq
ExecStopPost=/usr/bin/docker rm -f rabbitmq

[Install]
WantedBy=default.target
EOF

systemctl daemon-reload
systemctl start rabbit-docker.service
systemctl enable rabbit-docker.service

The instances are launched in a VPC, with a private subnet connected to a NAT Gateway. The mydomain.com DNS is managed in Route53 and has both forward and reverse lookup entries for all the IP addresses in the subnet (e.g. ip-192-168-205-21.mydomain.com. A 192.168.205.21, 21.205.168.192.in-addr.arpa. PTR ip-192-168-205-21.mydomain.com).

The security group for the nodes allows access to the following ports to all members of the security group:

  • 1883
  • 4369
  • 5671
  • 5672
  • 8883
  • 15672
  • 15674
  • 15675
  • 25672
  • 61613
  • 61614

It also allows traffic on 5672 and 15672 from an ELB (classic) used by clients to connect to the cluster and management ports.

I'm seeing the following error log after the plugin retrieves the Nodes list from AWS:

=INFO REPORT==== 27-Apr-2017::18:24:07 ===
autocluster: Fetching autoscaling = DNS: ["ip-192-168-205-21.ec2.internal"]
=INFO REPORT==== 27-Apr-2017::18:24:07 ===
autocluster: Registering node with aws.
=INFO REPORT==== 27-Apr-2017::18:24:07 ===
autocluster: Registered node with aws.
=INFO REPORT==== 27-Apr-2017::18:24:07 ===
autocluster: Discovered ['[email protected]']
=ERROR REPORT==== 27-Apr-2017::18:24:07 ===
autocluster: Can not communicate with cluster nodes.

This occurs when looking up nodes by autoscaling group, tag only, or a combination of the two.

Please support microsoft service fabric on linux and maybe windows too

Service fabric supports running on Linux, it can run docker containers, it has a service discovery service named the naming service much like consul and it can even run docker compose files and it even has it's own DNS service that can facilitate service discovery based on the state in the naming service. Please add this is a supported backend, it would be amazing.

https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-docker-compose
https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-dnsservice
https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-connect-and-communicate-with-services

0.9.0 is incompatible with official rabbitmq Docker image

  • RabbitMQ version: 3.6.12
  • Erlang version: 19.2.1

When running version 0.9.0 in the official rabbitmq:3.6.12-management Docker image, it fails to start with this error:

Failed to enable plugin "autocluster": it may have been built with an incompatible (more recent?) version of Erlang

Would it be possible to release a version of 0.9.0 that is compatible with this Docker image?

Add startup lock support for Consul backend

I actually began working on this one on the back of #6, and it is sort of coming together, hence opening an issue to check if there is any work lined up or done already to avoid duplicated effort. Otherwise happy to submit for an early review in a few days.

AWS instance cannot create cluster with other nodes within the same AWS autoscaling group

Hi all,
I face a problem while trying to cluster two nodes that belong to the same autoscaling group.

I have two AWS instances (Centos7) within the same AWS autoscaling group and each instance has RabbitMQ 3.6.10 with Erlang/OTP 20 installed. I also installed and enabled the rabbitmq-autocluster plugin 0.8.0

Here's the rabbitmq.config file in both instances:

[
{rabbit, [
{autocluster_log_level, info}
]},
{autocluster, [
{backend, aws},
{aws_autoscaling, true},
{aws_ec2_region, "eu-west-1"},
{aws_access_key, "my_access_key"},
{aws_secret_key, "my_secret_access_key"}
]}
].

I start the first RMQ server in the first instance (rabbit@ip-172-31-20-113). It creates its own single-node cluster as expected.

BUT, when I start the RMQ server in the second instance (rabbit@ip-172-31-16-139) it does not get clustered with the first instance although it recognizes that both of them belong to the same autoscaling group.
Here's the rabbitmq log from the second RMQ server (rabbit@ip-172-31-16-139):

=INFO REPORT==== 28-Sep-2017::08:32:30 ===
autocluster: List of registered nodes retrieved from the backend: ['rabbit@ip-172-31-20-113', 'rabbit@ip-172-31-16-139'] -----> As you can see autocluster plugin retrieved the nodes from the scaling group.

=ERROR REPORT==== 28-Sep-2017::08:32:30 ===
autocluster: No nodes to choose the preferred from!

=INFO REPORT==== 28-Sep-2017::08:32:30 ===
autocluster: Picked node as the preferred choice for joining: undefined

=INFO REPORT==== 28-Sep-2017::08:32:30 ===
autocluster: Running step maybe_cluster

=INFO REPORT==== 28-Sep-2017::08:32:30 ===
autocluster: We are the first node in the cluster, starting up unconditionally.

Why doesn't the 2nd instance choose to enter the 1st instance cluster?

I would appreciate any help!

Cannot get list of discovered service from consul

I have a setup where the consul agent is running on my host machine and I am using pivotal/rabbitmq docker image with autocluster on.

My docker compose looks like below

version: "2"
services:
   rabbit:
    environment:
      - TCP_PORTS=15672, 5672,25672,4369,8500
      - AUTOCLUSTER_TYPE=consul
      - AUTOCLUSTER_DELAY=60
      - CONSUL_HOST=localhost
      - CONSUL_ACL_TOKEN=b7862315-05fe-4dda-8b13-4f533ec4e205
      - CONSUL_SVC=terracotta_rabbitmq
      - AUTOCLUSTER_CLEANUP=true
      - CLEANUP_WARN_ONLY=false
      - CONSUL_DEREGISTER_AFTER=60
      - AUTOCLUSTER_LOG_LEVEL=debug
      - CONSUL_SVC_ADDR_AUTO=true
      - CONSUL_SVC_ADDR_NODENAME=true
    image: pivotalrabbitmq/rabbitmq-autocluster
    expose:
      - 15672
      - 5672
      - 5671
      - 15671
      - 33431
      - 8300
      - 4369
      - 25672
      - 8500
    tty: true
    network_mode: host
    command:  sh -c "sleep 20; rabbitmq-server;"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Strangely, on the step where it checks for services that are passing on consul, it gets empty list when queried from inside docker autocluster plugin.

bit_1_303964dae0b0 | autocluster: GET http://localhost:8500/v1/health/service/terracotta_rabbitmq
rabbit_1_303964dae0b0 | 
rabbit_1_303964dae0b0 | =INFO REPORT==== 9-Jan-2019::07:35:00 ===
rabbit_1_303964dae0b0 | autocluster: Response: [{ok,{{"HTTP/1.1",200,"OK"},
rabbit_1_303964dae0b0 |                              [{"date","Wed, 09 Jan 2019 07:35:00 GMT"},
rabbit_1_303964dae0b0 |                               {"content-length","2"},
rabbit_1_303964dae0b0 |                               {"content-type","application/json"},
rabbit_1_303964dae0b0 |                               {"x-consul-index","26252192"},
rabbit_1_303964dae0b0 |                               {"x-consul-knownleader","true"},
rabbit_1_303964dae0b0 |                               {"x-consul-lastcontact","0"}],
rabbit_1_303964dae0b0 |                              "[]"}}]

But when i curl from inside the docker sh, i get the list of services correctly. Because it is not able to identify passing services from consul, the cluster isnt forming.

RabbittMQ Server takes about 3-4 minutes to start with autocluster enabled

I am trying to setup a rmq cluster with autocluster plugin. This is what I am observing:

  • With version 0.7.0 : rmq server takes about 3-4 minutes to start with this plugin enabled.
  • With version 0.6.1 : Works perfectly fine!

The only difference I see b/w 0.6.1 and 0.7.0 is that 0.7.0 requires autocluster_aws plugin as a dependency. I also see that with 0.7.0 (with 3-4 min delay), I see the following error log:

=ERROR REPORT==== 19-May-2017::02:22:26 ===
Failed to retrieve AWS credentials: undefined

This is with the backend configured as etcd. Not sure why aws is checked for.

Below is my config (along with RABBITMQ_USE_LONGNAME=true ):

[
  {rabbit, [
    {tcp_listen_options, [
                          {backlog,       128},
                          {nodelay,       true},
                          {linger,        {true,0}},
                          {exit_on_close, false},
                          {sndbuf,        12000},
                          {recbuf,        12000}
                         ]},
    {loopback_users, []}
  ]},

  {autocluster, [
    {dummy_param_without_comma, true},
    {backend, etcd},
    {autocluster_failure, stop},
    {cleanup_interval, 30},
    {cluster_cleanup, true},
    {cleanup_warn_only, false},
    {etcd_ttl, 30},
    {etcd_scheme, http},
    {etcd_host, "etcd.kube-system.svc.cluster.local"},
    {etcd_port, 2379}
   ]}
].

rabbitmq-autocluster does not function with consul 1.0.0 due to API changes

rabbitmq-autocluster does not function with consul 1.0.0 due to API changes, roll back to 0.9.3 and it works as expected

docker stack deploy -c rabbit-cluster.yml rabbit-cluster

rabbit-cluster.yml

version: '3'

#customize this with options from
#https://www.consul.io/docs/agent/options.html

services:
  consul:
    hostname: consul
    # image: consul:0.9.3 #Works
    image: consul:1.0.0 # Does not Work
    deploy:
      replicas: 1
    environment:
      - "CONSUL_LOCAL_CONFIG={\"disable_update_check\": true}"
      - "CONSUL_BIND_INTERFACE=eth0"
      - "CONSUL_HTTP_ADDR=0.0.0.0"
    entrypoint:
      - consul
      - agent
      - -server
      - -bootstrap-expect=1
      - -data-dir=/tmp/consuldata
      - -bind={{ GetInterfaceIP "eth0" }}
      - -client=0.0.0.0
      - -ui
    networks:
      - "consul"
    ports:
      - "8500:8500"
      - "8600:8600"

  rabbit:
    depends_on:
      - "consul"
    environment:
      - AUTOCLUSTER_TYPE=consul
      - CONSUL_HOST=consul
      - CONSUL_PORT=8500
      - CONSUL_SERVICE_TTL=60
      - AUTOCLUSTER_CLEANUP=true
      - CLEANUP_WARN_ONLY=false
      - CONSUL_SVC_ADDR_AUTO=true
      - CONSUL_DEREGISTER_AFTER=60
    networks:
      - "consul"
    image: rabbitmq-autocluster
    ports:
      - "15672:15672"
    tty: true
    command:  sh -c "sleep 20; rabbitmq-server;"

networks:
  consul:
    driver: overlay

rabbitmq log reports
“autocluster: Step register_with_backend failed, will conitnue nevertheless. Failure reason: Failed to register in backend: 405.”

consul log reports
[ERR] http: Request POST /v1/agent/service/register, error: method POST not allowed from=10.0.1.5:39340

From https://www.consul.io/docs/upgrade-specific.html it appears Consul now requires PUT not POST for /v1/agent/service/register as well as a number of other endpoints.

Handle "notReadyAddresses" in kubernetes

Using the StatefulSets the API:

curl -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/test-rabbitmq/endpoints/rabbitmq

gets this result:

 "subsets": [
    {
      "notReadyAddresses": [
        {
          "ip": "172.17.0.2",
          "hostname": "rabbitmq-0",
          "nodeName": "minikube",
          "targetRef": {
            "kind": "Pod",
            "namespace": "test-rabbitmq",
            "name": "rabbitmq-0",
            "uid": "3523b6ad-3a17-11e7-9fac-080027cbdcae",
            "resourceVersion": "108775"
          }
        }
      ],

notReadyAddresses means that the POD is starting and it is not ready yet.

During the startup It causes:

=INFO REPORT==== 16-May-2017::07:23:26 ===
Error description:
   {could_not_start,rabbit,
       {function_clause,
           [{autocluster_k8s,'-extract_node_list/1-lc$^1/1-1-',
                [undefined],
                [{file,"src/autocluster_k8s.erl"},{line,85}]},
            {autocluster_k8s,'-extract_node_list/1-lc$^0/1-0-',1,
                [{file,"src/autocluster_k8s.erl"},{line,85}]},
            {autocluster_k8s,extract_node_list,1,
                [{file,"src/autocluster_k8s.erl"},{line,85}]},
            {autocluster_k8s,nodelist,0,
                [{file,"src/autocluster_k8s.erl"},{line,30}]},
            {autocluster,ensure_registered,3,
                [{file,"src/autocluster.erl"},{line,107}]},
            {autocluster,init,0,[{file,"src/autocluster.erl"},{line,33}]},
            {rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,
                [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
            {rabbit_boot_steps,run_step,2,
                [{file,"src/rabbit_boot_steps.erl"},{line,49}]}]}}

Log files (may contain more information):
   tty
   tty

init terminating in do_boot ()

{"init terminating in do_boot",{could_not_start,rabbit,{function_clause,[{autocluster_k8s,'-extract_node_list/1-lc$^1/1-1-',[undefined],[{file,"src/autocluster_k8s.erl"},{line,85}]},{autocluster_k8s,'-extract_node_list/1-lc$^0/1-0-',1,[{file,"src/autocluster_k8s.erl"},{line,85}]},{autocluster_k8s,extract_node_list,1,[{file,"src/autocluster_k8s.erl"},{line,85}]},{autocluster_k8s,nodelist,0,[{file,"src/autocluster_k8s.erl"},{line,30}]},{autocluster,ensure_registered,3,[{file,"src/autocluster.erl"},{line,107}]},{autocluster,init,0,[{file,"src/autocluster.erl"},{line,33}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,"src/rabbit_boot_steps.erl"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,"src/rabbit_boot_steps.erl"},{line,49}]}]}}}

Because proplists:get_value(<<"addresses">>, Subset) is undefined

The possible solution is to return an empty list nodes in case is undefined:

get_address(Subset) ->
  case proplists:get_value(<<"addresses">>, Subset) of 
      undefined -> autocluster_log:info("No nodes ready yet!"), []; 
      Address -> Address  
  end.

%% @spec extract_node_list(k8s_endpoints()) -> list()
%% @doc Return a list of nodes
%%    see http://kubernetes.io/docs/api-reference/v1/definitions/#_v1_endpoints
%% @end
%%
extract_node_list({struct, Response}) ->
    IpLists = [[proplists:get_value(list_to_binary(autocluster_config:get(k8s_address_type)), Address)
		|| {struct, Address} <- get_address(Subset)]
	       || {struct, Subset} <- proplists:get_value(<<"subsets">>, Response)],
    sets:to_list(sets:union(lists:map(fun sets:from_list/1, IpLists))).


When serviceName changed, the cluster doesn`t work.

I modify the value "serviceName" in statefulset yaml and "name" in service yaml together.
example both are "rabbitmq2", like this:

metadata: name: rabbitmq namespace: test-rabbitmq spec: serviceName: rabbitmq2 replicas: 3 template: metadata: labels: app: rabbitmq

kind: Service apiVersion: v1 metadata: namespace: test-rabbitmq name: rabbitmq2 labels: app: rabbitmq type: LoadBalancer

then the rabbitmq node can not found each other. Just when the value is "rabbitmq", the cluster working fine.

Docker image tag

Hey guys, can we please have the docker image tagged with the version of the release on GitHub? We don't want to keep pulling the latest image in case it breaks.

K8S: Unauthorized access while connecting to api server

I try to configure autocluster plugin with k8s connection type.
I use receipts from yaml-files in example folder to deploy cluster.
Also I use configuration file from the same git repositotry.

In my installation I use rabbitmq 3.6.10 and the latest plugin 0.8.0 to deploy a cluster.

While deploying I have a next output in log files of every node, which are idependent from each other%

INFO REPORT==== 4-Aug-2017::18:07:42 ===
autocluster: (cleanup) Timer started {60,false}

=INFO REPORT==== 4-Aug-2017::18:07:42 ===
Starting RabbitMQ 3.6.10 on Erlang 19.3.6.1
Copyright (C) 2007-2017 Pivotal Software, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/

=INFO REPORT==== 4-Aug-2017::18:07:42 ===
node           : [email protected]
home dir       : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash    : JJZMksGwEn9ChnNLQbQTdw==
log            : /var/log/rabbitmq/[email protected]
sasl log       : /var/log/rabbitmq/[email protected]
database dir   : /var/lib/rabbitmq/mnesia/[email protected]

=INFO REPORT==== 4-Aug-2017::18:07:43 ===
autocluster: Running discover/join step

=INFO REPORT==== 4-Aug-2017::18:07:43 ===
autocluster: Apps 'rabbit' and 'mnesia' successfully stopped

=INFO REPORT==== 4-Aug-2017::18:07:43 ===
autocluster: Running step initialize_backend

=INFO REPORT==== 4-Aug-2017::18:07:43 ===
autocluster: Using k8s backend

=INFO REPORT==== 4-Aug-2017::18:07:43 ===
autocluster: Running step acquire_startup_lock

=INFO REPORT==== 4-Aug-2017::18:07:43 ===
autocluster: Delaying startup for 7380ms.

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
autocluster: Running step find_best_node_to_join

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
autocluster: GET https://kubernetes.default:443/api/v1/namespaces/rabbitmq/endpoints/rabbitmq

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
autocluster: Response: [{ok,{{"HTTP/1.1",401,"Unauthorized"},
                             [{"date","Fri, 04 Aug 2017 12:07:50 GMT"},
                              {"content-length","13"},
                              {"content-type","text/plain; charset=utf-8"},
                              {"x-content-type-options","nosniff"}],
                             "Unauthorized\n"}}]

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
autocluster: HTTP Response (401) Unauthorized


=INFO REPORT==== 4-Aug-2017::18:07:50 ===
autocluster: Failed to get nodes from k8s - 401

=ERROR REPORT==== 4-Aug-2017::18:07:50 ===
autocluster: Step find_best_node_to_join failed, will conitnue nevertheless. Failure reason: Failed to fetch list of nodes from the backend: "401".

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
autocluster: Starting back 'rabbit' application

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
Memory limit set to 12577MB of 15721MB total.

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
Enabling free disk space monitoring

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
Disk free limit set to 50MB

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
Limiting to approx 1048476 file handles (943626 sockets)

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
FHC read buffering:  OFF
FHC write buffering: OFF

=INFO REPORT==== 4-Aug-2017::18:07:50 ===
Database directory at /var/lib/rabbitmq/mnesia/[email protected] is empty. Initialising from scratch...

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Waiting for Mnesia tables for 30000 ms, 9 retries left

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Waiting for Mnesia tables for 30000 ms, 9 retries left

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Waiting for Mnesia tables for 30000 ms, 9 retries left

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Priority queues enabled, real BQ is rabbit_variable_queue

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Starting rabbit_node_monitor

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Management plugin: using rates mode 'basic'

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=WARNING REPORT==== 4-Aug-2017::18:07:51 ===
msg_store_persistent: rebuilding indices from scratch

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Adding vhost '/'

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Creating user 'guest'

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Setting user tags for user 'guest' to [administrator]

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Setting permissions for 'guest' in '/' to '.*', '.*', '.*'

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
started TCP Listener on 0.0.0.0:5672

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
autocluster: Running registeration step

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
autocluster: Running step register_with_backend

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
autocluster: Running lock release step

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
autocluster: Running step release_startup_lock

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Management plugin started. Port: 15672

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Statistics database started.

=INFO REPORT==== 4-Aug-2017::18:07:51 ===
Server startup complete; 11 plugins started.
 * rabbitmq_shovel_management
 * rabbitmq_management
 * rabbitmq_management_agent
 * rabbitmq_web_dispatch
 * rabbitmq_sharding
 * rabbitmq_shovel
 * cowboy
 * amqp_client
 * autocluster
 * rabbitmq_aws
 * cowlib

=INFO REPORT==== 4-Aug-2017::18:08:42 ===
autocluster: (cleanup) checking cluster

=INFO REPORT==== 4-Aug-2017::18:08:42 ===
autocluster: (cleanup) Checking for partitioned nodes.

=INFO REPORT==== 4-Aug-2017::18:08:42 ===
autocluster: (cleanup) No partitioned nodes found.

=INFO REPORT==== 4-Aug-2017::18:09:42 ===
autocluster: (cleanup) checking cluster

=INFO REPORT==== 4-Aug-2017::18:09:42 ===
autocluster: (cleanup) Checking for partitioned nodes.

=INFO REPORT==== 4-Aug-2017::18:09:42 ===
autocluster: (cleanup) No partitioned nodes found.

I have got a message from autocluster plugin about an unauthorized access:

autocluster: HTTP Response (401) Unauthorized

Also when I try to curl this address I have got the same message from kubernetes api server:

$ curl https://kubernetes.default:443/api/v1/namespaces/rabbitmq/endpoints/rabbitmq --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Unauthorized

I use environment values to deploy a statefullset:

         - name: K8S_CERT_PATH
            value: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
          - name: K8S_TOKEN_PATH
            value: "/var/run/secrets/kubernetes.io/serviceaccount/token"

These ca.crt and token surely exist and has actual content!

Is there an any way to provide also client certificate and its private key to get authenticated successfully via kubernetes api? Or may may be has an another way to resolve this issue?

Plugin re-activation can fail

When enabling the plugin on the first node in a cluster the following error is generated.

[root@db3 ~]# rabbitmq-plugins enable autocluster
The following plugins have been enabled:
  rabbitmq_aws
  autocluster

Applying plugin configuration to [email protected]... failed.
Error: {{badmatch,false},
        [{autocluster_periodic,start_delayed,3,
                               [{file,"src/autocluster_periodic.erl"},
                                {line,47}]},
         {autocluster_consul,register,0,
                             [{file,"src/autocluster_consul.erl"},{line,135}]},
         {autocluster,register_with_backend,1,
                      [{file,"src/autocluster.erl"},{line,307}]},
         {autocluster,run_steps,1,[{file,"src/autocluster.erl"},{line,131}]},
         {rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,
                            [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
         {rabbit_boot_steps,run_step,2,
                            [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
         {rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,
                            [{file,"src/rabbit_boot_steps.erl"},{line,26}]},
         {rabbit_boot_steps,run_boot_steps,1,
                            [{file,"src/rabbit_boot_steps.erl"},{line,26}]}]}

Investigating this seems to point to a race condition in the logic, The plugin runs through the steps initially and acquires a lock and possibly inserts into ets, then on determining that it is the only node goes through the steps again while still holding the initial lock.

Retaining the initial lock itself could be a problem, after the initial lock is released the process then proceeds to register with consul, after registration the issue then arises when setting up the delayed task.

It seems the initial run may have already created this, thus a false is returned from ets:insert_new whose documentation[1] says a false is returned if keys are already exist.

In this state, if you then disable the plugin, it keeps running and trying to recreate the node in consul. I suspect the delayed task is not removed.

A spiral loop is then entered because consul returns 500 result code when the a request is made to check the state of a service that does not exist.

https://github.com/rabbitmq/rabbitmq-autocluster/blob/stable/src/autocluster_consul.erl#L194
https://github.com/rabbitmq/rabbitmq-autocluster/blob/stable/src/autocluster_consul.erl#L212

The only way to get out of this is to restart the rabbitmq-server process.

I do not have much time at the moment to dig through this so i have created a workaround[2] to stop the enable error until i have time later. If someone else is able to look it to this it would be great.

[1] http://erlang.org/doc/man/ets.html#insert_new-2
[2] akissa@39e0cb4

Docker image build

Hi, I was following the steps in
https://github.com/rabbitmq/rabbitmq-autocluster/tree/master/examples/k8s_statefulsets
to build the Docker image but I get the following error when starting the service on k8s:

14:36:44.684 [error] Failed to enable plugin "rabbitmq_autocluster": it may have been built with an incompatible (more recent?) version of Erlang

BOOT FAILED
===========

Error description:
    init:start_em/1
    init:start_it/1
    rabbit:start_it/1 line 454
    rabbit:broker_start/0 line 326
    rabbit_plugins:prepare_plugins/1 line 285
    rabbit_plugins:'-prepare_plugins/1-lc$^1/1-1-'/1 line 285
    rabbit_plugins:prepare_dir_plugin/1 line 449
throw:{plugin_built_with_incompatible_erlang,"rabbitmq_autocluster"}
Log file(s) (may contain more information):
   <stdout>

{"init terminating in do_boot",{plugin_built_with_incompatible_erlang,"rabbitmq_autocluster"}}

I build it using erlang 20.0 and elixir 1.4.5. I saw after that the dev requirements mention erlang 17.5 but I wasn't able to find an elixir version that works with erlang 17.5 so I'm not sure how you guys are building it.

Is the Docker image at https://hub.docker.com/r/gsantomaggio/rabbitmq-autocluster/ built using the same steps from the master branch?

0.8.0 Release Timeline

Hello,

I saw this comment indicating that a 0.8.0 release of this plugin was coming soon:

Do you have an ETA for when a binary build of 0.8.0 may be available, as we're looking to use functionality introduced in that version. Also, will 3.6.9 be supported in the 0.8.0 release?

Cheers!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.