Giter Club home page Giter Club logo

opsviz's People

Contributors

alexlovelltroy avatar buglione avatar invalidusrname avatar jmetzmeier avatar jonathandietz avatar jondowdle avatar pkoistin avatar slb350 avatar taylorludwig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opsviz's Issues

CloudFormation setup fails at EC2 instance creation

The CloudFormation setup fails at the EC2 instance creation. When looking at OpsWorks the instances are still running the setup phase of the chef run and then eventually get setup fine, with no errors.

There should probably be a wait condition added to the CloudFormation json that waits for the EC2 instances to finish being setup before failing the CloudFormation run.

WaitCondition: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-waitcondition.html

CreationPolicy Attribute: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-creationpolicy.html

Parameterize instance sizes and number of nodes

For testing purposes we don't really need c3.large instances, and might want to customize the number of nodes for elasticsearch in particular.

This is a feature request for customizing instance sizes (per role) and number of nodes for elasticsearch for now.

foxycoder chef-logstash dependency doesn't work on centos because of upstart

The current logstash cookbook that we rely on in bb_external is https://github.com/foxycoder/chef-logstash

The readme claims that it should work in other platforms but it only has been tested in ubuntu/debian systems.

The problem is the startup script relies on 'upstart', which as far as I know is Ubuntu only (without perhaps depending on the upstart cookbook: http://upstart.ubuntu.com/cookbook/#id416

I'm not sure we should be requiring upstart jobs, but if we don't find a different provider, we should at least fork the foxycoder repo with any fixes to allow init.d script. Or systemd as RHEL7 and Ubuntu seem to be going that route: http://www.markshuttleworth.com/archives/1316

Restarting rabbitmq1 node results in failed setup on 'set_policy ha-all'

I had stopped all instances in my opswork stack and upon restarting the rabbitmq1 instance, received the following error:

[2014-12-12T14:36:27+00:00] INFO: Enabling RabbitMQ plugin 'rabbitmq_management'.
[2014-12-12T14:36:27+00:00] INFO: rabbitmq_plugin[rabbitmq_management] not queuing delayed action restart on service[rabbitmq-server] (delayed), as it's already been queued
[2014-12-12T14:36:27+00:00] INFO: Processing execute[rabbitmq-plugins enable rabbitmq_management] action run (/var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/plugin.rb line 39)
[2014-12-12T14:36:28+00:00] INFO: execute[rabbitmq-plugins enable rabbitmq_management] ran successfully
[2014-12-12T14:36:28+00:00] INFO: Processing rabbitmq_plugin[rabbitmq_management_visualiser] action enable (rabbitmq::mgmt_console line 27)
[2014-12-12T14:36:28+00:00] INFO: Enabling RabbitMQ plugin 'rabbitmq_management_visualiser'.
[2014-12-12T14:36:28+00:00] INFO: rabbitmq_plugin[rabbitmq_management_visualiser] not queuing delayed action restart on service[rabbitmq-server] (delayed), as it's already been queued
[2014-12-12T14:36:28+00:00] INFO: Processing execute[rabbitmq-plugins enable rabbitmq_management_visualiser] action run (/var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/plugin.rb line 39)
[2014-12-12T14:36:29+00:00] INFO: execute[rabbitmq-plugins enable rabbitmq_management_visualiser] ran successfully
[2014-12-12T14:36:29+00:00] INFO: Processing execute[chown -R rabbitmq:rabbitmq /var/lib/rabbitmq] action run (rabbitmq_cluster::default line 10)
[2014-12-12T14:36:29+00:00] INFO: execute[chown -R rabbitmq:rabbitmq /var/lib/rabbitmq] ran successfully
[2014-12-12T14:36:29+00:00] INFO: Processing rabbitmq_user[guest] action delete (rabbitmq_cluster::default line 12)
[2014-12-12T14:36:29+00:00] INFO: Processing rabbitmq_policy[ha-all] action set (rabbitmq_cluster::default line 16)
[2014-12-12T14:36:29+00:00] INFO: Done setting RabbitMQ policy 'ha-all'.
[2014-12-12T14:36:29+00:00] INFO: Processing execute[set_policy ha-all] action run (/var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/policy.rb line 66)

================================================================================
Error executing action `run` on resource 'execute[set_policy ha-all]'
================================================================================


Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '2'
---- Begin output of rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 ----
STDOUT: Setting policy "ha-all" for pattern "^(?!amq\\.).*" to "{\"ha-mode\":\"all\",\"ha-sync-mode\":\"automatic\"}" with priority "1" ...
STDERR: Error: unable to connect to node rabbit@rabbitmq1: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@rabbitmq1]

rabbit@rabbitmq1:
* connected to epmd (port 4369) on rabbitmq1
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq1
* suggestion: start the node

current node details:
- node name: rabbitmqctl11602@rabbitmq1
- home dir: /var/lib/rabbitmq
- cookie hash: FUWzw5ayMo2aD4GJFavYFA==
---- End output of rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 ----
Ran rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 returned 2


Resource Declaration:
---------------------
# In /var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/policy.rb

66:     execute "set_policy #{new_resource.policy}" do
67:       command cmd
68:     end
69: 



Compiled Resource:
------------------
# Declared in /var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/policy.rb:66:in `block in class_from_file'

execute("set_policy ha-all") do
action "run"
retries 0
retry_delay 2
command "rabbitmqctl set_policy ha-all \"^(?!amq\\.).*\" '{\"ha-mode\":\"all\",\"ha-sync-mode\":\"automatic\"}' --priority 1"
backup 5
returns 0
cookbook_name "rabbitmq_cluster"
end



[2014-12-12T14:36:29+00:00] INFO: Running queued delayed notifications before re-raising exception
[2014-12-12T14:36:29+00:00] INFO: template[/etc/rabbitmq/rabbitmq-env.conf] sending restart action to service[rabbitmq-server] (delayed)
[2014-12-12T14:36:29+00:00] INFO: Processing service[rabbitmq-server] action restart (rabbitmq::default line 79)
[2014-12-12T14:36:33+00:00] INFO: service[rabbitmq-server] restarted
[2014-12-12T14:36:33+00:00] ERROR: Running exception handlers
[2014-12-12T14:36:33+00:00] ERROR: Exception handlers complete
[2014-12-12T14:36:33+00:00] FATAL: Stacktrace dumped to /var/lib/aws/opsworks/cache.stage2/chef-stacktrace.out
[2014-12-12T14:36:33+00:00] ERROR: execute[set_policy ha-all] (/var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/policy.rb line 66) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '2'
---- Begin output of rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 ----
STDOUT: Setting policy "ha-all" for pattern "^(?!amq\\.).*" to "{\"ha-mode\":\"all\",\"ha-sync-mode\":\"automatic\"}" with priority "1" ...
STDERR: Error: unable to connect to node rabbit@rabbitmq1: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@rabbitmq1]

rabbit@rabbitmq1:
* connected to epmd (port 4369) on rabbitmq1
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq1
* suggestion: start the node

current node details:
- node name: rabbitmqctl11602@rabbitmq1
- home dir: /var/lib/rabbitmq
- cookie hash: FUWzw5ayMo2aD4GJFavYFA==
---- End output of rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 ----
Ran rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 returned 2
[2014-12-12T14:36:33+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

I am assuming it's a timing issue, but not 100% sure.

Logstash exchange is not created on stack setup.

The logstash rabbitmq input plugin is not currently able to create an exchange if it doesn't exist. Producers (logstash output plugins) will create exchanges if they don't exist.

This means that when opsviz stack is spun up, rabbitmq1 and logstash1 instance generate many log error messages and connections aren't established properly until the first rabbitmq logstash producer is created.

As a quick fix, the opsviz recipes should support creating the logstash exchange on the default vhost '/' during the logstash install recipe.

Here's an excerpt from the chat discussion:

[2:31 PM] Derek Downey: hmm, it doesn't do what I expected then :( They don't have an 'exchange_type' option on the rabbitmq input http://logstash.net/docs/1.4.2/inputs/rabbitmq
my thought was that 'type' was the same thing, and that it only would create the exchange if a type was also specified
[2:34 PM] Taylor Ludwig: oh gotcha, since the output has an "exchange_type", you were expecting there to be an "exchange_type" option on the input too
[2:35 PM] Derek Downey: yes
[2:35 PM] Taylor Ludwig: that's only relevant on creating the exchange, right? So maybe since the output is the only one that is actually creating the exchange if it doesn't exist, thats why its absent front the input
[2:36 PM] Alex Lovell-Troy: depends on how the code is structured, but that would make sense
[2:36 PM] Taylor Ludwig: But yeah, the "types" option is logstash's message type, so its a setting outside of rabbitmq stuff
[2:36 PM] Derek Downey: I'm still not used to the pattern that only the output (producer) is creating exchanges/queues. To me this sounds like an easy exploit.
plus the stack starts up with a bunch of errors in logstash/rabbitmq until there's at least one producer :)
[2:39 PM] Taylor Ludwig: yeah that alone seems like a good reason to create it.
snip
[2:46 PM] Alex Lovell-Troy: actually...
this should be in logstash
[2:46 PM] Derek Downey: logstash input?
[2:46 PM] Alex Lovell-Troy: yeah
[2:46 PM] Taylor Ludwig: yeah that seems more logical,
[2:46 PM] Derek Downey: that's where I would do it
snip
[2:51 PM] Taylor Ludwig: oh i thought you were talking about the logstash install recipe. I thought we determined only the output creates the exchange not the input, sorry im getting lost going through 3 chats right now
[2:54 PM] Derek Downey: we determined that is how it currently works. I think something needs to create the exchange without requiring producers to avoid the errors, but the logstash input doesn't support it (this still is crazy to me!) We had discussed using rabbitmq management plugin to create the exchange as an alternative
that's my understanding of the discussion, anyway
[2:56 PM] Taylor Ludwig: yeah so my thinking is just use the rest api to create the exchange and do it either on the logstash server install recipe or the rabbitmq install recipe
[3:04 PM] Derek Downey: if doing that, I'd have a preference towards the logstash server install recipe

Add RabbitMQ management plugin to dashboard

RabbitMQ has a dashboard https://www.rabbitmq.com/management.html

It runs on the RabbitMQ server over port 15672. The RabbitMQ ELB has a listener for port 15672, but the external security group for that ELB only uses port 5671. https://github.com/pythian/opsviz/blob/master/cloudformation.json#L1182

We should make sure this dashboard plugin is installed and available on the dashboard. It probably makes sense to use the Dashboard ELB and Nginx to forward /rabbitmq to the RabbitMQ ELB on port 15672. This puts the RMQ dashboard behind Doorman's authentication.

t2 instance sizes not supported

This is a known issue, but Bastion instance type indicates it should support t2.small and t2.medium.

These instance types do not work with the default AMI. We should support the smaller instances for testing purposes.

Creation of cert in 'create_stack' fails if stack of same name was previously created

If you use create_stack, and it fails at any point after uploading the cert, then a re-run of the script will result in the following error:

writing RSA key
Traceback (most recent call last):
  File "create_stack", line 166, in <module>
    main()
  File "create_stack", line 163, in main
    stack_creator.create_stack()
  File "create_stack", line 128, in create_stack
    self.prepare_cert()
  File "create_stack", line 72, in prepare_cert
    self.cert_arn = self.upload_cert()
  File "create_stack", line 63, in upload_cert
    private_key=self.ssl_key)
  File "/home/vagrant/.virtualenvs/opvis/lib/python2.7/site-packages/boto/iam/connection.py", line 799, in upload_server_cert
    verb='POST')
  File "/home/vagrant/.virtualenvs/opvis/lib/python2.7/site-packages/boto/iam/connection.py", line 102, in get_response
    raise self.ResponseError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 409 Conflict
<ErrorResponse xmlns="https://iam.amazonaws.com/doc/2010-05-08/">
  <Error>
    <Type>Sender</Type>
    <Code>EntityAlreadyExists</Code>
    <Message>The Server Certificate with name opsvistest1_cert already exists.</Message>
  </Error>
  <RequestId>e9c7caab-b2bf-11e4-9420-5bb7c3142238</RequestId>
</ErrorResponse>

I would expect to handle this by either removing the previous one and re-attempting, or ignoring and letting it use the existing one.

Fix NGINX proxy for the sensu healthcheck

Uchiwa makes an API call to this address:

(dashboard_url)/sensu/health/sensu

This is supposed to return an object similar to this:

{"Sensu":{"output":"ok"}}

However, nginx is redirecting it to the events page. This causes the alerts on datacenter being undefined due to the way javascript is parsing it.

screen shot 2015-03-17 at 5 24 39 pm

Agile health checks

I have a different approach when it comes to creating sensu checks. It's not really an issue, just a different way of doing things so thought I'd mention it here.
I usually install a graphite client on all the servers I am monitoring (when possible, and I prefer Diamond).
I then create the sensu checks to verify data against graphite metrics - You'll say that I am hammering graphite with a bunch of queries, and I am aware of this - until now this hasn't been an issue in environments with around 150 boxes.
But this is why I say it is more agile: I can leverage graphite math functions to try and flatten anomalies and try to find only the relevant signal in the noise.
I then use the check-data.rb script for sensu-community-plugins this way:

/etc/sensu/plugins/check-data.rb -a 120 -s ${graphite_host} -t 'minSeries($graphite_prefix.hostname -s.diskspace.*.byte_percentfree)' -w :::params.graphite.diskspace.bytes.free.warning|20::: -c :::params.graphite.diskspace.bytes.free.critical|10:::

minSeries() might not be the best option here, but this is just an example.

Doorman authentication with optional parameters

Currently, there are two conditionals on the doorman config file that create modules: app_id and password.

If these are left empty in the parameters, the config blocks still get generated and will cause doorman inability to start up at best case, worst case security holes with empty passwords.

I'll submit a patch later if it hasn't been done.

Resolution of ELBs can be cached by nginx

Today I found kibana broken on an opsviz stack. This appeared to be caused by the ip changing on the ELB of the proxypass, but nginx was still trying to use the old IP. It looks like we can change the config to force DNS resolution everytime rather than caching the resolution:

http://serverfault.com/a/593003/105633

Another option may be to give the ELB an EIP.

sensu cluster

I think it would be interesting to have at least two sensu-server nodes for several reasons, one of which is that this server is the core of notifications, but it would also allow for easier maintenance of this part of the stack. Sensu works well in cluster. It's able to spread out health checks events on multiple nodes automatically, spreading out load in large installations.
Graphite and rabbitmq are already clustered, and I also saw some efforts for switching from redis to elasticache (I really like to use "redishappy" to manage redis clusters in non AWS environments).
I think the sensu-server deserves its own cluster too!

Multiple AZ

"Highly Available within one availability zone". For now I have just gone through some of the documentation, but it looks like support for multiple AZ would make sense, specifically when services are clustered.

Update recipes to be more FHS compliant

Currently we have a big mix of file locations in the opsviz stack. Things are much easier to troubleshoot when things like logfiles and config files are easy to find. Logs should go to /var/log and config files should be somewhere under /etc/. One example is elasticsearch, which logs to /usr/local/var/log.

Extract bb_external into opviz-client repo

We want to move the client installation into a separate repository and have migrated the code into pythian/opviz-client

However, so as not to break installs using master/HEAD we have not deleted it from pythian/opsviz

Until that happens, changes to the bb_external cookbook will need to be maintained in two places.

Nginx is appending a slash to many paths

Nginx is appending a slash to the end some of the urls and it shouldn't be. I think this is causing some issues with some things.

/elasticsearch/_nodes is actually this when it hits the ES server: [elasticsearch]:9200//_nodes (notice the double slash).

Marvel needs to be accessed like this: [doorman_elb]/elasticsearch_plugin/marvel and may redirect you the first run.

I can see there is a trailing slash here: https://github.com/pythian/opsviz/blob/master/site-cookbooks/bb_monitor/templates/default/nginx/dashboard.erb#L22. Removing that trailing slash should be the proper way to set this up. I also notice that many of the other routes are doing the same thing. If there isn't a specific reason those are setup like that I'll go ahead and fix this.

Restart of stack resulted in dashboard1 failing setup execution

This might be tied to issue #7 but dashboard1 instance failed restart as well due to inability to start sensu-api service. Re-running the setup step resulted in the dashboard1 setup completing successfully, but logging the issue to see if there's any recipe changes that can prevent this from happening or if it's just unfortunate luck!

[2014-12-12T14:39:47+00:00] INFO: Processing sensu_service[sensu-api] action start (sensu::api_service line 20)
[2014-12-12T14:39:47+00:00] INFO: Processing service[sensu-api] action start (/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb line 46)
[2014-12-12T14:39:49+00:00] INFO: Retrying execution of service[sensu-api], 2 attempt(s) left
[2014-12-12T14:39:56+00:00] INFO: Retrying execution of service[sensu-api], 1 attempt(s) left
[2014-12-12T14:40:02+00:00] INFO: Retrying execution of service[sensu-api], 0 attempt(s) left

================================================================================
Error executing action `start` on resource 'service[sensu-api]'
================================================================================


Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of /etc/init.d/sensu-api start ----
STDOUT: * Starting sensu-api
...fail!
STDERR: 
---- End output of /etc/init.d/sensu-api start ----
Ran /etc/init.d/sensu-api start returned 1


Cookbook Trace:
---------------
/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb:127:in `block in class_from_file'


Resource Declaration:
---------------------
# In /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb

46:     service new_resource.service do
47:       provider service_provider
48:       supports :status => true, :restart => true
49:       retries 3
50:       retry_delay 5
51:       action :nothing
52:       subscribes :restart, resources("ruby_block[sensu_service_trigger]"), :delayed
53:     end
54:   when "runit"



Compiled Resource:
------------------
# Declared in /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb:46:in `load_current_resource'

service("sensu-api") do
provider Chef::Provider::Service::Debian
action [:nothing]
updated true
supports {:status=>true, :restart=>true}
retries 0
retry_delay 5
service_name "sensu-api"
enabled true
pattern "sensu-api"
cookbook_name "sensu"
end




================================================================================
Error executing action `start` on resource 'sensu_service[sensu-api]'
================================================================================


Mixlib::ShellOut::ShellCommandFailed
------------------------------------
service[sensu-api] (/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb line 46) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /etc/init.d/sensu-api start ----
STDOUT: * Starting sensu-api
...fail!
STDERR: 
---- End output of /etc/init.d/sensu-api start ----
Ran /etc/init.d/sensu-api start returned 1


Cookbook Trace:
---------------
/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb:127:in `block in class_from_file'


Resource Declaration:
---------------------
# In /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/recipes/api_service.rb

20: sensu_service "sensu-api" do
21:   init_style node.sensu.init_style
22:   action [:enable, :start]
23: end



Compiled Resource:
------------------
# Declared in /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/recipes/api_service.rb:20:in `from_file'

sensu_service("sensu-api") do
action [:enable, :start]
updated true
retries 0
retry_delay 2
cookbook_name "sensu"
recipe_name "api_service"
init_style "sysv"
service "sensu-api"
end



[2014-12-12T14:40:09+00:00] INFO: Running queued delayed notifications before re-raising exception
[2014-12-12T14:40:09+00:00] INFO: package[sensu] sending create action to ruby_block[sensu_service_trigger] (delayed)
[2014-12-12T14:40:09+00:00] INFO: Processing ruby_block[sensu_service_trigger] action create (sensu::default line 20)
[2014-12-12T14:40:09+00:00] INFO: ruby_block[sensu_service_trigger] called
[2014-12-12T14:40:09+00:00] INFO: cookbook_file[/etc/sensu/extensions/graphite.rb] sending restart action to sensu_service[sensu-server] (delayed)
[2014-12-12T14:40:09+00:00] INFO: Processing sensu_service[sensu-server] action restart (sensu::server_service line 20)
[2014-12-12T14:40:09+00:00] INFO: Processing service[sensu-server] action restart (/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb line 46)
[2014-12-12T14:40:16+00:00] INFO: service[sensu-server] restarted
[2014-12-12T14:40:16+00:00] INFO: ruby_block[sensu_service_trigger] sending restart action to service[sensu-server] (delayed)
[2014-12-12T14:40:16+00:00] INFO: Processing service[sensu-server] action restart (/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb line 46)
[2014-12-12T14:40:22+00:00] INFO: service[sensu-server] restarted
[2014-12-12T14:40:22+00:00] INFO: ruby_block[sensu_service_trigger] sending restart action to service[sensu-api] (delayed)
[2014-12-12T14:40:22+00:00] INFO: Processing service[sensu-api] action restart (/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb line 46)

================================================================================
Error executing action `restart` on resource 'service[sensu-api]'
================================================================================


Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of /etc/init.d/sensu-api restart ----
STDOUT: * Stopping sensu-api
...done.
* Starting sensu-api
...fail!
STDERR: /sbin/start-stop-daemon: warning: failed to kill 4038: No such process
---- End output of /etc/init.d/sensu-api restart ----
Ran /etc/init.d/sensu-api restart returned 1


Resource Declaration:
---------------------
# In /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb

46:     service new_resource.service do
47:       provider service_provider
48:       supports :status => true, :restart => true
49:       retries 3
50:       retry_delay 5
51:       action :nothing
52:       subscribes :restart, resources("ruby_block[sensu_service_trigger]"), :delayed
53:     end
54:   when "runit"



Compiled Resource:
------------------
# Declared in /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb:46:in `load_current_resource'

service("sensu-api") do
provider Chef::Provider::Service::Debian
action [:nothing]
updated true
supports {:status=>true, :restart=>true}
retries 0
retry_delay 5
service_name "sensu-api"
enabled true
pattern "sensu-api"
cookbook_name "sensu"
end



[2014-12-12T14:40:24+00:00] ERROR: Running exception handlers
[2014-12-12T14:40:24+00:00] ERROR: Exception handlers complete
[2014-12-12T14:40:24+00:00] FATAL: Stacktrace dumped to /var/lib/aws/opsworks/cache.stage2/chef-stacktrace.out
[2014-12-12T14:40:24+00:00] ERROR: Chef::Exceptions::MultipleFailures
[2014-12-12T14:40:24+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

Create script to perform current manual work

Currently, there is some manual work that needs to be done such as generate secrets and RabbitMQ SSL cert. A wrapper script which does this generation and then calls createStack would be nice to have.

Add statsd recipes back to the logstash layer

this will likely involve fixing the upstream to not signal a restart when installing for the first time as well as checking the path for node before setting it in the attributes file

create_stack script should accept config file

I was going to raise an issue to accept a default instance size for all the layers, which is useful for testing the stack.

The driving factor is that passing all the --param options to customize can lead to mistakes.

However, I think a cleaner solution is to read from a config file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.