opsviz's People
Forkers
alexlovelltroy jondowdle dtest slb350 rjbez17 damonp jonathandietz invalidusrname lainevcampbell gvalentine vdharmap rombob testorgsrcclr pdehlke aaronmlee lesaux jeperez rbramwell taylorludwig 7castle mendoncaangelo naveenljopsviz's Issues
CloudFormation setup fails at EC2 instance creation
The CloudFormation setup fails at the EC2 instance creation. When looking at OpsWorks the instances are still running the setup phase of the chef run and then eventually get setup fine, with no errors.
There should probably be a wait condition added to the CloudFormation json that waits for the EC2 instances to finish being setup before failing the CloudFormation run.
WaitCondition: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-waitcondition.html
CreationPolicy Attribute: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-creationpolicy.html
Parameterize instance sizes and number of nodes
For testing purposes we don't really need c3.large instances, and might want to customize the number of nodes for elasticsearch in particular.
This is a feature request for customizing instance sizes (per role) and number of nodes for elasticsearch for now.
Add a client installer through piping through bash
Add Google Apps authentication
Doorman support Google Apps based authentication. Lets make this a configuration option from the Cloudformation script.
Here are the Google Apps config options: https://github.com/movableink/doorman/blob/master/conf.example.js#L65
Make the statsD flush time configurable
This has implications for Grafana
Update logstash cookbook to support conditionals
The current version of the logstash cookbook support conditionals for filters: lusis/chef-logstash#175. We should update the version in our berksfile.lock.
foxycoder chef-logstash dependency doesn't work on centos because of upstart
The current logstash cookbook that we rely on in bb_external is https://github.com/foxycoder/chef-logstash
The readme claims that it should work in other platforms but it only has been tested in ubuntu/debian systems.
The problem is the startup script relies on 'upstart', which as far as I know is Ubuntu only (without perhaps depending on the upstart cookbook: http://upstart.ubuntu.com/cookbook/#id416
I'm not sure we should be requiring upstart jobs, but if we don't find a different provider, we should at least fork the foxycoder repo with any fixes to allow init.d script. Or systemd as RHEL7 and Ubuntu seem to be going that route: http://www.markshuttleworth.com/archives/1316
Support mysql slow query log dashboards
Add required logstash inputs for parsing mysql slow query log and create Kibana dashboard to visualize.
Restarting rabbitmq1 node results in failed setup on 'set_policy ha-all'
I had stopped all instances in my opswork stack and upon restarting the rabbitmq1 instance, received the following error:
[2014-12-12T14:36:27+00:00] INFO: Enabling RabbitMQ plugin 'rabbitmq_management'.
[2014-12-12T14:36:27+00:00] INFO: rabbitmq_plugin[rabbitmq_management] not queuing delayed action restart on service[rabbitmq-server] (delayed), as it's already been queued
[2014-12-12T14:36:27+00:00] INFO: Processing execute[rabbitmq-plugins enable rabbitmq_management] action run (/var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/plugin.rb line 39)
[2014-12-12T14:36:28+00:00] INFO: execute[rabbitmq-plugins enable rabbitmq_management] ran successfully
[2014-12-12T14:36:28+00:00] INFO: Processing rabbitmq_plugin[rabbitmq_management_visualiser] action enable (rabbitmq::mgmt_console line 27)
[2014-12-12T14:36:28+00:00] INFO: Enabling RabbitMQ plugin 'rabbitmq_management_visualiser'.
[2014-12-12T14:36:28+00:00] INFO: rabbitmq_plugin[rabbitmq_management_visualiser] not queuing delayed action restart on service[rabbitmq-server] (delayed), as it's already been queued
[2014-12-12T14:36:28+00:00] INFO: Processing execute[rabbitmq-plugins enable rabbitmq_management_visualiser] action run (/var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/plugin.rb line 39)
[2014-12-12T14:36:29+00:00] INFO: execute[rabbitmq-plugins enable rabbitmq_management_visualiser] ran successfully
[2014-12-12T14:36:29+00:00] INFO: Processing execute[chown -R rabbitmq:rabbitmq /var/lib/rabbitmq] action run (rabbitmq_cluster::default line 10)
[2014-12-12T14:36:29+00:00] INFO: execute[chown -R rabbitmq:rabbitmq /var/lib/rabbitmq] ran successfully
[2014-12-12T14:36:29+00:00] INFO: Processing rabbitmq_user[guest] action delete (rabbitmq_cluster::default line 12)
[2014-12-12T14:36:29+00:00] INFO: Processing rabbitmq_policy[ha-all] action set (rabbitmq_cluster::default line 16)
[2014-12-12T14:36:29+00:00] INFO: Done setting RabbitMQ policy 'ha-all'.
[2014-12-12T14:36:29+00:00] INFO: Processing execute[set_policy ha-all] action run (/var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/policy.rb line 66)
================================================================================
Error executing action `run` on resource 'execute[set_policy ha-all]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '2'
---- Begin output of rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 ----
STDOUT: Setting policy "ha-all" for pattern "^(?!amq\\.).*" to "{\"ha-mode\":\"all\",\"ha-sync-mode\":\"automatic\"}" with priority "1" ...
STDERR: Error: unable to connect to node rabbit@rabbitmq1: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit@rabbitmq1]
rabbit@rabbitmq1:
* connected to epmd (port 4369) on rabbitmq1
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq1
* suggestion: start the node
current node details:
- node name: rabbitmqctl11602@rabbitmq1
- home dir: /var/lib/rabbitmq
- cookie hash: FUWzw5ayMo2aD4GJFavYFA==
---- End output of rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 ----
Ran rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 returned 2
Resource Declaration:
---------------------
# In /var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/policy.rb
66: execute "set_policy #{new_resource.policy}" do
67: command cmd
68: end
69:
Compiled Resource:
------------------
# Declared in /var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/policy.rb:66:in `block in class_from_file'
execute("set_policy ha-all") do
action "run"
retries 0
retry_delay 2
command "rabbitmqctl set_policy ha-all \"^(?!amq\\.).*\" '{\"ha-mode\":\"all\",\"ha-sync-mode\":\"automatic\"}' --priority 1"
backup 5
returns 0
cookbook_name "rabbitmq_cluster"
end
[2014-12-12T14:36:29+00:00] INFO: Running queued delayed notifications before re-raising exception
[2014-12-12T14:36:29+00:00] INFO: template[/etc/rabbitmq/rabbitmq-env.conf] sending restart action to service[rabbitmq-server] (delayed)
[2014-12-12T14:36:29+00:00] INFO: Processing service[rabbitmq-server] action restart (rabbitmq::default line 79)
[2014-12-12T14:36:33+00:00] INFO: service[rabbitmq-server] restarted
[2014-12-12T14:36:33+00:00] ERROR: Running exception handlers
[2014-12-12T14:36:33+00:00] ERROR: Exception handlers complete
[2014-12-12T14:36:33+00:00] FATAL: Stacktrace dumped to /var/lib/aws/opsworks/cache.stage2/chef-stacktrace.out
[2014-12-12T14:36:33+00:00] ERROR: execute[set_policy ha-all] (/var/lib/aws/opsworks/cache.stage2/cookbooks/rabbitmq/providers/policy.rb line 66) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '2'
---- Begin output of rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 ----
STDOUT: Setting policy "ha-all" for pattern "^(?!amq\\.).*" to "{\"ha-mode\":\"all\",\"ha-sync-mode\":\"automatic\"}" with priority "1" ...
STDERR: Error: unable to connect to node rabbit@rabbitmq1: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit@rabbitmq1]
rabbit@rabbitmq1:
* connected to epmd (port 4369) on rabbitmq1
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq1
* suggestion: start the node
current node details:
- node name: rabbitmqctl11602@rabbitmq1
- home dir: /var/lib/rabbitmq
- cookie hash: FUWzw5ayMo2aD4GJFavYFA==
---- End output of rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 ----
Ran rabbitmqctl set_policy ha-all "^(?!amq\.).*" '{"ha-mode":"all","ha-sync-mode":"automatic"}' --priority 1 returned 2
[2014-12-12T14:36:33+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
I am assuming it's a timing issue, but not 100% sure.
VPC creation should be optional
We should add a param for a VPC id and only create the VPC if that param is not set.
Logstash exchange is not created on stack setup.
The logstash rabbitmq input plugin is not currently able to create an exchange if it doesn't exist. Producers (logstash output plugins) will create exchanges if they don't exist.
This means that when opsviz stack is spun up, rabbitmq1 and logstash1 instance generate many log error messages and connections aren't established properly until the first rabbitmq logstash producer is created.
As a quick fix, the opsviz recipes should support creating the logstash exchange on the default vhost '/' during the logstash install recipe.
Here's an excerpt from the chat discussion:
[2:31 PM] Derek Downey: hmm, it doesn't do what I expected then :( They don't have an 'exchange_type' option on the rabbitmq input http://logstash.net/docs/1.4.2/inputs/rabbitmq
my thought was that 'type' was the same thing, and that it only would create the exchange if a type was also specified
[2:34 PM] Taylor Ludwig: oh gotcha, since the output has an "exchange_type", you were expecting there to be an "exchange_type" option on the input too
[2:35 PM] Derek Downey: yes
[2:35 PM] Taylor Ludwig: that's only relevant on creating the exchange, right? So maybe since the output is the only one that is actually creating the exchange if it doesn't exist, thats why its absent front the input
[2:36 PM] Alex Lovell-Troy: depends on how the code is structured, but that would make sense
[2:36 PM] Taylor Ludwig: But yeah, the "types" option is logstash's message type, so its a setting outside of rabbitmq stuff
[2:36 PM] Derek Downey: I'm still not used to the pattern that only the output (producer) is creating exchanges/queues. To me this sounds like an easy exploit.
plus the stack starts up with a bunch of errors in logstash/rabbitmq until there's at least one producer :)
[2:39 PM] Taylor Ludwig: yeah that alone seems like a good reason to create it.
snip
[2:46 PM] Alex Lovell-Troy: actually...
this should be in logstash
[2:46 PM] Derek Downey: logstash input?
[2:46 PM] Alex Lovell-Troy: yeah
[2:46 PM] Taylor Ludwig: yeah that seems more logical,
[2:46 PM] Derek Downey: that's where I would do it
snip
[2:51 PM] Taylor Ludwig: oh i thought you were talking about the logstash install recipe. I thought we determined only the output creates the exchange not the input, sorry im getting lost going through 3 chats right now
[2:54 PM] Derek Downey: we determined that is how it currently works. I think something needs to create the exchange without requiring producers to avoid the errors, but the logstash input doesn't support it (this still is crazy to me!) We had discussed using rabbitmq management plugin to create the exchange as an alternative
that's my understanding of the discussion, anyway
[2:56 PM] Taylor Ludwig: yeah so my thinking is just use the rest api to create the exchange and do it either on the logstash server install recipe or the rabbitmq install recipe
[3:04 PM] Derek Downey: if doing that, I'd have a preference towards the logstash server install recipe
add the galera specific nagios checks
Add RabbitMQ management plugin to dashboard
RabbitMQ has a dashboard https://www.rabbitmq.com/management.html
It runs on the RabbitMQ server over port 15672. The RabbitMQ ELB has a listener for port 15672, but the external security group for that ELB only uses port 5671. https://github.com/pythian/opsviz/blob/master/cloudformation.json#L1182
We should make sure this dashboard plugin is installed and available on the dashboard. It probably makes sense to use the Dashboard ELB and Nginx to forward /rabbitmq to the RabbitMQ ELB on port 15672. This puts the RMQ dashboard behind Doorman's authentication.
add the mongo nagios checks to sensu
Cert creation in create_stack should be optional
Cert creation should be optional if a cert is passed in as a --param.
t2 instance sizes not supported
This is a known issue, but Bastion instance type indicates it should support t2.small and t2.medium.
These instance types do not work with the default AMI. We should support the smaller instances for testing purposes.
Make sure any created instance/service role has permissions to do the ec2 clustering for elasticsearch
Add flapjack support
Between sensu and pagerduty, flapjack provides a customizable way to roll up, group, and escalate alerts which makes both pieces better. Plus, we use it at Pythian.
Horizontally scalable layers should have load-based instances
Creation of cert in 'create_stack' fails if stack of same name was previously created
If you use create_stack, and it fails at any point after uploading the cert, then a re-run of the script will result in the following error:
writing RSA key
Traceback (most recent call last):
File "create_stack", line 166, in <module>
main()
File "create_stack", line 163, in main
stack_creator.create_stack()
File "create_stack", line 128, in create_stack
self.prepare_cert()
File "create_stack", line 72, in prepare_cert
self.cert_arn = self.upload_cert()
File "create_stack", line 63, in upload_cert
private_key=self.ssl_key)
File "/home/vagrant/.virtualenvs/opvis/lib/python2.7/site-packages/boto/iam/connection.py", line 799, in upload_server_cert
verb='POST')
File "/home/vagrant/.virtualenvs/opvis/lib/python2.7/site-packages/boto/iam/connection.py", line 102, in get_response
raise self.ResponseError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 409 Conflict
<ErrorResponse xmlns="https://iam.amazonaws.com/doc/2010-05-08/">
<Error>
<Type>Sender</Type>
<Code>EntityAlreadyExists</Code>
<Message>The Server Certificate with name opsvistest1_cert already exists.</Message>
</Error>
<RequestId>e9c7caab-b2bf-11e4-9420-5bb7c3142238</RequestId>
</ErrorResponse>
I would expect to handle this by either removing the previous one and re-attempting, or ignoring and letting it use the existing one.
Fix NGINX proxy for the sensu healthcheck
Uchiwa makes an API call to this address:
(dashboard_url)/sensu/health/sensu
This is supposed to return an object similar to this:
{"Sensu":{"output":"ok"}}
However, nginx is redirecting it to the events page. This causes the alerts on datacenter being undefined due to the way javascript is parsing it.
Agile health checks
I have a different approach when it comes to creating sensu checks. It's not really an issue, just a different way of doing things so thought I'd mention it here.
I usually install a graphite client on all the servers I am monitoring (when possible, and I prefer Diamond).
I then create the sensu checks to verify data against graphite metrics - You'll say that I am hammering graphite with a bunch of queries, and I am aware of this - until now this hasn't been an issue in environments with around 150 boxes.
But this is why I say it is more agile: I can leverage graphite math functions to try and flatten anomalies and try to find only the relevant signal in the noise.
I then use the check-data.rb script for sensu-community-plugins this way:
/etc/sensu/plugins/check-data.rb -a 120 -s ${graphite_host} -t 'minSeries($graphite_prefix.hostname -s
.diskspace.*.byte_percentfree)' -w :::params.graphite.diskspace.bytes.free.warning|20::: -c :::params.graphite.diskspace.bytes.free.critical|10:::
minSeries() might not be the best option here, but this is just an example.
Doorman authentication with optional parameters
Currently, there are two conditionals on the doorman config file that create modules: app_id and password.
If these are left empty in the parameters, the config blocks still get generated and will cause doorman inability to start up at best case, worst case security holes with empty passwords.
I'll submit a patch later if it hasn't been done.
Resolution of ELBs can be cached by nginx
Today I found kibana broken on an opsviz stack. This appeared to be caused by the ip changing on the ELB of the proxypass, but nginx was still trying to use the old IP. It looks like we can change the config to force DNS resolution everytime rather than caching the resolution:
http://serverfault.com/a/593003/105633
Another option may be to give the ELB an EIP.
Add flapjack extension and flapjack.json to sensu
Doorman config changes don't restart service
Changing doorman configuration doesn't restart the doorman service
sensu cluster
I think it would be interesting to have at least two sensu-server nodes for several reasons, one of which is that this server is the core of notifications, but it would also allow for easier maintenance of this part of the stack. Sensu works well in cluster. It's able to spread out health checks events on multiple nodes automatically, spreading out load in large installations.
Graphite and rabbitmq are already clustered, and I also saw some efforts for switching from redis to elasticache (I really like to use "redishappy" to manage redis clusters in non AWS environments).
I think the sensu-server deserves its own cluster too!
Multiple AZ
"Highly Available within one availability zone". For now I have just gone through some of the documentation, but it looks like support for multiple AZ would make sense, specifically when services are clustered.
Create ability/method to override sensu check thresholds
It would be great if we could override the sensu check thresholds setup by the custom_json on a per-client basis. I found this as an example: https://gist.github.com/piavlo/5774621
Is this the best way to do it?
Update recipes to be more FHS compliant
Currently we have a big mix of file locations in the opsviz stack. Things are much easier to troubleshoot when things like logfiles and config files are easy to find. Logs should go to /var/log and config files should be somewhere under /etc/. One example is elasticsearch, which logs to /usr/local/var/log.
Review sensu plugin es-node-metrics and modify it to handle missing fielddata_breaker info
After upgrading elasticsearch to 1.4.4 the sensu check can no longer find fielddata_breaker info in the node report which causes the whole plugin to fail. The likely fix is to add exception handling around the checks that use fielddata_breaker at line 72. Should we also log or fail silently? Should we attempt to get this metric somewhere else?
Extract bb_external into opviz-client repo
We want to move the client installation into a separate repository and have migrated the code into pythian/opviz-client
However, so as not to break installs using master/HEAD we have not deleted it from pythian/opsviz
Until that happens, changes to the bb_external cookbook will need to be maintained in two places.
Add a longer ping timeout to ec2 elasticsearch discovery and limit to the our internal security group
elasticsearch cloud-aws plugin supports a few options that would make clustering more reliable. We should turn them on by default.
https://github.com/elastic/elasticsearch-cloud-aws#ec2-discovery
I think limiting clustering to our internal security group should be sufficient, but we might want to think about increasing the ping timeout as well.
Nginx is appending a slash to many paths
Nginx is appending a slash to the end some of the urls and it shouldn't be. I think this is causing some issues with some things.
/elasticsearch/_nodes
is actually this when it hits the ES server: [elasticsearch]:9200//_nodes
(notice the double slash).
Marvel needs to be accessed like this: [doorman_elb]/elasticsearch_plugin/marvel
and may redirect you the first run.
I can see there is a trailing slash here: https://github.com/pythian/opsviz/blob/master/site-cookbooks/bb_monitor/templates/default/nginx/dashboard.erb#L22
. Removing that trailing slash should be the proper way to set this up. I also notice that many of the other routes are doing the same thing. If there isn't a specific reason those are setup like that I'll go ahead and fix this.
incorporate the percona nagios checks into sensu
erlang cookie needs to be unique per installation
We're hardcoding an erlang cookie here that could be used to circumvent RMQ security. It should be generated by cloudformation when processing the template.
Restart of stack resulted in dashboard1 failing setup execution
This might be tied to issue #7 but dashboard1 instance failed restart as well due to inability to start sensu-api service. Re-running the setup step resulted in the dashboard1 setup completing successfully, but logging the issue to see if there's any recipe changes that can prevent this from happening or if it's just unfortunate luck!
[2014-12-12T14:39:47+00:00] INFO: Processing sensu_service[sensu-api] action start (sensu::api_service line 20)
[2014-12-12T14:39:47+00:00] INFO: Processing service[sensu-api] action start (/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb line 46)
[2014-12-12T14:39:49+00:00] INFO: Retrying execution of service[sensu-api], 2 attempt(s) left
[2014-12-12T14:39:56+00:00] INFO: Retrying execution of service[sensu-api], 1 attempt(s) left
[2014-12-12T14:40:02+00:00] INFO: Retrying execution of service[sensu-api], 0 attempt(s) left
================================================================================
Error executing action `start` on resource 'service[sensu-api]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of /etc/init.d/sensu-api start ----
STDOUT: * Starting sensu-api
...fail!
STDERR:
---- End output of /etc/init.d/sensu-api start ----
Ran /etc/init.d/sensu-api start returned 1
Cookbook Trace:
---------------
/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb:127:in `block in class_from_file'
Resource Declaration:
---------------------
# In /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb
46: service new_resource.service do
47: provider service_provider
48: supports :status => true, :restart => true
49: retries 3
50: retry_delay 5
51: action :nothing
52: subscribes :restart, resources("ruby_block[sensu_service_trigger]"), :delayed
53: end
54: when "runit"
Compiled Resource:
------------------
# Declared in /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb:46:in `load_current_resource'
service("sensu-api") do
provider Chef::Provider::Service::Debian
action [:nothing]
updated true
supports {:status=>true, :restart=>true}
retries 0
retry_delay 5
service_name "sensu-api"
enabled true
pattern "sensu-api"
cookbook_name "sensu"
end
================================================================================
Error executing action `start` on resource 'sensu_service[sensu-api]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
service[sensu-api] (/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb line 46) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /etc/init.d/sensu-api start ----
STDOUT: * Starting sensu-api
...fail!
STDERR:
---- End output of /etc/init.d/sensu-api start ----
Ran /etc/init.d/sensu-api start returned 1
Cookbook Trace:
---------------
/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb:127:in `block in class_from_file'
Resource Declaration:
---------------------
# In /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/recipes/api_service.rb
20: sensu_service "sensu-api" do
21: init_style node.sensu.init_style
22: action [:enable, :start]
23: end
Compiled Resource:
------------------
# Declared in /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/recipes/api_service.rb:20:in `from_file'
sensu_service("sensu-api") do
action [:enable, :start]
updated true
retries 0
retry_delay 2
cookbook_name "sensu"
recipe_name "api_service"
init_style "sysv"
service "sensu-api"
end
[2014-12-12T14:40:09+00:00] INFO: Running queued delayed notifications before re-raising exception
[2014-12-12T14:40:09+00:00] INFO: package[sensu] sending create action to ruby_block[sensu_service_trigger] (delayed)
[2014-12-12T14:40:09+00:00] INFO: Processing ruby_block[sensu_service_trigger] action create (sensu::default line 20)
[2014-12-12T14:40:09+00:00] INFO: ruby_block[sensu_service_trigger] called
[2014-12-12T14:40:09+00:00] INFO: cookbook_file[/etc/sensu/extensions/graphite.rb] sending restart action to sensu_service[sensu-server] (delayed)
[2014-12-12T14:40:09+00:00] INFO: Processing sensu_service[sensu-server] action restart (sensu::server_service line 20)
[2014-12-12T14:40:09+00:00] INFO: Processing service[sensu-server] action restart (/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb line 46)
[2014-12-12T14:40:16+00:00] INFO: service[sensu-server] restarted
[2014-12-12T14:40:16+00:00] INFO: ruby_block[sensu_service_trigger] sending restart action to service[sensu-server] (delayed)
[2014-12-12T14:40:16+00:00] INFO: Processing service[sensu-server] action restart (/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb line 46)
[2014-12-12T14:40:22+00:00] INFO: service[sensu-server] restarted
[2014-12-12T14:40:22+00:00] INFO: ruby_block[sensu_service_trigger] sending restart action to service[sensu-api] (delayed)
[2014-12-12T14:40:22+00:00] INFO: Processing service[sensu-api] action restart (/var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb line 46)
================================================================================
Error executing action `restart` on resource 'service[sensu-api]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of /etc/init.d/sensu-api restart ----
STDOUT: * Stopping sensu-api
...done.
* Starting sensu-api
...fail!
STDERR: /sbin/start-stop-daemon: warning: failed to kill 4038: No such process
---- End output of /etc/init.d/sensu-api restart ----
Ran /etc/init.d/sensu-api restart returned 1
Resource Declaration:
---------------------
# In /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb
46: service new_resource.service do
47: provider service_provider
48: supports :status => true, :restart => true
49: retries 3
50: retry_delay 5
51: action :nothing
52: subscribes :restart, resources("ruby_block[sensu_service_trigger]"), :delayed
53: end
54: when "runit"
Compiled Resource:
------------------
# Declared in /var/lib/aws/opsworks/cache.stage2/cookbooks/sensu/providers/service.rb:46:in `load_current_resource'
service("sensu-api") do
provider Chef::Provider::Service::Debian
action [:nothing]
updated true
supports {:status=>true, :restart=>true}
retries 0
retry_delay 5
service_name "sensu-api"
enabled true
pattern "sensu-api"
cookbook_name "sensu"
end
[2014-12-12T14:40:24+00:00] ERROR: Running exception handlers
[2014-12-12T14:40:24+00:00] ERROR: Exception handlers complete
[2014-12-12T14:40:24+00:00] FATAL: Stacktrace dumped to /var/lib/aws/opsworks/cache.stage2/chef-stacktrace.out
[2014-12-12T14:40:24+00:00] ERROR: Chef::Exceptions::MultipleFailures
[2014-12-12T14:40:24+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
Allow Sensu to use elasticache instead of local redis
Create script to perform current manual work
Currently, there is some manual work that needs to be done such as generate secrets and RabbitMQ SSL cert. A wrapper script which does this generation and then calls createStack would be nice to have.
Add statsd recipes back to the logstash layer
this will likely involve fixing the upstream to not signal a restart when installing for the first time as well as checking the path for node before setting it in the attributes file
create_stack script should accept config file
I was going to raise an issue to accept a default instance size for all the layers, which is useful for testing the stack.
The driving factor is that passing all the --param options to customize can lead to mistakes.
However, I think a cleaner solution is to read from a config file.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.