datadog / puppet-datadog-agent Goto Github PK
View Code? Open in Web Editor NEWPuppet module to install the Datadog agent
License: Other
Puppet module to install the Datadog agent
License: Other
some older versions of Amazon Linux will return $operatingsystem="Linux"
case: 18699
You can't use unordered hashes/arrays in templates, otherwise you get un-needed resource applies when the keys are evaluated in a different order.
Currently no pp file for rabbitmq
Asked by customer here: https://datadog.desk.com/agent/case/20165
When reporting is enabled, events are created but metrics seem to be missing. We should check where we should be pulling the expected metrics from, something may have changed in the newer versions of puppet.
If using POSS w/ puppetserver or Puppet Enterprise the 'dogapi' gem must be installed into the context of puppetserver(JRuby):
# /opt/puppetlabs/bin/puppetserver gem install dogapi
Fetching: multi_json-1.11.2.gem (100%)
Successfully installed multi_json-1.11.2
Fetching: dogapi-1.21.0.gem (100%)
Successfully installed dogapi-1.21.0
2 gems installed
# /etc/init.d/pe-puppetserver restart
Each time puppet run this module on Centos, it re-enables service, which by itself triggers false "changed" puppet report on a node:
Debug: Executing: '/bin/systemctl is-active datadog-agent'
Debug: Executing: '/bin/systemctl show datadog-agent --property LoadState --property UnitFileState --no-pager'
Debug: Executing: '/bin/systemctl unmask datadog-agent'
Debug: Executing: '/bin/systemctl enable datadog-agent'
Notice: /Stage[main]/Datadog_agent::Redhat/Service[datadog-agent]/enable: enable changed 'false' to 'true'
I find setting provider => 'redhat'
in redhat.pp
solves this problem.
On some hosts, I run multiple redis server instances. Can the agent support that? And, can the puppet template support it?
metadata.json indicates that any 4.x version of stdlib is sufficient, but the use of validate_integer means it's not.
We have 4.5.1 installed currently which doesn't have the validate_integer function.
Please correct your metadata.json file to reflect the minimum version of stdlib that includes validate_integer. It looks like v4.6 is what you want.
At the moment a lot of the options within datadog.conf.erb aren't set as parameters. This leads to some headaches when trying to set up things such as dogstatsd.
Ideally all options within datadog.conf.erb should be able to be defined by parameters provided to the datadog_agent class. This would enable them to be declared where required without having to use a modified version of the puppet-datadog-agent module or override the file in a later module.
Additionally, it would be potentially worthwhile having all the parameters covered by the rspec-tests in order to validate the default values and the custom ones. It would just provide a bit more confidence when introducing new parameters.
This would probably need to take place as three parts:
I'm not sure if this DD feature previously only allowed a single instance, but certainly as of today that isn't the case. Unnecessary limitation.
Related to #1
Hosts may show up as duplicates in infrastructure list, with puppet metrics only intermittently reporting. Some cases have included a workaround specific to each case, but these workarounds need to be included each time the puppet module is updated
For redis servers configured with a non-default slowlog max len, collector issues a warning. Also, users who have configured their redis instance with a higher length probably want access to that data. Error message from sudo service datadog-agent info
:
[WARNING]:Redis slowlog-max-len is higher than 128. Defaulting to 128.If you need a higher value, please set slowlog-max-len in your check config
Puppet agent template doesn't have this as an option, nor does the manifest accept it as a setting.
Resources defined in manifests/ubuntu.pp
are not guaranteed to run in a consistent order.
I saw this when I tried to deploy datadog agent to a handful Ubuntu boxes. The expected order in which resources execute is:
Exec['datadog_key']
File['/etc/apt/sources.list.d/datadog.list']
Exec['datadog_apt-get_update']
via refreshPackage['datadog-agent']
Service['datadog-agent']
... but looking at reports from failed puppet agents in Puppet Enterprise, I see that 1 and 2 run, but 3 didn't and 4 fails. Unfortunately PE doesn't tell what order it runs, and I'm not 100% positive if it reports refresh events, but I think one of the two things is going on.
Either way, once the system gets into this state, 2 is up-to-date, so 3 will never run, so the system cannot recover from this state by itself. I had to go in and manually run "apt-get update" to make them unstuck.
I recommend you collapse 1 and 3 into a single exec
resource, and define dependencies in 2->(1+3)->4->5.
for the sake of getting things up and running on my end, I've put up a separate module that does datadog installs and integrations (SQL server, IIS) for Windows.
https://github.com/vchan2002/datadog_agent_windows
The goal is to hopefully merge this back to the regular datadog-agent install....
https://forge.puppetlabs.com/puppetlabs/concat#overview . Might clean up some of the code we use to create conf.d
files
notify {'Unsupported OS'} is defined twice, which causes an error and makes the notify not execute
Puppet reporting to data dog is not working:
puppet-master[27201]: Report datadog_reports failed: undefined method `title' for #<Puppet::Resource::Status:0x7f1c22d30820>
Puppet version: 2.6.2
We've received reports of customers trying it out and having issues, primarily around Class Datadog_reports is already defined in Puppet::Reports
, despite confirming correct configuration.
In general, we should also strive to pass puppet lint, and any other testing that makes sense.
Every second execution of puppet removed datadog_agent package, and fails.
We use commit 3cffbcf plus #78 on top of it, but this pull request change does not touch the affected area.
So, for the half of the time, we don't even have the datadog_agent package installed.
This part in the redhat.pp causes every second execution of puppet to fail:
package { 'datadog-agent-base':
ensure => absent,
before => Package['datadog-agent'],
}
We don't have datadog-agent-base, but this code by some reason deletes datadog-agent package, including dd_user. Here is the relevant portion of the puppet log:
Info: Applying configuration version '1427837407'
Debug: Prefetching yum resources for package
Debug: Executing '/bin/rpm --version'
Debug: Executing '/bin/rpm -qa --nosignature --nodigest --qf '%{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n''
Debug: Executing '/bin/rpm -q datadog-agent-base --nosignature --nodigest --qf %{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n'
Debug: Executing '/bin/rpm -q datadog-agent-base --nosignature --nodigest --qf %{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n --whatprovides'
Debug: Executing '/bin/rpm -e datadog-agent-5.2.2-1.x86_64'
Notice: /Stage[main]/Datadog_agent::Redhat/Package[datadog-agent-base]/ensure: removed
Debug: /Stage[main]/Datadog_agent::Redhat/Package[datadog-agent-base]: The container Class[Datadog_agent::Redhat] will propagate my refresh event
After that, puppet fails at init.pp:
Error: Could not find user dd-agent
Error: /Stage[main]/Datadog_agent/File[/etc/dd-agent/datadog.conf]/owner: change from 498 to dd-agent failed: Could not find user dd-agent
With a next puppet execution, the datadog_agent package gets reinstalled and exection completes normally:
Debug: Prefetching inifile resources for yumrepo
Debug: Executing '/bin/rpm -q datadog-agent --nosignature --nodigest --qf %{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n'
Debug: Executing '/bin/rpm -q datadog-agent --nosignature --nodigest --qf %{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n --whatprovides'
Debug: Packagedatadog-agent: Ensuring => latest
Debug: Executing '/usr/bin/yum -d 0 -e 0 -y install datadog-agent'
Notice: /Stage[main]/Datadog_agent::Redhat/Package[datadog-agent]/ensure: created
Since I've upgraded to 3.0.2, I've been getting an error message from the datadog report processor. Here's the output from puppet with --debug --trace
:
Notice: Finished catalog run in 16.60 seconds
Debug: Using settings: adding file resource 'rrddir': 'File[/var/lib/puppet/rrd]{:links=>:follow, :group=>"puppet", :backup=>false, :ensure=>:directory, :owner=>"puppet", :mode=>"750", :loglevel=>:debug, :path=>"/var/lib/puppet/rrd"}'
Debug: Finishing transaction 69891384984120
Debug: Received report to process from instance5.zicasso.com
Debug: Processing report from instance5.zicasso.com with processor Puppet::Reports::Store
Debug: Processing report from instance5.zicasso.com with processor Puppet::Reports::Datadog_reports
Debug: Sending metrics for instance5.zicasso.com to Datadog
Debug: Sending events for instance5.zicasso.com to Datadog
undefined method `[]' for :@aggregation_key:Symbol
Seems to stem from this commit
the problem is $name
being reassigned.
Also, I really think the whole comment of "we do integrations as classes, not defines" doesn't make a lot of sense. In this case if you wanted to use multiple directories (which the integration seems to support, because instances
(I'll admit, I have not used the integration myself, so I'm just guessing) you'd ... what?
Anywho, I'd like to fix this, but I'm not certain what the best approach is. What is $name
trying to do there and why, and since it's currently non-functional, how much backward compatibility do I need to worry about?
Again, I haven't used this integration, I am working on writing tests for all of the integration classes so I can factor out a generic integration type, and ran into this issue. Please let me know how I can help :)
It's a pain to have to specify every key of the YAML file in a different variable.
We could just get a dictionary representing the instances and convert it to YAML.
class { 'datadog_agent::integrations::example' :
instances => [
{ 'host' => 'localhost', 'port' => 42, 'tags' => [] }
]
}
# in the template
<%= require 'yaml'; {'init_config'=>@init_config, 'instances'=>@instances}.to_yaml %>
Moreover, it would be easy to add new integrations, the puppet user just has to stick to the description format of the agent YAML file.
cc @alq666
That potentially is a non backward-compatible change for people manifests tough.
By default apt-key will attempt to access a keyserver on 11371/tcp which can be problematic for environments with firewalls or network ACLs.
The Ubuntu keyserver also listens on port 80/tcp which is much more firewall friendly.
warn_on_missing_keys was added to the dd-agent integration July 2014. I'd like fewer warnings.
https://github.com/DataDog/dd-agent/blame/master/checks.d/redisdb.py#L242
Because it's a class, you can only define a single http check per agent. It should be changed to a defined resource combined with maybe the puppetlabs/concat module to create a yaml with multiple instances of http checks. Having a single http check per host is pretty useless unless you're checking strictly for localhost.
Now that consul is supported in datadog, would be nice if this puppet module exposed that fact
I am running Puppet 4.2.2 licenced under the open source. I followed the instruction on this project but Puppet integration on the dashboard says the "No Data Received". After looking into puppetserver.log
, I found the following error message:
[puppet-server] Puppet You need the `dogapi` gem to use the Datadog report (run puppet with puppet_run_reports on your master)
Reporting is now working well since I have installed dogapi
gem by hand:
sudo puppetserver gem install dogapi
TL;DR: revamp integrations to allow multiple instances and standard parameter validation on all integrations.
Hey folks! I'm looking for some feedback on the current state of integrations vs where you'd like to go with things.
I see in the history there is at least one commit that references "integrations are classes now, not defines", and I can understand why that is to a degree. However, I don't think that hard "this is the only way it can be" model makes a lot of sense, and I think there is a better approach that can satisfy everyone's needs.
Because of the way the configuration files for integrations are, the 'classes only' model makes a lot of sense. There is one section init_config
that is common to the file itself, and then there are multiple instances that can be associated, like requested for in #130. This maps decently to a class whose parameters cover the init_config
section, and then an array of instances as one of the class parameters. Great.
However there are some significant downsides to this model:
The problem of course is that this is all modifying one file, and that file can only be declared once. And part of the file is "class-like" and part of the file is "type-like". IMO the best way to resolve this would actually be to change the configuration parser to allow multiple files to configure the same plugin, and maybe each config file supports an integration parameter which specifies which integration the configs are for. This model is used very well with collectd and the puppet/collectd module (disclaimer: I have made many contributions to that module).
However, that's a fairly major change which depends on major changes in the datadog agent itself, which is a rather large dependency for this :)
I'd like to propose a different model.
There would be 2 types that define the underlying file:
datadog_agent::integration::init_config
datadog_agent::integration::instance
These two would be used to define the init_config section of an integration config file, and the instances section. They would expose a pretty generic API which would pretty much just take some 'standard' parts (like tags, on the instance) and then a hash of other parameters to simply YAML.dump into the config file.
With these we could build types and classes to define specific integrations.
As an example, for the mongo integration it would look something like this:
class datadog_agent::integrations::mongo::init_config (params) {
# validate params
datadog_agent::integration::init_config { 'mongo':
params => 'here',
}
}
define datadog_agent::integrations::mongo (params) {
include datadog_agent::integrations::mongo::init_config
# validate params
datadog_agent::integration::instance { "mongo-${name}":
integration => 'mongo',
params => 'here',
tags => [ 'tags', 'go', 'here' ],
}
}
Then using the integration would be as simple as:
datadog::agent::integration::mongo { 'localhost':
params => 'here',
tags => 'here',
}
And then you get instances for free. If you need to specify additional parameters to the init_config
section, you can declare it separately yourself, or specify them via hiera.
As far as having 'generic' integrations that people can use (maybe their own integrations they've written, maybe an integration provided by the datadog agent itself which doesn't have puppet support yet), they can use those 2 building blocks in their own classes/defines.
Some of the integrations don't make sense to have "instances" of (disk, agent_metrics) even though they use the instances
section. These can just be classes:
class datadog_agent::integrations::disk (params) {
datadog::agent::integration::init_config { 'disk':
params => 'here',
}
datadog::agent::integration::instance { 'disk':
params => 'here',
tags => ['whatever'],
}
}
The underlying implementation of the two core types could use puppetlabs/concat or richardc/datacat to actually build the files. The nice thing about datacat is that since we're generating a yaml file it's a nice mapping to build up a hash and YAML.dump in the template and call it good, but I've always thought datacat, while great, was the wrong approach to the problem. Concat is nice because it's a puppetlabs supported module, but building a yaml data structure out of file fragments also has always been a bad smell for me. However, Because Puppet™, these are basically the best solutions we have to this problems.
As far as supporting every-feature-under-the-sun of the integrations, I believe if we make the underlying types (the init_config
, and instance
types) generic enough, then arbitrary integrations are possible. And for the 'official' integrations, assuming they are defined well enough in the python code, we could probably generate the puppet module based on those (instance/init_config parameters, etc). This is something which may require a fair amount of work on the python code (adding metadata to each integration type), but could make supporting the puppet module a much easier prospect for you folks in general.
Anywho, I'd like to get cracking on these changes as soon as possible, and I'd love to have some feedback if there are any concerns y'all have or changes you'd like to request. I can also do a proof of concept to show what I'm talking about if you'd like before I get cracking on changing all of the current integrations.
Thanks, and I look forward to your feedback!
Running dd-agent info
gives me the following error.
...
Collector (v 5.6.3)
...
mesos
-----
- instance #0 [WARNING]
Warning: This check is deprecated in favor of Mesos master and slave specific checks. It will be removed in a future version of the Datadog Agent.
Warning: ('Connection aborted.', error(101, 'Network is unreachable'))
Warning: ('Connection aborted.', error(101, 'Network is unreachable'))
Warning: ('Connection aborted.', error(101, 'Network is unreachable'))
- Collected 0 metrics, 0 events & 4 service checks
I noticed that the integration instructions in the webapp are inconsistent with the puppet module. To fix this, I had to rename the mesos.yaml file to mesos_slave.yaml. Please resolve this either in the agent or the Puppet module. Changing the agent is preferred because it probably isn't good practice to use the name of a configuration file for application logic. After changing the name of the YAML file and restarting the collector, I get:
mesos_slave
-----------
- instance #0 [OK]
- Collected 30 metrics, 0 events & 2 service checks
To summarize, the mesos integration does not work out of the box using this Puppet module and the problem lies within the agent but can be worked around by adding a slave/master parameter to the Puppet module to dictate what the configuration file should be named.
puppet-datadog-agent/manifests/integrations/mesos.pp
Puppet master reporting requires the dogapi gem. it seems to fail silently without it.
I was unable to start the agent installed with this module on a Fedora 20 box since the tornado
python module was missing. Once I'd installed the module the agent started working. The problem is that on the particular box there are not many packages (and that's the desired state), so installing tornado
by hand caused error messages complaining about inability to compile some speedup.
The second issue is the pip
installer. I had to install it prior to tornado
installation. I'd like to avoid that, too.
The last problem is adding the installation to the puppet scripts. Either it has to be handcrafted or the puppet-python
module has to be used. I use the last option but I don't feel it's the preferred one. The best approach would be to avoid external dependencies.
Just about every check provided by the dd-agent package extends AgentCheck and supports the min_collection_interval setting at the instance or check level. None of the integrations provided in the puppet module supports a method to set the min_collection_level.
Could we make the requirement of puppetlabs-ruby a little less strict?
PR #110 changes metadata.json to support 0.2.0 and higher, until the next major version.
Cheers,
Otto
In "http_check.yaml.example", it states that we can specify a "name" attribute.
# - name: My second service
# url: https://another.url.example.com
However, the "name" attribute is not being parsed in "http_check.pp" script.
[class datadog_agent::integrations::http_check (
$url = undef,
$username = undef, ](url)
Potentially, this can be fixed by modifying the script and the .erb files as below:
class datadog_agent::integrations::http_check (
+ $svcname = undef,
$url = undef,
instances:
! - name: <%= @name %>
url: <%= @url %>
<% if @timeout -%>
timeout: <%= @timeout %>
--- 1,7 ----
init_config:
instances:
! - name: <%= @svcname %>
url: <%= @url %>
<% if @timeout -%>
timeout: <%= @timeout %>
Basically having a way to say I want these nodes to report but not these ones is sometimes useful when using a mix of prod and not-so-prod nodes in puppet.
Or even to roll out slowly the Datadog reporting on all the nodes.
Maybe more a kind of hint or reminder . The kafka
and kafka consumer
integration is missing.
As of Ruby 1.9, what used to be in the rubygems package is now included in the base ruby package.
When using this puppet script on ubuntu 12.04 ( any presumably any distro that is on ruby 1.9 ) the output includes the error
E: Package 'rubygems' has no installation candidate
Since the needed code is included in the ruby 1.9 package, the agent will work correctly.
Customer reached out with an issue where they upgraded the module to 1.7.0. and they immediately stopped seeing their statsd metrics.
Looks like this is the culprit. https://github.com/DataDog/puppet-datadog-agent/blob/master/manifests/init.pp#L219
Hi,
I've seen on several issues/requests that you are going to push a major release of integrations.
The problem is that at the moment there are very few supported services from those we need.
I would gladly help contribute new integrations but from what I've seen, things are delayed until the major release is out.
My question is, what is next? should I start working on integrations or do you have an ETA?
Anyway, thanks for the great job you did so far!
Eliran
Is there a way to configure the module to report when running in a masterless puppet configuration?
The HTTP check here is declared as a class, this means that only one can be declared on a host. Could you please convert this to a define as I have in my fork (which you cherry-picked):
https://github.com/ordenull/puppet-datadog-agent/blob/master/manifests/defines/http_check.pp
https://github.com/ordenull/puppet-datadog-agent/blob/master/templates/http_check.yaml.erb
The pattern to declare resources as defines is much more puppet friendly and really should apply to all other checks as well. There is also another pattern in use by the process check. It does allow multiple process checks to be defined, however because it wraps them all in the same class it prevents the use of if statements in node definitions such as follows:
if ( defined( Class['Apache'] ) ) {
datadog::process { 'apache2':
name => 'apache2',
search_string => 'apache2',
}
if ( defined( Class['Varnish'] ) ) {
datadog::process { 'varnish':
name => 'varnish',
search_string => 'varnish',
}
It seems that ssl repo support has not yet been deployed by datadog. Meaning that our AMIs were failing and we didn't know why. I found that on the 7th, it was changed to https. Here is what port 443 returns from this url as of today(10-09-15):
nick.parry@nparry-laptop:~$ nc -vzw2 yum.datadoghq.com 443
nc: connect to yum.datadoghq.com port 443 (tcp) failed: Connection refused
Just wanted to provide some visibility on this issue.
It would be useful to get facts in the report class, to customize what the report event we send to Datadog.
The following needs to be changed, as it currently is it doesn't work when passing in a value that isn't nil:
<% if @hostname_extraction_regex.nil? -%>
:hostname_extraction_regex: '<%= @hostname_extraction_regex %>'
<% end -%>
to:
<% if !@hostname_extraction_regex.nil? -%>
:hostname_extraction_regex: '<%= @hostname_extraction_regex %>'
<% end -%>
This project shows up as a fork of jamtur01/puppet-datadog-agent.
As it is now 249 commits ahead, 3 commits behind jamtur01:master it might be time to change this to be a standalone repository? Github support should be able to do this easily.
It would be great to have a generic integration manifest. This allows us to pass larger, more complicated configuration files to integrations that need to be rendered with many variables, and are too site specific.
It would be great if the API key could be defined in hiera
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.