Giter Club home page Giter Club logo

puppet-datadog-agent's Issues

Missing Metrics on PE/POSS

When reporting is enabled, events are created but metrics seem to be missing. We should check where we should be pulling the expected metrics from, something may have changed in the newer versions of puppet.

When puppetserver, install dogapi into puppetserver context

If using POSS w/ puppetserver or Puppet Enterprise the 'dogapi' gem must be installed into the context of puppetserver(JRuby):

# /opt/puppetlabs/bin/puppetserver gem install dogapi
Fetching: multi_json-1.11.2.gem (100%)
Successfully installed multi_json-1.11.2
Fetching: dogapi-1.21.0.gem (100%)
Successfully installed dogapi-1.21.0
2 gems installed
# /etc/init.d/pe-puppetserver restart

Service enable on Centos

Each time puppet run this module on Centos, it re-enables service, which by itself triggers false "changed" puppet report on a node:

Debug: Executing: '/bin/systemctl is-active datadog-agent'
Debug: Executing: '/bin/systemctl show datadog-agent --property LoadState --property UnitFileState --no-pager'
Debug: Executing: '/bin/systemctl unmask datadog-agent'
Debug: Executing: '/bin/systemctl enable datadog-agent'
Notice: /Stage[main]/Datadog_agent::Redhat/Service[datadog-agent]/enable: enable changed 'false' to 'true'

I find setting provider => 'redhat' in redhat.pp solves this problem.

module has wrong dependency for stdlib in metadata.json

metadata.json indicates that any 4.x version of stdlib is sufficient, but the use of validate_integer means it's not.

We have 4.5.1 installed currently which doesn't have the validate_integer function.

Please correct your metadata.json file to reflect the minimum version of stdlib that includes validate_integer. It looks like v4.6 is what you want.

Improve the parameters for datadog.conf.erb

At the moment a lot of the options within datadog.conf.erb aren't set as parameters. This leads to some headaches when trying to set up things such as dogstatsd.

Ideally all options within datadog.conf.erb should be able to be defined by parameters provided to the datadog_agent class. This would enable them to be declared where required without having to use a modified version of the puppet-datadog-agent module or override the file in a later module.

Additionally, it would be potentially worthwhile having all the parameters covered by the rspec-tests in order to validate the default values and the custom ones. It would just provide a bit more confidence when introducing new parameters.

This would probably need to take place as three parts:

  • Existing parameters get tests added to the appropriate spec
  • Tests are written up for the new parameters
  • datadog_agent class and datadog.conf.erb are modified in order to pass the new parameters, with defaults matching the current setup in order to ensure the change is backwardly compatible.

Allow modifications to reported hostnames

Related to #1

Hosts may show up as duplicates in infrastructure list, with puppet metrics only intermittently reporting. Some cases have included a workaround specific to each case, but these workarounds need to be included each time the puppet module is updated

Redis slowlog_max_len not configurable in module but raises warning in collector

For redis servers configured with a non-default slowlog max len, collector issues a warning. Also, users who have configured their redis instance with a higher length probably want access to that data. Error message from sudo service datadog-agent info:

[WARNING]:Redis slowlog-max-len is higher than 128. Defaulting to 128.If you need a higher value, please set slowlog-max-len in your check config

Puppet agent template doesn't have this as an option, nor does the manifest accept it as a setting.

Installation issue in Ubuntu

Resources defined in manifests/ubuntu.pp are not guaranteed to run in a consistent order.

I saw this when I tried to deploy datadog agent to a handful Ubuntu boxes. The expected order in which resources execute is:

  1. Exec['datadog_key']
  2. File['/etc/apt/sources.list.d/datadog.list']
  3. Exec['datadog_apt-get_update'] via refresh
  4. Package['datadog-agent']
  5. Service['datadog-agent']

... but looking at reports from failed puppet agents in Puppet Enterprise, I see that 1 and 2 run, but 3 didn't and 4 fails. Unfortunately PE doesn't tell what order it runs, and I'm not 100% positive if it reports refresh events, but I think one of the two things is going on.

  • it tried to run 1->2->4->3 in that order. 4 requires 3, but 3 is a refreshonly resource, and there are some dragons in this part of Puppet.
  • it tried to run 2->3->4->1 in that order, because no other resources actually explicitly require 1.

Either way, once the system gets into this state, 2 is up-to-date, so 3 will never run, so the system cannot recover from this state by itself. I had to go in and manually run "apt-get update" to make them unstuck.

I recommend you collapse 1 and 3 into a single exec resource, and define dependencies in 2->(1+3)->4->5.

Ensure puppet module works well on puppet 3.x

We've received reports of customers trying it out and having issues, primarily around Class Datadog_reports is already defined in Puppet::Reports, despite confirming correct configuration.

In general, we should also strive to pass puppet lint, and any other testing that makes sense.

Every second execution of puppet fails

Every second execution of puppet removed datadog_agent package, and fails.
We use commit 3cffbcf plus #78 on top of it, but this pull request change does not touch the affected area.
So, for the half of the time, we don't even have the datadog_agent package installed.
This part in the redhat.pp causes every second execution of puppet to fail:

package { 'datadog-agent-base':
  ensure => absent,
  before => Package['datadog-agent'],
}

We don't have datadog-agent-base, but this code by some reason deletes datadog-agent package, including dd_user. Here is the relevant portion of the puppet log:

Info: Applying configuration version '1427837407'
Debug: Prefetching yum resources for package
Debug: Executing '/bin/rpm --version'
Debug: Executing '/bin/rpm -qa --nosignature --nodigest --qf '%{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n''
Debug: Executing '/bin/rpm -q datadog-agent-base --nosignature --nodigest --qf %{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n'
Debug: Executing '/bin/rpm -q datadog-agent-base --nosignature --nodigest --qf %{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n --whatprovides'
Debug: Executing '/bin/rpm -e datadog-agent-5.2.2-1.x86_64'
Notice: /Stage[main]/Datadog_agent::Redhat/Package[datadog-agent-base]/ensure: removed
Debug: /Stage[main]/Datadog_agent::Redhat/Package[datadog-agent-base]: The container Class[Datadog_agent::Redhat] will propagate my refresh event

After that, puppet fails at init.pp:
Error: Could not find user dd-agent
Error: /Stage[main]/Datadog_agent/File[/etc/dd-agent/datadog.conf]/owner: change from 498 to dd-agent failed: Could not find user dd-agent

With a next puppet execution, the datadog_agent package gets reinstalled and exection completes normally:
Debug: Prefetching inifile resources for yumrepo
Debug: Executing '/bin/rpm -q datadog-agent --nosignature --nodigest --qf %{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n'
Debug: Executing '/bin/rpm -q datadog-agent --nosignature --nodigest --qf %{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n --whatprovides'
Debug: Packagedatadog-agent: Ensuring => latest
Debug: Executing '/usr/bin/yum -d 0 -e 0 -y install datadog-agent'
Notice: /Stage[main]/Datadog_agent::Redhat/Package[datadog-agent]/ensure: created

On puppet 3.0.x, I get an error message from the datadog report processor.

Since I've upgraded to 3.0.2, I've been getting an error message from the datadog report processor. Here's the output from puppet with --debug --trace:

Notice: Finished catalog run in 16.60 seconds
Debug: Using settings: adding file resource 'rrddir': 'File[/var/lib/puppet/rrd]{:links=>:follow, :group=>"puppet", :backup=>false, :ensure=>:directory, :owner=>"puppet", :mode=>"750", :loglevel=>:debug, :path=>"/var/lib/puppet/rrd"}'
Debug: Finishing transaction 69891384984120
Debug: Received report to process from instance5.zicasso.com
Debug: Processing report from instance5.zicasso.com with processor Puppet::Reports::Store
Debug: Processing report from instance5.zicasso.com with processor Puppet::Reports::Datadog_reports
Debug: Sending metrics for instance5.zicasso.com to Datadog
Debug: Sending events for instance5.zicasso.com to Datadog
undefined method `[]' for :@aggregation_key:Symbol

datadog_agent::integrations::directory is not functional?

Seems to stem from this commit

the problem is $name being reassigned.

Also, I really think the whole comment of "we do integrations as classes, not defines" doesn't make a lot of sense. In this case if you wanted to use multiple directories (which the integration seems to support, because instances (I'll admit, I have not used the integration myself, so I'm just guessing) you'd ... what?

Anywho, I'd like to fix this, but I'm not certain what the best approach is. What is $name trying to do there and why, and since it's currently non-functional, how much backward compatibility do I need to worry about?

Again, I haven't used this integration, I am working on writing tests for all of the integration classes so I can factor out a generic integration type, and ran into this issue. Please let me know how I can help :)

Make the integration manifests 'parameter agnostic'

It's a pain to have to specify every key of the YAML file in a different variable.
We could just get a dictionary representing the instances and convert it to YAML.

class { 'datadog_agent::integrations::example' :
  instances => [
    { 'host' => 'localhost', 'port' => 42, 'tags' => [] }
  ]
}

# in the template
<%= require 'yaml'; {'init_config'=>@init_config, 'instances'=>@instances}.to_yaml %>

Moreover, it would be easy to add new integrations, the puppet user just has to stick to the description format of the agent YAML file.

cc @alq666

That potentially is a non backward-compatible change for people manifests tough.

Use port 80 when fetching the DataDog APT key

By default apt-key will attempt to access a keyserver on 11371/tcp which can be problematic for environments with firewalls or network ACLs.

The Ubuntu keyserver also listens on port 80/tcp which is much more firewall friendly.

datadog_agent::integrations::http_check should be a defined resource

Because it's a class, you can only define a single http check per agent. It should be changed to a defined resource combined with maybe the puppetlabs/concat module to create a yaml with multiple instances of http checks. Having a single http check per host is pretty useless unless you're checking strictly for localhost.

`dogapi` is required to use Datadog report

I am running Puppet 4.2.2 licenced under the open source. I followed the instruction on this project but Puppet integration on the dashboard says the "No Data Received". After looking into puppetserver.log, I found the following error message:

[puppet-server] Puppet You need the `dogapi` gem to use the Datadog report (run puppet with puppet_run_reports on your master)

Reporting is now working well since I have installed dogapi gem by hand:

sudo puppetserver gem install dogapi

revamp of integrations

TL;DR: revamp integrations to allow multiple instances and standard parameter validation on all integrations.

Hey folks! I'm looking for some feedback on the current state of integrations vs where you'd like to go with things.

I see in the history there is at least one commit that references "integrations are classes now, not defines", and I can understand why that is to a degree. However, I don't think that hard "this is the only way it can be" model makes a lot of sense, and I think there is a better approach that can satisfy everyone's needs.

current state

Because of the way the configuration files for integrations are, the 'classes only' model makes a lot of sense. There is one section init_config that is common to the file itself, and then there are multiple instances that can be associated, like requested for in #130. This maps decently to a class whose parameters cover the init_config section, and then an array of instances as one of the class parameters. Great.

However there are some significant downsides to this model:

  1. Since we're using class params, it's hard for a class to say "hey, I need one of these too" like you might with a type. For instance, with the role/profile pattern, if you have 2 postgres databases in different profiles and want to monitor them both, you can't simply declare the class on both, because now those classes can't exist within the same catalog. Now you have to do something to factor out the class into another class which can be included in both profiles, but that's not always the easiest thing to do. Also, consider the case where you want to separate these profiles onto different machines, now you have to un-factor this, and ugh.
  2. Semantically, this is actually not the right way to do it with puppet. These are instances, they should be a type so you can declare multiple of them.
  3. Currently, the module implements this pattern very sporadically. Some have it (nginx, mongo, zk), while most do not. There does appear to be demand for multiple instances from people other than me (see #130, #64, #56), and I know we (Stripe) will be needing it shortly as we start setting up datadog monitoring for our RDS instances (among other things).
  4. Deep validation of the hashes and arrays used to declare the instances is difficult, and as a result, almost completely missing.

The problem of course is that this is all modifying one file, and that file can only be declared once. And part of the file is "class-like" and part of the file is "type-like". IMO the best way to resolve this would actually be to change the configuration parser to allow multiple files to configure the same plugin, and maybe each config file supports an integration parameter which specifies which integration the configs are for. This model is used very well with collectd and the puppet/collectd module (disclaimer: I have made many contributions to that module).

However, that's a fairly major change which depends on major changes in the datadog agent itself, which is a rather large dependency for this :)

new hotness

I'd like to propose a different model.

There would be 2 types that define the underlying file:

  • datadog_agent::integration::init_config
  • datadog_agent::integration::instance

These two would be used to define the init_config section of an integration config file, and the instances section. They would expose a pretty generic API which would pretty much just take some 'standard' parts (like tags, on the instance) and then a hash of other parameters to simply YAML.dump into the config file.

With these we could build types and classes to define specific integrations.

As an example, for the mongo integration it would look something like this:

class datadog_agent::integrations::mongo::init_config (params) {
    # validate params
    datadog_agent::integration::init_config { 'mongo':
        params => 'here',
    }
}
define datadog_agent::integrations::mongo (params) {
    include datadog_agent::integrations::mongo::init_config
    # validate params
    datadog_agent::integration::instance { "mongo-${name}":
        integration => 'mongo',
        params => 'here',
        tags => [ 'tags', 'go', 'here' ],
    }
}

Then using the integration would be as simple as:

datadog::agent::integration::mongo { 'localhost':
    params => 'here',
    tags => 'here',
}

And then you get instances for free. If you need to specify additional parameters to the init_config section, you can declare it separately yourself, or specify them via hiera.

As far as having 'generic' integrations that people can use (maybe their own integrations they've written, maybe an integration provided by the datadog agent itself which doesn't have puppet support yet), they can use those 2 building blocks in their own classes/defines.

Some of the integrations don't make sense to have "instances" of (disk, agent_metrics) even though they use the instances section. These can just be classes:

class datadog_agent::integrations::disk (params) {
    datadog::agent::integration::init_config { 'disk':
        params => 'here',
    }
    datadog::agent::integration::instance { 'disk':
        params => 'here',
        tags => ['whatever'],
    }
 }

The underlying implementation of the two core types could use puppetlabs/concat or richardc/datacat to actually build the files. The nice thing about datacat is that since we're generating a yaml file it's a nice mapping to build up a hash and YAML.dump in the template and call it good, but I've always thought datacat, while great, was the wrong approach to the problem. Concat is nice because it's a puppetlabs supported module, but building a yaml data structure out of file fragments also has always been a bad smell for me. However, Because Puppet™, these are basically the best solutions we have to this problems.

As far as supporting every-feature-under-the-sun of the integrations, I believe if we make the underlying types (the init_config, and instance types) generic enough, then arbitrary integrations are possible. And for the 'official' integrations, assuming they are defined well enough in the python code, we could probably generate the puppet module based on those (instance/init_config parameters, etc). This is something which may require a fair amount of work on the python code (adding metadata to each integration type), but could make supporting the puppet module a much easier prospect for you folks in general.

Anywho, I'd like to get cracking on these changes as soon as possible, and I'd love to have some feedback if there are any concerns y'all have or changes you'd like to request. I can also do a proof of concept to show what I'm talking about if you'd like before I get cracking on changing all of the current integrations.

Thanks, and I look forward to your feedback!

Mesos Integration Problems

Running dd-agent info gives me the following error.

...
Collector (v 5.6.3)
...

mesos
-----
  - instance #0 [WARNING]
      Warning: This check is deprecated in favor of Mesos master and slave specific checks. It will be removed in a future version of the Datadog Agent.
      Warning: ('Connection aborted.', error(101, 'Network is unreachable'))
      Warning: ('Connection aborted.', error(101, 'Network is unreachable'))
      Warning: ('Connection aborted.', error(101, 'Network is unreachable'))
  - Collected 0 metrics, 0 events & 4 service checks

I noticed that the integration instructions in the webapp are inconsistent with the puppet module. To fix this, I had to rename the mesos.yaml file to mesos_slave.yaml. Please resolve this either in the agent or the Puppet module. Changing the agent is preferred because it probably isn't good practice to use the name of a configuration file for application logic. After changing the name of the YAML file and restarting the collector, I get:

mesos_slave
-----------
  - instance #0 [OK]
  - Collected 30 metrics, 0 events & 2 service checks

To summarize, the mesos integration does not work out of the box using this Puppet module and the problem lies within the agent but can be worked around by adding a slave/master parameter to the Puppet module to dictate what the configuration file should be named.

puppet-datadog-agent/manifests/integrations/mesos.pp

Missing tornado package causes failing to start on Fedora 20

I was unable to start the agent installed with this module on a Fedora 20 box since the tornado python module was missing. Once I'd installed the module the agent started working. The problem is that on the particular box there are not many packages (and that's the desired state), so installing tornado by hand caused error messages complaining about inability to compile some speedup.

The second issue is the pip installer. I had to install it prior to tornado installation. I'd like to avoid that, too.

The last problem is adding the installation to the puppet scripts. Either it has to be handcrafted or the puppet-python module has to be used. I use the last option but I don't feel it's the preferred one. The best approach would be to avoid external dependencies.

Support puppetlabs-ruby > 0.2.0

Could we make the requirement of puppetlabs-ruby a little less strict?

PR #110 changes metadata.json to support 0.2.0 and higher, until the next major version.

Cheers,

Otto

"name" attribute is not parsed in by the http_check.pp script

In "http_check.yaml.example", it states that we can specify a "name" attribute.

# - name: My second service 
#   url: https://another.url.example.com 

However, the "name" attribute is not being parsed in "http_check.pp" script.

[class datadog_agent::integrations::http_check ( 
  $url       = undef, 
  $username  = undef, ](url)

Potentially, this can be fixed by modifying the script and the .erb files as below:

  1. https://github.com/DataDog/puppet-datadog-agent/blob/master/manifests/integrations/http_check.pp
  class datadog_agent::integrations::http_check ( 
+   $svcname   = undef, 
    $url       = undef, 
  1. https://github.com/DataDog/puppet-datadog-agent/blob/master/templates/agent-conf.d/http_check.yaml.erb

  instances: 
!     -   name: <%= @name %> 
          url: <%= @url %> 
  <% if @timeout -%> 
          timeout: <%= @timeout %> 
--- 1,7 ---- 
  init_config: 

  instances: 
!     -   name: <%= @svcname %> 
          url: <%= @url %> 
  <% if @timeout -%> 
          timeout: <%= @timeout %> 

Include/exclude rules in the report class

Basically having a way to say I want these nodes to report but not these ones is sometimes useful when using a mix of prod and not-so-prod nodes in puppet.
Or even to roll out slowly the Datadog reporting on all the nodes.

rubygems installation error on machines with ruby 1.9

As of Ruby 1.9, what used to be in the rubygems package is now included in the base ruby package.

When using this puppet script on ubuntu 12.04 ( any presumably any distro that is on ruby 1.9 ) the output includes the error

E: Package 'rubygems' has no installation candidate

Since the needed code is included in the ruby 1.9 package, the agent will work correctly.

New integrations anytime soon?

Hi,

I've seen on several issues/requests that you are going to push a major release of integrations.
The problem is that at the moment there are very few supported services from those we need.

I would gladly help contribute new integrations but from what I've seen, things are delayed until the major release is out.

My question is, what is next? should I start working on integrations or do you have an ETA?

Anyway, thanks for the great job you did so far!

Eliran

http_check doesn't support multiple instances

The HTTP check here is declared as a class, this means that only one can be declared on a host. Could you please convert this to a define as I have in my fork (which you cherry-picked):

https://github.com/ordenull/puppet-datadog-agent/blob/master/manifests/defines/http_check.pp
https://github.com/ordenull/puppet-datadog-agent/blob/master/templates/http_check.yaml.erb

The pattern to declare resources as defines is much more puppet friendly and really should apply to all other checks as well. There is also another pattern in use by the process check. It does allow multiple process checks to be defined, however because it wraps them all in the same class it prevents the use of if statements in node definitions such as follows:

if ( defined( Class['Apache'] ) ) {
datadog::process { 'apache2':
name => 'apache2',
search_string => 'apache2',
}

if ( defined( Class['Varnish'] ) ) {
datadog::process { 'varnish':
name => 'varnish',
search_string => 'varnish',
}

HTTPS is not yet a thing for the datadog repo

It seems that ssl repo support has not yet been deployed by datadog. Meaning that our AMIs were failing and we didn't know why. I found that on the 7th, it was changed to https. Here is what port 443 returns from this url as of today(10-09-15):

nick.parry@nparry-laptop:~$ nc -vzw2 yum.datadoghq.com 443
nc: connect to yum.datadoghq.com port 443 (tcp) failed: Connection refused

Just wanted to provide some visibility on this issue.

Logic in datadog.yaml is incorrect

The following needs to be changed, as it currently is it doesn't work when passing in a value that isn't nil:

<% if @hostname_extraction_regex.nil? -%>
:hostname_extraction_regex: '<%= @hostname_extraction_regex %>'
<% end -%>

to:

<% if !@hostname_extraction_regex.nil? -%>
:hostname_extraction_regex: '<%= @hostname_extraction_regex %>'
<% end -%>

Add generic integration

It would be great to have a generic integration manifest. This allows us to pass larger, more complicated configuration files to integrations that need to be rendered with many variables, and are too site specific.

hiera support?

It would be great if the API key could be defined in hiera

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.