datadog / ansible-datadog Goto Github PK

View Code? Open in Web Editor NEW

295.0 226.0 223.0 677 KB

Ansible role for Datadog Agent

License: Apache License 2.0

Jinja 83.15% Shell 16.85%

datadog datadog-agent ansible-role ansible-galaxy

ansible-datadog's Introduction

Datadog Agent Ansible Role

The Datadog Agent Ansible role installs and configures the Datadog Agent and integrations.

Ansible role versus Ansible collection

The Datadog Agent Ansible role is available through 2 different channels:

As part of the Datadog collection, accessible under the datadog.dd name on Ansible Galaxy (recommended).
As a standalone role, accessible under the datadog.datadog name on Ansible Galaxy (legacy).

Version 4 of the role and version 5 of the collection install the Datadog Agent v7 by default.

Setup

Note that the install instructions in this document describe installation of the standalone Datadog role. For installation instructions of the Datadog collection, please refer to the collection README file. The configuration variables are the same for both the standalone role as well as the role accessed through the collection.

Requirements

Requires Ansible v2.6+.
Supports most Debian and RHEL-based Linux distributions, macOS, and Windows.
When using with Ansible 2.10+ to manage Windows hosts, requires the ansible.windows collection to be installed:
```
ansible-galaxy collection install ansible.windows
```
When using with Ansible 2.10+ to manage openSUSE/SLES hosts, requires the community.general collection to be installed:
```
ansible-galaxy collection install community.general
```

Installation

Install the Datadog role from Ansible Galaxy on your Ansible server:

ansible-galaxy install datadog.datadog

To deploy the Datadog Agent on hosts, add the Datadog role and your API key to your playbook:

- hosts: servers
  roles:
    - { role: datadog.datadog, become: yes }
  vars:
    datadog_api_key: "<YOUR_DD_API_KEY>"

The API key is required and its absence causes the role to fail. If you want to provide it through another way, outside of Ansible's control, specify a placeholder key and substitute the key at a later point.

Role variables

These variables provide additional configuration during the installation of the Datadog Agent. They should be specified in the vars section of your playbook.

Variable	Description
`datadog_api_key`	Your Datadog API key. This variable is mandatory starting from 4.21.
`datadog_site`	The site of the Datadog intake to send Agent data to. Defaults to `datadoghq.com`, set to `datadoghq.eu` to send data to the EU site. This option is only available with Agent version >= 6.6.0.
`datadog_agent_flavor`	Override the default Debian / RedHat Package for IOT Installations on RPI. Defaults to "datadog-agent" - use "datadog-iot-agent" for RPI.
`datadog_agent_version`	The pinned version of the Agent to install (optional, but recommended), for example: `7.16.0`. Setting `datadog_agent_major_version` is not needed if `datadog_agent_version` is used.
`datadog_agent_major_version`	The major version of the Agent to install. The possible values are 5, 6, or 7 (default). If `datadog_agent_version` is set, it takes precedence otherwise the latest version of the specified major is installed. Setting `datadog_agent_major_version` is not needed if `datadog_agent_version` is used.
`datadog_checks`	YAML configuration for Agent checks to drop into: - `/etc/datadog-agent/conf.d/<check_name>.d/conf.yaml` for Agent v6 and v7, - `/etc/dd-agent/conf.d` for Agent v5.
`datadog_disable_untracked_checks`	Set to `true` to remove all checks not present in `datadog_checks` and `datadog_additional_checks`.
`datadog_additional_checks`	List of additional checks that are not removed if `datadog_disable_untracked_checks` is set to `true`.
`datadog_disable_default_checks`	Set to `true` to remove all default checks.
`datadog_config`	Set configuration for the Datadog Agent. The role writes the config to the correct location based on the operating system. For a full list of config options, see the `datadog.yaml` template file in the datadog-agent GitHub repository.
`datadog_config_ex`	(Optional) Extra INI sections to go in `/etc/dd-agent/datadog.conf` (Agent v5 only).
`datadog_apt_repo`	Override the default Datadog `apt` repository. Make sure to use the `signed-by` option if repository metadata is signed using Datadog's signing keys: `deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://yourrepo`.
`datadog_apt_cache_valid_time`	Override the default apt cache expiration time (defaults to 1 hour).
`datadog_apt_key_url_new`	Override the location from which to obtain Datadog `apt` key (the deprecated `datadog_apt_key_url` variable refers to an expired key that's been removed from the role). The URL is expected to be a GPG keyring containing keys `382E94DE`, `F14F620E` and `C0962C7D`.
`datadog_yum_repo_config_enabled`	Set to `false` to prevent the configuration of a Datadog `yum` repository (defaults to `true`). WARNING: it deactivates the automatic update of GPG keys.
`datadog_yum_repo`	Override the default Datadog `yum` repository.
`datadog_yum_repo_proxy`	Set a proxy URL to use in the Datadog `yum` repo configuration.
`datadog_yum_repo_proxy_username`	Set a proxy username to use in the Datadog `yum` repo configuration.
`datadog_yum_repo_proxy_password`	Set a proxy password to use in the Datadog `yum` repo configuration.
`datadog_yum_repo_gpgcheck`	Override the default `repo_gpgcheck` value (empty). If empty, value is dynamically set to `yes` when custom `datadog_yum_repo` is not used and system is not RHEL/CentOS 8.1 (due to a bug in dnf), otherwise it's set to `no`. Note: repodata signature verification is always turned off for Agent 5.
`datadog_yum_gpgcheck`	Override the default `gpgcheck` value (`yes`) - use `no` to turn off package GPG signature verification.
`datadog_yum_gpgkey`	Removed in version 4.18.0 Override the default URL to the Datadog `yum` key used to verify Agent v5 and v6 (up to 6.13) packages (key ID `4172A230`).
`datadog_yum_gpgkey_e09422b3`	Override the default URL to the Datadog `yum` key used to verify Agent v6.14+ packages (key ID `E09422B3`).
`datadog_yum_gpgkey_e09422b3_sha256sum`	Override the default checksum of the `datadog_yum_gpgkey_e09422b3` key.
`datadog_zypper_repo`	Override the default Datadog `zypper` repository.
`datadog_zypper_repo_gpgcheck`	Override the default `repo_gpgcheck` value (empty). If empty, value is dynamically set to `yes` when custom `datadog_zypper_repo` is not used, otherwise it's set to `no`. Note: repodata signature verification is always turned off for Agent 5.
`datadog_zypper_gpgcheck`	Override the default `gpgcheck` value (`yes`) - use `no` to turn off package GPG signature verification.
`datadog_zypper_gpgkey`	Removed in version 4.18.0 Override the default URL to the Datadog `zypper` key used to verify Agent v5 and v6 (up to 6.13) packages (key ID `4172A230`).
`datadog_zypper_gpgkey_sha256sum`	Removed in version 4.18.0 Override the default checksum of the `datadog_zypper_gpgkey` key.
`datadog_zypper_gpgkey_e09422b3`	Override the default URL to the Datadog `zypper` key used to verify Agent v6.14+ packages (key ID `E09422B3`).
`datadog_zypper_gpgkey_e09422b3_sha256sum`	Override the default checksum of the `datadog_zypper_gpgkey_e09422b3` key.
`datadog_agent_allow_downgrade`	Set to `yes` to allow Agent downgrade (use with caution, see `defaults/main.yml` for details). Note: Downgrades are not supported on Windows platforms.
`datadog_enabled`	Set to `false` to prevent `datadog-agent` service from starting (defaults to `true`).
`datadog_additional_groups`	Either a list, or a string containing a comma-separated list of additional groups for the `datadog_user` (Linux only).
`datadog_windows_ddagentuser_name`	The name of Windows user to create/use, in the format `<domain>\<user>` (Windows only).
`datadog_windows_ddagentuser_password`	The password used to create the user and/or register the service (Windows only).
`datadog_apply_windows_614_fix`	Whether or not to download and apply file referenced by `datadog_windows_614_fix_script_url` (Windows only). See https://dtdg.co/win-614-fix for more details. You can set this to `false` assuming your hosts aren't running Datadog Agent 6.14.*.
`datadog_macos_user`	The name of the user to run Agent under. The user has to exist, it won't be created automatically. Defaults to `ansible_user` (macOS only).
`datadog_macos_download_url`	Override the URL to download the DMG installer from (macOS only).
`datadog_apm_instrumentation_enabled`	Configure APM instrumentation. Possible values are: - `host`: Both the Agent and your services are running on a host. - `docker`: The Agent and your services are running in separate Docker containers on the same host. - `all`: Supports all the previous scenarios for `host` and `docker` at the same time.
`datadog_apm_instrumentation_libraries`	List of APM libraries to install if `host` or `docker` injection is enabled (defaults to `["java", "js", "dotnet", "python", "ruby"]`). You can find the available values in Inject Libraries Locally.
`datadog_apm_instrumentation_docker_config`	Override Docker APM configuration. Read configure Docker injection for more details.
`datadog_remote_updates`	Enable remote installation and updates through the datadog-installer.

Integrations

To configure a Datadog integration (check), add an entry to the datadog_checks section. The first level key is the name of the check, and the value is the YAML payload to write the configuration file. Examples are provided below.

To install or remove an integration, refer to the datadog_integration paragraph

Process check

To define two instances for the process check use the configuration below. This creates the corresponding configuration files:

Agent v6 & v7: /etc/datadog-agent/conf.d/process.d/conf.yaml
Agent v5: /etc/dd-agent/conf.d/process.yaml

    datadog_checks:
      process:
        init_config:
        instances:
          - name: ssh
            search_string: ['ssh', 'sshd']
          - name: syslog
            search_string: ['rsyslog']
            cpu_check_interval: 0.2
            exact_match: true
            ignore_denied_access: true

Custom check

To configure a custom check use the configuration below. This creates the corresponding configuration files:

Agent v6 & v7: /etc/datadog-agent/conf.d/my_custom_check.d/conf.yaml
Agent v5: /etc/dd-agent/conf.d/my_custom_check.yaml

    datadog_checks:
      my_custom_check:
        init_config:
        instances:
          - some_data: true

Custom Python Checks

To pass a Python check to the playbook, use the configuration below.

This configuration requires the Datadog play and role to be a part of the larger playbook where the value passed in is the relative file path to the actual task for Linux or Windows.

This is only available for Agent v6 or later.

The key should be the name of the file created in the checks directory checks.d/{{ item }}.py:

    datadog_checks:
      my_custom_check:
        init_config:
        instances:
          - some_data: true
    datadog_custom_checks:
      my_custom_check: '../../../custom_checks/my_custom_check.py'

Autodiscovery

When using Autodiscovery, there is no pre-processing nor post-processing on the YAML. This means every YAML section is added to the final configuration file, including autodiscovery identifiers.

The example below configures the PostgreSQL check through Autodiscovery:

    datadog_checks:
      postgres:
        ad_identifiers:
          - db-master
          - db-slave
        init_config:
        instances:
          - host: %%host%%
            port: %%port%%
            username: username
            password: password

Learn more about Autodiscovery in the Datadog documentation.

Tracing

To enable trace collection with Agent v6 or v7 use the following configuration:

datadog_config:
  apm_config:
    enabled: true

To enable trace collection with Agent v5 use the following configuration:

datadog_config:
  apm_enabled: "true" # has to be a string

Live processes

To enable live process collection with Agent v6 or v7 use the following configuration:

datadog_config:
  process_config:
    enabled: "true" # type: string

The possible values for enabled are: "true", "false" (only container collection), or "disabled" (disable live processes entirely).

Variables

The following variables are available for live processes:

scrub_args: Enables the scrubbing of sensitive arguments from a process command line (defaults to true).
custom_sensitive_words: Expands the default list of sensitive words used by the command line scrubber.

System probe

The system probe is configured under the system_probe_config variable. Any variables nested underneath are written to the system-probe.yaml, in the system_probe_config section.

Network Performance Monitoring (NPM) is configured under the network_config variable. Any variables nested underneath are written to the system-probe.yaml, in the network_config section.

Cloud Workload Security is configured under the runtime_security_config variable. Any variables nested underneath are written to the system-probe.yaml and security-agent.yaml, in the runtime_security_config section.

Universal Service Monitoring (USM) is configured under the service_monitoring_config variable. Any variables nested underneath are written to the system-probe.yaml, in the service_monitoring_config section.

Compliance is configured under the compliance_config variable. Any variables nested underneath are written to the security-agent.yaml, in the compliance_config section.

Note for Windows users: NPM is supported on Windows with Agent v6.27+ and v7.27+. It ships as an optional component that is only installed if network_config.enabled is set to true when the Agent is installed or upgraded. Because of this, existing installations might need to do an uninstall and reinstall of the Agent once to install the NPM component, unless the Agent is upgraded at the same time.

Example configuration

datadog_config:
  process_config:
    enabled: "true" # type: string
    scrub_args: true
    custom_sensitive_words: ['consul_token','dd_api_key']
system_probe_config:
  sysprobe_socket: /opt/datadog-agent/run/sysprobe.sock
network_config:
  enabled: true
service_monitoring_config:
  enabled: true
runtime_security_config:
  enabled: true

Note: This configuration works with Agent 6.24.1+ and 7.24.1+. For older Agent versions, see the Network Performance Monitoring documentation on how to enable system-probe.

On Linux, once this modification is complete, follow the steps below if you installed an Agent version older than 6.18.0 or 7.18.0:

Start the system-probe: sudo service datadog-agent-sysprobe start Note: If the service wrapper is not available on your system, run this command instead: sudo initctl start datadog-agent-sysprobe.
Restart the Agent: sudo service datadog-agent restart.
Enable the system-probe to start on boot: sudo service enable datadog-agent-sysprobe.

For manual setup, see the NPM documentation.

Agent v5

To enable live process collection with Agent v5, use the following configuration:

datadog_config:
  process_agent_enabled: true
datadog_config_ex:
  process.config:
    scrub_args: true
    custom_sensitive_words: "<FIRST_WORD>,<SECOND_WORD>"

Versions

By default, the current major version of the Datadog Ansible role installs Agent v7. The variables datadog_agent_version and datadog_agent_major_version are available to control the Agent version installed.

For v4+ of this role, when datadog_agent_version is used to pin a specific Agent version, the role derives per-OS version names to comply with the version naming schemes of the supported operating systems, for example:

1:7.16.0-1 for Debian and SUSE based
7.16.0-1 for RedHat-based
7.16.0-1 for macOS
7.16.0 for Windows.

This makes it possible to target hosts running different operating systems in the same Ansible run, for example:

Provided	Installs	System
`datadog_agent_version: 7.16.0`	`1:7.16.0-1`	Debian and SUSE-based
`datadog_agent_version: 7.16.0`	`7.16.0-1`	RedHat-based
`datadog_agent_version: 7.16.0`	`7.16.0-1`	macOS
`datadog_agent_version: 7.16.0`	`7.16.0`	Windows
`datadog_agent_version: 1:7.16.0-1`	`1:7.16.0-1`	Debian and SUSE-based
`datadog_agent_version: 1:7.16.0-1`	`7.16.0-1`	RedHat-based
`datadog_agent_version: 1:7.16.0-1`	`7.16.0`	Windows

Note: If the version is not provided, the role uses 1 as the epoch and 1 as the release number.

Agent v5 (older version):

The Datadog Ansible role includes support for Datadog Agent v5 for Linux only. To install Agent v5, use datadog_agent_major_version: 5 to install the latest version of Agent v5 or set datadog_agent_version to a specific version of Agent v5. Note: The datadog_agent5 variable is obsolete and has been removed.

Repositories

Linux

When the variables datadog_apt_repo, datadog_yum_repo, and datadog_zypper_repo are not set, the official Datadog repositories for the major version set in datadog_agent_major_version are used:

#	Default apt repository	Default yum repository	Default zypper repository
5	deb https://apt.datadoghq.com stable main	https://yum.datadoghq.com/rpm	https://yum.datadoghq.com/suse/rpm
6	deb https://apt.datadoghq.com stable 6	https://yum.datadoghq.com/stable/6	https://yum.datadoghq.com/suse/stable/6
7	deb https://apt.datadoghq.com stable 7	https://yum.datadoghq.com/stable/7	https://yum.datadoghq.com/suse/stable/7

To override the default behavior, set these variables to something else than an empty string.

If you previously used the Agent v5 variables, use the new variables below with datadog_agent_major_version set to 5 or datadog_agent_version pinned to a specific Agent v5 version.

Old	New
`datadog_agent5_apt_repo`	`datadog_apt_repo`
`datadog_agent5_yum_repo`	`datadog_yum_repo`
`datadog_agent5_zypper_repo`	`datadog_zypper_repo`

Since version 4.9.0, the use_apt_backup_keyserver variable has been removed, as APT keys are obtained from https://keys.datadoghq.com.

Windows

When the variable datadog_windows_download_url is not set, the official Windows MSI package corresponding to the datadog_agent_major_version is used:

Agent version	Default Windows MSI package URL
6	https://s3.amazonaws.com/ddagent-windows-stable/datadog-agent-6-latest.amd64.msi
7	https://s3.amazonaws.com/ddagent-windows-stable/datadog-agent-7-latest.amd64.msi

To override the default behavior, set this variable to something other than an empty string.

macOS

When the variable datadog_macos_download_url is not set, the official macOS DMG package corresponding to the datadog_agent_major_version is used:

Agent version	Default macOS DMG package URL
6	https://install.datadoghq.com/datadog-agent-6-latest.dmg
7	https://install.datadoghq.com/datadog-agent-7-latest.dmg

To override the default behavior, set this variable to something other than an empty string.

Upgrade

To upgrade from Agent v6 to v7, use datadog_agent_major_version: 7 to install the latest version or set datadog_agent_version to a specific version of Agent v7. Use similar logic to upgrade from Agent v5 to v6.

Integration installation

Available for Agent v6.8+

Use the datadog_integration resource to install a specific version of a Datadog integration. Keep in mind, the Agent comes with the core integrations already installed. This command is useful for upgrading a specific integration without upgrading the whole Agent. For more details, see integration management.

If you want to configure an integration, refer to the datadog_checks paragraph

Available actions:

install: Installs a specific version of the integration.
remove: Removes an integration.

Third party integrations

Datadog community and Datadog Marketplace integrations can be installed with the datadog_integration resource. Note: These integrations are considered to be "third party" and thus need third_party: true to be set—see the example below.

Syntax

  datadog_integration:
    <INTEGRATION_NAME>:
      action: <ACTION>
      version: <VERSION_TO_INSTALL>

To install third party integrations, set third_party to true:

  datadog_integration:
    <INTEGRATION_NAME>:
      action: <ACTION>
      version: <VERSION_TO_INSTALL>
      third_party: true

Example

This example installs version 1.11.0 of the ElasticSearch integration and removes the postgres integration.

 datadog_integration:
   datadog-elastic:
     action: install
     version: 1.11.0
   datadog-postgres:
     action: remove

To see the available versions of Datadog integrations, see their CHANGELOG.md file in the integrations-core repository.

Downgrade

To downgrade to a prior version of the Agent:

Set datadog_agent_version to a specific version, for example: 5.32.5.
Set datadog_agent_allow_downgrade to yes.

Notes:

Downgrades are not supported for Windows platforms.

Playbooks

Below are some sample playbooks to assist you with using the Datadog Ansible role.

The following example sends data to Datadog US (default), enables logs, NPM, and configures a few checks.

- hosts: servers
  roles:
    - { role: datadog.datadog, become: yes }
  vars:
    datadog_api_key: "<YOUR_DD_API_KEY>"
    datadog_agent_version: "7.16.0"
    datadog_config:
      tags:
        - "<KEY>:<VALUE>"
        - "<KEY>:<VALUE>"
      log_level: INFO
      apm_config:
        enabled: true
      logs_enabled: true  # available with Agent v6 and v7
    datadog_checks:
      process:
        init_config:
        instances:
          - name: ssh
            search_string: ['ssh', 'sshd' ]
          - name: syslog
            search_string: ['rsyslog' ]
            cpu_check_interval: 0.2
            exact_match: true
            ignore_denied_access: true
      ssh_check:
        init_config:
        instances:
          - host: localhost
            port: 22
            username: root
            password: <YOUR_PASSWORD>
            sftp_check: True
            private_key_file:
            add_missing_keys: True
      nginx:
        init_config:
        instances:
          - nginx_status_url: http://example.com/nginx_status/
            tags:
              - "source:nginx"
              - "instance:foo"
          - nginx_status_url: http://example2.com:1234/nginx_status/
            tags:
              - "source:nginx"
              - "<KEY>:<VALUE>"

        #Log collection is available on Agent 6 and 7
        logs:
          - type: file
            path: /var/log/access.log
            service: myapp
            source: nginx
            sourcecategory: http_web_access
          - type: file
            path: /var/log/error.log
            service: nginx
            source: nginx
            sourcecategory: http_web_access
    # datadog_integration is available on Agent 6.8+
    datadog_integration:
      datadog-elastic:
        action: install
        version: 1.11.0
      datadog-postgres:
        action: remove
    network_config:
      enabled: true

Agent v6

This example installs the latest Agent v6:

- hosts: servers
  roles:
    - { role: datadog.datadog, become: yes }
  vars:
    datadog_agent_major_version: 6
    datadog_api_key: "<YOUR_DD_API_KEY>"

Configuring the site

If using a site other than the default datadoghq.com, set the datadog_site var to the appropriate URL (eg: datadoghq.eu, us3.datadoghq.com).

This example sends data to the EU site:

- hosts: servers
  roles:
    - { role: datadog.datadog, become: yes }
  vars:
    datadog_site: "datadoghq.eu"
    datadog_api_key: "<YOUR_DD_API_KEY>"

Windows

On Windows, remove the become: yes option so the role does not fail. Below are two methods to make the example playbooks work with Windows hosts:

Inventory file

Using the inventory file is the recommended approach. Set the ansible_become option to no in the inventory file for each Windows host:

[servers]
linux1 ansible_host=127.0.0.1
linux2 ansible_host=127.0.0.2
windows1 ansible_host=127.0.0.3 ansible_become=no
windows2 ansible_host=127.0.0.4 ansible_become=no

To avoid repeating the same configuration for all Windows hosts, group them and set the variable at the group level:

[linux]
linux1 ansible_host=127.0.0.1
linux2 ansible_host=127.0.0.2

[windows]
windows1 ansible_host=127.0.0.3
windows2 ansible_host=127.0.0.4

[windows:vars]
ansible_become=no

Playbook file

Alternatively, if your playbook only runs on Windows hosts, use the following in the playbook file:

- hosts: servers
  roles:
    - { role: datadog.datadog }
  vars:
    ...

Note: This configuration fails on Linux hosts. Only use it if the playbook is specific to Windows hosts. Otherwise, use the inventory file method.

Uninstallation

On Windows it's possible to uninstall the Agent by using the following code in your Ansible role:

- name: Check If Datadog Agent is installed
  win_shell: |
    (@(Get-ChildItem -Path "HKLM:SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall" -Recurse) | Where {$_.GetValue("DisplayName") -like "Datadog Agent" }).PSChildName
  register: agent_installed_result
- name: Set Datadog Agent installed fact
  set_fact:
    agent_installed: "{{ agent_installed_result.stdout | trim }}"
- name: Uninstall the Datadog Agent
  win_package:
    product_id: "{{ agent_installed }}"
    state: absent
  when: agent_installed != ""

Troubleshooting

Debian stretch

Note: this information applies to versions of the role prior to 4.9.0. Since 4.9.0, the apt_key module is no longer used by the role.

On Debian Stretch, the apt_key module used by the role requires an additional system dependency to work correctly. The dependency (dirmngr) is not provided by the module. Add the following configuration to your playbooks to make use of the present role:

---
- hosts: all
  pre_tasks:
    - name: Debian Stretch requires the dirmngr package to use apt_key
      become: yes
      apt:
        name: dirmngr
        state: present
  roles:
    - { role: datadog.datadog, become: yes }
  vars:
    datadog_api_key: "<YOUR_DD_API_KEY>"

CentOS 6/7 with Python 3 interpreter and Ansible 2.10.x or below

The yum Python module, which is used in this role to install the Agent on CentOS-based hosts, is only available on Python 2 if Ansible 2.10.x or below is used. In such cases, the dnf package manager would have to be used instead.

However, dnf and the dnf Python module are not installed by default on CentOS-based hosts before CentOS 8. In this case, it is not possible to install the Agent when a Python 3 interpreter is used.

This role fails early when this situation is detected to indicate that Ansible 2.11+ or a Python 2 interpreter is needed when installing the Agent on CentOS / RHEL < 8.

To bypass this early failure detection (for instance, if dnf and the python3-dnf package are available on your host), set the datadog_ignore_old_centos_python3_error variable to true.

Windows

Due to a critical bug in Agent versions 6.14.0 and 6.14.1 on Windows, installation of these versions is blocked (starting with version 3.3.0 of this role).

NOTE: Ansible fails on Windows if datadog_agent_version is set to 6.14.0 or 6.14.1. Use 6.14.2 or above.

If you are updating from 6.14.0 or 6.14.1 on Windows, use the following steps:

Upgrade the present datadog.datadog Ansible role to the latest version (>=3.3.0).
Set the datadog_agent_version to 6.14.2 or above (defaults to latest).

For more details, see Critical Bug in Uninstaller for Datadog Agent 6.14.0 and 6.14.1 on Windows.

Ubuntu 20.04 broken by service_facts

Running the service_facts module on Ubuntu 20.04 causes the following error:

localhost | FAILED! => {
    "changed": false,
    "msg": "Malformed output discovered from systemd list-unit-files: accounts-daemon.service                    enabled         enabled      "
}

To fix this, update Ansible to v2.9.8 or above.

Missing API key

Starting from role 4.21 the API key is mandatory for the role to proceed.

If you need to install the agent through Ansible but don't want to specify an API key (if you are baking it into a container/VM image for instance) you can:

Specify a dummy API key and replace it afterward
Disable managed_config (datadog_manage_config: false)

ansible-datadog's People

Contributors

Stargazers

Watchers

Forkers

stefanfoulis smcavoy-b wawrzek waysact skamithi benjaminws dustinbrown recollect jeffreypalmer rhlmhrtr jberlinsky mlbarrow-rbi gnarf ssbarnea mattdlh tvinhas chriscannon davidwittman evnm aaronkvanmeerten seranakc tomazstrukelj brent-mills opt-tech sarimali intersection udemy jsteinich ebonharme gr8routdoors oliryde tolbrino udacity blastworksinc musicglue holmser mattlk13 hilli bgupta vcamvr grinnery transmit-live powerreviews snemetz jmseaton ncracker ismferd brouberol securityscorecard crowdskout arenadotio brendanlong dbr1993 matisku techstep system-dev-formations goforward filiptepper gfodor dstarmanhr mgar siran deluxetom johninweb excalq anatolebeuzon olisto isss802 paribus-jenkins jeff-jk robbyoconnor martinskis quantopian rakeshkamisetti enarciso dhirenshumsher xp-1000 slb350 kcd83 nickxj gregsharpe1 jno21 cloud-destroyer pdecat gabelbombe ns-mkusper msvbhat okunc jpiron richardsonlima tmcevoy14 lacornichon tolajuwon charsyam bunchc bcrist keenwawa dysonfrost andrewelizondo andyshinn

ansible-datadog's Issues

Integration configuration files are installed to the wrong location for agent v6+

dest: "/etc/datadog-agent/conf.d/{{ item }}.yaml"

should be:

dest: "/etc/datadog-agent/{{ item }}.d/conf.yaml"

According to the docs.

restart datadog-handler permission denied

After updating a basic check it fails to restart the agent.

fatal: [hostname]: FAILED! => {"changed": false, "failed": true, "msg": "start-stop-daemon: warning: failed to kill 23378: Operation not permitted\nTraceback (most recent call last):\n File \"/opt/datadog-agent/bin/supervisord\", line 11, in <module>\n load_entry_point('supervisor==3.3.0', 'console_scripts', 'supervisord')()\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/supervisor/supervisord.py\", line 365, in main\n go(options)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/supervisor/supervisord.py\", line 375, in go\n d.main()\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/supervisor/supervisord.py\", line 78, in main\n info_messages)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/supervisor/options.py\", line 1398, in make_logger\n stdout = self.nodaemon,\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/supervisor/loggers.py\", line 346, in getLogger\n handlers.append(RotatingFileHandler(filename,'a',maxbytes,backups))\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/supervisor/loggers.py\", line 172, in __init__\n FileHandler.__init__(self, filename, mode)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/supervisor/loggers.py\", line 98, in __init__\n self.stream = open(filename, mode)\nIOError: [Errno 13] Permission denied: '/var/log/datadog/supervisord.log'\nstart-stop-daemon: warning: failed to kill 23378: Operation not permitted\n"}

Can't set thresholds on process checks

Hi,

I'm trying to set thresholds on process checks but the following yaml doesn't result in output datadog-agent can use (regardless of any quoting I tried to introduce):

process:
    init_config:
      pid_cache_duration: 10
    instances:
      - name: sshd
        collect_children: true
        pid_file: /run/sshd.pid
        thresholds:
          critical: [1,150]

'to_nice_yaml' filter used in from checks.yaml.j2 interprets [1,150] as a list and results in the following output:

init_config:
    pid_cache_duration: 10
instances:
-   collect_children: true
    name: sshd
    pid_file: /run/sshd.pid
    thresholds:
    - 1
    - 150

What I expect to see is this:

init_config:
    pid_cache_duration: 10
instances:
-   collect_children: true
    name: sshd
    pid_file: /run/sshd.pid
    thresholds: [1,150]

just as described at https://github.com/DataDog/integrations-core/blob/master/process/conf.yaml.example#L46

4-way comparisons

Long mysql string breaks .yml config file

Following long query stament on mysql check causes datadog to not start.
Check definition

datadog_checks:
    mysql:
        init_config: null
        instances:
        -   options:
                disable_innodb_metrics: false
                extra_innodb_metrics: true
                extra_performance_metrics: true
                extra_status_metrics: true
                galera_cluster: false
                replication: true
                schema_size_metrics: false
            pass: ***********
            queries:
            -  field: services_completed_in_prev_5_mins
                metric: app.services.completed_in_prev_5_mins
                query: SELECT COUNT(*) AS services_completed_in_prev_5_mins FROM companies.services WHERE created_at > DATE_SUB(NOW(), INTERVAL 5 MINUTE);
                type: guage

The result in datadog.yml:

            -   field: bookings_started_in_prev_5_mins
                metric: app.bookings.started_in_prev_5_mins
                query: SELECT COUNT(*) AS bookings_started_in_prev_5_mins FROM travelgroup.abandonedbookings
                    WHERE created_at > DATE_SUB(NOW(), INTERVAL 5 MINUTE);
                type: guage

The error:
Couldn't initialize logging: File contains parsing errors: <???> [line 26]: 'WHERE created_at > DATE_SUB(NOW(), INTERVAL 5 MINUTE);\n’

Replacing to_nice_yaml on https://github.com/DataDog/ansible-datadog/blob/master/templates/datadog.conf.j2#L17 with to_yaml resolves it.

No happy path for templated checks when upgrading ansible module to 2.x

Format `datadog.conf` with the INI format instead of YAML

See example in DataDog/dd-agent#3227

The datadog.conf file should be formatted as an INI file instead of YAML here: https://github.com/DataDog/ansible-datadog/blob/1.2.0/templates/datadog.conf.j2#L17

cc @pahaz who reported the issue in the linked issue

Issues with Centos 7

Running this role on a Centos 7 instance produces the following error:

IOError: [Errno 13] Permission denied: '/var/log/datadog/supervisord.log'

Tried applying appropriate permissions for the "dd-agent" user but that didn't work.

Solution for me was to update the user to root under the following config file / section:

file:/etc/dd-agent/supervisor.conf
section: [supervisord]

Anyone else have similar issues?

RFE: Add support for installing datadog-agent from source

Specifically, for systems where there is not a pre-built package available, being able to install from source, would be perfect.

Example:
https://help.datadoghq.com/hc/en-us/articles/208163513-Deploying-the-Agent-on-RaspberryPI

Removing entries from `datadog_checks` doesn't delete the file

Resulting in the datadog agent attempting to run checks which are likely fail.

Steps to reproduce

Configure playbook w/ datadog role & an arbitrary check
Run playbook
Remove entry from datadog_checks and replace it with a different one
Run playbook again

Expected result

Check is removed and agent is restarted, only running the new check.

Actual result

Old check is still run, as the /etc/dd-agent/checks.d/<check>.yaml file is still present.

Ideally we'd be doing immutable deploys and this wouldn't be an issue, but c'est la vie.

logs check not generating valid yaml

OS: Ubuntu 16.04
Ansible: 2.5.0

datadog.yml (ansible playbook)

- hosts: all
  roles:
    - { role: Datadog.datadog, become: yes }  # On Ansible < 1.9, use `sudo: yes` instead of `become: yes`
  vars:
    datadog_api_key: "REDACTED"
    datadog_agent_version: "1:6.1.3-1" # for apt-based platforms, use a `6.0.0-1` format on yum-based platforms
    datadog_config:
      tags: "ddtag"
      log_level: INFO
      apm_enabled: "false" # has to be set as a string
      logs_enabled: true  # log collection is available on agent 6
    datadog_checks:
      logs:
        - type: file
          path: /var/log/alternatives.log
          service: syslog
          source: syslog
        - type: file
          path: /var/log/apt/history.log
          service: syslog
          source: syslog
        - type: file
          path: /var/log/auth.log
          service: syslog
          source: syslog
        - type: file
          path: /var/log/cloud-init.log
          service: syslog
          source: syslog
        - type: file
          path: /var/log/dpkg.log
          service: syslog
          source: syslog
        - type: file
          path: /var/log/kern.log
          service: syslog
          source: syslog
        - type: file
          path: /var/log/syslog
          service: syslog
          source: syslog
        - type: file
          path: /var/log/upstart/systemd-logind.log
          service: syslog
          source: syslog

generates:
/etc/datadog-agent/conf.d/logs.yml (datadog-agent config file)

-   path: /var/log/alternatives.log
    service: syslog
    source: syslog
    type: file
-   path: /var/log/apt/history.log
    service: syslog
    source: syslog
    type: file
-   path: /var/log/auth.log
    service: syslog
    source: syslog
    type: file
-   path: /var/log/cloud-init.log
    service: syslog
    source: syslog
    type: file
-   path: /var/log/dpkg.log
    service: syslog
    source: syslog
    type: file
-   path: /var/log/kern.log
    service: syslog
    source: syslog
    type: file
-   path: /var/log/syslog
    service: syslog
    source: syslog
    type: file
-   path: /var/log/upstart/systemd-logind.log
    service: syslog
    source: syslog
    type: file

errors: /var/log/datadog/agent.log

2018-04-17 17:51:21 UTC | INFO | (start.go:156 in StartAgent) | Starting Datadog Agent v6.1.3
2018-04-17 17:51:21 UTC | INFO | (start.go:167 in StartAgent) | pid '117229' written to pid file '/opt/datadog-agent/run/agent.pid'
2018-04-17 17:51:21 UTC | INFO | (start.go:174 in StartAgent) | Hostname is: REDACTED
2018-04-17 17:51:22 UTC | INFO | (start.go:191 in StartAgent) | GUI server port -1 specified: not starting the GUI.
2018-04-17 17:51:22 UTC | INFO | (forwarder.go:235 in Start) | DefaultForwarder started (1 workers), sending to 1 endpoint(s): "https://6-1-
3-app.agent.datadoghq.com" (1 api key(s))
2018-04-17 17:51:22 UTC | ERROR | (integration_config.go:89 in buildLogSourcesFromDirectory) | While parsing config: yaml: unmarshal errors:
  line 1: cannot unmarshal !!seq into map[string]interface {}
2018-04-17 17:51:22 UTC | ERROR | (start.go:228 in StartAgent) | Could not start logs-agent: could not find any valid logs configuration

This file is missing logs: from the first line. Adding this in manually works.

Add Datadog user to Docker user group, if applicable

It seems like the Datadog agent process doesn't have enough permissions to inspect /var/run/docker.sock. We've had to add a manual step to add the Datadog agent process user to the Docker group in order to enable using the Docker checks.

Adding this capability to the role would prove useful to us, and perhaps others trying to get their checks working properly.

Is there support for using dogstatsd?

I don't see an option to add the these to the datadog.yml file if you are using dogstatsd:

use_dogstatsd: yes
dogstatsd_port: 8125

Or am I missing something?

Thanks,
Jon.

include is Deprecated in ansible 2.4

'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions.```

{} placed at end of /etc/datadog-agent/datadog.yaml

Using the latest version of the Ansible galaxy role from the Galaxy repo, I have the following issue.

Replication:

Set the API variable
Enabled dd agent 6 (Believe this same error may happen regardless of agent 6)
Run the role.

Result:

The service will fail to start and a {} is placed at the end of the datadog.yaml file.
Removing the {} causes the service to start properly.

Invalid yaml from `datadog_checks`

Trying to generate docker daemon config with:

...blah-blah-blah...
  vars:                                       
    datadog_checks:                           
      docker:                                 
        init_config:                          
        instances:                            
          - url: "unix://var/run/docker.sock"

and getting docker.yaml, which does not work:

init_config: null
instances:
-   url: unix://var/run/docker.sock

Likely related to this issue with jinja's to_nice_yaml

include has been deprecated

Using of include: statement is deprecated in Ansible 2.4.0:

[DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use 'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions. This feature will be removed in a future release. Deprecation warnings can
 be disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: include is kept for backwards compatibility but usage is discouraged. The module documentation details page may explain more about this rationale.. This feature will be removed in a future release. Deprecation
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.

Apt key has expired

$ apt-key list
/etc/apt/trusted.gpg
--------------------
...

pub   2048R/C7A7DA52 2011-03-16 [expired: 2018-03-04]
uid                  Datadog Packages <[email protected]>

As a result the Ensure ubuntu apt-key server is present Debian step is failing.

feature: allow adding tags

Issues with CentOS 5

We have a few legacy servers that are still running on CentOS 5. This brought to light a couple of issues:

It appears 5 does not support sha256

TASK [Datadog.datadog : Download new RPM key] **********************************
fatal: [west01]: FAILED! => {"changed": false, "failed": true, "msg": "Could not hash file '/tmp/DATADOG_RPM_KEY_E09422B3.public' with algorithm 'sha256'. Available algorithms: sha1, md5"}
	to retry, use: --limit @/home/holmser/code/systems/ansible-files/playbooks/ccs-servers.retry

CentOS 5 does not support TLS. SSLv3 is the latest.

httplib.py\", line 685, in _send_output\n    self.send(msg)\n  File \"/usr/lib64/python2.4/httplib.py\", line 652, in send\n    self.connect()\n  File \"/usr/lib64/python2.4/site-packages/M2Crypto/httpslib.py\", line 47, in connect\n    self.sock.connect((self.host, self.port))\n  File \"/usr/lib64/python2.4/site-packages/M2Crypto/SSL/Connection.py\", line 156, in connect\n    ret = self.connect_ssl()\n  File \"/usr/lib64/python2.4/site-packages/M2Crypto/SSL/Connection.py\", line 149, in connect_ssl\n    return m2.ssl_connect(self.ssl, self._timeout)\nM2Crypto.SSL.SSLError: sslv3 alert handshake failure\n", "rc": 1, "results": []}

[DEPRECATION WARNING]: Skipping task due to undefined attribute

With ansible 2.0 we get this warning while using the role:

TASK [Datadog.datadog : service] ***********************************************
ok: [atlas2.do.citrite.net] => {"changed": false, "enabled": true, "name": "datadog-agent", "state": "started"}

TASK [Datadog.datadog : service] ***********************************************
skipping: [atlas2.do.citrite.net] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

TASK [Datadog.datadog : Create a configuration file for each Datadog check] ****
[DEPRECATION WARNING]: Skipping task due to undefined attribute, in the future this will be a fatal error.. This feature will be removed in a future release. Deprecation warnings
can be disabled by setting deprecation_warnings=False in ansible.cfg.

psutil needed with omnibus install?

Add possibility to change YUM repo url

Can you provide some default variables for the yum repos?
Just like with apt. so that we can override these with URL and key from our local mirror.

These vars should be sufficient:

datadog_yum_url
datadog_yum_key
datadog_yum_enabled

[DEPRECATION WARNING]: ...datadog_checks.keys()

Hi guys,

Using Ansible 2.0.2.0 to deploy to Debian 8.4.
Getting this:

[DEPRECATION WARNING]: Using bare variables is deprecated. Update your playbooks so that the environment value uses the full variable syntax
('{{datadog_checks.keys()}}').
This feature will be removed in a future release. Deprecation warnings can be disabled by setting deprecation_warnings=False in
ansible.cfg.

Thank you!

/etc/datadog-agent/conf.d does not exist

I'm noticing that when doing a fresh install of the DataDog agent (v6 via the ansible-datadog v2.0.1 role), that we get an error due to a non-existent directory. It's easy to fix manually with a mkdir -p, but I wanted to surface the issue.

TASK [monitoring/Datadog.datadog : Create a configuration file for each Datadog check] 
*******************************
failed: [our.ansiblized.host] (item=postfix) => {
    "changed": false,
    "checksum": "fc33569545c6669cbea509271db144ec81ddc13f",
    "item": "postfix"
}
MSG:
Destination directory /etc/datadog-agent/conf.d does not exist

Upgrading ansible-datadog from 1.6.0 to 2.0.1 causes double APT repo entry

Upgrading ansible-datadog from 1.6.0 to 2.0.1 on Ubuntu 16.04 (and presumably all other apt-based distributions) is causing a double entry in /etc/apt/sources.list.d/apt_datadoghq_com.list (one with https and one http).

This results in apt-get update failing due to the duplicate entry.

Dependency issues on debian based deployment

I've noticed that my deployments were failing recently as what is presumably a release containing a security issue was removed from the Ubuntu repositories. #34 should fix.

feature: postgres integration

feature: support coreos

this would use containers, so would take out the need to install packages

get lsof installed and configured in sudoers file

lsof is needed in order to gather important metrics for datadog but is neither installed nor configured inside sudoers so dd-agent user can call it.

See DataDog/dd-agent#853

Permission denied on apt-get when using datadog role

I am seeing an issue with the datadog role where it is failing due to apt-get not being available. Below is the output from ansible:

TASK [Datadog.datadog : apt] ***************************************************
ok: [52.56.74.57]

TASK [Datadog.datadog : apt_key] ***********************************************
changed: [52.56.74.57]

TASK [Datadog.datadog : apt_key] ***********************************************
skipping: [52.56.74.57]

TASK [Datadog.datadog : apt_key] ***********************************************
changed: [52.56.74.57]

TASK [Datadog.datadog : apt_key] ***********************************************
skipping: [52.56.74.57]

TASK [Datadog.datadog : apt_repository] ****************************************
changed: [52.56.74.57]

TASK [Datadog.datadog : apt] ***************************************************
fatal: [52.56.74.57]: FAILED! => {"cache_update_time": 1482931915, "cache_updated": false, "changed": false, "failed": true, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"     install 'datadog-agent'' failed: E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)\nE: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?\n", "stderr": "E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)\nE: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?\n", "stdout": "", "stdout_lines": []}

It definitely has sudo permission as the same playbook performs other tasks.
Could it be because the previous apt_repository task hasn't completed?

I have just manually added a pause task (http://docs.ansible.com/ansible/pause_module.html), inbetween apt_repository and apt (for 30 seconds, just to test) and everything goes through correctly and installs. Looks like that was it!

For your reference I am running Ubuntu 16.04.1 LTS.

Hope that helps?

Check datadog agent `info` for warnings & errors, failing the play if any are found

As a user, I expect an attempt to configure the datadog-agent improperly would result in a play failure. Instead, it fails silently and I don't realize monitoring is broken until after the fact.

It seems this could be accomplished with something along these lines:

- name: Get datadog-agent status
  command: /etc/init.d/datadog-agent info
  register: agent_info

- fail: "Something seems wrong with your configuration, please check the docs"
  when: 'WARNING' in agent_info or 'ERROR' in agent-info

Checks YML files are not generated correctly

When using other checks than process, the http_check for example, the yaml file does not get generated correctly and this is ignored by the dd-agent.

Example

Source playbook yaml:

  - hosts: all
    roles:
      - { role: Datadog.datadog, become: yes }  # On Ansible &lt; 1.9, use `sudo: yes` instead of `become: yes`
    vars:
      datadog_api_key: "xxxxxxxxxxxxxxxxxxxxxxxxxxx"
      datadog_config:
        tags: "environment:production, moretags"
        log_level: INFO
      datadog_checks:
        process:
          init_config:
          instances:
            - name: ssh
              search_string: ['ssh', 'sshd' ]
        http:
           init_config:

           instances:
             - name: somewebsite Login Page
               url: https://someurl.com/dispatcher/login.jsp
               timeout: 5

               content_match: 'Login'
               collect_response_time: true
               skip_event: true

               tags:
                 - service:somewebsite
                 - url:someurl.com

Generated check yaml:

init_config: null
instances:
-   collect_response_time: true
    content_match: Login
    name: somewebsite Login Page
    skip_event: true
    tags:
    - service:somewebsite
    - url:someurl.com
    timeout: 5
    url: https://someurl.com/dispatcher/login.jsp

The keys are put in the destination file alphabetically in stead of in order they were specified. (- name: check_name is missing)

Why install python-psutil?

Why does your task use apt to explicitly install python-psutil? Is it required by datadog, or merely optional? Should it have been included as a dependency of the datadog-agent package?

Ansible dry-run failed on pkg-redhat.yml

"Import new RPM key" task was failed when I run "tasks/pkg-redhat.yml" with dry-run mode. Because dry-run mode had not made the fie "/tmp/DATADOG_RPM_KEY_E09422B3.public".
And, sometimes any files in /tmp directory may be deleted by setting.
So, I add "always_run: yes" to get_url task . (in 2.X must change to check_mode: no)

- name: Download new RPM key
  get_url:
    url: "http://yum.datadoghq.com/DATADOG_RPM_KEY_E09422B3.public"
    dest: /tmp/DATADOG_RPM_KEY_E09422B3.public
    sha256sum: 694a2ffecff85326cc08e5f1a619937999a5913171e42f166e13ec802c812085

- name: Import new RPM key
  rpm_key: key=/tmp/DATADOG_RPM_KEY_E09422B3.public state=present

Ability to set hostname?

Hello. I see here that one can set the hostname in the datadog config (https://help.datadoghq.com/hc/en-us/articles/203037179). Is this possible via this playbook?

the readme contains no information regarding galaxy-install method

README file contains no information regarding galaxy-install method

Please just update it in order to include

ansible-galaxy install Datadog.datadog

Check Mode ("Dry Run") fails in pkg-debian.yml because the deb repo hasn't been added to sources

Hi, thanks for creating the role! This is very straightforward to fix, but I'm not sure which approach you would prefer. I'm happy to open a PR, just let me know what route you would like to take.

Issue:

If you have never run pkg-debian.yml the system will not have the datadog deb repo added to sources. This causes this task to fail as the package cannot be found.

Version: 1.4.0
File: Datadog.datadog/tasks/pkg-debian.yml:40
Error:

TASK [Datadog.datadog : Ensure Datadog agent is installed] *********************
fatal: [10.1.2.220]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'dat
adog-agent' is available"}

Resolution:

Two ways to fix this.

Add when: not ansible_check_mode conditional to this step so that it is skipped when running ansible-playbook --check
Add check_mode: no to the prior tasks that set up the apt-key and source for the datadog deb repo which will force the key/deb repo to be added to sources on the system when running with --check

I usually go for option two on my systems because I don't care about a few keys being added even if not yet being used and would prefer to let the apt module be dry-run for testing. However, I suspect that with a public role such as this that people may want to avoid adding the key cruft so option one may be better.

Let me know, thanks!

Creation of munged rabbitmq.yaml files -- but it works fine for other integrations

When I use this role to configure agent integrations for Postgres, Mongo, and Docker it works just fine. With RabbitMQ the generated YAML file gets munged inexplicably.

I send code like this to the role to configure RabbitMQ:

  roles:
    - { role: Datadog.datadog, sudo: true }
    - datadog
  vars:
    datadog_api_key: "0000000000000"
    datadog_checks:
      rabbitmq:
        init_config:
        instances:
        - rabbitmq_api_url: https://hostname1.com:15672/api/
          rabbitmq_user: ********
          rabbitmq_pass: ********
          tags:
            - env:prod
            - prod_rmq4
            - prod_rabbitmq
          queues_regexes:
            - .*WebReportJobQueue.*
            - .*tx_history_queue.*
          vhosts:
            - prod
            - prodjob

And when I run it in a playbook, it generates an /etc/dd-agent/conf.d/rabbitmq.yaml like this:

init_config: null
instances:
-   queues_regexes:
    - .*WebReportJobQueue.*
    - .*tx_history_queue.*
    rabbitmq_api_url: https://hostname1.com:15672/api/
    rabbitmq_pass: ********
    rabbitmq_user: ********
    tags:
    - env:prod
    - prod_rmq4
    - prod_rabbitmq
    vhosts:
    - prod
    - prodjob

Is there a bug or am I just stupidly screwing up the config YAML?

Allow multiple config file per check

Right now a check will be configured in conf.d/<check>.d/conf.yaml. This behavior might be an issue when people are using autodiscovery. Customer might have multiple containers for one check needing different configuration.

Bottom line: this role should allow customer to overwrite the file name of the configuration.

Be able to unset "use_mount" from datadog.conf

Using use_mount variable in the main config is deprecated, but when using this module we have no way to remove it and we get this warning all the time when restarting the agent

dd.collector[5182]: WARNING (disk.py:84): Using use_mount in datadog.conf has been deprecated in favor of use_mount in disk.yaml

Would it be possible to remove it somehow? Thanks!

Allow pinning the version of the Agent

We need to allow it and recommend pinning the Agent.
We should not encourage unattended upgrades of the Agent.

ansible fails to install datadog on debian stretch

TASK [Datadog.datadog : Install ubuntu apt-key server] *************************************************
fatal: [manager0]: FAILED! => 
{
  "changed": false,
  "cmd": "/usr/bin/apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv A2923DFF56EDA6E76E55E492D3A80E30382E94DE",
  "failed": true,
  "msg": "Error fetching key A2923DFF56EDA6E76E55E492D3A80E30382E94DE from keyserver: hkp://keyserver.ubuntu.com:80",
  "rc": 2,
  "stderr": "Warning: apt-key output should not be parsed (stdout is not a terminal)\ngpg: failed to start the dirmngr '/usr/bin/dirmngr': No such file or directory\ngpg: connecting dirmngr at '/tmp/apt-key-gpghome.YVCEPPnGZk/S.dirmngr' failed: No such file or directory\ngpg: keyserver receive failed: No dirmngr\n",
  "stderr_lines": [
    "Warning: apt-key output should not be parsed (stdout is not a terminal)",
    "gpg: failed to start the dirmngr '/usr/bin/dirmngr': No such file or directory",
    "gpg: connecting dirmngr at '/tmp/apt-key-gpghome.YVCEPPnGZk/S.dirmngr' failed: No such file or directory",
    "gpg: keyserver receive failed: No dirmngr"
  ],
  "stdout": "Executing: /tmp/apt-key-gpghome.YVCEPPnGZk/gpg.1.sh --keyserver hkp://keyserver.ubuntu.com:80 --recv A2923DFF56EDA6E76E55E492D3A80E30382E94DE\n",
  "stdout_lines": [
    "Executing: /tmp/apt-key-gpghome.YVCEPPnGZk/gpg.1.sh --keyserver hkp://keyserver.ubuntu.com:80 --recv A2923DFF56EDA6E76E55E492D3A80E30382E94DE"
  ]
}

"failed determining service state, possible typo of service name?"

Im getting this error

TASK: [bakins.datadog | service name=datadog-agent state=started enabled=yes] *** 
failed: [web] => {"failed": true}
msg: failed determining service state, possible typo of service name?

Any idea why?... This is my playbook


---
- hosts: web
  sudo: yes

  roles:
    - bakins.datadog
  vars:    
    datadog_api_key: <datadog api key>

kafka.yml file not generated correctly

Copying and pasting the yaml from the kafka.yaml.example produces a yaml file that fans to work with the agent

This is the var I'm passing to the module

datadog_config:
      tags: "dcos, {{ ec2_tag_env }}, {{ ec2_tag_Name }}"
      log_level: WARN
    datadog_checks:
          docker_daemon:
              init_config:
              instances:
                  - url: "unix://var/run/docker.sock"
                    collect_images_stats: true
                    collect_image_size: true
                    container_tags: ["image_name", "image_tag", "docker_image"]
          mesos_slave:
              init_config:
                  default_timeout: 10
              instances:
                  - url: "http://{{ ec2_private_ip_address }}:5051"
          kafka:
              instances:
                  - tools_jar_path: /opt/mesosphere/active/java/usr/java/lib/tools.jar
                    process_name_regex: .*Kafka.*
                    java_bin_path: /opt/mesosphere/bin/java
              init_config:
                is_jmx: true

                conf:
                    - include:
                        domain: 'kafka.producer'
                        bean_regex: 'kafka\.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=.*'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.producer.request_rate
                    - include:
                        domain: 'kafka.producer'
                        bean_regex: 'kafka\.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=.*'
                        attribute:
                        Mean:
                            metric_type: gauge
                            alias: kafka.producer.request_latency_avg
                    - include:
                        domain: 'kafka.producer'
                        bean_regex: 'kafka\.producer:type=ProducerTopicMetrics,name=BytesPerSec,clientId=.*'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.producer.bytes_out
                    - include:
                        domain: 'kafka.producer'
                        bean_regex: 'kafka\.producer:type=ProducerTopicMetrics,name=MessagesPerSec,clientId=.*'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.producer.message_rate


                    - include:
                        domain: 'kafka.producer'
                        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
                        attribute:
                        response-rate:
                            metric_type: gauge
                            alias: kafka.producer.response_rate
                    - include:
                        domain: 'kafka.producer'
                        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
                        attribute:
                        request-rate:
                            metric_type: gauge
                            alias: kafka.producer.request_rate
                    - include:
                        domain: 'kafka.producer'
                        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
                        attribute:
                        request-latency-avi:
                            metric_type: gauge
                            alias: kafka.producer.request_latency_avg
                    - include:
                        domain: 'kafka.producer'
                        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
                        attribute:
                        outgoing-byte-rate:
                            metric_type: gauge
                            alias: kafka.producer.bytes_out
                    - include:
                        domain: 'kafka.producer'
                        bean_regex: 'kafka\.producer:type=producer-metrics,client-id=.*'
                        attribute:
                        io-wait-time-ns-avg:
                            metric_type: gauge
                            alias: kafka.producer.io_wait


                    - include:
                        domain: 'kafka.consumer'
                        bean_regex: 'kafka\.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=.*'
                        attribute:
                        Value:
                            metric_type: gauge
                            alias: kafka.consumer.max_lag
                    - include:
                        domain: 'kafka.consumer'
                        bean_regex: 'kafka\.consumer:type=ConsumerFetcherManager,name=MinFetchRate,clientId=.*'
                        attribute:
                        Value:
                            metric_type: gauge
                            alias: kafka.consumer.fetch_rate
                    - include:
                        domain: 'kafka.consumer'
                        bean_regex: 'kafka\.consumer:type=ConsumerTopicMetrics,name=BytesPerSec,clientId=.*'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.consumer.bytes_in
                    - include:
                        domain: 'kafka.consumer'
                        bean_regex: 'kafka\.consumer:type=ConsumerTopicMetrics,name=MessagesPerSec,clientId=.*'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.consumer.messages_in

                    - include:
                        domain: 'kafka.consumer'
                        bean_regex: 'kafka\.consumer:type=ZookeeperConsumerConnector,name=ZooKeeperCommitsPerSec,clientId=.*'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.consumer.zookeeper_commits
                    - include:
                        domain: 'kafka.consumer'
                        bean_regex: 'kafka\.consumer:type=ZookeeperConsumerConnector,name=KafkaCommitsPerSec,clientId=.*'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.consumer.kafka_commits

                    - include:
                        domain: 'kafka.consumer'
                        bean_regex: 'kafka\.consumer:type=consumer-fetch-manager-metrics,client-id=.*'
                        attribute:
                        bytes-consumed-rate:
                            metric_type: gauge
                            alias: kafka.consumer.bytes_in
                    - include:
                        domain: 'kafka.consumer'
                        bean_regex: 'kafka\.consumer:type=consumer-fetch-manager-metrics,client-id=.*'
                        attribute:
                        records-consumed-rate:
                            metric_type: gauge
                            alias: kafka.consumer.messages_in

                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.net.bytes_out.rate
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.net.bytes_in.rate
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.messages_in.rate
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.net.bytes_rejected.rate

                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.request.fetch.failed.rate
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.request.produce.failed.rate
                    - include:
                        domain: 'kafka.network'
                        bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.request.produce.rate
                    - include:
                        domain: 'kafka.network'
                        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce'
                        attribute:
                        Mean:
                            metric_type: gauge
                            alias: kafka.request.produce.time.avg
                        99thPercentile:
                            metric_type: gauge
                            alias: kafka.request.produce.time.99percentile
                    - include:
                        domain: 'kafka.network'
                        bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.request.fetch_consumer.rate
                    - include:
                        domain: 'kafka.network'
                        bean: 'kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.request.fetch_follower.rate
                    - include:
                        domain: 'kafka.network'
                        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer'
                        attribute:
                        Mean:
                            metric_type: gauge
                            alias: kafka.request.fetch_consumer.time.avg
                        99thPercentile:
                            metric_type: gauge
                            alias: kafka.request.fetch_consumer.time.99percentile
                    - include:
                        domain: 'kafka.network'
                        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower'
                        attribute:
                        Mean:
                            metric_type: gauge
                            alias: kafka.request.fetch_follower.time.avg
                        99thPercentile:
                            metric_type: gauge
                            alias: kafka.request.fetch_follower.time.99percentile
                    - include:
                        domain: 'kafka.network'
                        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata'
                        attribute:
                        Mean:
                            metric_type: gauge
                            alias: kafka.request.update_metadata.time.avg
                        99thPercentile:
                            metric_type: gauge
                            alias: kafka.request.update_metadata.time.99percentile
                    - include:
                        domain: 'kafka.network'
                        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Metadata'
                        attribute:
                        Mean:
                            metric_type: gauge
                            alias: kafka.request.metadata.time.avg
                        99thPercentile:
                            metric_type: gauge
                            alias: kafka.request.metadata.time.99percentile
                    - include:
                        domain: 'kafka.network'
                        bean: 'kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Offsets'
                        attribute:
                        Mean:
                            metric_type: gauge
                            alias: kafka.request.offsets.time.avg
                        99thPercentile:
                            metric_type: gauge
                            alias: kafka.request.offsets.time.99percentile
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.request.handler.avg.idle.pct.rate
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=ProducerRequestPurgatory,name=PurgatorySize'
                        attribute:
                        Value:
                            metric_type: gauge
                            alias: kafka.request.producer_request_purgatory.size
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=FetchRequestPurgatory,name=PurgatorySize'
                        attribute:
                        Value:
                            metric_type: gauge
                            alias: kafka.request.fetch_request_purgatory.size

                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions'
                        attribute:
                        Value:
                            metric_type: gauge
                            alias: kafka.replication.under_replicated_partitions
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=ReplicaManager,name=IsrShrinksPerSec'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.replication.isr_shrinks.rate
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=ReplicaManager,name=IsrExpandsPerSec'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.replication.isr_expands.rate
                    - include:
                        domain: 'kafka.controller'
                        bean: 'kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.replication.leader_elections.rate
                    - include:
                        domain: 'kafka.controller'
                        bean: 'kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.replication.unclean_leader_elections.rate
                    - include:
                        domain: 'kafka.controller'
                        bean: 'kafka.controller:type=KafkaController,name=OfflinePartitionsCount'
                        attribute:
                        Value:
                            metric_type: gauge
                            alias: kafka.replication.offline_partitions_count
                    - include:
                        domain: 'kafka.controller'
                        bean: 'kafka.controller:type=KafkaController,name=ActiveControllerCount'
                        attribute:
                        Value:
                            metric_type: gauge
                            alias: kafka.replication.active_controller_count
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=ReplicaManager,name=PartitionCount'
                        attribute:
                        Value:
                            metric_type: gauge
                            alias: kafka.replication.partition_count
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=ReplicaManager,name=LeaderCount'
                        attribute:
                        Value:
                            metric_type: gauge
                            alias: kafka.replication.leader_count
                    - include:
                        domain: 'kafka.server'
                        bean: 'kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica'
                        attribute:
                        Value:
                            metric_type: gauge
                            alias: kafka.replication.max_lag

                    - include:
                        domain: 'kafka.log'
                        bean: 'kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs'
                        attribute:
                        Count:
                            metric_type: rate
                            alias: kafka.log.flush_rate.rate

And this is what I'm getting out

init_config:
    conf:
    -   include:
            Count:
                alias: kafka.producer.request_rate
                metric_type: rate
            attribute: null
            bean_regex: kafka\.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=.*
            domain: kafka.producer
    -   include:
            Mean:
                alias: kafka.producer.request_latency_avg
                metric_type: gauge
            attribute: null
            bean_regex: kafka\.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=.*
            domain: kafka.producer
    -   include:
            Count:
                alias: kafka.producer.bytes_out
                metric_type: rate
            attribute: null
            bean_regex: kafka\.producer:type=ProducerTopicMetrics,name=BytesPerSec,clientId=.*
            domain: kafka.producer
    -   include:
            Count:
                alias: kafka.producer.message_rate
                metric_type: rate
            attribute: null
            bean_regex: kafka\.producer:type=ProducerTopicMetrics,name=MessagesPerSec,clientId=.*
            domain: kafka.producer
    -   include:
            attribute: null
            bean_regex: kafka\.producer:type=producer-metrics,client-id=.*
            domain: kafka.producer
            response-rate:
                alias: kafka.producer.response_rate
                metric_type: gauge
    -   include:
            attribute: null
            bean_regex: kafka\.producer:type=producer-metrics,client-id=.*
            domain: kafka.producer
            request-rate:
                alias: kafka.producer.request_rate
                metric_type: gauge
    -   include:
            attribute: null
            bean_regex: kafka\.producer:type=producer-metrics,client-id=.*
            domain: kafka.producer
            request-latency-avg:
                alias: kafka.producer.request_latency_avg
                metric_type: gauge
    -   include:
            attribute: null
            bean_regex: kafka\.producer:type=producer-metrics,client-id=.*
            domain: kafka.producer
            outgoing-byte-rate:
                alias: kafka.producer.bytes_out
                metric_type: gauge
    -   include:
            attribute: null
            bean_regex: kafka\.producer:type=producer-metrics,client-id=.*
            domain: kafka.producer
            io-wait-time-ns-avg:
                alias: kafka.producer.io_wait
                metric_type: gauge
    -   include:
            Value:
                alias: kafka.consumer.max_lag
                metric_type: gauge
            attribute: null
            bean_regex: kafka\.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=.*
            domain: kafka.consumer
    -   include:
            Value:
                alias: kafka.consumer.fetch_rate
                metric_type: gauge
            attribute: null
            bean_regex: kafka\.consumer:type=ConsumerFetcherManager,name=MinFetchRate,clientId=.*
            domain: kafka.consumer
    -   include:
            Count:
                alias: kafka.consumer.bytes_in
                metric_type: rate
            attribute: null
            bean_regex: kafka\.consumer:type=ConsumerTopicMetrics,name=BytesPerSec,clientId=.*
            domain: kafka.consumer
    -   include:
            Count:
                alias: kafka.consumer.messages_in
                metric_type: rate
            attribute: null
            bean_regex: kafka\.consumer:type=ConsumerTopicMetrics,name=MessagesPerSec,clientId=.*
            domain: kafka.consumer
    -   include:
            Count:
                alias: kafka.consumer.zookeeper_commits
                metric_type: rate
            attribute: null
            bean_regex: kafka\.consumer:type=ZookeeperConsumerConnector,name=ZooKeeperCommitsPerSec,clientId=.*
            domain: kafka.consumer
    -   include:
            Count:
                alias: kafka.consumer.kafka_commits
                metric_type: rate
            attribute: null
            bean_regex: kafka\.consumer:type=ZookeeperConsumerConnector,name=KafkaCommitsPerSec,clientId=.*
            domain: kafka.consumer
    -   include:
            attribute: null
            bean_regex: kafka\.consumer:type=consumer-fetch-manager-metrics,client-id=.*
            bytes-consumed-rate:
                alias: kafka.consumer.bytes_in
                metric_type: gauge
            domain: kafka.consumer
    -   include:
            attribute: null
            bean_regex: kafka\.consumer:type=consumer-fetch-manager-metrics,client-id=.*
            domain: kafka.consumer
            records-consumed-rate:
                alias: kafka.consumer.messages_in
                metric_type: gauge
    -   include:
            Count:
                alias: kafka.net.bytes_out.rate
                metric_type: rate
            attribute: null
            bean: kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
            domain: kafka.server
    -   include:
            Count:
                alias: kafka.net.bytes_in.rate
                metric_type: rate
            attribute: null
            bean: kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
            domain: kafka.server
    -   include:
            Count:
                alias: kafka.messages_in.rate
                metric_type: rate
            attribute: null
            bean: kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
            domain: kafka.server
    -   include:
            Count:
                alias: kafka.net.bytes_rejected.rate
                metric_type: rate
            attribute: null
            bean: kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec
            domain: kafka.server
    -   include:
            Count:
                alias: kafka.request.fetch.failed.rate
                metric_type: rate
            attribute: null
            bean: kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec
            domain: kafka.server
    -   include:
            Count:
                alias: kafka.request.produce.failed.rate
                metric_type: rate
            attribute: null
            bean: kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec
            domain: kafka.server
    -   include:
            Count:
                alias: kafka.request.produce.rate
                metric_type: rate
            attribute: null
            bean: kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce
            domain: kafka.network
    -   include:
            99thPercentile:
                alias: kafka.request.produce.time.99percentile
                metric_type: gauge
            Mean:
                alias: kafka.request.produce.time.avg
                metric_type: gauge
            attribute: null
            bean: kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce
            domain: kafka.network
    -   include:
            Count:
                alias: kafka.request.fetch_consumer.rate
                metric_type: rate
            attribute: null
            bean: kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer
            domain: kafka.network
    -   include:
            Count:
                alias: kafka.request.fetch_follower.rate
                metric_type: rate
            attribute: null
            bean: kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower
            domain: kafka.network
    -   include:
            99thPercentile:
                alias: kafka.request.fetch_consumer.time.99percentile
                metric_type: gauge
            Mean:
                alias: kafka.request.fetch_consumer.time.avg
                metric_type: gauge
            attribute: null
            bean: kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer
            domain: kafka.network
    -   include:
            99thPercentile:
                alias: kafka.request.fetch_follower.time.99percentile
                metric_type: gauge
            Mean:
                alias: kafka.request.fetch_follower.time.avg
                metric_type: gauge
            attribute: null
            bean: kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower
            domain: kafka.network
    -   include:
            99thPercentile:
                alias: kafka.request.update_metadata.time.99percentile
                metric_type: gauge
            Mean:
                alias: kafka.request.update_metadata.time.avg
                metric_type: gauge
            attribute: null
            bean: kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata
            domain: kafka.network
    -   include:
            99thPercentile:
                alias: kafka.request.metadata.time.99percentile
                metric_type: gauge
            Mean:
                alias: kafka.request.metadata.time.avg
                metric_type: gauge
            attribute: null
            bean: kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Metadata
            domain: kafka.network
    -   include:
            99thPercentile:
                alias: kafka.request.offsets.time.99percentile
                metric_type: gauge
            Mean:
                alias: kafka.request.offsets.time.avg
                metric_type: gauge
            attribute: null
            bean: kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Offsets
            domain: kafka.network
    -   include:
            Count:
                alias: kafka.request.handler.avg.idle.pct.rate
                metric_type: rate
            attribute: null
            bean: kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent
            domain: kafka.server
    -   include:
            Value:
                alias: kafka.request.producer_request_purgatory.size
                metric_type: gauge
            attribute: null
            bean: kafka.server:type=ProducerRequestPurgatory,name=PurgatorySize
            domain: kafka.server
    -   include:
            Value:
                alias: kafka.request.fetch_request_purgatory.size
                metric_type: gauge
            attribute: null
            bean: kafka.server:type=FetchRequestPurgatory,name=PurgatorySize
            domain: kafka.server
    -   include:
            Value:
                alias: kafka.replication.under_replicated_partitions
                metric_type: gauge
            attribute: null
            bean: kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
            domain: kafka.server
    -   include:
            Count:
                alias: kafka.replication.isr_shrinks.rate
                metric_type: rate
            attribute: null
            bean: kafka.server:type=ReplicaManager,name=IsrShrinksPerSec
            domain: kafka.server
    -   include:
            Count:
                alias: kafka.replication.isr_expands.rate
                metric_type: rate
            attribute: null
            bean: kafka.server:type=ReplicaManager,name=IsrExpandsPerSec
            domain: kafka.server
    -   include:
            Count:
                alias: kafka.replication.leader_elections.rate
                metric_type: rate
            attribute: null
            bean: kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs
            domain: kafka.controller
    -   include:
            Count:
                alias: kafka.replication.unclean_leader_elections.rate
                metric_type: rate
            attribute: null
            bean: kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec
            domain: kafka.controller
    -   include:
            Value:
                alias: kafka.replication.offline_partitions_count
                metric_type: gauge
            attribute: null
            bean: kafka.controller:type=KafkaController,name=OfflinePartitionsCount
            domain: kafka.controller
    -   include:
            Value:
                alias: kafka.replication.active_controller_count
                metric_type: gauge
            attribute: null
            bean: kafka.controller:type=KafkaController,name=ActiveControllerCount
            domain: kafka.controller
    -   include:
            Value:
                alias: kafka.replication.partition_count
                metric_type: gauge
            attribute: null
            bean: kafka.server:type=ReplicaManager,name=PartitionCount
            domain: kafka.server
    -   include:
            Value:
                alias: kafka.replication.leader_count
                metric_type: gauge
            attribute: null
            bean: kafka.server:type=ReplicaManager,name=LeaderCount
            domain: kafka.server
    -   include:
            Value:
                alias: kafka.replication.max_lag
                metric_type: gauge
            attribute: null
            bean: kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica
            domain: kafka.server
    -   include:
            Count:
                alias: kafka.log.flush_rate.rate
                metric_type: rate
            attribute: null
            bean: kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
            domain: kafka.log
    is_jmx: true
instances:
-   java_bin_path: /opt/mesosphere/bin/java
    process_name_regex: .*Kafka.*
    tools_jar_path: /opt/mesosphere/active/java/usr/java/lib/tools.jar

And I'm getting these errors in the logs

2016-07-12 00:20:14,049 | ERROR| Instance | Error while trying to match attributeInfo configuration with the Attribute: kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica : javax.management.MBeanAttributeInfo[description=Attribute exposed for management, name=Value, type=java.lang.Object, read-only, descriptor={}]
java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to java.lang.String
        at org.datadog.jmxfetch.Filter$1.<init>(Filter.java:58)
        at org.datadog.jmxfetch.Filter.toStringArrayList(Filter.java:57)
        at org.datadog.jmxfetch.Filter.getParameterValues(Filter.java:140)
        at org.datadog.jmxfetch.JMXAttribute.matchBeanName(JMXAttribute.java:281)
        at org.datadog.jmxfetch.JMXAttribute.matchBean(JMXAttribute.java:318)
        at org.datadog.jmxfetch.JMXSimpleAttribute.match(JMXSimpleAttribute.java:42)
        at org.datadog.jmxfetch.Instance.getMatchingAttributes(Instance.java:254)
        at org.datadog.jmxfetch.Instance.init(Instance.java:133)
        at org.datadog.jmxfetch.App.init(App.java:354)
        at org.datadog.jmxfetch.App.main(App.java:92)

Running jmxfetch.py manually I get

 /opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/jmxfetch.py
Traceback (most recent call last):
  File "/opt/datadog-agent/agent/jmxfetch.py", line 7, in <module>
    from config import initialize_logging  # noqa
  File "/opt/datadog-agent/agent/config.py", line 26, in <module>
    from util import check_yaml, get_os
  File "/opt/datadog-agent/agent/util.py", line 22, in <module>
    import yaml  # noqa, let's guess, probably imported somewhere
  File "/opt/mesosphere/lib/python3.4/site-packages/yaml/__init__.py", line 284
    class YAMLObject(metaclass=YAMLObjectMetaclass):
                              ^
SyntaxError: invalid syntax

PHP-FPM check being created as php-fpm.yaml and so not being picked up

Hello!

I have a php-fpm check which has been added as below:

datadog_checks:
      php-fpm:
        init_config:
        instances:
          - status_url: http://127.0.0.1/php-status
            ping_url: http://127.0.0.1/ping
            ping_reply: pong

The weird thing was that the check wasn't being picked up by datadog. I had a quick look in the agent conf.d folder and noticed that there were two php-fpm related files, one called php_fpm.yaml.example and one called php-fpm.yaml (notice the hyphen and not an underscore).
I ran the info command on the agent and noticed that the php-fpm check wasn't being picked up.

I noticed that your integration docs say that the file should be php_fpm.yaml and so renamed php-fpm.yaml to php_fpm.yaml, restarted the agent and all works fine!

Could it be something to do with how the to_nice_yaml module works as per here, when it comes to checks with hyphens in them? https://github.com/DataDog/ansible-datadog/blob/master/templates/checks.yaml.j2

I can of course work round this but this involves having an extra task in ansible, specifically to rename the file.

Running Ansible 2.2.0.0

Hostname not complying with RFC 1123 should raise an error

I struggled to understand why I did not have data on my datadog dashboard. The issue was that my hostname was not complying with RFC 1123. I suggest that the playbook fails in this case. I guess that dd-agent should also not be silent about this error (this is only a warning on version 1:5.12.3-1 dd.collector[6935]: WARNING (hostname.py:35): Hostname: xxx is not complying with RFC 1123).

Datadog checks enabled from the example playbook in this repo create yaml file in /etc/datadog-agent/conf.d/<whatever-name>.yaml. Is that the standard behavior?

I am asking because my assumption is that the checks should reside in /etc/datadog-agent/conf.d/<relevant-directory>.d/conf.yaml file.
For example, with this variables file, a new yaml file called process.yaml is created in /etc/datadog-agent/conf.d directory.

# cat main.yml 
---

# The following are the site wide default variables
# for datadog

datadog_enabled: yes
datadog_agent_version: 6.1.0
datadog_api_key: xxxxxxx

datadog_checks:
  process:
    init_config:
    instances:
     - name: ssh
       search_string: ['ssh', 'sshd']

What I thought about best practice is that maybe we should insert our checks in /etc/datadog-agent/conf.d/process.d/conf.yaml. Is that not so?

to_nice_yaml outputting incorrectly formatted .yaml files

When following your example playbook I get an incorrectly formatted process.yaml file:

init_config: null
instances:
-   name: ssh
    search_string:
    - ssh
    - sshd
-   cpu_check_interval: 0.2
    exact_match: true
    ignore_denied_access: true
    name: syslog
    search_string:
    - rsyslog

See http://codebeautify.org/yaml-validator for validation. This results in datadog not collecting these statistics.

Adding config files to conf.d

Is there an easy way to add/enable specific yaml configs in conf.d. For example, I want enabled haproxy, I would usually just copy haproxy.yaml.example to haproxy.yaml

datadog / ansible-datadog Goto Github PK

ansible-datadog's Introduction

Datadog Agent Ansible Role

Ansible role versus Ansible collection

Setup

Requirements

Installation

Role variables

Integrations

Process check

Custom check

Custom Python Checks

Autodiscovery

Tracing

Live processes

Variables

System probe

Example configuration

Agent v5

Versions

Repositories

Linux

Windows

macOS

Upgrade

Integration installation

Third party integrations

Syntax

Example

Downgrade

Playbooks

Agent v6

Configuring the site

Windows

Inventory file

Playbook file

Uninstallation

Troubleshooting

Debian stretch

CentOS 6/7 with Python 3 interpreter and Ansible 2.10.x or below

Windows

Ubuntu 20.04 broken by service_facts

Missing API key

ansible-datadog's People

Contributors

Stargazers

Watchers

Forkers

ansible-datadog's Issues

Steps to reproduce

Expected result

Actual result

Example

Issue:

Resolution:

Recommend Projects

Recommend Topics

Recommend Org