Giter Club home page Giter Club logo

rackspace-monitoring-agent's Introduction

Rackspace Monitoring Agent

Throughput Graph

Build status Build Status Stories in Ready

The monitoring agent is the first agent to use the infrastructure provided by virgo-base-agent

Installing The Agent

Make sure you have the required packages to build things on your system. The Dockerfile will contain the development dependencies.

Please note, we provide binaries for many platforms. Check out the article for Installing the Monitoring Agent for instructions.

Otherwise, continue reading this section.

Satisfy pre-requisites:

If you're on windows you may have to either install or find and add certain utilities to your path beforehand. These are:

  • cmake - Downloadable from cmake gnu site
  • nmake - Included in Visual studio/VC/bin but may need to be inserted into your path
  • signtool - Included in Microsoft SDKs/windows/v7.1a/bin but may need to be inserted into your path

On Linux from a fresh install:

apt-get install make cmake

Get the source:

git clone https://github.com/virgo-agent-toolkit/rackspace-monitoring-agent

Go into the directory that you just created:

cd rackspace-monitoring-agent

Build:

make

Now simply install the virgo client by running this last and final command:

make install

After installing on unix systems, there is a new binary called rackspace-monitoring-agent. To get the client running on your system please follow the documented setup procedure.

Host Info Runner

The agent has a built in host information runner (similar to OHAI).

rackspace-monitoring-agent -e hostinfo_runner -x [type]

Further documentation for the host informations can be found in the hostinfo readme

License

The Monitoring Agent is distributed under the Apache License 2.0.

Building for Rackspace Cloud Monitoring

Rackspace customers: Virgo is the open source project for the Rackspace Cloud Monitoring agent. Feel free to build your own copy from this source.

But! Please don't contact Rackspace Support about issues you encounter with your custom build.

Versioning

The agent is versioned with a three digit dot seperated "semantic version" with the template being x.y.z. An example being e.g. 1.4.2. The rough meaning of each of these parts are:

  • major version numbers will change when we make a backwards incompatible change to the bundle format. Binaries can only run bundles with identical major version numbers. e.g. a binary of version 2.3.1 can only run bundles starting with 2.

  • minor version numbers will change when we make backwards compatible changes to the bundle format. Binaries can only run bundles with minor versions that are greater than or equal to the bundle version. e.g. a binary of version 2.3.1 can run a 2.3.4 bundle but not a 2.2.1 bundle.

  • patch version numbers will change everytime a new bundle is released. It has no semantic meaning to the versioning.

Running tests

Virgo supplies infrastructure for running tests. Calling make test will launch Virgo with command line flags set to feed it the testing bundle and with the -e flag set to tests.

make test

You can also run an individual test module:

TEST_MODULE=net make test

Running tests on docker

This only needs to be done once per terminal session:

docker-machine create agent
eval $(docker-machine env agent)

Use docker-compose to build and run the tests:

docker-compose run build make clean
docker-compose run build make
docker-compose run build test

Configuration File Parameters

monitoring_token [token]         - (required) The authentication token.
monitoring_id [agent_id]         - (optional) The Agent's monitoring_id
                                   (default: Instance ID (Xen) or Cloud-Init ID)
monitoring_snet_region [region]  - (optional) Enable Service Net endpoints
                                   (region: dfw, ord, lon, syd, hkg, iad)
monitoring_endpoints             - (optional) Force IP and Port, comma
                                   delimited
monitoring_proxy_url [url]       - (optional) Use a HTTP Proxy
                                   Must support CONNECT on port 443.
                                   Additionally, HTTP_PROXY and HTTPS_PROXY
                                   are honored.
monitoring_query_endpoints [queries] - (optional) SRV queries comma
                                        delimited

Exit Codes

1 unknown error
2 config file fail
3 already running

Signals

SIGUSR1: Force GC
SIGUSR2: Toggle Debug

rackspace-monitoring-agent's People

Contributors

adityacb15 avatar bravelittlescientist avatar cfarquhar avatar cloudnull avatar creationix avatar danhickox avatar fourk avatar gdusbabek avatar harageth avatar hub-cap avatar inflatador avatar itzg avatar jirwin avatar jjbuchan avatar jordane avatar kami avatar kans avatar kaustavha avatar mkandrashoff avatar philips avatar pquerna avatar richarxt avatar rjemanuele avatar robert-chiniquy avatar rphillips avatar russellhaering avatar songgao avatar wirehead avatar ynachiket avatar zzantozz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rackspace-monitoring-agent's Issues

Debian Sqeeze Agent Won't Run

I'd love some instructions that work for Debian squeeze. I know people say it can work, but I can't get it to work.

This is what I did:

echo "deb http://stable.packages.cloudmonitoring.rackspace.com/debian-squeeze-x86_64 cloudmonitoring main" > /etc/apt/sources.list.d/rackspace-monitoring-agent.list

curl https://monitoring.api.rackspacecloud.com/pki/agent/linux.asc | sudo apt-key add -

sudo apt-get update

sudo apt-get install rackspace-monitoring-agent

I get the following error:

insserv: warning: script 'K01nova-agent' missing LSB tags and overrides
insserv: script mongoshard: service mongoshard already provided!
insserv: warning: script 'nova-agent' missing LSB tags and overrides
insserv: warning: script 'mongos' missing LSB tags and overrides
Stopping rackspace-monitoring-agent: rackspace-monitoring-agent failed!

A little help?

Document: Server-side configuration

Create documentation on the READ.ME for server-side configuration.

Explain features, attributes, parameters, and requirements for server-side configuration.

Documentation for agent project

Goal: Create a website for Virgo Agent Project to explain what this project is about, how it can be valuable for other agent writers, and how to contribute to the project.

The target audience for document should be the developers who wish to learn about the project, use it to build other agents, or contribute to the project. This would not include the documentation on how the features are surfaced through any particular product, like Rackspace Cloud Monitoring. Those documentation should be managed by the vendors themselves.

Repo for doc: https://github.com/virgo-agent-toolkit/docs

Suggested Layout:
A. Documentation

  1. Introduction (getting started)
  2. Advanced topics (Troubleshooting and debugging)
  3. Automation (how to use automation with the agent)

B. Version Information (see current and past version info)

  1. Release notes
    C. Resources
  2. Link to List of Agent Check Types on Rackspace Documentation - documentation SPECIFIC to the project and its developers here.
    * API docs(arguments, return values) of functions that lives in the code base, including checks.
    * Differences or 'features' that the rackspace-monitoring-agent adds to virgo
    * Common pitfalls/gotchas while developing a new check type
    * Generic troubleshooting for connecting and using versions of rackspace-monitoring-agent that
  3. List of Agent Plug-ins (not supported)
  4. Link to other agent related features, like the capability to pick up configuration files and push to the server
    D. Get Involved
  5. How to Add example configs and YAML files
  6. How to Improve Documentation
  7. How to Add a plug-in

E. Developer Resources

  1. Architecture Diagram
  2. Plug-in Architecture
  3. Coding style
  4. About Luvit (likely links to other resources)
  5. About Lua (likely links to other resources)

Inspiration: https://collectd.org/wiki/index.php/Main_Page

Agent checks should all support a common method for returning their metadata

Currently agent checks don't make it easy to get the metrics out with type and unit information. This metadata is stored in various ways and there's no way to programmatically extract it from all checks afaik.
https://github.com/racker/virgo/blob/master/check/apache.lua#L97
https://github.com/racker/virgo/blob/master/check/disk.lua#L59
It would be cool if all checks supported a method to return the metric metadata in a consistent format.

Windows Agent Configuration GUI

Create a GUI for Windows to perform the same task as the --setup option when running the agent from the command-line.

Ideally this can be accessed from a Start Menu item if reconfiguration is needed and will be run during the MSI installer.

Fallback to ServiceNet for Rackspace Hosts

In order to ease configuration and help customers who do not have access to the public Internet, we could fall back to the service net endpoints for API and AEP.

Initial thoughts are to use the xenstore vm-data to determine if the agent is on a Rax cloud host.

Refactoring lua files

TL;DR: division in virgo has some design issues to address. I'm gonna proceed with a little hacky way for now, but we need to think about the design for long-term run.

Currently all lua scripts, monitoring-related or not, are in virgo instead virgo-base. This is fine for people who just wanna deploy their own monitoring system, but not good for people who wanna develop their own agent based on virgo-base, e.g., a cloudkeep(https://github.com/cloudkeep) agent that uses FUSE to provide keys to applications. The lua files in virgo need to be divided into monitoring-related part (which stays in virgo) and general part (which should be moved into virgo-base.

Here's a diagram I drew while reading:
old

Red nodes are the ones with monitoring logic. Some of them are easy to deal with, like /check/*.lua, /schedule/scheduler.lua, since they are independent modules that can be simply isolated from the core part. The challenging part is the MonitoringAgent -> ConnectionStream -> AgentClient -> AgentProtocolConnection path. Each of them more of less has something to do with monitoring.

@robert-chiniquy and I thought about providing a base type for each of them, which has core logic such as handshake.hello (authentication) and heartbeat, and use dependency injection to determine what type to use. When users need to extend something, they simply sub-class the base type and inject the type to proper place.

However, the idea above has issues since there are too many levels. Suppose we need to add a new RPC call, helloworld, which uses a new message type helloworld.param. The message type is not a big issue. But we also need to add the handler for helloworld RPC method, which is defined in /protocol/connection.lua. However, it's in the bottom level. In order to support that, the type needs to be specified somehow from top down.

For now, so we are gonna add an argument to constructors, which has types that needs to be overwritten. Then when instantiating objects, if a type is defined for that object, use that type; other wise, use the default one. But this is not a good design in long-term. We probably still need to refactor those scripts in the future for a cleaner design.

Uploading crash dump fails with "socket hang up" or "ECANCELED" error

This is a second issue from #489.

I probably wasn't obvious and clear enough in my original pull request description, but there are actually two issues in play here. First one is logging issue which has been fixed in #489 and the second one is the actual crash dump upload issue which I haven't been able to track down yet.

Every time I restart the agent it tries to upload some crash dump files, but it either fails with socket hung up or ECANCELED error.

Second (ECANCELED) error seems to be related to the retried request.

Mon Jan  6 18:42:05 2014 WRN: POST to nil:nil failed for /var/lib/rackspace-monitoring-agent/rackspace-monitoring-agent-crash-report-ponies.dmp with status: ? and error: socket hang up.
Mon Jan  6 18:42:05 2014 DBG: retrying download 1 more times.
Mon Jan  6 18:42:05 2014 WRN: POST to nil:nil failed for /var/lib/rackspace-monitoring-agent/rackspace-monitoring-agent-crash-report-ponies.dmp with status: ? and error: ECANCELED, operation canceled.

If I manually try to send a crash dump from that server using cURL it works just fine.

Could it be some weird socket pooling / re-use issue going on in Luvit since it posts crash dump to the same IP address which is also used to talk to the agent endpoint?

Edit: Here is a screenshot from a tcp dump capture. All those RST's at the end look kinda weird... Might be related?

selection_002

Move the host info logic into a subprocess

When constructing these objects, sigar is called to populate the object. This should actually be populated within the run function to allow everything to be async. A better fix would be to run these in a subprocess.

Simplify Handshake Message

In #636 and https://github.com/racker/ele/pull/2630 we've been talking of expanding the handshake message to better accommodate the new and growing feature list field. The suggestions are tending toward simplify the handshake and breakout the features.

The current handshake (handshake.hello), https://github.com/virgo-agent-toolkit/virgo-base-agent/blob/master/libs/connection.lua#L262, contains the auth token, monitoring id, agent (application) name, versions, and the feature set.

We could reduce the handshake down (again) to more of a plain authenticate if we move the monitoring id and the features into a secondary message (something akin to handshake.identify). This way the hello authenticates and versions the connection while the identify binds and sets up features for this agent.

Remove Virgo.config

Move the config logic into lua land. Right now it's in C and exported as virgo.config. This does not fit in the newer model.

The task is to read config files in lua and available within virgo-base.

Improve the prompt about entity creation

Now that Cloud Intelligence is fully functional, should we change the following prompt (the line after "Select Option") in agent setup to mention it?

In order to execute checks, the agent must be associated with a Cloud Monitoring Entity.

Please select the Entity that corresponds to this server:
  1. Create a new Entity for this server (not supported by Rackspace Cloud Control Panel)
  2. Do not associate with an Entity

Select Option (e.g., 1, 2): 1
Creating an entity does not work with the Rackspace Cloud Control Panel. Really create an entity? (yes/no) yes

ReadMe update

This is for the README update. The website update is discussed here: #551

The target audience for document should be the developers who wish to learn about the project, use it to build other agents, or contribute to the project. This would not include the documentation on how the features are surfaced through any particular product, like Rackspace Cloud Monitoring. Those documentation should be managed by the vendors themselves.

Topics to address:
A. What is (the agent)
B. Key Concepts
C. How to Install and Configure
D. How to use it (the agent)
E. Supported OSes
F. Sub-section of specific topics link to:

  1. Check Types and their metrics
  2. Plug-ins
    G. Debugging and Troubleshooting
    H. Tests
    I. Contacts (e.g. IRC and mailing list)

Inspiration:
https://github.com/etsy/statsd

Tests related files should not be bundled into deployment

Currently both rackspace-monitoring-agent and make test target uses a same bundle (rackspace-monitoring-agent-bundle.zip) built from make_bundle target. It would be nice to have a test bundle target just for testing, which contains tests related stuff, and a release bundle that only have lua scripts required to run. This way we can keep the released bundle clean/minimal.

remove all.gyp?

When I was playing with ninja it was complaining that "all" was an ambiguous target. Paul mentioned virgo has a all.gyp which is a bit weird anyways. Perhaps we are doing something wrong with gyp?

Add the ability to use HostInfo data as a Check

There is a wide variety of data available using the HostInfo system within the agent. People have expressed interest in using HostInfo data as check data.

Pros:

  • HostInfo data can then be treated as metrics and stored long-term
  • Changes to HostInfo has the potential to be alarmed on

Cons:

  • Some HostInfo data does not map well to a check format

--prefix is completely wrong

Using

./configure --prefix=/usr

causes the make install to fail. Using a normal

make install

with no DESTDIR set works just fine, but for some reason the Makefile wants to put the prefix before the DESTDIR, so running

make DESTDIR=/home/wgiokas/pkg/virgo-git install

when the prefix is set to /usr causes it to try

install -d /usr//home/wgiokas/pkg/virgo-git//usr/bin

Also, when the prefix is set to / and I run the same make install command as before, it still tries to install to

///home/wgiokas/pkg/virgo-git//usr/bin

To conclude (tl;dr):

  1. what the help says (it defaults to a prefix of /usr, not /usr/local)
  2. the prefix should be after the DESTDIR
  3. the prefix should overwrite whatever is setting /usr

Thank you,
kaictl

Add custom plugins to targets API.

The agent should be able to report which custom plugin files are available for it to run. This would be a list of executable files in /usr/lib/rackspace-monitoring-agent/plugins

Create Apache VHost Check

The Apache check does not display the status of the virtual hosts. http://blog.e-shell.org/132
An alternative would be to enhance the apache check to have the virtual hosts status. The risk is the variation of the number of metrics, and how to map the host name as part of the metrics locator.

Limit a maximum number of established connections per endpoint

I've encountered condition today when agent had more then 500 established connections to the London endpoint.

We should put some kind of sanity check to the agent to make sure there at most 1 established connection per endpoint.

On a side note, we should also put limit inside the endpoint.

[FREEBSD] Build fails because of GCC dependency

FreeBSD 10 release has ditched GCC (completely - it's not even in the base system anymore) in favour of LLVM/CLANG. Because of that build on this target OS fails with this error (after some time build was running):

--- CUT ---
COPY /opt/virgo-0.1.9/out/Debug/jit
ACTION _opt_virgo_0_1_9_base_bundle_gyp_bundle_h_target_bundle /opt/virgo-0.1.9/out/Debug/obj/gen/bundle.h
TOUCH /opt/virgo-0.1.9/out/Debug/obj.target/base/bundle.zip.embed.stamp
TOUCH /opt/virgo-0.1.9/out/Debug/obj.target/base/bundle.zip.stamp
AR(target) /opt/virgo-0.1.9/out/Debug/obj.target/base/deps/libarchive.a
AR(target) /opt/virgo-0.1.9/out/Debug/obj.target/base/deps/liblua_sigar.a
LINK(target) /opt/virgo-0.1.9/out/Debug/minilua
lockf: g++: No such file or directory
gmake[1]: *** [/opt/virgo-0.1.9/out/Debug/minilua] Error 1
gmake[1]: *** Waiting for unfinished jobs....
gmake[1]: Leaving directory `/opt/virgo-0.1.9/out'
gmake: *** [all] Error 2
--- CUT ---

Installing GCC is not a proper solution. There is a reason why FreeBSD 10 release ditched GCC and there is a reason why you get whole pile of warnings while trying to install GCC on this distro from ports.

reload lua/luvit state when poked

Add code to load a new monitoring.zip file and load it while running.

Config option to exit instead of loading new monitoring.zip

Add process check

We should add a native agent check that inspects resource use by a program (1 or more processes).
It should:

  1. Take a regex (or similar pattern) that can be used to match a command as a parameter to the check
  2. The check should find all processes matching the pattern supplied, and aggregate metrics across them (ie, we don't report metrics for each individual process).
    In terms of metrics, we should ideally report at least CPU and memory use, as well as how many processes were aggregated.
    Lets try to make this work on Windows too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.