Made with Material for MkDocs
stefanprodan / dockprom Goto Github PK
View Code? Open in Web Editor NEWDocker hosts and containers monitoring with Prometheus, Grafana, cAdvisor, NodeExporter and AlertManager
License: MIT License
Docker hosts and containers monitoring with Prometheus, Grafana, cAdvisor, NodeExporter and AlertManager
License: MIT License
Values on some graphs keep changing inconsistently and most of the time they are just N/A or 0.
Executing the expressions behind these graphs on the prometheus dashboard is inline with what the dashboard is showing so this could be a prometheus problem.
Here, container memory usage graph has some broken points. But I think it shouldn't have.
Memory load seems fine though.
On my observation the three somewhat broken graphs have one thing in common, they use container_memory_usage_bytes{image!=""}
.
Hoping for someone to confirm that this does not only happen to me.
Hi,
We would like to use Dockprom for our company internal needs but the Caddy licensing is a no-go.
Is there a way to pull off Caddy from Dockprom ?
Strong authentication is not a need for us because we will only use it inside private networks.
Thanks
I upgraded to Docker 18.01-ce today and it appears that I do not get any container statistics showing up from the point where I restarted the dockprom containers.
I have attempted to recreate these containers, with no luck. Everything starts up correctly, cAdvisor and the other containers do not seem to throw any errors alluding to a specific problem.
If I downgrade to Docker 17.11 this seems to work (I haven't tried 17.12, though can if required).
I am also using a zfs dataset so I had to ensure to include the following in the docker-compose.yml:
devices:
- /dev/zfs:/dev/zfs
This was to prevent zfs errors cAdvisor was spitting out on launch (both for version 17.11 and 18.01)
# docker info
Containers: 38
Running: 38
Paused: 0
Stopped: 0
Images: 41
Server Version: dev
Storage Driver: zfs
Zpool: nerv
Zpool Health: ONLINE
Parent Dataset: nerv/ROOT/void
Space Used By Parent: 10682224640
Space Available: 471921729536
Parent Quota: no
Compression: on
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.12_4
Operating System: void
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 31.4GiB
Name: nerv
ID: 2J4W:CXSO:LGMT:S5YB:FZQ7:UMO6:JGPB:G2YF:IWZF:C4EO:A2SF:BV5L
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
# docker version
Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:15:15 2017
OS/Arch: linux/amd64
Server:
Version: dev
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: v18.01.0-ce
Built: Tue Nov 28 17:25:15 2017
OS/Arch: linux/amd64
Experimental: false
https://github.com/stefanprodan/dockprom/blob/master/docker-compose.yml#L15
I had to add
prometheus:
user: root
privileged: true
to make it work, maybe someone else needs this information
I am a bit confused about the different default dashboards. Does "Docker Host" live show stats for the actual machine the container is running on? Same for Prometheus? And do "Docker Containers" just show stats for all docker processes running on the system?
The RC1 of schema.org does not mention this label. Any reason/background to have it in the 1.3 docker-compose.yml?
RC1 version at: http://label-schema.org/rc1/
Hi!
Very nice work! Can you suggest how to use grafana templates with multiple hosts?
Thx
Hello,
My Host: Ubuntu Mini 16.04 LTS x64
The Network Usage graph of Host in grafana does not update.
Tested by downloading a 1 GB file using wget in host.
I thought of changing the ubuntu network interface naming from enp0s3 to eth0 on the host would work, but it didn't. Any idea how to troubleshoot?
Thanks
When I try to monitor an application, for example Redis, I'm having issues.
My config:
prometheus:
image: stefanprodan/swarmprom-prometheus
environment:
- JOBS=redis-exporter:9121
version: '3'
networks:
mon_net:
external: true
services:
redis:
image: redis
networks:
- mon_net
ports:
- "6379:6379"
deploy:
mode: global
redis-exporter:
image: oliver006/redis_exporter
networks:
- mon_net
ports:
- "9121:9121"
deploy:
mode: global
When I run the monitoring stack and then compose-redis:
Prometheus goes up and down all the time.
Log shows:
level=error ts=2018-02-19T16:49:15.594740858Z caller=main.go:582 err="Error loading config couldn't load configuration (--config.file=/etc/prometheus/prometheus.yml): parsing YAML file /etc/prometheus/prometheus.yml: unknown fields in alertmanager config: job_name"
I have no idea how to fix this or what I did wrong.
Any help would be appreciated.
Thanks
Shouldn't this be sum(machine_cpu_cores)
? Otherwise I'm seeing a Multiple Series Error
when monitoring across multiple hosts. Believe several other metrics could use this change as well.
I'm trying to get the prometheus configured here to scrape the metrics endpoints of the containers, not just the stats coming from cadvisor. Maybe finding them via a label?
Just wondering if you've seen or done something similar.
Cheers,
E.
Is there a Dockerfile for this repo or is it missing?
For Used Storage under Docker Containers dashboard I get
Which is clearly not true because I only have a 500GB drive
Then for Free Storage under Docker Host dashboard, I am not sure what my fstype
is, querying node_filesystem_free
on prometheus gave a lot of output. So I tried
aufs
which gave
and ext4
which gave
which is still incorrect because df -h
shows that I have 68G free.
Hi,
On dashboard docker containers, the storage load does not work and I have those errors:
Error: Multiple Series Error
at e.setValues (http://localhost:3000/public/build/0.be20b78823b4c9d93a84.js:7:277367)
at e.onDataReceived (http://localhost:3000/public/build/0.be20b78823b4c9d93a84.js:7:274881)
at o.emit (http://localhost:3000/public/build/vendor.2305a8e1d478628b1297.js:15:520749)
at t.emit (http://localhost:3000/public/build/app.5331f559bd9a1bed9a93.js:1:29217)
at e.handleQueryResult (http://localhost:3000/public/build/0.be20b78823b4c9d93a84.js:7:19860)
Container Memory usage, Sample Ingested 5M rate and container cached memory usage do not show anything.
I am using Debian OS like the host for Docker containers.
Thank you,
Ionut
Hi
After issuing docker-compose up -d
on a freshly cloned repo of dockprom with $DOCKER_HOST set to a new Debian install running a few containers, I see prometheus and alertmanager are failing to start with similar errors:
time="2017-04-03T16:03:43Z" level=info msg="Starting prometheus (version=1.5.2, branch=master, revision=bd1182d29f462c39544f94cc822830e1c64cf55b)" source="main.go:75"
time="2017-04-03T16:03:43Z" level=info msg="Build context (go=go1.7.5, user=root@1a01c5f68840, date=20170210-16:23:28)" source="main.go:76"
time="2017-04-03T16:03:43Z" level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248"
time="2017-04-03T16:03:43Z" level=error msg="Error loading config: couldn't load configuration (-config.file=/etc/prometheus/prometheus.yml): open /etc/prometheus/prometheus.yml: no such file or directory" source="main.go:150"
And also:
time="2017-04-03T16:20:09Z" level=info msg="Starting alertmanager (version=0.5.1, branch=master, revision=0ea1cac51e6a620ec09d053f0484b97932b5c902)" source="main.go:101"
time="2017-04-03T16:20:09Z" level=info msg="Build context (go=go1.7.3, user=root@fb407787b8bf, date=20161125-08:14:40)" source="main.go:102"
time="2017-04-03T16:20:09Z" level=info msg="Loading configuration file" file="/etc/alertmanager/config.yml" source="main.go:195"
time="2017-04-03T16:20:09Z" level=error msg="Loading configuration file failed: open /etc/alertmanager/config.yml: no such file or directory" file="/etc/alertmanager/config.yml" source="main.go:198"`
Other info:
# apt show docker-ce
[...]
Package: docker-ce
Version: 17.03.1~ce-0~debian-jessie
[...]
# lsb_release -d
Description: Debian GNU/Linux 8.7 (jessie)
Have I misunderstood the instructions?
Thank you
Any thoughts on using this with Kubernetes?
Got this error:
nodeexporter | time="2017-10-12T07:19:45Z" level=error msg="Error on statfs() system call for "/rootfs/var/lib/docker/containers/3ba4123c2ff67826a1869c0c3e2ac7e36beea1601b97ff3075e117448af39300/shm": permission denied" source="filesystem_linux.go:57"
Is it ok?
Hi @stefanprodan :
with node_exporter
0.15.0 I am getting the following error message:
node_exporter: error: unknown short flag '-c', try --help
I have to add the following patch:
diff --git a/docker-compose.yml b/docker-compose.yml
index 6a65bff..9a1403a 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -56,9 +56,9 @@ services:
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- - '-collector.procfs=/host/proc'
- - '-collector.sysfs=/host/sys'
- - '-collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
+ - '--path.procfs=/host/proc'
+ - '--path.sysfs=/host/sys'
+ - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
If you also confirm this as an issue, maybe allow me to make a PR to fix it, for the sake of this ๐.
I will also modify docker-compose.exporters.yml
as necessary.
I think it'd be good to pin node_exporter
to a specific version, say 0.14.0
, what do you think?
Grafana dashboards supplied by dockprom don't work with Prometheus 2.0
Hello,
I think I got a problem w/ CPU stats in the Container and Service Monitor dashboards while the CPU Host seems to be OK.
For example, if I run stress -c 1 in a given container, I get those data:
As you can see, I have no stats and sometimes stats stucking to 0 for my container (postgres) in the CPU Usage but the System Load seems to be good according to the stress test.
I have the same dashboard configuration as those defined in this repository and all the Dockprom containers are alive.
That's pretty strange and I don't know how to solve it, so if you have some tips it would be great !
Hi,
First of all thanks for this repo. It's great work and I was looking for something like that long time. Appreciate it!
I have no data points in the Nginx dashboard. Only CPU usage.
Any idea why and what should I do to get the data?
I don't have them with the latest version of node-exporter and prometheus. Do you have some recording rules?
I'm new to the wonderful world of containers and am having difficulty deploying this to monitor external hosts/nodes. How can I monitor additional hosts/containers beyond what this is deployed on? Maybe a more in-depth version of this comment.
Hey again,
I think it would be best practice to use memory/cpu reservation & limits. I know how to do this using a stack but I don't know the syntax for compose 2.1 .
Here is an example :)
version: "3.1"
services:
home:
image: abiosoft/caddy
networks:
- ntw_front
volumes:
- ./www/home/srv/:/srv/
deploy:
mode: replicated
replicas: 2
#placement:
# constraints: [node.role==manager]
restart_policy:
condition: on-failure
resources:
limits:
cpus: '0.20'
memory: 9M
reservations:
cpus: '0.05'
memory: 9M
labels:
- "traefik.backend=home"
- "traefik.frontend.rule=PathPrefixStrip:/"
- "traefik.port=2015"
- "traefik.enable=true"
- "traefik.backend.loadbalancer.method=drr"
- "traefik.frontend.entryPoints=http"
- "traefik.docker.network=ntw_front"
- "traefik.weight=10"
who1:
image: nginx:alpine
networks:
- ntw_front
volumes:
- ./www/who1/html/:/usr/share/nginx/html/
deploy:
mode: replicated
replicas: 2
#placement:
# constraints: [node.role==manager]
restart_policy:
condition: on-failure
resources:
limits:
cpus: '0.20'
memory: 9M
reservations:
cpus: '0.05'
memory: 9M
labels:
- "traefik.backend=who1"
- "traefik.frontend.rule=PathPrefixStrip:/who1"
- "traefik.port=80"
- "traefik.enable=true"
- "traefik.backend.loadbalancer.method=drr"
- "traefik.frontend.entryPoints=http"
- "traefik.docker.network=ntw_front"
- "traefik.weight=10"
who2:
image: emilevauge/whoami
networks:
- ntw_front
deploy:
mode: replicated
replicas: 2
#placement:
# constraints: [node.role==manager]
restart_policy:
condition: on-failure
resources:
limits:
cpus: '0.20'
memory: 9M
reservations:
cpus: '0.05'
memory: 9M
labels:
- "traefik.backend=who2"
- "traefik.frontend.rule=PathPrefixStrip:/who2"
- "traefik.port=80"
- "traefik.enable=true"
- "traefik.backend.loadbalancer.method=drr"
- "traefik.frontend.entryPoints=http"
- "traefik.docker.network=ntw_front"
- "traefik.weight=10"
networks:
ntw_front:
external: true
# With a real domain name you will need "traefik.frontend.rule=Host:mydummysite.tk"
#
# by Pascal Andy | # https://twitter.com/askpascalandy
# https://github.com/pascalandy/docker-stack-this
#
It make sense to open grafana port to the public.
But I'm not not sure to understand why prometheus and alertmanager have their port public. Any particular reason for this behaviour?
Many cheers!
Are you interested to add memory limits?
Something like:
deploy:
mode: replicated
replicas: 2
placement:
constraints: [node.role==manager]
restart_policy:
condition: on-failure
resources:
limits:
cpus: '0.25'
memory: 192M
reservations:
memory: 96M
If yes, I'll do a PR.
Cheers!
I am trying to get the data written to the local disk so that the data can be retained .
Can you please guide me how to achieve this kind of setup .
I am trying with the below configuration however everytime the container is failing to start if i remove the hash for PROMETHEUS_DATA volume
services:
prometheus:
image: prom/prometheus:v2.0.0
container_name: Prometheus-Monitoring
volumes:
- ./PROMETHEUS:/etc/prometheus/
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-lifecycle'
- '--web.console.templates=consoles'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--storage.tsdb.retention=15d'
- '--log.level=debug'
- '--web.enable-admin-api'
restart: unless-stopped
expose:
- 9090
ports:
- 9090:9090
networks:
- monitoring
labels:
app: monitoring
This is a great combination for monitoring our Docker environment. However we are trying to get swarm mode services to show without any luck. Any suggestions?
Thanks!
First; VERY nice put toghter! Thank you very much for this!
I must admit, I don't have much experience with prometheus, but how would a swarm-mode setup work with this project? Would it be as easy as setting up collectors on all node and have the monitor network on overlay?
ps: Sorry for submitting this as an issue.. It is sort of a feature request ๐
I am trying to get the free space graph working. I am using btrfs too so I set that entry in the docker_host.json.. I so went and edited the dashboard panel but when I set it to btrfs I get a "Multiple Series Error" because the response I get back from the node_exporter is an array and not a single object.
I am not sure how to filter down to the device I want. Do you have any suggestions?
my current config is just the default
(node_filesystem_size{fstype="btrfs"} - node_filesystem_free{fstype="btrfs"}) / node_filesystem_size{fstype="btrfs"} * 100
here is the json it returns.
[{"datapoints":[[23.06336212158203,1515775249000]],"label":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/"}","id":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/"}","alias":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/"}","stats":{"total":23.06336212158203,"max":23.06336212158203,"min":23.06336212158203,"logmin":23.06336212158203,"avg":23.06336212158203,"current":23.06336212158203,"first":23.06336212158203,"delta":0,"diff":0,"range":0,"timeStep":1.7976931348623157e+308,"count":1},"legend":true,"hasMsResolution":false,"allIsNull":false,"allIsZero":false,"flotpairs":[[1515775249000,23.06336212158203]]},{"datapoints":[[23.06336212158203,1515775249000]],"label":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/var/lib/docker"}","id":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/var/lib/docker"}","alias":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/var/lib/docker"}","stats":{"total":23.06336212158203,"max":23.06336212158203,"min":23.06336212158203,"logmin":23.06336212158203,"avg":23.06336212158203,"current":23.06336212158203,"first":23.06336212158203,"delta":0,"diff":0,"range":0,"timeStep":1.7976931348623157e+308,"count":1},"legend":true,"hasMsResolution":false,"allIsNull":false,"allIsZero":false,"flotpairs":[[1515775249000,23.06336212158203]]},{"datapoints":[[23.06336212158203,1515775249000]],"label":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/var/lib/docker/btrfs"}","id":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/var/lib/docker/btrfs"}","alias":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/var/lib/docker/btrfs"}","stats":{"total":23.06336212158203,"max":23.06336212158203,"min":23.06336212158203,"logmin":23.06336212158203,"avg":23.06336212158203,"current":23.06336212158203,"first":23.06336212158203,"delta":0,"diff":0,"range":0,"timeStep":1.7976931348623157e+308,"count":1},"legend":true,"hasMsResolution":false,"allIsNull":false,"allIsZero":false,"flotpairs":[[1515775249000,23.06336212158203]]},{"datapoints":[[23.06336212158203,1515775249000]],"label":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/var/lib/docker/btrfs/subvolumes/b25055e26df1be10643aa21e2963a8955cdd07ebf2a7bcad3465bdfd808a4081"}","id":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/var/lib/docker/btrfs/subvolumes/b25055e26df1be10643aa21e2963a8955cdd07ebf2a7bcad3465bdfd808a4081"}","alias":"{device="/dev/loop0",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/var/lib/docker/btrfs/subvolumes/b25055e26df1be10643aa21e2963a8955cdd07ebf2a7bcad3465bdfd808a4081"}","stats":{"total":23.06336212158203,"max":23.06336212158203,"min":23.06336212158203,"logmin":23.06336212158203,"avg":23.06336212158203,"current":23.06336212158203,"first":23.06336212158203,"delta":0,"diff":0,"range":0,"timeStep":1.7976931348623157e+308,"count":1},"legend":true,"hasMsResolution":false,"allIsNull":false,"allIsZero":false,"flotpairs":[[1515775249000,23.06336212158203]]},{"datapoints":[[87.30685779913249,1515775249000]],"label":"{device="/dev/md1",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk1"}","id":"{device="/dev/md1",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk1"}","alias":"{device="/dev/md1",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk1"}","stats":{"total":87.30685779913249,"max":87.30685779913249,"min":87.30685779913249,"logmin":87.30685779913249,"avg":87.30685779913249,"current":87.30685779913249,"first":87.30685779913249,"delta":0,"diff":0,"range":0,"timeStep":1.7976931348623157e+308,"count":1},"legend":true,"hasMsResolution":false,"allIsNull":false,"allIsZero":false,"flotpairs":[[1515775249000,87.30685779913249]]},{"datapoints":[[87.71065207811158,1515775249000]],"label":"{device="/dev/md2",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk2"}","id":"{device="/dev/md2",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk2"}","alias":"{device="/dev/md2",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk2"}","stats":{"total":87.71065207811158,"max":87.71065207811158,"min":87.71065207811158,"logmin":87.71065207811158,"avg":87.71065207811158,"current":87.71065207811158,"first":87.71065207811158,"delta":0,"diff":0,"range":0,"timeStep":1.7976931348623157e+308,"count":1},"legend":true,"hasMsResolution":false,"allIsNull":false,"allIsZero":false,"flotpairs":[[1515775249000,87.71065207811158]]},{"datapoints":[[86.93418322690847,1515775249000]],"label":"{device="/dev/md3",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk3"}","id":"{device="/dev/md3",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk3"}","alias":"{device="/dev/md3",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk3"}","stats":{"total":86.93418322690847,"max":86.93418322690847,"min":86.93418322690847,"logmin":86.93418322690847,"avg":86.93418322690847,"current":86.93418322690847,"first":86.93418322690847,"delta":0,"diff":0,"range":0,"timeStep":1.7976931348623157e+308,"count":1},"legend":true,"hasMsResolution":false,"allIsNull":false,"allIsZero":false,"flotpairs":[[1515775249000,86.93418322690847]]},{"datapoints":[[86.96944050202295,1515775249000]],"label":"{device="/dev/md4",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk4"}","id":"{device="/dev/md4",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk4"}","alias":"{device="/dev/md4",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/disk4"}","stats":{"total":86.96944050202295,"max":86.96944050202295,"min":86.96944050202295,"logmin":86.96944050202295,"avg":86.96944050202295,"current":86.96944050202295,"first":86.96944050202295,"delta":0,"diff":0,"range":0,"timeStep":1.7976931348623157e+308,"count":1},"legend":true,"hasMsResolution":false,"allIsNull":false,"allIsZero":false,"flotpairs":[[1515775249000,86.96944050202295]]},{"datapoints":[[48.815854517282645,1515775249000]],"label":"{device="/dev/sdf1",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/cache"}","id":"{device="/dev/sdf1",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/cache"}","alias":"{device="/dev/sdf1",fstype="btrfs",instance="nodeexporter:9100",job="nodeexporter",mountpoint="/rootfs/mnt/cache"}","stats":{"total":48.815854517282645,"max":48.815854517282645,"min":48.815854517282645,"logmin":48.815854517282645,"avg":48.815854517282645,"current":48.815854517282645,"first":48.815854517282645,"delta":0,"diff":0,"range":0,"timeStep":1.7976931348623157e+308,"count":1},"legend":true,"hasMsResolution":false,"allIsNull":false,"allIsZero":false,"flotpairs":[[1515775249000,48.815854517282645]]}]
Hi,
Sometimes I can't get any data in Prometheus and although I restart the containers (Grafana, Nodeexporter, cAdvisor, Prometheus) nothing happens. So I did docker logs Prometheus
and the result is :
@[1486967534.497] source="scrape.go:579"
time="2017-02-13T06:32:15Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=554 source="scrape.go:517"
time="2017-02-13T06:32:15Z" level=warning msg="Scrape health sample discarded" error="sample timestamp out of order" sample=up{instance="localhost:9090", job="prometheus"} => 1 @[1486967535.088] source="scrape.go:570"
time="2017-02-13T06:32:15Z" level=warning msg="Scrape duration sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="localhost:9090", job="prometheus"} => 0.020098991 @[1486967535.088] source="scrape.go:573"
time="2017-02-13T06:32:15Z" level=warning msg="Scrape sample count sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="localhost:9090", job="prometheus"} => 0.020098991 @[1486967535.088] source="scrape.go:576"
time="2017-02-13T06:32:15Z" level=warning msg="Scrape sample count post-relabeling sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="localhost:9090", job="prometheus"} => 0.020098991 @[1486967535.088] source="scrape.go:579"
time="2017-02-13T06:32:18Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=852 source="scrape.go:517"
time="2017-02-13T06:32:18Z" level=warning msg="Scrape health sample discarded" error="sample timestamp out of order" sample=up{instance="cadvisor:8080", job="cadvisor"} => 1 @[1486967538.095] source="scrape.go:570"
time="2017-02-13T06:32:18Z" level=warning msg="Scrape duration sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="cadvisor:8080", job="cadvisor"} => 0.058924147 @[1486967538.095] source="scrape.go:573"
time="2017-02-13T06:32:18Z" level=warning msg="Scrape sample count sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="cadvisor:8080", job="cadvisor"} => 0.058924147 @[1486967538.095] source="scrape.go:576"
time="2017-02-13T06:32:18Z" level=warning msg="Scrape sample count post-relabeling sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="cadvisor:8080", job="cadvisor"} => 0.058924147 @[1486967538.095] source="scrape.go:579"
time="2017-02-13T06:32:19Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=978 source="scrape.go:517"
time="2017-02-13T06:32:19Z" level=warning msg="Scrape health sample discarded" error="sample timestamp out of order" sample=up{instance="nodeexporter:9100", job="nodeexporter"} => 1 @[1486967539.499] source="scrape.go:570"
time="2017-02-13T06:32:19Z" level=warning msg="Scrape duration sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="nodeexporter:9100", job="nodeexporter"} => 0.021957840000000003 @[1486967539.499] source="scrape.go:573"
time="2017-02-13T06:32:19Z" level=warning msg="Scrape sample count sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="nodeexporter:9100", job="nodeexporter"} => 0.021957840000000003 @[1486967539.499] source="scrape.go:576"
time="2017-02-13T06:32:19Z" level=warning msg="Scrape sample count post-relabeling sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="nodeexporter:9100", job="nodeexporter"} => 0.021957840000000003 @[1486967539.499] source="scrape.go:579"
time="2017-02-13T06:32:22Z" level=warning msg="Error on ingesting out-of-order result from rule evaluation" numDropped=1 source="manager.go:296"
time="2017-02-13T06:32:23Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=852 source="scrape.go:517"
and after Google research I have found that is something like I collect data from multiple hosts or I don't know.
Could you help me ?
[UPDATE] So it was working perfectly last week but when I started my computer I got this errors. Next point sometimes it sync the data but then it stop
Following the README to the letter, I get no datapoints in prometheus.
Three target's are "UP".
/graph on any basic metric reports "No Datapoints"
Grafana reports the datasource "is working".
Grafana Dashboard for "Docker Containers" is empty of data.
OSX, Docker for Mac
Version 17.03.1-ce-mac5 (16048)
Channel: stable
b18e2a50cc
When running the project as-is on an Ubuntu16.04 host, the cadvisor fails to get to most data for the "Data Containers" dashboard. "Container Memory Usage" et al remain blank.
This is fixed by adding a /cgroup mount to the docker-compose.yml files.
This looks great btw, fantastic work and great use of grafana.
re: "all you need to do is to deploy a node-exporter and a cAdvisor container on each host and point the Prometheus server to scrape those"
It's not clear in the docs how to do this. After deploying node-exporter and cAdvisor container on a new host, do we simply add something like this?
- job_name: 'nodeexporter'
scrape_interval: 5s
static_configs:
- targets: ['nodeexporter:9100', 'new.host.ip.address:9100']
- job_name: 'cadvisor'
scrape_interval: 5s
static_configs:
- targets: ['cadvisor:8080', 'new.host.ip.address:8080']
Or do we need to create new -job_name: entries for each host (with host IP:9100 | 8080 as the targets?
Hi,
I'm trying to make a docker-compose.yml without Grafana to deploy it on others machines and with my central one collect the infos and display them on a graphic. So to resume I want to create a docker-compose.yml to export the data of others machines.
To make that I think we only need Prometheus, cAdvisor, NodeExporter and AlertManager.
So I tried to remove the Grafana parts of the .yml but it doesn't work. Prometheus can't run and in the logs I have:
level=info msg="Starting prometheus (version=1.5.2, branch=master, revision=bd1182d29f462c39544f94cc822830e1c64cf55b)" source="main.go:75"
level=info msg="Build context (go=go1.7.5, user=root@1a01c5f68840, date=20170220-07:00:00)" source="main.go:76"
level=info msg="Loading configuration file /etc/prometheus/prometheus.yml" source="main.go:248"
level=error msg="Error opening memory series storage: leveldb: manifest corrupted (field 'comparer'): missing [file=MANIFEST-000009]" source="main.go:182"
This is the .yml that I have made :
version: '2'
networks:
monitor-net:
driver: bridge
volumes:
prometheus_data: {}
services:
prometheus:
image: prom/prometheus
container_name: prometheus
volumes:
- ./prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '-config.file=/etc/prometheus/prometheus.yml'
- '-storage.local.path=/prometheus'
- '-alertmanager.url=http://alertmanager:9093'
- '-storage.local.memory-chunks=100000'
restart: unless-stopped
expose:
- 9090
ports:
- 9090:9090
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"
alertmanager:
image: prom/alertmanager
container_name: alertmanager
volumes:
- ./alertmanager/:/etc/alertmanager/
command:
- '-config.file=/etc/alertmanager/config.yml'
- '-storage.path=/alertmanager'
restart: unless-stopped
expose:
- 9093
ports:
- 9093:9093
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"
nodeexporter:
image: prom/node-exporter
container_name: nodeexporter
restart: unless-stopped
expose:
- 9100
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"
cadvisor:
image: google/cadvisor
container_name: cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
restart: unless-stopped
expose:
- 8080
networks:
- monitor-net
labels:
org.label-schema.group: "monitoring"
Thank you for the help
PS: I know you have already made something like this but there isn't the Prometheus part in yours and I need it
Hi,
I want to change the password "changeme" by something like "admin" for example, so I did this in the config file:
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=admin
GF_USERS_ALLOW_SIGN_UP=false
Then I do a docker-compose up -d
And when I go to "localhost:3000" I can only login with "changeme" (I tried with private navigator)
Hi Stefan,
I'm trying to set up SMTP by using the config.yml in alertmanager.
global:
smtp_smarthost: 'x.xx.xx.xxx:25'
smtp_from: '[email protected]'
require_tls: false
route:
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: '[email protected]'
require_tls: false```
The alertmanager container is always in a restart loop and is not up. Could this be to the changes in config.yml file?
When calling the below
curl -X POST http://admin:admin@<host-ip>:9090/-/reload
it returns: Lifecycle APIs are not enabled
Hi,
The first I'd like to thank you to provided the great tools!
I confired these suites on my Mac client, now I would like to export these containers to my monitoring server(Cetnos 7, the server has been deployed docker-ce) and installed other node exporter such as snmp_exporter, blackbox_exporter, do you know how can I migrate the suites to my server?
Fresh install, I've noticed that the top row (uptime, cpu, memory, etc) all display N/A unless they are deleted and re-created.
Has anyone seen this before, and is there a fix/workaround other than re-creating everything?
edit - this happens on other tabs as well
Hello, here you can see error trace
{"status":"error","errorType":"bad_data","error":"start time must be before end time"}
As i understood from prometheus/prometheus#3543
This issue between Prometheus: 2.1.0 & Alertmanager: 0.13.0, actually prometheus send invalid data to alertmanager and it can't processed them. So what should we do for resolve this trouble, may be wait for new Prometheus 2.2 and Alertmanager 0.14, or how we can downgrade to Prometheus 2.0(it's resolve this issue) based on Docker Compose.
Thanks a lot for your work, you project is amazing!
P.S . In Alertmanager 0.14 Release Notes i see [BUGFIX] Don't count alerts with EndTime in the future as resolved
It seems this is not an issue.
I am just following your blog post to deploy this project and it's very nice.
I have a case that I need to install Prometheus only on one server and is it possible to monitor all containers inside another server? maybe by adding source "IP from another server" which previously we install a Prometheus client on that server.
Thank you
I have docker version:
Docker version 17.06.1-ce, build 874a737
and got error:
ERROR: Version in "./docker-compose.yml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a version of "2" (or "2.0") and place your service definitions under the
services
key, or omit theversion
key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/
I had to add this line in docker-compose.yml
:
devices:
- "/dev/zfs:/dev/zfs"
in the cadvisor:
section.
Otherwise I got this error:
cadvisor | Try running 'udevadm trigger' and 'mount -t proc proc /proc' as root.
cadvisor | E0308 23:14:11.066788 1 fs.go:418] Stat fs failed. Error: exit status 1: "/usr/sbin/zfs zfs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset myzfspool/home/root" => /dev/zfs and /proc/self/mounts are required.
Thanks, this is a great project, really helps to see all those components into action.
I'm not sure to understand how metrics are collected from the node though, as nodeexporter does not use any mounts in the compose file.
Anything I'm missing ?
Why prometheus container has been configured not to auto restart after rebooting the host while all other containers are configured to auto restart?
Hello,
Newer versions of Ubuntu will often have different network interface names (instead of eth0, it'll be something like enp1s0f0).
By default, if this is the case, the network monitor doesn't show any traffic and I'm afraid I can't figure out how to change this.
Could you shed some light please?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.