Giter Club home page Giter Club logo

reconnoiter's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reconnoiter's Issues

segfault in feed-driven transient check removal

mdb core

mdb: core file data for mapping at 100000000 not saved: Bad address
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]

::stack
libc.so.1kill+0xa() libc.so.1__sighndlr+6()
libc.so.1call_user_handler+0x1db(b, 0, fffffd7fffdf7770) libc.so.1sigacthandler+0x10e(b, 0, fffffd7fffdf7770)
noit_hash_delete_all+0x49()
noit_hash_destroy+0x2b(107c676f8, 0, 465123)
noit_poller_free_check+0x338()
noit_check_transient_remove_feed+0x295()
handle_extra_feeds+0xac()
noit_check_log_bundle+0x1d(107c67610)
noit_check_set_stats+0x3e5()
dns.sodns_check_log_results+0x232() dns.sodns_cb+0x733()
dns_end_query+0x114()
dns_ioevent+0x7e8()
dns.so`dns_module_eventer_callback+0x45()
eventer_ports_impl_trigger+0x158()
eventer_ports_impl_loop+0x3e1()
child_main+0x455()
noit_watchdog_start_child+0x297()
noit_main+0xb76()
main+0x68()
_start+0x6c()

Want M-only output from noit_b2sm

In the context of building a replay tool for raw log records, it would be helpful if one could request only M records from noit_b2sm, since status/state-change records are not useful for metric replay. This would eliminate a second filtering step on the output of noit_b2sm.

I suppose there might be a case for S-only too, though I don't anticipate needing it.

Experiencing lag in the console of noit and stratcon

On another note, I have been experiencing some lag in the noit and stratcon consoles for some time now, but havn't had the time to create a issue before today..

I have just tested it on 2 different Ubuntu 14.04 systems, and both appear to have the same lag, when working inside the console.

When I type a character, it first appears on the console command line, when the next character have been typed in, and the whole "typing" is experienced to be slow.

Any ideas?

Scheduling issue(s)

I implemented my own DNS probe (based on the libudns one) and have found some issues with scheduling that I cannot explain. Things seem fine when I only run a measurement or two, but when I setup say 10-15 probes per second (still not a lot) things start going nuts).

First thing I see is that a probe that should fire every 10 seconds is firing sporadically. Here's the first two times one particular measurement runs. The first message is right after I set the NP_RUNNING flag and the second is shortly after that in the dns_check_send call

2016-06-23 15:48:35.070069] [debug/amaas_dns] recdns-soa_1 is running...
[2016-06-23 15:48:35.070104] [debug/amaas_dns] dns_check_send called
...
[2016-06-23 15:48:58.700329] [debug/amaas_dns] recdns-soa_1 is running...
[2016-06-23 15:48:58.700354] [debug/amaas_dns] dns_check_send called

You can see by the time stamps the second run was about 17 seconds after the first run. Not even close to when it should have run.

I also see a lot of these errors in the noitd log:
[2016-06-23 15:48:51.140765] [error] ctld-chnull_50 might not finish in 312ms (timeout 2000ms)
That particular measurement returned and reported a 15 ms latency, but the log for it clearing the NP_RUNNING flag came very late (again according to time stamps)

[2016-06-23 15:49:02.213172] [debug] ctld-chnull_50 <- [latency_us: 15796]
[2016-06-23 15:49:02.213178] [debug] ctld-chnull_50 -> [available:good]
[2016-06-23 15:49:02.213460] [debug/amaas_dns] ctld-chnull_50 is no longer running...

Next, unlike the libudns dns probe, my probe does not implement timeout functionality. Instead I'm using the same code from libudns to register the timeout with the eventer. I'm relying on it to tell me when the measurement has timed out. My current configuration has all timeouts set to 2 seconds, and for the most part its triggered the instant the timeout is reached. However I've seen it take 6+ seconds and on occasion 10+ seconds to trigger the callback. When it takes more than 10 seconds, the next measurement doesn't fire because NP_RUNNING is still set on it and the BAIL_ON_RUNNING macro fails it out. Here's my eventer registration:

    newe = eventer_alloc();
    newe->mask = EVENTER_TIMER;
    gettimeofday(&now, NULL);
    p_int.tv_sec = check->timeout / 1000;
    p_int.tv_usec = (check->timeout % 1000) * 1000;
    add_timeval(now, p_int, &newe->whence);
    newe->closure = ci;
    newe->callback = dns_module_check_timeout;
    ci->timeout_event = newe;
    eventer_add(newe);

Last, if I leave this setup run long enough (about 15 measurements per second) triggering the above errors occasionally/fairly frequently) eventually all my probes get scheduled in negative time. Using the cli "show check" both the last run and next run are negative and the next run is becoming a larger and larger negative number. When this happens, noitd continues to run, but no measurements are sent.

Any help with this would be greatly appreciated.
-Dave

I can't get snmp to work after upgrading..

I have some simple SNMP version 1 checks, polling some interfaces of my NAS, working just fine in master.4d6754ffc5b8ab1a54d6fc9d19819dfedfaf9ced.1359043503, but after upgrading to master.57d42b3ced2b3293f9e3fee35bb4f7bcdabd23e2.1364969173, I don't get any results:

noit# show check f21ba4a6-0e63-45cd-8d55-b0bbb296d38d
==== f21ba4a6-0e63-45cd-8d55-b0bbb296d38d ====
name: eth0::1
module: snmp [inherited from dc1/synology/@module]
target: nas [inherited from dc1/synology/DS213/@target]
resolve_rtype: prefer-ipv4 [inherited from @resolve_rtype]
period: 60000 [inherited from dc1/synology/@period]
timeout: 30000 [inherited from dc1/@timeout]
oncheck: [undef]
filterset: default [inherited from @FilterSet]
disable: [undef]
config::community: public
config::version: 1
target_ip: 192.168.69.7
currently: 00000060 idle
next run: 40.554 seconds
last run: 17.852 seconds ago
availability/state: available/bad
status: results=0
feeds: 0
metrics:
noit#

A tcpdump on port 161 looks like this:

13:30:31.029494 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 57)
192.168.69.30.32810 > 192.168.69.7.161: [bad udp cksum 0x0bad -> 0x08e0!] { SNMPv1 { GetRequest(14) R=1324089835 } }
0x0000: 4500 0039 0000 4000 4011 2f3e c0a8 451e E..9..@.@./>..E.
0x0010: c0a8 4507 802a 00a1 0025 0bad 301b 0201 ..E.....%..0...
0x0020: 0004 0670 7562 6c69 63a0 0e02 044e ec01 ...public....N..
0x0030: eb02 0100 0201 0030 00 .......0.
13:30:31.030243 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 57)
192.168.69.7.161 > 192.168.69.30.32810: [udp sum ok] { SNMPv1 { GetResponse(14) R=1324089835 } }
0x0000: 4500 0039 0000 4000 4011 2f3e c0a8 4507 E..9..@.@./>..E.
0x0010: c0a8 451e 00a1 802a 0025 08de 301b 0201 ..E....
.%..0...
0x0020: 0004 0670 7562 6c69 63a2 0e02 044e ec01 ...public....N..
0x0030: eb02 0100 0201 0030 00 .......0.

And when I look on localhost port 8083, I get the following:

13:34:31.021497 IP (tos 0x0, ttl 64, id 1737, offset 0, flags [DF], proto TCP (6), length 430)
127.0.0.1.46057 > 127.0.0.1.8083: Flags [P.], cksum 0xffa2 (incorrect -> 0x1bd5), seq 1:379, ack 1, win 2050, options [nop,nop,TS val 20039615 ecr 20039615], length 378
0x0000: 4500 01ae 06c9 4000 4006 347f 7f00 0001 E.....@[email protected].....
0x0010: 7f00 0001 b3e9 1f93 40f3 c179 0a47 929c [email protected]..
0x0020: 8018 0802 ffa2 0000 0101 080a 0131 c7bf .............1..
0x0030: 0131 c7bf 504f 5354 202f 6469 7370 6174 .1..POST./dispat
0x0040: 6368 2f73 6e6d 7020 4854 5450 2f31 2e31 ch/snmp.HTTP/1.1
0x0050: 0d0a 486f 7374 3a20 3132 372e 302e 302e ..Host:.127.0.0.
0x0060: 310d 0a55 7365 722d 4167 656e 743a 2052 1..User-Agent:.R
0x0070: 6563 6f6e 6e6f 6974 6572 2f30 2e39 0d0a econnoiter/0.9..
0x0080: 436f 6e74 656e 742d 4c65 6e67 7468 3a20 Content-Length:.
0x0090: 3234 370d 0a41 6363 6570 742d 456e 636f 247..Accept-Enco
0x00a0: 6469 6e67 3a20 677a 6970 2c20 6465 666c ding:.gzip,.defl
0x00b0: 6174 650d 0a0d 0a3c 3f78 6d6c 2076 6572 ate.....<ch
0x00e0: 6563 6b20 7461 7267 6574 3d22 6e61 732e eck.target="nas.
0x00f0: 6272 6f73 7469 6e67 2e6e 6574 2220 7461 yyyyyyy.xxx".ta
0x0100: 7267 6574 5f69 703d 2231 3932 2e31 3638 rget_ip="192.168
0x0110: 2e36 392e 3722 206d 6f64 756c 653d 2273 .69.7".module="s
0x0120: 6e6d 7022 206e 616d 653d 2274 756e 3a3a nmp".name="tun::
0x0130: 3222 2070 6572 696f 643d 2236 3030 3030 2".period="60000
0x0140: 2220 7469 6d65 6f75 743d 2233 3030 3030 ".timeout="30000
0x0150: 223e 0a20 203c 636f 6e66 6967 3e0a 2020 ">......
0x0160: 2020 3c63 6f6d 6d75 6e69 7479 3e70 7562 ..pub
0x0170: 6c69 633c 2f63 6f6d 6d75 6e69 7479 3e0a lic.
0x0180: 2020 2020 3c76 6572 7369 6f6e 3e31 3c2f ....1</
0x0190: 7665 7273 696f 6e3e 0a20 203c 2f63 6f6e version>...</con
0x01a0: 6669 673e 0a3c 2f63 6865 636b 3e0a fig>..
13:34:31.021534 IP (tos 0x0, ttl 64, id 21795, offset 0, flags [DF], proto TCP (6), length 52)
127.0.0.1.8083 > 127.0.0.1.46057: Flags [.], cksum 0xfe28 (incorrect -> 0x6a92), seq 1, ack 379, win 2048, options [nop,nop,TS val 20039615 ecr 20039615], length 0
0x0000: 4500 0034 5523 4000 4006 e79e 7f00 0001 E..4U#@.@.......
0x0010: 7f00 0001 1f93 b3e9 0a47 929c 40f3 c2f3 .........G..@...
0x0020: 8010 0800 fe28 0000 0101 080a 0131 c7bf .....(.......1..
0x0030: 0131 c7bf .1..
13:34:31.040931 IP (tos 0x0, ttl 64, id 21796, offset 0, flags [DF], proto TCP (6), length 127)
127.0.0.1.8083 > 127.0.0.1.46057: Flags [P.], cksum 0xfe73 (incorrect -> 0xfeee), seq 1:76, ack 379, win 2048, options [nop,nop,TS val 20039620 ecr 20039615], length 75
0x0000: 4500 007f 5524 4000 4006 e752 7f00 0001 E...U$@[email protected]....
0x0010: 7f00 0001 1f93 b3e9 0a47 929c 40f3 c2f3 .........G..@...
0x0020: 8018 0800 fe73 0000 0101 080a 0131 c7c4 .....s.......1..
0x0030: 0131 c7bf 4854 5450 2f31 2e31 2032 3030 .1..HTTP/1.1.200
0x0040: 204f 4b0d 0a54 7261 6e73 6665 722d 456e .OK..Transfer-En
0x0050: 636f 6469 6e67 3a20 6368 756e 6b65 640d coding:.chunked.
0x0060: 0a53 6572 7665 723a 204a 6574 7479 2836 .Server:.Jetty(6
0x0070: 2e31 2e32 3029 0d0a 0d0a 3130 370d 0a .1.20)....107..
13:34:31.040994 IP (tos 0x0, ttl 64, id 1738, offset 0, flags [DF], proto TCP (6), length 52)
127.0.0.1.46057 > 127.0.0.1.8083: Flags [.], cksum 0xfe28 (incorrect -> 0x6a3d), seq 379, ack 76, win 2048, options [nop,nop,TS val 20039620 ecr 20039620], length 0
0x0000: 4500 0034 06ca 4000 4006 35f8 7f00 0001 E..4..@[email protected].....
0x0010: 7f00 0001 b3e9 1f93 40f3 c2f3 0a47 92e7 [email protected]..
0x0020: 8010 0800 fe28 0000 0101 080a 0131 c7c4 .....(.......1..
0x0030: 0131 c7c4 .1..
13:34:31.043311 IP (tos 0x0, ttl 64, id 21797, offset 0, flags [DF], proto TCP (6), length 317)
127.0.0.1.8083 > 127.0.0.1.46057: Flags [P.], cksum 0xff31 (incorrect -> 0x7469), seq 76:341, ack 379, win 2048, options [nop,nop,TS val 20039621 ecr 20039620], length 265
0x0000: 4500 013d 5525 4000 4006 e693 7f00 0001 E..=U%@.@.......
0x0010: 7f00 0001 1f93 b3e9 0a47 92e7 40f3 c2f3 .........G..@...
0x0020: 8018 0800 ff31 0000 0101 080a 0131 c7c5 .....1.......1..
0x0030: 0131 c7c4 3c3f 786d 6c20 7665 7273 696f .1....<ResmonRes
0x00c0: 756c 7473 3e0a 3c52 6573 6d6f 6e52 6573 ults>.<ResmonRes
0x00d0: 756c 7420 6d6f 6475 6c65 3d22 736e 6d70 ult.module="snmp
0x00e0: 2220 7365 7276 6963 653d 2274 756e 3a3a ".service="tun::
0x00f0: 3222 3e0a 3c6c 6173 745f 7570 6461 7465 2">.<last_update
0x0100: 3e31 3336 3530 3735 3237 313c 2f6c 6173 >1365075271</las
0x0110: 745f 7570 6461 7465 3e0a 3c2f 5265 736d t_update>.</Resm
0x0120: 6f6e 5265 7375 6c74 3e0a 3c2f 5265 736d onResult>.</Resm
0x0130: 6f6e 5265 7375 6c74 733e 0a0d 0a onResults>...

I run jezebel with -f and nothing else, do I use jezebel correct? Anything I have missed?

I have made sure, to update the changes in noit.conf. Any ideas would be welcome :)

support a tagset in the check configuration that will augment metric names

the <check> configuration should support a new <tagset> field. The contents of this field
should be a validated set of stream tags (see noit_metric.h) and if there is a <tagset> field present on the check it means that all incoming metrics for the check have these additional <tagset> tags added to the metric name as stream tags.

If a check has <tagset>a:b,c:d,foo:bar</tagset> and a metric arrives on this check called quux the name should be transformed to: quux|ST[a:b,c:d,foo:bar]. If a metric arrives on this check called 'baz|ST[region:us-east-1], the name should be transformed to: baz|ST[a:b,c:d,foo:bar,region:us-east-1]. If a metric arrives that has the name blarg|ST[a:b], the new name should be: blarg|ST[a:b,c:d,foo:bar]` (by the tag property that there are no duplicate tags in a name).

DNS check does incorrect PTR lookups for IPv6

If I create a DNS check using the PTR record type, and enter an IPv6 address, the DNS module does not construct a valid ip6.arpa. query. Consider these tcpdumps, the first from a broker trying to execute a PTR check for 2607:f8b0:4004:802::2004, the second from dig @8.8.4.4 -x 2607:f8b0:4004:802::2004 on the broker's host OS:

broker:

14:46:00.728043 IP x.x.x.x.50877 > 8.8.4.4.domain: 9346+ [1au] PTR? 2607:f8b0:4004:802::2004. (53)
14:46:00.739805 IP 8.8.4.4.domain > x.x.x.x.50877: 9346 NXDomain 0/1/1 (128)

dig:

14:48:03.308405 IP x.x.x.x.37313 > 8.8.4.4.domain: 16813+ [1au] PTR? 4.0.0.2.0.0.0.0.0.0.0.0.0.0.0.0.2.0.8.0.4.0.0.4.0.b.8.f.7.0.6.2.ip6.arpa. (101)
14:48:03.319944 IP 8.8.4.4.domain > x.x.x.x.37313: 16813 1/0/1 PTR iad23s58-in-x04.1e100.net. (140)

A cursory glance at https://github.com/circonus-labs/reconnoiter/blob/master/src/modules/dns.c shows that we anticipate doing reverse lookups on IPv6, but dns_interpolate_inaddr_arpa() seems to only consider dot-delimited strings (i.e., IPv4 only).

psql:scaffolding.sql:11: ERROR: permission denied for database operation "alter user stratcon set search_path to noit,public;"

System Information:

Fedora Linux 15

[akshatha@localhost reconnoiter]$ postgres --version
postgres (PostgreSQL) 9.0.6

Earlier I tried to run the scaffolding.sql using psql, it gave me error, so I reordered the scaffolding.sql file content as follows,

snippet of scaffolding.sql file

create user reconnoiter;
create database reconnoiter with owner = reconnoiter;
create user stratcon with encrypted password 'stratcon';
create user prism with encrypted password 'prism';
\c reconnoiter reconnoiter;

create language plpgsql;
create schema noit;
create schema stratcon;
create schema prism;
alter user stratcon set search_path to noit,public;
alter user prism set search_path to noit,public;

begin;

grant usage on schema stratcon to stratcon;
grant usage on schema stratcon to prism;
grant usage on schema noit to stratcon;
grant usage on schema noit to prism;
grant usage on schema prism to prism;


I am getting psql:scaffolding.sql:11: ERROR: permission denied error on line 11 and 12 which are,
alter user stratcon set search_path to noit,public;
alter user prism set search_path to noit,public;

Except those two errors, the rest of scaffolding.sql installs properly.

Error Output:

-bash-4.2$ psql
psql (9.0.6)
Type "help" for help.

postgres=# \i scaffolding.sql
psql:scaffolding.sql:1: ERROR: role "reconnoiter" already exists
psql:scaffolding.sql:2: ERROR: database "reconnoiter" already exists
psql:scaffolding.sql:3: ERROR: role "stratcon" already exists
psql:scaffolding.sql:4: ERROR: role "prism" already exists
You are now connected to database "reconnoiter" as user "reconnoiter".
psql:scaffolding.sql:7: ERROR: language "plpgsql" already exists
CREATE SCHEMA
CREATE SCHEMA
CREATE SCHEMA
psql:scaffolding.sql:11: ERROR: permission denied
psql:scaffolding.sql:12: ERROR: permission denied
BEGIN
GRANT
GRANT
GRANT
GRANT
GRANT
CREATE FUNCTION

State of the project

Hi,

there was no commit since five years.

What is the state of the project?

I am curious, since overall picture looks good.

eventer_SSL_fd_opset.c:660: undefined reference to `SSLv2_server_method'

HI,

I get the following errors on a Ubuntu 14.04.1 LTS system, with the latest code:

make[2]: Leaving directory `/home/vrou/work/reconnoiter/src/LuaJIT/src'
- making private noit-objs/LuaJIT/src/lib_init.o
- linking noitd
noit-objs/eventer/eventer_SSL_fd_opset.o: In function `eventer_ssl_ctx_new':
/home/vrou/work/reconnoiter/src/eventer/eventer_SSL_fd_opset.c:660: undefined reference to `SSLv2_server_method'
/home/vrou/work/reconnoiter/src/eventer/eventer_SSL_fd_opset.c:660: undefined reference to `SSLv2_client_method'
collect2: error: ld returned 1 exit status
make[1]: *** [noitd] Error 1
rm udns/udns_jran.o noit_reverse_socket.o noit_events_rest.o
make[1]: Leaving directory `/home/vrou/work/reconnoiter/src'
make: *** [all] Error 2

I also see the following warnings:

make[2]: Leaving directory `/home/vrou/work/reconnoiter/src/eventer'
- making private noit-objs/eventer/eventer_POSIX_fd_opset.o
make[2]: Entering directory `/home/vrou/work/reconnoiter/src/eventer'
eventer_SSL_fd_opset.c: In function ‘eventer_ssl_ctx_new’:
eventer_SSL_fd_opset.c:660:34: warning: implicit declaration of function ‘SSLv2_server_method’ [-Wimplicit-function-declaration]
                                  SSLv2_server_method() : SSLv2_client_method());
                                  ^
eventer_SSL_fd_opset.c:660:34: warning: implicit declaration of function ‘SSLv2_client_method’ [-Wimplicit-function-declaration]
eventer_SSL_fd_opset.c:660:34: warning: passing argument 1 of ‘SSL_CTX_new’ makes pointer from integer without a cast [enabled by default]
In file included from ../../src/eventer/eventer_SSL_fd_opset.h:39:0,
                 from ../../src/eventer/eventer.h:82,
                 from eventer_SSL_fd_opset.c:34:
/usr/include/openssl/ssl.h:1681:10: note: expected ‘const struct SSL_METHOD *’ but argument is of type ‘int’
 SSL_CTX *SSL_CTX_new(const SSL_METHOD *meth);
          ^
eventer_SSL_fd_opset.c: In function ‘eventer_SSL_close’:
eventer_SSL_fd_opset.c:1001:7: warning: variable ‘rv’ set but not used [-Wunused-but-set-variable]
   int rv;
       ^
- compiling eventer_SSL_fd_opset.c
make[2]: Leaving directory `/home/vrou/work/reconnoiter/src/eventer'

The current version running on this system, is version: master.1d5af2e662b96a9cc4ace14177d4e623562feeff.1409682109

Any ideas as to what could be the issue?

status: zlib: data error in master.57ac1eb735b4c8ba5716469dbc7666a0da073659.1385155891

Hi,

Just updated a noit in test setup, and saw the following. When doing a curl from the host to the above URL, it works.

The fun thing is, that the other HTTP check for some other URL works as it should with no issues.

noit# show check ccafe7f5-925f-4be9-975e-46c3722bd624
==== ccafe7f5-925f-4be9-975e-46c3722bd624 ====
name: http [from module]
module: http [inherited from dk01/web/@module]
target: www.dr.dk
resolve_rtype: prefer-ipv4 [inherited from @resolve_rtype]
period: 60000 [inherited from dk01/@period]
timeout: 30000 [inherited from dk01/@timeout]
oncheck: [undef]
filterset: default [inherited from @FilterSet]
disable: [undef]
config::code: 200
config::url: http://www.dr.dk/
target_ip: 159.20.6.22
currently: 00000060 idle
next run: 23.567 seconds
last run: 36.354 seconds ago
availability/state: unavailable/bad
status: zlib: data error
feeds: 0

noit# show version
build sysname: SunOS
build nodename: recon01
build release: 5.11
build version: omnios-b281e50
build machine: i86pc
run sysname: SunOS
run nodename: recon01
run release: 5.11
run version: omnios-b281e50
run machine: i86pc
bitwidth: 64bit
version: master.57ac1eb735b4c8ba5716469dbc7666a0da073659.1385155891

noitd dies of no memory when encountering big resmon result

So, I had a bug in my app. This app exposes some metrics as JSON and reconnoiter eats them via the resmon module. The bug made one of the metric values grow to gigantic proportions (instead of replacing one of the values, it kept appending, don't ask...). This led to noitd running out of memory, CPU usage goes thru the roof and the noitd parent process sacrifices the child (what a parent that is).

I have an strace of it, if anyone is interested, excerpt below.

mremap(0x40f39000, 507904, 512000, 0)   = 0x40f39000
mmap(NULL, 512000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x40365000
munmap(0x401c1000, 483328)              = 0
munmap(0x4025f000, 499712)              = 0
read(15, "222232142332223146332131132361231123472141514232117722115231412112211133231122212721111214145321311242212113421132322232115133111163313133111126133132323233133911313222131131212132122111312233
33112111"..., 4096) = 4096
read(15, "211232251162122111211223111123311125452123221112413333341213131631222111132112425321104145228641114313334322211123333233321141111131121232121361113151313121241122121211511441231222112611131124
36293121"..., 4096) = 4096
mremap(0x40f39000, 512000, 520192, 0)   = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 520192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x4177f000
munmap(0x40f39000, 512000)              = 0
mmap(NULL, 520192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x404cc000
mremap(0x4177f000, 520192, 262144, 0)   = 0x4177f000
read(15, "21141312531111321311322135221121223213111121222512332331114141231741211252653222362311217111512122111231121131113233713336321111211111141161132122122111122113812211332332117322313211161121131113111331"..., 4096) = 4096
read(15, "41112113111324426111214121123161111124113113312711113125111213112221341112322111311112213111921223242211316411312221131331511111623111131211111121117136247221111112241811111114111111921433212134223216"..., 4096) = 4096

and it ends like this:

mremap(0x459a4000, 10059776, 10067968, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 10067968, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x46cd4000
+++ killed by SIGKILL +++

out of order target_ip ordering in polls_by_name

The polls_by_name skiplist in noit_check has two secondary indices (one on target and one on target_ip) that are used for passive checks (like statsd).

We've seen in production (under yet-to-be diagnosed circumstances) that the target_ip index is ordered incorrectly causing the software to malfunction. Interestingly, the ordering usually looks like that target ordering, but sometimes it is just all fouled up.

enabling the ssh2 module crashes noitd on ubuntu 12.04.03

Just after enabling a ssh2 check in the console, noitd crashes with the following message:

/opt/local/sbin/noitd: symbol lookup error: /opt/local/libexec/noit/ssh2.so: undefined symbol: libssh2_session_set_timeout

I stumbled over this commit, and after reversing the changes in the commit, I can now configure the ssh2 check and getting metrics:
87ed3da

error: variable ‘cn_expected’,'feedtype' set but not used [-Werror=unused-but-set-variable]

system info:
—————–
[akshatha@localhost src]$ uname -a
Linux localhost.localdomain 2.6.41.4-1.fc15.x86_64 #1 SMP Tue Nov 29 11:53:48 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
[akshatha@localhost src]$ cat /etc/issue
Fedora release 15 (Lovelock)

I am getting below warnings which acts as errors:
———————————————————————–
[akshatha@localhost reconnoiter]$ make
(cd src && make)
make[1]: Entering directory `/home/akshatha/Documents/reconnoiter/src’
Building version info from git

  • version -> 1f06389
  • symbolic -> branches/master
  • version unchanged
    make[2]: Entering directory `/home/akshatha/Documents/reconnoiter/src/udns’
    …..
    …..
  • compiling stratcon_realtime_http.c
    stratcon_jlog_streamer.c: In function ‘noit_connection_schedule_reattempt’:
    stratcon_jlog_streamer.c:244:30: error: variable ‘cn_expected’ set but not used [-Werror=unused-but-set-variable]
    stratcon_jlog_streamer.c:244:19: error: variable ‘feedtype’ set but not used [-Werror=unused-but-set-variable]
    stratcon_jlog_streamer.c: In function ‘stratcon_jlog_recv_handler’:
    stratcon_jlog_streamer.c:395:29: error: variable ‘feedtype’ set but not used [-Werror=unused-but-set-variable]
    stratcon_jlog_streamer.c:395:15: error: variable ‘cn_expected’ set but not used [-Werror=unused-but-set-variable]
    stratcon_jlog_streamer.c: In function ‘noit_connection_ssl_upgrade’:
    stratcon_jlog_streamer.c:560:44: error: variable ‘feedtype’ set but not used [-Werror=unused-but-set-variable]
    stratcon_jlog_streamer.c: In function ‘noit_connection_complete_connect’:
    stratcon_jlog_streamer.c:612:70: error: variable ‘feedtype’ set but not used [-Werror=unused-but-set-variable]
    stratcon_jlog_streamer.c:612:56: error: variable ‘cn_expected’ set but not used [-Werror=unused-but-set-variable]
    stratcon_jlog_streamer.c: In function ‘noit_connection_initiate_connection’:
    stratcon_jlog_streamer.c:690:29: error: variable ‘feedtype’ set but not used [-Werror=unused-but-set-variable]
    stratcon_jlog_streamer.c:690:15: error: variable ‘cn_expected’ set but not used [-Werror=unused-but-set-variable]
    cc1: all warnings being treated as errors

make[1]: *** [stratcon_jlog_streamer.o] Error 1
make[1]: Leaving directory `/home/akshatha/Documents/reconnoiter/src’
make: *** [all] Error 2

However if I remove all those warning lines, it compiles successfully.

Is it recommanded to remove the warning lines? since google shows that most of the solutions have ended in removing lines which contain warning.

Thanks,
Kiran Patil.

noitd crashes every now and then.

In the last week, I have seen a couple of noitd crashes. After setting up the gimli glider, I produced several dumps, and one of those are pasted into the gist below:
https://gist.github.com/vrou/46bf5bad0959046a95d2

The noitd starts up a new process, but have once hanged a couple of hours before I started using the glider, losing a couple of hours of metrics. I haven't experienced this after using the glider.

There aren't any other processes crashing, and I have tested the memory the last 10 hours to see if there was something in this regards :)

Anyone seen this before?

Ubuntu 12.04.03
noit# show version
build sysname: Linux
build nodename: de01
build release: 3.2.0-41-generic
build version: #66-Ubuntu SMP Thu Apr 25 03:27:11 UTC 2013
build machine: x86_64
run sysname: Linux
run nodename: de01
run release: 3.2.0-41-generic
run version: #66-Ubuntu SMP Thu Apr 25 03:27:11 UTC 2013
run machine: x86_64
bitwidth: 64bit
version: master.7e697ae768723eb6412090b34208d79f5fcfcbc0.1388430135

Java processes takes up all CPU resources, after upgrading to OmniOS r151008

Don't know if this is a OmniOS or a reconnoiter issue, but the problem is that after upgrading my zones to OmniOS r151008, both the run-iep.sh and jezebel processes takes up all CPU resources.

Anyone else seen this behavior?

noit# show version
build sysname: SunOS
build nodename: recon01
build release: 5.11
build version: omnios-6de5e81
build machine: i86pc
run sysname: SunOS
run nodename: recon01
run release: 5.11
run version: omnios-6de5e81
run machine: i86pc
bitwidth: 64bit
version: master.32d791199ea3658c4cd01ae2aca57ca7e6972467.1386309700

Java packages installed:

pkg://omnios/developer/java/[email protected],5.11-0.151008:20131204T201044Z i--
pkg://omnios/runtime/[email protected],5.11-0.151008:20131204T201013Z i--

Convert test suite to mtevbusted.

From perl to nodejs, it has been a journey. libmtev ships with a busted-assisted testing runtime within luamtev. We should be using that to do our testing.

Confusing cluster boot messages.

Restarted CAQL1, got:

[2018-09-06 14:11:49.416959] [notice] cluster noit:caql-broker-caqlbroker1-gcp-ia -> (re)booted
[2018-09-06 14:11:49.553198] [notice] cluster noit:caql-broker-caqlbroker2-gcp-ia -> (re)booted
Expected only to see the first line.

Likely just a confusing message.

Test Lua probe doesn't Timeout

I am trying to write a new Lua Probe, I was able to successfully load and run on a configured "interval",

module(..., package.seeall)

function onload(image)
  image.xml_description([=[
<module>
  <name>test1</name>
  <description><para>Test Lua probe.</para></description>
  <loader>lua</loader>
  <object>noit.module.test1</object>
  <moduleconfig />
  <examples>
    <example>
      <title>Test1</title>
      <para>Test Lua probe </para>
      <programlisting><![CDATA[
      <noit>
        <modules>
          <loader image="lua" name="lua">
            <config><directory>/opt/reconnoiter/libexec/modules-lua/?.lua</directory></config>
          </loader>
          <module loader="lua" name="test1" object="noit.module.test1"/>
        </modules>
        <checks>
          <check uuid="4ee1a1e2-1e60-11df-8e99-bf796ca462ab" module="test1" target="8.8.8.8" period="10000" timeout="500"/>
        </checks>
      </noit>
      ]]></programlisting>
    </example>
  </examples>
</module>]=])
  return 0
end

function init(module)
  return 0
end

function config(module, options)
  return 0
end

function initiate(module, check)
     check.bad()
     check.unavailable()
     check.status("unknown error")
     sleep(30)
     check.good()
     check.available()
end

local clock = os.clock
function sleep(n)  -- seconds
   local t0 = clock()
   while clock() - t0 <= n do
   end
end

I introduced a sleep to test the timeout functionality, the eventer doesn't seem to timeout and probe continues to run until it finishes, all the other LUA probes which are part of the reconnoiter platform happen to timeout, am I missing something obvious?

License GPLv2 issue

I think the following two entries are prematurely closed. It may be best to keep discussing the question and close when a conclusion is reached?

#168
#167

Building b90582cf27dccb8638f1e0b7a90aa1f214c98238 fails, with maven build error

Trying to build latest version on OmniOS stable, with the following configure line:

$ autoconf && ./configure CPPFLAGS="-I/opt/omni/include -I/opt/omni/include/amd64/mysql -I/opt/omni/include/libxml2" LDFLAGS="-m64 -L/opt/omni/lib/amd64 -R/opt/omni/lib/amd64 -L/opt/omni/lib/amd64/mysql -R/opt/omni/lib/amd64/mysql -L/usr/local/lib -R/usr/local/lib" CFLAGS="-g -m64" SHCFLAGS="-g -m64" --prefix=/opt/local/

This worked perfect a couple of weeks ago, I think that was just a couple of days after going changing to riemann. And it looks like it can't find the pom files:

gmake[2]: Entering directory `/shared/reconnoiter-riemann/src/java'
- lib/reconnoiter.jar compiling files
- creating lib/reconnoiter.jar
- lib/jezebel.jar compiling files
- creating lib/jezebel.jar
- building a maven-like repo layout in lib
- building reconnoiter-riemann
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building reconnoiter-riemann 1.0
[INFO] ------------------------------------------------------------------------
Downloading: http://clojars.org/repo/reconnoiter/reconnoiter/0.1/reconnoiter-0.1.pom
Downloading: file:///shared/reconnoiter-riemann/src/java/reconnoiter-riemann/../lib/reconnoiter/reconnoiter/0.1/reconnoiter-0.1.pom
Downloading: http://repo.maven.apache.org/maven2/reconnoiter/reconnoiter/0.1/reconnoiter-0.1.pom
[WARNING] The POM for reconnoiter:reconnoiter:jar:0.1 is missing, no dependency information available
Downloading: http://clojars.org/repo/fqclient/fqclient/0.1/fqclient-0.1.pom
Downloading: file:///shared/reconnoiter-riemann/src/java/reconnoiter-riemann/../lib/fqclient/fqclient/0.1/fqclient-0.1.pom
Downloading: http://repo.maven.apache.org/maven2/fqclient/fqclient/0.1/fqclient-0.1.pom
[WARNING] The POM for fqclient:fqclient:jar:0.1 is missing, no dependency information available
Downloading: http://clojars.org/repo/rabbitmq-client/rabbitmq-client/2.4.1/rabbitmq-client-2.4.1.pom
Downloading: file:///shared/reconnoiter-riemann/src/java/reconnoiter-riemann/../lib/rabbitmq-client/rabbitmq-client/2.4.1/rabbitmq-client-2.4.1.pom
Downloading: http://repo.maven.apache.org/maven2/rabbitmq-client/rabbitmq-client/2.4.1/rabbitmq-client-2.4.1.pom
[WARNING] The POM for rabbitmq-client:rabbitmq-client:jar:2.4.1 is missing, no dependency information available
Downloading: http://clojars.org/repo/activemq-all/activemq-all/5.2.0/activemq-all-5.2.0.pom
Downloading: file:///shared/reconnoiter-riemann/src/java/reconnoiter-riemann/../lib/activemq-all/activemq-all/5.2.0/activemq-all-5.2.0.pom
Downloading: http://repo.maven.apache.org/maven2/activemq-all/activemq-all/5.2.0/activemq-all-5.2.0.pom
[WARNING] The POM for activemq-all:activemq-all:jar:5.2.0 is missing, no dependency information available
Downloading: http://clojars.org/repo/spring-context/spring-context/2.5.5/spring-context-2.5.5.pom
Downloading: file:///shared/reconnoiter-riemann/src/java/reconnoiter-riemann/../lib/spring-context/spring-context/2.5.5/spring-context-2.5.5.pom
Downloading: http://repo.maven.apache.org/maven2/spring-context/spring-context/2.5.5/spring-context-2.5.5.pom
[WARNING] The POM for spring-context:spring-context:jar:2.5.5 is missing, no dependency information available
Downloading: http://clojars.org/repo/spring-beans/spring-beans/2.5.5/spring-beans-2.5.5.pom
Downloading: file:///shared/reconnoiter-riemann/src/java/reconnoiter-riemann/../lib/spring-beans/spring-beans/2.5.5/spring-beans-2.5.5.pom
Downloading: http://repo.maven.apache.org/maven2/spring-beans/spring-beans/2.5.5/spring-beans-2.5.5.pom
[WARNING] The POM for spring-beans:spring-beans:jar:2.5.5 is missing, no dependency information available
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ reconnoiter-riemann ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ reconnoiter-riemann ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Compiling 2 source files to /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/target/classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/EventHandler.java:[40,36] cannot find symbol
symbol  : class IMQMQ
location: package com.omniti.reconnoiter.broker
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/EventHandler.java:[62,10] cannot find symbol
symbol  : class IMQMQ
location: class com.omniti.reconnoiter.EventHandler
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/EventHandler.java:[80,22] cannot find symbol
symbol  : class IMQMQ
location: class com.omniti.reconnoiter.EventHandler
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/EventHandler.java:[105,9] cannot find symbol
symbol  : class IMQMQ
location: class com.omniti.reconnoiter.EventHandler
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/IEPRiemann.java:[41,36] cannot find symbol
symbol  : class MQFactory
location: package com.omniti.reconnoiter.broker
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/IEPRiemann.java:[49,36] cannot find symbol
symbol  : class IMQMQ
location: package com.omniti.reconnoiter.broker
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/IEPRiemann.java:[77,4] cannot find symbol
symbol  : class IMQMQ
location: class com.omniti.reconnoiter.IEPRiemann
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/IEPRiemann.java:[77,15] cannot find symbol
symbol  : variable MQFactory
location: class com.omniti.reconnoiter.IEPRiemann
[INFO] 8 errors 
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.889s
[INFO] Finished at: Mon Oct 21 10:29:41 CEST 2013
[INFO] Final Memory: 15M/93M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project reconnoiter-riemann: Compilation failure: Compilation failure:
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/EventHandler.java:[40,36] cannot find symbol
[ERROR] symbol  : class IMQMQ
[ERROR] location: package com.omniti.reconnoiter.broker
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/EventHandler.java:[62,10] cannot find symbol
[ERROR] symbol  : class IMQMQ
[ERROR] location: class com.omniti.reconnoiter.EventHandler
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/EventHandler.java:[80,22] cannot find symbol
[ERROR] symbol  : class IMQMQ
[ERROR] location: class com.omniti.reconnoiter.EventHandler
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/EventHandler.java:[105,9] cannot find symbol
[ERROR] symbol  : class IMQMQ
[ERROR] location: class com.omniti.reconnoiter.EventHandler
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/IEPRiemann.java:[41,36] cannot find symbol
[ERROR] symbol  : class MQFactory
[ERROR] location: package com.omniti.reconnoiter.broker
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/IEPRiemann.java:[49,36] cannot find symbol
[ERROR] symbol  : class IMQMQ
[ERROR] location: package com.omniti.reconnoiter.broker
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/IEPRiemann.java:[77,4] cannot find symbol
[ERROR] symbol  : class IMQMQ
[ERROR] location: class com.omniti.reconnoiter.IEPRiemann
[ERROR] /shared/reconnoiter-riemann/src/java/reconnoiter-riemann/src/main/java/com/omniti/reconnoiter/IEPRiemann.java:[77,15] cannot find symbol
[ERROR] symbol  : variable MQFactory
[ERROR] location: class com.omniti.reconnoiter.IEPRiemann
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
gmake[2]: *** [reconnoiter-riemann/target/reconnoiter-riemann-1.0-jar-with-dependencies.jar] Error 1
gmake[2]: Leaving directory `/shared/reconnoiter-riemann/src/java'
gmake[1]: [java-bits] Error 2 (ignored)
gmake[2]: Entering directory `/shared/reconnoiter-riemann/src/modules'

libssh2 can hang forever in libssh2wait_socket()

root@us1-5:~# pstack 2503/6
2503: /opt/noit/prod/sbin/noitd -M
----------------- lwp# 6 / thread# 6 --------------------
fffffd7fff1d7e9a pollsys (fffffd7ffdc30910, 1, 0, 0)
fffffd7fff16e196 poll (fffffd7ffdc30910, 1, ffffffff) + 56
fffffd7ffe0de518 _libssh2_wait_socket () + 1a0
fffffd7ffe0df6a6 libssh2_session_disconnect_ex () + 66
fffffd7ffc8f2b4a ssh2_drive_session () + 5a4
00000000004b2a9f eventer_jobq_consumer () + cd2
00000000004b1419 eventer_jobq_consumer_pthreadentry (556a00) + 18
fffffd7fff1d0dba _thrp_setup (fffffd7ffefb1a80) + 8a
fffffd7fff1d10d0 _lwp_start ()

run as root failed

root user run notid like
notid -u xxx -g xxx

failed like:
[2011-12-08 15:12:33.688739] Cannot find user '0'
[2011-12-08 15:12:33.688751] Failed to regain privileges, exiting.

I found reason why. In src/util/noit_security.c

static inline int isuinteger(const char _user) {
char *cp;
if (user == NULL || *user == '\0' || isblank(_user)) return 0;
if (strtol (user, &cp, 10) < 0) return 0;
return (cp == '\0');
}

Maybe cp == '\0' should be *cp == '\0'

Thanks

snmp checks can get stuck

Not sure of the cause, but this is a state achieved on a noitd instance:

noit#  show check 1bda5e7a-b473-11e4-a9de-7cd1c3dcddf7
==== 1bda5e7a-b473-11e4-a9de-7cd1c3dcddf7 ====
 name: c_0_0::snmp
 module: snmp
 target: code.engine.sourcefire.com
 resolve_rtype: [undef]
 period: 60000
 timeout: 10000
 oncheck: [undef]
 filterset: filterset_0
 disable: [undef]
 config::oid_ram_avail_kb: .1.3.6.1.4.1.2021.4.11.0
 config::version: 2c
 config::oid_loadavg_1m: .1.3.6.1.4.1.2021.10.1.3.1
 config::oid_disk_repos_avail_kb: .1.3.6.1.4.1.2021.9.1.7.2
 config::security_level: authPriv
 config::oid_cpu_idle_percent: .1.3.6.1.4.1.2021.11.11.0
 config::separate_queries: false
 config::oid_swap_avail_kb: .1.3.6.1.4.1.2021.4.4.0
 config::oid_disk_root_avail_kb: .1.3.6.1.4.1.2021.9.1.7.1
 config::auth_protocol: MD5
 config::privacy_protocol: DES
 config::port: 161
 config::community: elided
 target_ip: 10.0.0.1
 currently: 00000060 idle
 next run: 11.477 seconds
 last run: never
 metrics (inprogress):
   cpu_idle_percent[i] = 99
   disk_repos_avail_kb[i] = 383412192
   disk_root_avail_kb[i] = 34414832
   loadavg_1m[s] = 0.00
   ram_avail_kb[i] = 33692976
   swap_avail_kb[i] = 0

Fix New Relic Check

The New Relic check creation is not working as expected.

I registered for a free New Relic account, created a new Infrastructure product with the 30-day trial, installed the app on my Windows VM, which successfully pushed metrics to the account, and generated an API key.

I then configured the reconnoiter check using the API key.

The check did not work as expected. There might be a communication issue, as the expected return data was not afforded as it had been before.

crash - epoll_ctl(4,add,1141,6) -> -1 (17: File exists)

#0 0x00000038eca328a5 in raise () from /lib64/libc.so.6
#1 0x00000038eca34085 in abort () from /lib64/libc.so.6
#2 0x0000000000463234 in eventer_epoll_impl_add (e=0x1e5a0c0)

at eventer_epoll_impl.c:122

#3 0x00007f11b777fc95 in noit_lua_socket_write (L=0x41e0e990)

at lua_noit.c:1230

#4 0x0000000000485f2b in lj_BC_FUNCC ()
#5 0x00007f11b777afc6 in noit_lua_check_resume (ri=0x423cb70,

nargs=<value optimized out>) at lua.c:882

#6 0x00007f11b777bac8 in noit_lua_socket_connect_complete (e=0x1e55990,

mask=<value optimized out>, vcl=0x1e558f0, now=<value optimized out>)
at lua_noit.c:216

#7 0x0000000000462b8f in eventer_epoll_impl_trigger (e=,

mask=2) at eventer_epoll_impl.c:208

#8 0x0000000000462e3b in eventer_epoll_impl_loop ()

at eventer_epoll_impl.c:279

#9 0x000000000041906a in child_main () at noitd.c:233
#10 0x0000000000419e17 in noit_main (appname=0x4d2d8c "noit",

config_filename=<value optimized out>, debug=<value optimized out>,
foreground=1, _glider=<value optimized out>,
drop_to_user=<value optimized out>, drop_to_group=0x0,
passed_child_main=0x418ed0 <child_main>) at noit_main.c:220

#11 0x0000000000419558 in main (argc=,

argv=<value optimized out>) at noitd.c:239

collectd errors in the log

Hi Theo,

After trying out the latest version, I get a hell of a lot of the errors beneath in the logs from noit:

[2015-02-14 18:28:19.485952] [error] Copying 'fork_rate","type_instance":""},{"values":[3262],"dstypes":["derive"],"dsnames":["value"],"time":1423934897.691,"interval":60.000,"host":"strauss.catsolutions.be","plugin":"mysql","plugin_instance":"mysql","type":"mysql_commands","type_instance":"unlock_tables"}]":"JAVA_Profitplus","type":"ps_data","type_instance":""}]"type_instance":"del"}]]:"steal"}]' into type limit 10
[2015-02-14 18:28:19.485999] [error] Copying '"},{"values":[3262],"dstypes":["derive"],"dsnames":["value"],"time":1423934897.691,"interval":60.000,"host":"strauss.catsolutions.be","plugin":"mysql","plugin_instance":"mysql","type":"mysql_commands","type_instance":"unlock_tables"}]":"JAVA_Profitplus","type":"ps_data","type_instance":""}]"type_instance":"del"}]]:"steal"}]' into type_instance limit 1
[2015-02-14 18:28:19.486100] [error] Copying 'strauss.catsolutions.be","plugin":"mysql","plugin_instance":"mysql","type":"mysql_commands","type_instance":"unlock_tables"}]":"JAVA_Profitplus","type":"ps_data","type_instance":""}]"type_instance":"del"}]]:"steal"}]' into host limit 24
[2015-02-14 18:28:19.486137] [error] Copying 'mysql","plugin_instance":"mysql","type":"mysql_commands","type_instance":"unlock_tables"}]":"JAVA_Profitplus","type":"ps_data","type_instance":""}]"type_instance":"del"}]]:"steal"}]' into plugin limit 6
[2015-02-14 18:28:19.486165] [error] Copying 'mysql","type":"mysql_commands","type_instance":"unlock_tables"}]":"JAVA_Profitplus","type":"ps_data","type_instance":""}]"type_instance":"del"}]]:"steal"}]' into plugin_instance limit 6
[2015-02-14 18:28:19.486191] [error] Copying 'mysql_commands","type_instance":"unlock_tables"}]":"JAVA_Profitplus","type":"ps_data","type_instance":""}]"type_instance":"del"}]]:"steal"}]' into type limit 15

Are they safe to ignore? And can I turn off the log? :)

stratcond can get stuck connecting.

From stratcond's console:

stratcon# show noit 10.198.67.69:43191
10.198.67.69:43191 [connecting]:
    Last connect: 2013-07-01 13:51:20 UTC
    Local address is 192.168.13.62:58647
    JLog event streamer [transient/iep]

Pfiles:

# pfiles `pgrep -n stratcond` | ggrep -A1 -B4 'port: 58647'
  21: S_IFSOCK mode:0666 dev:530,0 ino:44891 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        SOCK_STREAM
        SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(128872)
        sockname: AF_INET 192.168.13.62  port: 58647
        peername: AF_INET 10.198.67.69  port: 43191

event debugging:

stratcon# show eventer debug sockets
...
  [51ba120] fd: 21 [r-e-] -> noit_connection_ssl_upgrade(15f7010)
...

Support optional higher frequency histogram collection

This might necessitate a rewrite of the histogram module because 1minute collection may be baked into the assumptions in the existing module.

We should allow module configuration to increase the frequency that histogram data is submitted (the histogram aggregation window) down to some absolute minimum (1 second?)

Fix Google Analytics Check

The underlying interface to Google Analytics seemingly has changed with the introduction of v4. New checks result in a "data feed not found" error from Google.

Additionally it seems this would be a good opportunity to reimplement this particular check, as the way it works is a bit different than the other checks that we support. Specifically:

  • The existing check relies on inline creation of an oauth2 token for authentication. A more flexible approach would be to use the Google UI to generate the token manually, which is more consistent with what we do elsewhere.
  • The existing check appears to be composed of 7 sub-checks, each for a specific set of metrics associated with Google Analytics. For example visitors, goals, etc. This effectively means you need to configure multiple checks to get all the information, instead of having it all available under one one umbrella.

collectd module write_http

Hi,

I can't seem to get the write_http plugin working. I have added the following to my collectd.conf file:

<URL "https://xx.yy.xx.yy.xx:43191/module/collectd/">
User "username"
Password "s3cr3t"
VerifyPeer false
VerifyHost false
CACert "/usr/local/etc/ca.crt"
Format "JSON"
StoreRates false

This is my collectd noit config:

<module image="collectd" name="collectd">
  <config>
    <security_level>2</security_level>
    <username>username</username>
    <password>s3cr3t</password>
  </config>
</module>

It works great with the normal collectd network plugin.

I have copied the ca.crt over from the reconnoiter install, to the server which runs the collectd daemon.

I can see encrypted traffic between the 2 hosts, on the destination port of 43191/TCP, there are no errors on the server running collectd.

And this is what the failing check, looks like:

==== 52d12d4b-5834-44a7-b87a-d3821ce96035 ====
name: collectd [from module]
module: collectd [inherited from NL/collectd/@module]
target: node02.xxxxxxxx.net
resolve_rtype: prefer-ipv4 [inherited from @resolve_rtype]
period: 60000 [inherited from NL/collectd/@period]
timeout: 30000 [inherited from NL/collectd/@timeout]
oncheck: [undef]
filterset: default [inherited from @FilterSet]
disable: false
target_ip: xxx.yyy.xxx.yyy
currently: 00012060 idle
next run: 27.939 seconds
last run: 92.061 seconds ago
availability/state: unavailable/bad
status: dur=120,run=1,stats=0,ntfy=0
feeds: 0
metrics:

Any ideas to why it's not working? Do I need some other configuration option in noit, to enable the write_http module?

I am using the default noit.conf file, so I haven't fiddled with the rest rules.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.