didi / falcon-log-agent Goto Github PK
View Code? Open in Web Editor NEW用于监控系统的日志采集agent,可无缝对接open-falcon
License: MIT License
用于监控系统的日志采集agent,可无缝对接open-falcon
License: MIT License
post请求的body为
[{"endpoint":"mycomputer","metric":"log-error-monitor","timestamp":1536565260,"step":60,"value":150,"counterType":"GAUGE","tags":"syslog=Error"}]
如果将syslog=Error改成syslog=error的话,post数据就能上报
为什么内存资源占用取的是堆内存?有办法查询swap空间的占用情况吗?如果swap占用比较多会影响机器的整体性能.
make pack没有生成tar文件,直接control start报./falcon-log-agent: 1: ./falcon-log-agent: Syntax error: word unexpected (expecting ")")
// PosterLoop to start post loop
// 循环推送,10s一次
func PosterLoop() {
dlog.Info("PosterLoop Start")
go func() {
for {
select {
case p := <-pushQueue:
points := make([]*FalconPoint, 0)
points = append(points, p)
DONE:
for {
select {
case tmp := <-pushQueue:
points = append(points, tmp)
continue
default:
break DONE
}
}
//先推到cache中
PostToCache(points)
//开一个协程,异步发送至odin-agent
go postToFalconAgent(points)
}
time.Sleep(10 * time.Second)
}
}()
}
this code maybe is better for:
points := make([]*FalconPoint, len(pushQueue))
or len(pushQueue) + x
ll /home/manzz/tomcat_all/pro/hxc_cloud/log/
total 316M
-rw-r--r-- 1 root root 40M Aug 29 23:59 all.log.2018-08-29.log
-rw-r--r-- 1 root root 39M Aug 30 23:59 all.log.2018-08-30.log
-rw-r--r-- 1 root root 39M Aug 31 23:59 all.log.2018-08-31.log
cat cfg/strategy.json
[
{
"id":1,
"name":"information",
"file_path":"/home/manzz/tomcat_all/pro/hxc_cloud/log/all.log.${%Y-%m-%d}.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"info",
"exclude":"",
"step":5,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]
测试匹配出现最多的info,一直没有数据
多行日志怎么处理?
为啥不把falcon-log-agent的功能合并到open-falcon-agent服务里,这样子的话,每个实例都得要安装2个agent,一个采集常规数据,一个采集日志。感觉太浪费资源了
ll /home/manzz/tomcat_all/pro/hxc_cloud/log/
total 316M
-rw-r--r-- 1 root root 40M Aug 29 23:59 all.log.2018-08-29.log
-rw-r--r-- 1 root root 39M Aug 30 23:59 all.log.2018-08-30.log
-rw-r--r-- 1 root root 39M Aug 31 23:59 all.log.2018-08-31.log
cat cfg/strategy.json
[
{
"id":1,
"name":"information",
"file_path":"/home/manzz/tomcat_all/pro/hxc_cloud/log/all.log.${%Y-%m-%d}.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"info",
"exclude":"",
"step":5,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]
测试匹配出现最多的info,一直没有数据 falcon-log-agent git:(master) ✗ curl localhost:8003/strategy
[{"id":1,"name":"information","file_path":"/home/manzz/tomcat_all/pro/hxc_cloud/log/all.log.${%Y-%m-%d}.log","time_format":"yyyy-mm-dd HH:MM:SS","pattern":"info","exclude":"","step":5,"tags":{},"func":"cnt","degree":6,"comment":"我是备注","parse_succ":false}
falcon-log-agent git:(master) ✗ curl localhost:8003/cached
{"counters":{}}#
这个插件可以做到扫描指定位置的日志文件,然后上报到falcon。
我想咨询一下,如果一个文件里面有大量且不重复的metric,此时配置strategy.json的话就需要每个metric都配一下,那么这个工作量也很大,感觉也不合理。
有没有合适的方式扫描到一个metric就上报一个metric,而且这里的name对应falcon的metric,但是还是需要自己配,有没有可能直接上报falcon的metric。
具体想法是:扫描到文件里面的metric,那么上报这个metric(metric大量且不重复)
匹配到pattern会向openfalcon推送数据,如果未匹配到指定pattern可以向openfalcon推送一个默认值。
否则,有些告警由于值是空的,实际已经恢复了,但是没达到触发条件,无法恢复。比如(#3)> 3,只有当这个告警再次出现且< 3 才会触发恢复。
当然,可以用nodata填充默认值,但是日志规则太多,填写比较麻烦。
**root@sinpedx00028:/home/work/open-falcon/falcon-log-agent# curl -s -XPOST localhost:8003/check -d 'log=Jan 30 14:10:49 sinpedx00028 ntpd[95200]: Soliciting pool server 45.125.1.20' | python -m json.tool
{
"body": [
{
"detail": {
"code": "sinpedx00028",
"pattern_": "sinpedx00028",
"time_": "Jan 30 14:10:49"
},
"strategy": {
"comment": "\u6211\u662f\u5907\u6ce8",
"degree": 6,
"exclude": "",
"file_path": "/var/log/syslog",
"func": "cnt",
"id": 1,
"name": "\u6d41\u91cf500\u9519\u8bef\u6570",
"parse_succ": true,
"pattern": "sinpedx00028",
"step": 10,
"tags": {
"code": "sinpedx00028"
},
"time_format": "mmm dd HH:MM:SS"
}
}
],
"matched": true
}
root@sinpedx00028:/home/work/open-falcon/falcon-log-agent# curl localhost:8003/cached
{"counters":{}}root@sinpedx00028:/home/work/open-falcon/falcon-log-agent#
手工执行check能匹配上,但是实际运行一直取不到上报数据,能否请作者帮忙指导一下。谢谢**
1.这个适用采集业务程序日志么?能否代替flume类的日志采集器?
2.正则匹配的问题,对于日志量较大,正则表达式非常复杂的情况下,会不会影响采集的性能?造成cpu非常高?
file
dimensionworker
module, instead by using the timestamp directly which analysised when reading file.➜ falcon-log-agent git:(master) ✗ curl -s -XPOST localhost:8003/check -d 'log=2018-09-06 10:31:10.094 [pool-4-thread-1] ERROR com.hxc_cloud.hxc_cloud.core.impl.TimerServiceImpl - Thread:40:---ERROR---分钟学时数据存储异常---ERROR- ' | python -m json.tool
{
"body": [
{
"detail": {
"pattern_": "\u5206\u949f\u5b66\u65f6\u6570\u636e\u5b58\u50a8\u5f02\u5e38",
"time_": "2018-09-06 10:31:10"
},
"strategy": {
"comment": "\u6211\u662f\u5907\u6ce8",
"degree": 6,
"exclude": "",
"file_path": "/home/manzz/application/hxc_cloud_pro/log/error.log.2018-09-06.log",
"func": "cnt",
"id": 1,
"name": "max_hour_ERROR",
"parse_succ": false,
"pattern": "\u5206\u949f\u5b66\u65f6\u6570\u636e\u5b58\u50a8\u5f02\u5e38",
"step": 10,
"tags": {},
"time_format": "yyyy-mm-dd HH:MM:SS"
}
}
],
"matched": true
}
➜ falcon-log-agent git:(master) ✗ cat cfg/strategy.json
[
{
"id":1,
"name":"max_hour_ERROR",
"file_path":"/home/manzz/application/hxc_cloud_pro/log/error.log.2018-09-06.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"分钟学时数据存储异常",
"exclude":"dial",
"step":10,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]
➜ falcon-log-agent git:(master) ✗ curl localhost:8003/cached
{"counters":{}}#
app.log日志里有错误信息:
fatal error: concurrent map read and map write
过一会,./control status的状态就变成stoped状态
fatal error: concurrent map read and map write
goroutine 15 [running]:
runtime.throw(0xc6dda0running]:
runtime.throw(0xc6dda0, 0x21)
0x21)
/usr/lib/golang/src/runtime/panic.go:/usr/lib/golang/src/runtime/panic.go:547 + +0x90 fp= fp=0xc8214015f0 sp= sp=0xc8214015d8
runtime.mapaccess2_faststr(0x9c02e00x9c02e0, 0xc8202176b00xc8202176b0, 0xc8201a2280, 0xc8201a2280, 0x32, 0x20x2, 0x20x2)
/usr/lib/golang/src/runtime/hashmap_fast.go::307 + +0x5b fp= fp=0xc821401650 sp= sp=0xc8214015f0
// Start http api
func Start() {
router := gin.Default()
router.GET("/health", func(c *gin.Context) {
c.JSON(http.StatusOK, "ok")
})
router.GET("/strategy", func(c *gin.Context) {
c.JSON(http.StatusOK, strategy.GetListAll())
})
router.GET("/cached", func(c *gin.Context) {
c.String(http.StatusOK, worker.GetCachedAll())
})
router.POST("/check", func(c *gin.Context) {
log := c.PostForm("log")
c.JSON(http.StatusOK, CheckLogByStrategy(log))
})
router.Run(fmt.Sprintf("0.0.0.0:%d", g.Conf().Http.HTTPPort))
}
正则未匹配到行,cnt计算方法,无法记录为0.
因为falcon-server通过超过阈值报警,低于阈值恢复。
我们在扫面日志错误关键字时,正常情况下,是匹配不带关键字的。所以希望cnt计算方法可以记录为0,便于falcon-server后续报警策略
日志格式这样,谁会写匹配规则,帮忙来个示例。谢谢。统计200访问量
192.168.0.254 - - [14/Aug/2018:17:25:19 +0800] "GET /admin.php HTTP/1.1" 200 34 "http://test.svsse.dev/admin.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0" "103.254.65.210"
strategy.dev.json 如下:
[
{
"id":1,
"name":"test_nginx_500",
"file_path":"/home/lanyulei/go/go_projects/open-falcon/log_agent/log/test.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"error",
"exclude":"",
"step":10,
"tags":{
"error":"test_error"
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]
相关的日志信息如下:
2018-08-14 18:30:00.442322 DEBUG metric/metric.go:96 self monit [metric:log.agent.push.err.cnt][tms:1534242600][value:0]
2018-08-14 18:30:00.442331 DEBUG metric/metric.go:97 self monit [metric:log.agent.read.line.cnt][tms:1534242600][value:&{{{0 0} 0 0 0 0} map[/home/lanyulei/go/go_projects/open-falcon/log_agent/log/test.log:50]}]
2018-08-14 18:30:00.442416 DEBUG metric/metric.go:98 self monit [metric:log.agent.drop.line.cnt][tms:1534242600][value:&{{{0 0} 0 0 0 0} map[/home/lanyulei/go/go_projects/open-falcon/log_agent/log/test.log:0]}]
2018-08-14 18:30:00.442433 DEBUG metric/metric.go:99 self monit [metric:log.agent.analysis.cnt][tms:1534242600][value:&{{{0 0} 0 0 0 0} map[/home/lanyulei/go/go_projects/open-falcon/log_agent/log/test.log:50]}]
2018-08-14 18:30:00.442448 DEBUG metric/metric.go:100 self monit [metric:log.agent.analysis.succ][tms:1534242600][value:&{{{0 0} 0 0 0 0} map[]}]
请大神指教啊。
[
{
"id":1,
"name":"servervice",
"file_path":"/opt/xcloud-cm.016-10-17_0.info.log",
"time_format":"dd/mmm/yyyy:HH:MM:SS",
"pattern":"log,num=(\\d+)",
"exclude":"unimport-request",
"step":10,
"tags":{
"error":"服务器下线: null"
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]
tail -f /var/log/log-agent/INFO.log
2018-07-23 11:32:54.403516 INFO strategy/update.go:22 [1532316774]Update Strategy start
2018-07-23 11:32:54.403769 INFO strategy/get_config.go:22 load config success from cfg/strategy.json
2018-07-23 11:32:54.403929 INFO strategy/update.go:31 [1532316774]Get my Strategy success, num : [1]
2018-07-23 11:32:54.403944 INFO strategy/update.go:38 [1532316774]Update Strategy end
2018-07-23 11:32:54.408669 INFO worker/counter.go:254 Updating global count
2018-07-23 11:32:54.408698 INFO worker/counter.go:285 Update global count done, [del:0][update:0]
2018-07-23 11:32:54.435036 INFO patrol/patrol.go:21 agent mem used : 35MB, percent : 3%
ll /home/manzz/tomcat_all/pro/hxc_cloud/log/
total 316M
-rw-r--r-- 1 root root 40M Aug 29 23:59 all.log.2018-08-29.log
-rw-r--r-- 1 root root 39M Aug 30 23:59 all.log.2018-08-30.log
-rw-r--r-- 1 root root 39M Aug 31 23:59 all.log.2018-08-31.log
cat cfg/strategy.json
[
{
"id":1,
"name":"information",
"file_path":"/home/manzz/tomcat_all/pro/hxc_cloud/log/all.log.${%Y-%m-%d}.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"info",
"exclude":"",
"step":5,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]
测试匹配出现最多的info,一直没有数据
@GaoJiasheng ,您好, 线上日志要保留不能被切分, 这种情况不知是否能够兼容呢?
日志格式类似这样:
10.1.0.124 - - [24/Apr/2020:07:06:11 +0000] "PATCH /api/DN/checkdns. HTTP/1.1" 200 1600 "-" "python-requests/2.23.0"
@GaoJiasheng 麻烦帮忙确认一下
首先,agent是启动状态,此时去修改strategy.json的采集策略,然后重启agent,从暴露的接口/strategy
获取生效的采集策略时,发现采集策略并未发生变化。
还有,发现strategy.json采集策略没有配置好的话,是不会生效的吗?就是说配错了也没有提示
"file_path":"/home/manzz/tomcat_all/pro/hxc_cloud/log/all.log.{%Y-%m-%d}.log"
动态日志路径设置之后是去匹配当天的日期还是之前能匹配的都按日志路径匹配
大神好:
我使用falcon-log-agent的时候出现一个问题,配置文件strategy.json中配置:
"id":1,
"name":"agent-jm-log-INFO",
"file_path":"xxxxxxxxxxxxxx/jm.log",
"time_format":"yyyy/mm/dd HH:MM:SS",
"pattern":"INFO",
"exclude":"",
"step":10,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":""
然而并不上传数据,我看了下日志出现错误日志:
2019-05-27 15:05:16.078068 ERROR sample_log/sample_log.go:83 [worker][file:/opt/dmeeting/dm-project/logs/jm-manager/jm.log][num:10][id:6][producer error][sid:1] : cannot get timestamp:[sname:agent-jm-log-Error][sid:1][timeFormat:2006/01/02 15:04:05]. log_num : 1
2019-05-27 15:05:16.078084 ERROR sample_log/sample_log.go:83 [worker][file:/opt/dmeeting/dm-project/logs/jm-manager/jm.log][num:10][id:9][producer error][sid:1] : cannot get timestamp:[sname:agent-jm-log-Error][sid:1][timeFormat:2006/01/02 15:04:05]. log_num : 1
说什么producer error,但是我用curl -s -XPOST localhost:8003/check -d 'log=2019/05/18 12:12:12 INFO, num=10 province= ' | python -m json.tool
是可以正常返回数据的
"body": [
{
"detail": {
"pattern_": "INFO",
"time_": "2019/05/18 12:12:12"
},
"strategy": {
"comment": "",
"degree": 6,
"exclude": "",
"file_path": "/xxxxxxxxxx/jm.log",
"func": "cnt",
"id": 1,
"name": "agent-jm-log-Error",
"parse_succ": true,
"pattern": "INFO",
"step": 10,
"tags": {},
"time_format": "yyyy/mm/dd HH:MM:SS"
}
}
],
"matched": true
然后我就无语了
排查的过程中我修改了配置,即上面配置文件中修改了"pattern":"INFO", 改成"pattern":"Total=(\d+)", 并且修改成对应的agent的log路径,则完全正常,没有错误日志,也正常出现图形,所以我怀疑是"pattern":"INFO", 这个问题,我换成了"pattern":"ERROR",也不行,我想问下pattern
这到底该如何配置,我想统计INFO日志(计数就可以cnt),和统计ERROR错误日志,还请大神指点一二!!!!
cat cfg/strategy.json
[
{
"id":1,
"name":"流量500错误数",
"file_path":"/root/log/access.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"error",
"exclude":"",
"step":5,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":"我是Error 500"
}
]
curl -s -XPOST localhost:8003/check -d 'log=2017/12/01 12:12:06 service error 505, num=10 province=33' | python -m json.tool
{
"body": [],
"matched": false
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.