Giter Club home page Giter Club logo

falcon-log-agent's People

Contributors

1feng avatar gaojiasheng avatar mdh67899 avatar monkeywithacupcake avatar tianyanli avatar wcc526 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

falcon-log-agent's Issues

当tag的value为Error的时候,post数据上传不上去

post请求的body为
[{"endpoint":"mycomputer","metric":"log-error-monitor","timestamp":1536565260,"step":60,"value":150,"counterType":"GAUGE","tags":"syslog=Error"}]
如果将syslog=Error改成syslog=error的话,post数据就能上报

PosterLoop slice make fix?

// PosterLoop to start post loop
// 循环推送,10s一次
func PosterLoop() {
dlog.Info("PosterLoop Start")
go func() {
for {
select {
case p := <-pushQueue:
points := make([]*FalconPoint, 0)
points = append(points, p)
DONE:
for {
select {
case tmp := <-pushQueue:
points = append(points, tmp)
continue
default:
break DONE
}
}
//先推到cache中
PostToCache(points)
//开一个协程,异步发送至odin-agent
go postToFalconAgent(points)
}
time.Sleep(10 * time.Second)
}
}()
}

this code maybe is better for:

points := make([]*FalconPoint, len(pushQueue))

or len(pushQueue) + x

动态日志路径匹配

ll /home/manzz/tomcat_all/pro/hxc_cloud/log/
total 316M
-rw-r--r-- 1 root root 40M Aug 29 23:59 all.log.2018-08-29.log
-rw-r--r-- 1 root root 39M Aug 30 23:59 all.log.2018-08-30.log
-rw-r--r-- 1 root root 39M Aug 31 23:59 all.log.2018-08-31.log

cat cfg/strategy.json
[
{
"id":1,
"name":"information",
"file_path":"/home/manzz/tomcat_all/pro/hxc_cloud/log/all.log.${%Y-%m-%d}.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"info",
"exclude":"",
"step":5,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]

测试匹配出现最多的info,一直没有数据

动态日志路径匹配问题

ll /home/manzz/tomcat_all/pro/hxc_cloud/log/
total 316M
-rw-r--r-- 1 root root 40M Aug 29 23:59 all.log.2018-08-29.log
-rw-r--r-- 1 root root 39M Aug 30 23:59 all.log.2018-08-30.log
-rw-r--r-- 1 root root 39M Aug 31 23:59 all.log.2018-08-31.log

cat cfg/strategy.json
[
{
"id":1,
"name":"information",
"file_path":"/home/manzz/tomcat_all/pro/hxc_cloud/log/all.log.${%Y-%m-%d}.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"info",
"exclude":"",
"step":5,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]

测试匹配出现最多的info,一直没有数据 falcon-log-agent git:(master) ✗ curl localhost:8003/strategy
[{"id":1,"name":"information","file_path":"/home/manzz/tomcat_all/pro/hxc_cloud/log/all.log.${%Y-%m-%d}.log","time_format":"yyyy-mm-dd HH:MM:SS","pattern":"info","exclude":"","step":5,"tags":{},"func":"cnt","degree":6,"comment":"我是备注","parse_succ":false}

falcon-log-agent git:(master) ✗ curl localhost:8003/cached
{"counters":{}}#

大量metric上报

这个插件可以做到扫描指定位置的日志文件,然后上报到falcon。
我想咨询一下,如果一个文件里面有大量且不重复的metric,此时配置strategy.json的话就需要每个metric都配一下,那么这个工作量也很大,感觉也不合理。
有没有合适的方式扫描到一个metric就上报一个metric,而且这里的name对应falcon的metric,但是还是需要自己配,有没有可能直接上报falcon的metric。
具体想法是:扫描到文件里面的metric,那么上报这个metric(metric大量且不重复

未匹配到指定字段的默认值怎么设置

匹配到pattern会向openfalcon推送数据,如果未匹配到指定pattern可以向openfalcon推送一个默认值。

否则,有些告警由于值是空的,实际已经恢复了,但是没达到触发条件,无法恢复。比如(#3)> 3,只有当这个告警再次出现且< 3 才会触发恢复。

当然,可以用nodata填充默认值,但是日志规则太多,填写比较麻烦。

【咨询】关于数据上报问题

**root@sinpedx00028:/home/work/open-falcon/falcon-log-agent# curl -s -XPOST localhost:8003/check -d 'log=Jan 30 14:10:49 sinpedx00028 ntpd[95200]: Soliciting pool server 45.125.1.20' | python -m json.tool
{
"body": [
{
"detail": {
"code": "sinpedx00028",
"pattern_": "sinpedx00028",
"time_": "Jan 30 14:10:49"
},
"strategy": {
"comment": "\u6211\u662f\u5907\u6ce8",
"degree": 6,
"exclude": "",
"file_path": "/var/log/syslog",
"func": "cnt",
"id": 1,
"name": "\u6d41\u91cf500\u9519\u8bef\u6570",
"parse_succ": true,
"pattern": "sinpedx00028",
"step": 10,
"tags": {
"code": "sinpedx00028"
},
"time_format": "mmm dd HH:MM:SS"
}
}
],
"matched": true
}
root@sinpedx00028:/home/work/open-falcon/falcon-log-agent# curl localhost:8003/cached
{"counters":{}}root@sinpedx00028:/home/work/open-falcon/falcon-log-agent#

手工执行check能匹配上,但是实际运行一直取不到上报数据,能否请作者帮忙指导一下。谢谢**

几个问题请教一下

1.这个适用采集业务程序日志么?能否代替flume类的日志采集器?
2.正则匹配的问题,对于日志量较大,正则表达式非常复杂的情况下,会不会影响采集的性能?造成cpu非常高?

support multiple-line log analysis

  • pre-check during strategy load
  • generate a time format configuration by file dimension
  • analysis the timestamp when read logs and save for furthur analysis
  • drop the timestam analysis from worker module, instead by using the timestamp directly which analysised when reading file.

匹配中文问题

➜ falcon-log-agent git:(master) ✗ curl -s -XPOST localhost:8003/check -d 'log=2018-09-06 10:31:10.094 [pool-4-thread-1] ERROR com.hxc_cloud.hxc_cloud.core.impl.TimerServiceImpl - Thread:40:---ERROR---分钟学时数据存储异常---ERROR- ' | python -m json.tool
{
"body": [
{
"detail": {
"pattern_": "\u5206\u949f\u5b66\u65f6\u6570\u636e\u5b58\u50a8\u5f02\u5e38",
"time_": "2018-09-06 10:31:10"
},
"strategy": {
"comment": "\u6211\u662f\u5907\u6ce8",
"degree": 6,
"exclude": "",
"file_path": "/home/manzz/application/hxc_cloud_pro/log/error.log.2018-09-06.log",
"func": "cnt",
"id": 1,
"name": "max_hour_ERROR",
"parse_succ": false,
"pattern": "\u5206\u949f\u5b66\u65f6\u6570\u636e\u5b58\u50a8\u5f02\u5e38",
"step": 10,
"tags": {},
"time_format": "yyyy-mm-dd HH:MM:SS"
}
}
],
"matched": true
}
➜ falcon-log-agent git:(master) ✗ cat cfg/strategy.json
[
{
"id":1,
"name":"max_hour_ERROR",
"file_path":"/home/manzz/application/hxc_cloud_pro/log/error.log.2018-09-06.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"分钟学时数据存储异常",
"exclude":"dial",
"step":10,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]
➜ falcon-log-agent git:(master) ✗ curl localhost:8003/cached
{"counters":{}}#

app.log日志报错

app.log日志里有错误信息:
fatal error: concurrent map read and map write

过一会,./control status的状态就变成stoped状态

小白来请教下这个竞争问题 fatal error: concurrent map read and map write

fatal error: concurrent map read and map write

goroutine 15 [running]:
runtime.throw(0xc6dda0running]:
runtime.throw(0xc6dda0, 0x21)
0x21)
/usr/lib/golang/src/runtime/panic.go:/usr/lib/golang/src/runtime/panic.go:547 + +0x90 fp= fp=0xc8214015f0 sp= sp=0xc8214015d8

runtime.mapaccess2_faststr(0x9c02e00x9c02e0, 0xc8202176b00xc8202176b0, 0xc8201a2280, 0xc8201a2280, 0x32, 0x20x2, 0x20x2)
/usr/lib/golang/src/runtime/hashmap_fast.go::307 + +0x5b fp= fp=0xc821401650 sp= sp=0xc8214015f0

Inner machine Security risks for listen 0.0.0.0

// Start http api
func Start() {
router := gin.Default()
router.GET("/health", func(c *gin.Context) {
c.JSON(http.StatusOK, "ok")
})
router.GET("/strategy", func(c *gin.Context) {
c.JSON(http.StatusOK, strategy.GetListAll())
})

router.GET("/cached", func(c *gin.Context) {
	c.String(http.StatusOK, worker.GetCachedAll())
})

router.POST("/check", func(c *gin.Context) {
	log := c.PostForm("log")
	c.JSON(http.StatusOK, CheckLogByStrategy(log))
})

router.Run(fmt.Sprintf("0.0.0.0:%d", g.Conf().Http.HTTPPort))

}

正则未匹配到行,cnt计算方法,无法记录为0.

正则未匹配到行,cnt计算方法,无法记录为0.
因为falcon-server通过超过阈值报警,低于阈值恢复。
我们在扫面日志错误关键字时,正常情况下,是匹配不带关键字的。所以希望cnt计算方法可以记录为0,便于falcon-server后续报警策略

匹配规则怎么写也不对

日志格式这样,谁会写匹配规则,帮忙来个示例。谢谢。统计200访问量
192.168.0.254 - - [14/Aug/2018:17:25:19 +0800] "GET /admin.php HTTP/1.1" 200 34 "http://test.svsse.dev/admin.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0" "103.254.65.210"

日志么有报错,但是为啥没有上报到agent数据呢?

strategy.dev.json 如下:
[
{
"id":1,
"name":"test_nginx_500",
"file_path":"/home/lanyulei/go/go_projects/open-falcon/log_agent/log/test.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"error",
"exclude":"",
"step":10,
"tags":{
"error":"test_error"
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]

相关的日志信息如下:
2018-08-14 18:30:00.442322 DEBUG metric/metric.go:96 self monit [metric:log.agent.push.err.cnt][tms:1534242600][value:0]
2018-08-14 18:30:00.442331 DEBUG metric/metric.go:97 self monit [metric:log.agent.read.line.cnt][tms:1534242600][value:&{{{0 0} 0 0 0 0} map[/home/lanyulei/go/go_projects/open-falcon/log_agent/log/test.log:50]}]
2018-08-14 18:30:00.442416 DEBUG metric/metric.go:98 self monit [metric:log.agent.drop.line.cnt][tms:1534242600][value:&{{{0 0} 0 0 0 0} map[/home/lanyulei/go/go_projects/open-falcon/log_agent/log/test.log:0]}]
2018-08-14 18:30:00.442433 DEBUG metric/metric.go:99 self monit [metric:log.agent.analysis.cnt][tms:1534242600][value:&{{{0 0} 0 0 0 0} map[/home/lanyulei/go/go_projects/open-falcon/log_agent/log/test.log:50]}]
2018-08-14 18:30:00.442448 DEBUG metric/metric.go:100 self monit [metric:log.agent.analysis.succ][tms:1534242600][value:&{{{0 0} 0 0 0 0} map[]}]

请大神指教啊。

配置StrategyConfig如下,但是未见采取数据上传到agent

[

{
    "id":1,
    "name":"servervice",
    "file_path":"/opt/xcloud-cm.016-10-17_0.info.log",
    "time_format":"dd/mmm/yyyy:HH:MM:SS",
    "pattern":"log,num=(\\d+)",
    "exclude":"unimport-request",
    "step":10,
    "tags":{
        "error":"服务器下线: null"
    },
    "func":"cnt",
    "degree":6,
    "comment":"我是备注"
}

]
tail -f /var/log/log-agent/INFO.log
2018-07-23 11:32:54.403516 INFO strategy/update.go:22 [1532316774]Update Strategy start
2018-07-23 11:32:54.403769 INFO strategy/get_config.go:22 load config success from cfg/strategy.json
2018-07-23 11:32:54.403929 INFO strategy/update.go:31 [1532316774]Get my Strategy success, num : [1]
2018-07-23 11:32:54.403944 INFO strategy/update.go:38 [1532316774]Update Strategy end
2018-07-23 11:32:54.408669 INFO worker/counter.go:254 Updating global count
2018-07-23 11:32:54.408698 INFO worker/counter.go:285 Update global count done, [del:0][update:0]
2018-07-23 11:32:54.435036 INFO patrol/patrol.go:21 agent mem used : 35MB, percent : 3%

动态日志路径匹配

ll /home/manzz/tomcat_all/pro/hxc_cloud/log/
total 316M
-rw-r--r-- 1 root root 40M Aug 29 23:59 all.log.2018-08-29.log
-rw-r--r-- 1 root root 39M Aug 30 23:59 all.log.2018-08-30.log
-rw-r--r-- 1 root root 39M Aug 31 23:59 all.log.2018-08-31.log

cat cfg/strategy.json
[
{
"id":1,
"name":"information",
"file_path":"/home/manzz/tomcat_all/pro/hxc_cloud/log/all.log.${%Y-%m-%d}.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"info",
"exclude":"",
"step":5,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":"我是备注"
}
]

测试匹配出现最多的info,一直没有数据

重启后采集策略不变

首先,agent是启动状态,此时去修改strategy.json的采集策略,然后重启agent,从暴露的接口/strategy获取生效的采集策略时,发现采集策略并未发生变化。
还有,发现strategy.json采集策略没有配置好的话,是不会生效的吗?就是说配错了也没有提示

动态路径相关

"file_path":"/home/manzz/tomcat_all/pro/hxc_cloud/log/all.log.{%Y-%m-%d}.log"
动态日志路径设置之后是去匹配当天的日期还是之前能匹配的都按日志路径匹配

日志报错(producer error)数据不上传,求大神解救!

大神好:
我使用falcon-log-agent的时候出现一个问题,配置文件strategy.json中配置:
"id":1,
"name":"agent-jm-log-INFO",
"file_path":"xxxxxxxxxxxxxx/jm.log",
"time_format":"yyyy/mm/dd HH:MM:SS",
"pattern":"INFO",
"exclude":"",
"step":10,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":""

然而并不上传数据,我看了下日志出现错误日志:
2019-05-27 15:05:16.078068 ERROR sample_log/sample_log.go:83 [worker][file:/opt/dmeeting/dm-project/logs/jm-manager/jm.log][num:10][id:6][producer error][sid:1] : cannot get timestamp:[sname:agent-jm-log-Error][sid:1][timeFormat:2006/01/02 15:04:05]. log_num : 1
2019-05-27 15:05:16.078084 ERROR sample_log/sample_log.go:83 [worker][file:/opt/dmeeting/dm-project/logs/jm-manager/jm.log][num:10][id:9][producer error][sid:1] : cannot get timestamp:[sname:agent-jm-log-Error][sid:1][timeFormat:2006/01/02 15:04:05]. log_num : 1

说什么producer error,但是我用curl -s -XPOST localhost:8003/check -d 'log=2019/05/18 12:12:12 INFO, num=10 province= ' | python -m json.tool
是可以正常返回数据的
"body": [
{
"detail": {
"pattern_": "INFO",
"time_": "2019/05/18 12:12:12"
},
"strategy": {
"comment": "",
"degree": 6,
"exclude": "",
"file_path": "/xxxxxxxxxx/jm.log",
"func": "cnt",
"id": 1,
"name": "agent-jm-log-Error",
"parse_succ": true,
"pattern": "INFO",
"step": 10,
"tags": {},
"time_format": "yyyy/mm/dd HH:MM:SS"
}
}
],
"matched": true

然后我就无语了

排查的过程中我修改了配置,即上面配置文件中修改了"pattern":"INFO", 改成"pattern":"Total=(\d+)", 并且修改成对应的agent的log路径,则完全正常,没有错误日志,也正常出现图形,所以我怀疑是"pattern":"INFO", 这个问题,我换成了"pattern":"ERROR",也不行,我想问下pattern
这到底该如何配置,我想统计INFO日志(计数就可以cnt),和统计ERROR错误日志,还请大神指点一二!!!!

匹配不成功,是什么原因呢?

cat cfg/strategy.json
[
{
"id":1,
"name":"流量500错误数",
"file_path":"/root/log/access.log",
"time_format":"yyyy-mm-dd HH:MM:SS",
"pattern":"error",
"exclude":"",
"step":5,
"tags":{
},
"func":"cnt",
"degree":6,
"comment":"我是Error 500"
}
]

curl -s -XPOST localhost:8003/check -d 'log=2017/12/01 12:12:06 service error 505, num=10 province=33' | python -m json.tool

{
"body": [],
"matched": false
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.