Giter Club home page Giter Club logo

Comments (18)

robotneo avatar robotneo commented on August 10, 2024

版本是:v0.3.63-cd87ed55ee9611a1208d801ae12f2d2fa481fb12

from categraf.

kongfei605 avatar kongfei605 commented on August 10, 2024

不会的,每个target都有超时时间重试次数,最后时间到了正常的会有指标上报的。

max repetition 调成10 试试

from categraf.

robotneo avatar robotneo commented on August 10, 2024

不好意思,是我急切了,有一会是没有数据,现在数据就一切正常了,我在观察会

from categraf.

robotneo avatar robotneo commented on August 10, 2024

观察了会,样本数据点会中断,我截图给你看下。
image

from categraf.

robotneo avatar robotneo commented on August 10, 2024

一会好了 一会数据中断,是不是采集间隔轮训导致数据中断了,会影响其他正常的也中断一会。
image

from categraf.

UlricQin avatar UlricQin commented on August 10, 2024

vm可能会有补点的逻辑,要看真实数据的话,还是得用 range vector,用 Table 视图。类似这样:

image

可以清楚的看到具体是哪些时间点上报的数据

from categraf.

robotneo avatar robotneo commented on August 10, 2024

比如这个样本正常状态是的采集数据是3,当轮训到 172.16.42.3 的时候,采集不到就导致所有其他的正常IP画图也端点?

from categraf.

robotneo avatar robotneo commented on August 10, 2024

正常能采集的对象指标 VM画图也给断点?就因为一个老鼠屎搅一锅粥

from categraf.

robotneo avatar robotneo commented on August 10, 2024
image 确定下是categraf的snmp问题 还是vm问题,我现在去使用telegraf测试下,能否复现。

from categraf.

robotneo avatar robotneo commented on August 10, 2024

1、categraf的SNMP插件采集数据插入到Prometheus中,agents中对象都是实际存在的,网络可通可采集的对象,Graph查询没有数据断点,测试时间:15~20分钟,时间范围缩小都没有断点。
2、添加网络不通的对象或采集不到的对象,写入Prometheus中,和VictoriaMetrics一样,还是出现断点问题,断点问题如下,日志报错如下所示:
image
image
image

from categraf.

kongfei605 avatar kongfei605 commented on August 10, 2024

这个正常,timeout对应的那个周期的都会断

from categraf.

robotneo avatar robotneo commented on August 10, 2024

timeout = "5s"
retries = 3

这个是15秒?如果有一个对象采集不到,就所有的对象等一起等着,直到结束?,我觉得不太合理,是不是不要等样本全部返回后在把数据暴露出去,而是以instances来处理 避免其他正常的也出现断点 这样才合理一点

from categraf.

robotneo avatar robotneo commented on August 10, 2024

func (ins *Instance) Gather(slist *types.SampleList) {
for i, agent := range ins.Agents {
var wg sync.WaitGroup
wg.Add(1)
go func(i int, agent string) {
defer wg.Done()
// First is the top-level fields. We treat the fields as table prefixes with an empty index.
t := Table{
Name: ins.Name,
Fields: ins.Fields,

			DebugMode: ins.DebugMod,
		}
		for idx, f := range t.Fields {
			t.Fields[idx].Oid = strings.TrimSpace(f.Oid)
		}
		topTags := map[string]string{}
		for k, v := range ins.GetLabels() {
			topTags[k] = v
		}
		extraTags := map[string]string{}
		if m, ok := ins.Mappings[agent]; ok {
			extraTags = m
		}
		if !ins.DisableUp {
			ins.up(slist, i)
		}

		gs, err := ins.getConnection(i)
		if err != nil {
			log.Printf("agent %s ins: %s", agent, err)
			return
		}
		if err := ins.gatherTable(slist, gs, t, topTags, extraTags, false); err != nil {
			log.Printf("agent %s ins: %s", agent, err)
		}

		// Now is the real tables.
		for _, t := range ins.Tables {
			if err := ins.gatherTable(slist, gs, t, topTags, extraTags, true); err != nil {
				log.Printf("agent %s ins: gathering table %s error: %s", agent, t.Name, err)
			}
		}
	}(i, agent)
	wg.Wait() // 等待单个采集完成后立即存储结果 改下?
}

}

应该是这一段吧,等待所有agents完成采集,然后样本值在输出出去

from categraf.

kongfei605 avatar kongfei605 commented on August 10, 2024

你想改成什么样子?

from categraf.

robotneo avatar robotneo commented on August 10, 2024

就是避免因为一个agent 影响所有agent的数据转储 可以边采集边转储 其中一个挂了(网络不通)不影响其他的转储,这样在图表上就不会因为一个agent因为重试导致所有agent都等待中 图表无数据出现断点问题

from categraf.

kongfei605 avatar kongfei605 commented on August 10, 2024

同一个instances中采集周期是一样的,超时判断逻辑也应该一样,完全没必要设置那么大的超时+重试次数。

如果不同时控制,挂了的设备会堆积探测goroutine ,超时+重试设置越大堆积会越多(超时+重试期间下一轮探测又开始了)。

这种情况,最好是加一个目标IP标记+旁路探测逻辑来完成,这个和non accessible oid的采集比起来优先级没有那么高。

from categraf.

robotneo avatar robotneo commented on August 10, 2024

谢谢解惑,这个有计划弄嘛

from categraf.

kongfei605 avatar kongfei605 commented on August 10, 2024

有,长期优化点。

from categraf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.