Giter Club home page Giter Club logo

Comments (8)

robcowart avatar robcowart commented on June 6, 2024 1

It looks like you have a device that is not RFC-compliant and is using ifIndex numbers for interfaces that are too large. In theory an ifIndex number is 32-bits. However the most significant bit would be indicate a negative value, so you are limited to 31-bits. This is why the sFlow v5 spec (http://sflow.org/sflow_version_5.txt) states that the maximum value for ifIndex is 0x3FFFFFFF (which is actually 30-bits).

Further the spec states:

While the theoretical range of ifIndex numbers is 2^32, RFC 2863 recommends that ifIndex numbers are allocated using small integer values starting at 1. For most agent implementations the 2^24 range of values for ifIndex supported by the compact encoding is more than adequate and its use saves bandwidth.

In the latter case 2147483648 is a full 32-bits + 1, so it is too big for an integer, although it look like Logstash tries to deal with it, but Elasticsearch rejects it. In the first error the value is significantly beyond 32-bits and it fails already at the input.

So the initial indication is that the device sending the flows (what kind of device is it?) is sending bad flows, which may be related to it using non-RFC compliant ifIndex values. If you send me a packet capture of an sFlow record that is causing these issues I can confirm this is the problem, but based on the logs that seems to be the case.

from elastiflow.

robcowart avatar robcowart commented on June 6, 2024

Can you please open the second problem as a separate issue so I can address each individually. Thanks.

from elastiflow.

robcowart avatar robcowart commented on June 6, 2024

Actually these may be related... leave it for now.

from elastiflow.

lyepustin avatar lyepustin commented on June 6, 2024

Thx for you feedback!
I'm using 2 device to export sflow to my host: Switch Arista (dcs-7504 and 7280sr).

How is it possible that between each warm spend more than five minutes? Should not it happen more often?

Example with bold text hours:
[2018-02-07T16:59:07`,299][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"elastiflow-2018.02.07", :_type=>"doc", :_routing=>nil}, #LogStash::Event:0x3eb34f63], :response=>{"index"=>{"_index"=>"elastiflow-2018.02.07", "_type"=>"doc", "_id"=>"lx_9cGEBEAOT3Lqa9r4f", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [sflow.output_interface]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Value [2147483648] is out of range for an integer"}}}}}

[2018-02-07T16:59:16,484][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T16:59:16,578][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T16:59:16,580][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T16:59:20,443][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T16:59:20,677][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T16:59:20,819][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T16:59:21,243][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T17:04:16,488][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T17:04:16,589][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T17:04:16,590][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T17:04:20,504][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T17:04:20,760][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T17:04:20,920][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-02-07T17:04:21,358][INFO ][logstash.filters.translate] refreshing dictionary file

[2018-02-07T**17:06:02,**862][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"elastiflow-2018.02.07", :_type=>"doc", :_routing=>nil}, #LogStash::Event:0x30c18ac0], :response=>{"index"=>{"_index"=>"elastiflow-2018.02.07", "_type"=>"doc", "_id"=>"9iEEcWEBEAOT3LqaTVxp", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [sflow.output_interface]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Value [2147483648] is out of range for an integer"}}}}}`

PD: Is it possible to differentiate traffic/flow by interface? (using ifindex)

from elastiflow.

robcowart avatar robcowart commented on June 6, 2024

Are you getting good data between those two warnings? Perhaps there is a periodic message being sent that is at fault. For example, flows for normal traffic may be OK, but flows for traffic to the switch itself (to/from its mgmt IP) are at fault.

I can only speculate with the info I have. To troubleshoot this properly a packet capture of the flow records which are in error would be necessary.

from elastiflow.

lyepustin avatar lyepustin commented on June 6, 2024

Using https://github.com/fooelisa/perl-net-sflow I have captured following info:

===Datagram===
sFlowVersion => 5
AgentIpVersion => 1
AgentIp => 10.50.0.134
subAgentId => 0
datagramSequenceNumber => 790931294
agentUptime => 2115643520
samplesInPacket => 7

---Sample---
sampleTypeEnterprise => 0
sampleTypeFormat => 1
sampleLength => 140
sampleSequenceNumber => 1275391139
sourceIdType => 0
sourceIdIndex => 1000011
samplingRate => 16384
samplePool => 3349792512
drops => 0
inputInterface => 1000011
outputInterface => 2147483648
flowRecordsCount => 2
HEADERDATA => HEADERDATA
HeaderProtocol => 1
HeaderFrameLength => 64
HeaderStrippedLength => 4
HeaderSizeByte => 60
HeaderSizeBit => 480
HeaderEtherSrcMac => 6487885bfcf9
HeaderEtherDestMac => ffffffffffff
HeaderType => 0806
HeaderDatalen => 64
SWITCHDATA => SWITCHDATA
SwitchSrcVlan => 6
SwitchSrcPriority => 0
SwitchDestVlan => 6
SwitchDestPriority => 0

---Sample---
sampleTypeEnterprise => 0
sampleTypeFormat => 1
sampleLength => 208
sampleSequenceNumber => 1275391140
sourceIdType => 0
sourceIdIndex => 1000011
samplingRate => 16384
samplePool => 3349808896
drops => 0
inputInterface => 1000011
outputInterface => 1000001
flowRecordsCount => 2
HEADERDATA => HEADERDATA
HeaderProtocol => 1
HeaderFrameLength => 1484
HeaderStrippedLength => 4
HeaderSizeByte => 128
HeaderSizeBit => 1024
HeaderEtherSrcMac => f8c0019d77f3
HeaderEtherDestMac => f4a739d0c0c1
HeaderType => 0800
HeaderDatalen => 1480
SWITCHDATA => SWITCHDATA
SwitchSrcVlan => 6
SwitchSrcPriority => 0
SwitchDestVlan => 6
SwitchDestPriority => 0

---Sample---
sampleTypeEnterprise => 0
sampleTypeFormat => 1
sampleLength => 160
sampleSequenceNumber => 1275391141
sourceIdType => 0
sourceIdIndex => 1000011
samplingRate => 16384
samplePool => 3349825280
drops => 0
inputInterface => 1000011
outputInterface => 1000001
flowRecordsCount => 2
HEADERDATA => HEADERDATA
HeaderProtocol => 1
HeaderFrameLength => 82
HeaderStrippedLength => 4
HeaderSizeByte => 78
HeaderSizeBit => 624
HeaderEtherSrcMac => 0819a6f205d9
HeaderEtherDestMac => f4a739d0c0c1
HeaderType => 0800
HeaderDatalen => 78
SWITCHDATA => SWITCHDATA
SwitchSrcVlan => 6
SwitchSrcPriority => 0
SwitchDestVlan => 6
SwitchDestPriority => 0

---Sample---
sampleTypeEnterprise => 0
sampleTypeFormat => 1
sampleLength => 208
sampleSequenceNumber => 1275391142
sourceIdType => 0
sourceIdIndex => 1000011
samplingRate => 16384
samplePool => 3349841664
drops => 0
inputInterface => 1000011
outputInterface => 1000001
flowRecordsCount => 2
HEADERDATA => HEADERDATA
HeaderProtocol => 1
HeaderFrameLength => 1518
HeaderStrippedLength => 4
HeaderSizeByte => 128
HeaderSizeBit => 1024
HeaderEtherSrcMac => 6c3b6bf00a7a
HeaderEtherDestMac => f4a739d0c0c1
HeaderType => 0800
HeaderDatalen => 1514
SWITCHDATA => SWITCHDATA
SwitchSrcVlan => 6
SwitchSrcPriority => 0
SwitchDestVlan => 6
SwitchDestPriority => 0

---Sample---
sampleTypeEnterprise => 0
sampleTypeFormat => 1
sampleLength => 148
sampleSequenceNumber => 1275391143
sourceIdType => 0
sourceIdIndex => 1000011
samplingRate => 16384
samplePool => 3349858048
drops => 0
inputInterface => 1000011
outputInterface => 1000003
flowRecordsCount => 2
HEADERDATA => HEADERDATA
HeaderProtocol => 1
HeaderFrameLength => 70
HeaderStrippedLength => 4
HeaderSizeByte => 66
HeaderSizeBit => 528
HeaderEtherSrcMac => ac4e9166ea73
HeaderEtherDestMac => 0014f6b0aff0
HeaderType => 0800
HeaderDatalen => 66
SWITCHDATA => SWITCHDATA
SwitchSrcVlan => 6
SwitchSrcPriority => 0
SwitchDestVlan => 6
SwitchDestPriority => 0

---Sample---
sampleTypeEnterprise => 0
sampleTypeFormat => 1
sampleLength => 208
sampleSequenceNumber => 1275391144
sourceIdType => 0
sourceIdIndex => 1000011
samplingRate => 16384
samplePool => 3349874432
drops => 0
inputInterface => 1000011
outputInterface => 1000001
flowRecordsCount => 2
HEADERDATA => HEADERDATA
HeaderProtocol => 1
HeaderFrameLength => 1418
HeaderStrippedLength => 4
HeaderSizeByte => 128
HeaderSizeBit => 1024
HeaderEtherSrcMac => 444ca88d95d3
HeaderEtherDestMac => f4a739d0c0c1
HeaderType => 0800
HeaderDatalen => 1414
SWITCHDATA => SWITCHDATA
SwitchSrcVlan => 6
SwitchSrcPriority => 0
SwitchDestVlan => 6
SwitchDestPriority => 0

---Sample---
sampleTypeEnterprise => 0
sampleTypeFormat => 1
sampleLength => 208
sampleSequenceNumber => 1275391145
sourceIdType => 0
sourceIdIndex => 1000011
samplingRate => 16384
samplePool => 3349890816
drops => 0
inputInterface => 1000011
outputInterface => 1000001
flowRecordsCount => 2
HEADERDATA => HEADERDATA
HeaderProtocol => 1
HeaderFrameLength => 1518
HeaderStrippedLength => 4
HeaderSizeByte => 128
HeaderSizeBit => 1024
HeaderEtherSrcMac => 66649bb79535
HeaderEtherDestMac => f4a739d0c0c1
HeaderType => 0800
HeaderDatalen => 1514
SWITCHDATA => SWITCHDATA
SwitchSrcVlan => 6
SwitchSrcPriority => 0
SwitchDestVlan => 6
SwitchDestPriority => 0

from elastiflow.

robcowart avatar robcowart commented on June 6, 2024

That first sample definitely indicates a problem. outputInterface => 2147483648 shows that the ifIndex value is beyond the allowable range. This is why you are seeing errors in Logstash.

Based on the MAC addresses it is a broadcast packet from a Juniper device. This is likely something like a protocol-related heartbeat/keep-alive, which is why you see them at a somewhat regular interval. My guess is that the high ifIndex values are for the internal/mgmt Interfaces of the switch itself. From the sFlow spec about ifIndex...

The maximum value, 0x3FFFFFFF, indicates that there is no input or output interface (according to which field it appears in). This is used in describing traffic which is not bridged, routed, or otherwise sent through the device being monitored by the agent, but which rather originates or terminates in the device itself.

So according to the sFlow spec, 0x3FFFFFFF (1073741823) is the maximum valid ifIndex. 0x80000000 (2147483648) above the maximum allowed ifIndex value. This is a bug in the Arista devices. I recommend contacting their support team to open a ticket.

from elastiflow.

lyepustin avatar lyepustin commented on June 6, 2024

Ok, Thank you.

from elastiflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.