Giter Club home page Giter Club logo

netbase's Introduction

Background

One of the biggest challenges we face as network traffic analysts is determining whether or not the traffic we are currently looking at is normal. Normal is inherently difficult to define in a network environment. What can be considered normal varies widely across devices. Servers behave differently than workstations, domain controllers behave differently than database systems and so on. Normal can also vary over time. What is normal for a given host in the middle of a workday is usually very different than the behavior expected in the middle of the night.

So as we try to determine if some traffic is normal, we ask oursevles questions like is this normal for this network? for this type of host? for this specific host? for this time of day? for this particular day of the week? Answering these questions requires lots of contextual knowledge, experience in the environment, and access to data that provides the right insights. Obtaining these things is not trivial.

One way to accomplish this is to make running observations of things that can then be compared. A baseline, by definition, is a "minimum or starting point used for comparison". Perfect, baselining sounds like a great fit, but how do you create one?

Netbase

Netbase, short for Network Baseliner, is a Zeek framework aimed at helping you do just that. By creating a running record of quantitative observations about network device activity it provides data points that can be compared to one another across several dimensions and analyzed manually, visually or statistically.

Netbase uses a device-centric approach to capturing observations, specifically, the observations it logs describe activity from the perspective of each active, monitored host (more on monitored hosts below). When an IP address is active on the network, Netbase begins recording a wide variety of observations over a finite time interval. At the end of the interval, an entry is written to the Netbase log stream containing the metrics that describe the devices activity in that timeframe, then the interval timer resets.

Netbase Structure

Netbase is meant to work best in Zeek clusters, although it functions just fine on a stand-alone instance. In clusters, more than one worker node performs traffic analysis and categorization tasks and records them in the form of observables (more on that below).

When a worker finishes its analysis of a given connection it sends any recorded observables to the Proxy node(s) using Zeek's data partitioning API, which allows us to evenly spread keys in a table across multiple nodes in the cluster. The Proxies process observables and associate them with the monitored IP address to which they apply, and regularly (on a set interval) log a summary of observations to the Netbase log stream.

High-level depiction of Netbase's structure and data flow.

Observables

Netbase's primary goal is turn interesting network device activity into quantitative metrics that can be analyzed and compared at scale, these metrics are referred to as observables. What is considered interesting activity is highly subjective though. There are many, many inferences one can make by analyzing any one of Zeek's native logs. The approach here is simple, try to be comprehensive. Cover device behaviors that apply to all types of hosts with the understanding that not all observables will apply to every host - and that's ok.

There are a few fundamental types of observables, they are:

  • Counts of specific device behaviors
  • Summary statistics describing numerical data properties, e.g. sum, average, mininum and maximum
  • Cardinality (unique) counts of a given value

Tons of great observables can be extracted from Zeek's Conn events alone, in fact, Netbase currently includes 52 of them (visible in the flow module). It also includes other protocol-specific observables that can be found in their respective modules.

Monitored Hosts

Netbase generates observations for monitored hosts, or hosts the user is specifically concerned with. In smaller networks it might be practical to apply this methodology to all hosts, but in larger networks its usually prudent to refine things a bit.

By default, Netbase considers any IP address that belongs to a subnet defined in Zeek's Site::local_nets variable a monitored host. This is customizable using the Netbase::monitoring_mode variable (default = LOCAL_NETS). Other monitoring mode options include:

  • PRIVATE_NETS - Record observations for any IP within a non-routable RFC 1918 address range
  • LOCAL_AND_NEIGHBORS - Record observations for any IP within a Site:local_nets or Site::local_neighbors subnets

In addition, or alternatively, you can define specific subnets that contain monitored hosts using the Netbase::critical_assets variable. Any IP address belonging to a subnet defined in Netbase::critical_assets will always be monitored, regardless of the monitoring mode selected.

Analyzing Netbase Observations

There are many ways to work with data generated by Netbase. Here a few of the most useful approaches:

  • Compare new observations for a specific IP to its own historical observations
  • Compare new observations for a given IP to historical observations for other, similar hosts
  • Compare observations for all monitored hosts at once
  • Compare observations across other categorical dimensions such as OS, service, function and location

netbase's People

Contributors

pmphry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

netbase's Issues

use_conn_size_analyzer and size or orig/resp_bytes is not accurate for PCR

Hello,

Great project. I would like to communicate a known issue with using orig$bytes/resp$bytes or the use_conn_size_analyzer for tracking tx/rx. The issue is that for TCP these use sequence numbers to calculate the metric. For long or large connections the TCP seq can wrap which will lead to unreliable results:

From https://docs.zeek.org/en/current/scripts/base/protocols/conn/main.zeek.html

orig_bytes: count&log&optional
The number of payload bytes the originator sent. For TCP this is taken from sequence numbers and might be inaccurate (e.g., due to large connections).

resp_bytes: count&log&optional
The number of payload bytes the responder sent. See orig_bytes.

In my and others testing this has been confirmed to cause ridiculously large flow tx/rx reports. Instead it is recommended to use orig/resp_ip_bytes which utilize the len header of the ip frame.

orig_ip_bytes: count&log&optional
Number of IP level bytes that the originator sent (as seen on the wire, taken from the IP total_length header field). Only set if use_conn_size_analyzer = T.

Using the *_ip_bytes field on our sensors to calculate PCR and comparing that to the PCR calculated on firewall logs reported tx/rx bytes counts has confirmed accuracy for myself.

Issue when using "netbase" Framework with Zeek 4

Hi!
First I'll thank a lot for this project, which I need it to use in Zeek 4
but when I put this framework in the following directory, I have got the error below:

zeek -C -r /home/mohammad/Downloads/mypackets.trace /opt/zeek/share/zeek/base/frameworks/netbase-master/main.bro

error in /opt/zeek/share/zeek/base/frameworks/netbase-master/main.bro, line 112: &default is not valid for global variables except for tables (&default=set())

can you help me please to solve this issue?
Thank a lot :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.