Giter Club home page Giter Club logo

netflowlabeler's Introduction

NetflowLabeler

Docker Image CI GitHub last commit (branch) Docker Pulls

Authors: Sebastian Garcia and Veronica Valeros, Stratosphere Laboratory, CTU in Prague

NetflowLabeler is a Python tool to add labels to text-based network flow files. To label a netflow file, simply add the labels and conditions to a configuration file, then use this tool to assign them. The assignment of labels adheres to our own label ontology, which is structured as a customizable configuration file. Within the configuration file, you have the ability to incorporate both generic and detailed labels. Currently, the tool supports Zeek files that are delimited by TABS. However, future updates will expand its capabilities to include Zeek files in JSON and CSV formats, Argus files in CSV and TABS formats, Nfdump files in CSV format, and Suricata files in JSON format.

  • netflowlabeler.py can label conn.log files based on a configuration file.
  • zeek-files-labeler.py can label the rest of the Zeek log files, using the labels in the conn.log file.

Usage

To label a conn.log file from a configuration file:

netflowlabeler.py -c <configFile> [-v <verbose>] [-d DEBUG] -f <netflowFile> [-h]

To label the rest of the Zeek files using an already labeled conn.log file (conn.log.labeled):

zeek-files-labeler.py -l conn.log.labeled -f folder-with-zeek-log-files

Features

  • You can have AND and OR conditions
  • You can have generic labels and detailed labels
  • You can use negative conditions
  • All columns that can be interpreted as numbers can be compared with <, >, <= and >=
  • You can add comments in any place
  • You can use CIDR notation for IP ranges
  • You can label all the Zeek log files, by using the labels you put in the conn.log file

Example Configuration File of Labels

An example of the confguration file syntax is shown below:

Background:
    - srcIP=all
# Here the generic label is Background and the detailed label is ARP
Background, ARP: 
    - Proto=ARP
Malicious, From_Malware:
    - srcIP=10.0.0.34
Malicious-More, From_Other_Malware:
    - srcIP!=10.0.0.34 & dstPort=23
Malicious-HEre, From_This_Malware:
    - srcIP=10.0.0.34 & State=SF
Malicious, From_Local_Link_IPv6:
    - srcIP=fe80::1dfe:6c38:93c9:c808
Test-State:
    - srcIP=10.0.0.34 & State=S0
Test-largebytes:
   - Bytes>=100
Test-smallbytes:
   - Bytes<=100
Benign, FromWindows:
    - Proto=UDP & srcIP=147.32.84.165 & dstPort=53     # (AND conditions go in one line)
    - Proto=TCP & dstIP=1.1.1.1 & dstPort=53           # (all new lines are OR conditions)
  1. The first part of the label is the generic label (Benign), after the comma is the detailed description (FromWindows). We encourage not to use : or spaces or , or TABs in the detailed description
  2. If there is no |, then the detailed label is empty.
  3. Don't use quotes for the text.
  4. Labels are assigned from top to bottom
  5. Each new label superseeds and overwrites the previous match

The position is the priority of the rule. First we check the first rule matches and if it does, then we assign that label. Then we check the second rule, etc.

These are the possible fields that you can use in a configuration file to create the rules used for labeling.

  • Date
  • start
  • Duration
  • Proto
  • srcIP
  • srcPort
  • dstIP
  • dstPort
  • State
  • Tos
  • Packets
  • Bytes
  • Flows

The fields 'Bytes', 'Packets' and 'IPBytes' are computed in Zeek from the fields for the src and dst values. For example, Bytes=srcbytes + dstbytes

Docker Image

Netflow labeler has a public docker image with the latest version.

To test the labeler is working correctly, run the following command. The command will run the netflow labeler tool on a Zeek example conn.log file and then cat the labeled file to the standard output. You should see the fresh labels in the output (e.g.: search for the string 'Test-smallbytes').

docker run --tty -it stratosphereips/netflowlabeler:latest /bin/bash -c 'python3 netflowlabeler.py -c labels.config  -f examples/conn.tab.log ; cat examples/conn.tab.log.labeled'

To mount your logs path to the container and run the netflow labeler interactively:

docker run -v /full/path/to/logs/:/netflowlabeler/data --rm -it stratosphereips/netflowlabeler:latest /bin/bash

To mount your logs path to the container and automatically run the netflow labeler on it with your own labels.config file:

docker run -v /full/path/to/logs/:/netflowlabeler/data --rm -it stratosphereips/netflowlabeler:latest python3 netflowlabeler.py -c data/labels.config -f data/conn.log

Netflow Labeler High Level Diagram

flowchart LR;
    NetFlow["Netflow File"]-->labeler;
    Config["Labels Config"]-->labeler;
    subgraph ONE["Interpret Input File"]
        labeler-->load_conditions;
        load_conditions-->process_netflow;
        process_netflow-->define_type;
        define_type-->define_columns;
    end
    subgraph TWO["Label NetFlow File"]
        define_columns-.->process_argus;
        define_columns-.->process_nfdump;
        define_columns-->process_zeek;
        process_argus-.->output_netflow_line_to_file;
        process_nfdump-.->output_netflow_line_to_file;
        process_zeek-->output_netflow_line_to_file;
    end
    output_netflow_line_to_file-->Output["Labeled NetFlow File"];
Loading

netflowlabeler's People

Contributors

eldraco avatar verovaleros avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

netflowlabeler's Issues

F821 undefined name 'headers' in process_argus()

This should be also a bug as the variable is referenced but not defined in this function. This affects the process_argus function, which is marked as deprecated.

netflowlabeler.py:953:35: F821 undefined name 'headers'

Found running flake8 on netflowlabeler.py

W0101: Unreachable code on process_argus()

The pylint tool threw this warning, indicating there is unreachable code in the netflow labeler:

netflowlabeler.py:957:8: W0101: Unreachable code (unreachable)

It seems there's a return instruction on the beginning of the function process_argus, which makes the rest of the code not to execute:

 947 def process_argus(column_idx, output_file, labelmachine, filetype):
 948     """
 949     DEPRECATED!! NEEDS UPDATE COMPLETELY
 950     Process an Argus file
 951     """
 952     try:
 953         print(column_idx)
 954         return 0
 955
 956         # This is argus files...
 957         amount_lines_processed = 0

While this function needs updating, I think it's good to remember this is there. Just in case.

Inconsistent naming convention on functions and variable names

There are some functions that follow a naming convention, such as loadConditions() and there are other functions that follow a naming convention, such as process_netflow().

I think following the same naming convention would help in the code readability and clarity.

[BUG] zeek-files-labeler checks headers on Zeek JSON files when it should not

Describe the bug
The JSON conn.log files have no headers. Each line is a JSON line:

zeek@zeek:~/zeek-test/json$ cat conn.log
{"ts":1591367999.305988,"uid":"CMdzit1AMNsmfAIiQc","id.orig_h":"192.168.4.76","id.orig_p":36844,"id.resp_h":"192.168.4.1","id.resp_p":53,"proto":"udp","service":"dns","duration":0.06685185432434082,"orig_bytes":62,"resp_bytes":141,"conn_state":"SF","missed_bytes":0,"history":"Dd","orig_pkts":2,"orig_ip_bytes":118,"resp_pkts":2,"resp_ip_bytes":197}
{"ts":1591367999.430166,"uid":"C5bLoe2Mvxqhawzqqd","id.orig_h":"192.168.4.76","id.orig_p":46378,"id.resp_h":"31.3.245.133","id.resp_p":80,"proto":"tcp","service":"http","duration":0.25411510467529297,"orig_bytes":77,"resp_bytes":295,"conn_state":"SF","missed_bytes":0,"history":"ShADadFf","orig_pkts":6,"orig_ip_bytes":397,"resp_pkts":4,"resp_ip_bytes":511}

When running the zeek-files-labeler.py tool on a Zeek JSON file, the following error is shown:

Zeek Files labeler from labeled conn.log.labeled file. Version 0.1
https://stratosphereips.org
[+] Labeled file to use: dataset/test.log
The labeled file has not headers. Please add them.

To Reproduce
Sample Zeek JSON conn.log to test:

{"ts":1684227625.431351,"uid":"CMWzya3GuKbV6hku2l","id.orig_h":"192.168.1.243","id.orig_p":35358,"id.resp_h":"172.31.0.2","id.resp_p":12654,"proto":"tcp","service":"http","duration":6.867906093597412,"orig_bytes":531,"resp_bytes":0,"conn_state":"SH","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ScADFa","orig_pkts":6,"orig_ip_bytes":855,"resp_pkts":1,"resp_ip_bytes":52}
{"ts":1684227665.237912,"uid":"CRm5Me2cKCMEbBrDJc","id.orig_h":"192.168.1.18","id.orig_p":50118,"id.resp_h":"172.31.0.2","id.resp_p":12654,"proto":"tcp","service":"http","duration":5.09847617149353,"orig_bytes":625,"resp_bytes":0,"conn_state":"SH","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ScADFa","orig_pkts":8,"orig_ip_bytes":957,"resp_pkts":1,"resp_ip_bytes":40}
{"ts":1684227665.237911,"uid":"CXzhcA3CmZhXikjFQk","id.orig_h":"192.168.1.18","id.orig_p":50117,"id.resp_h":"172.31.0.2","id.resp_p":12654,"proto":"tcp","service":"http","duration":5.098514080047607,"orig_bytes":626,"resp_bytes":0,"conn_state":"SH","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ScADFa","orig_pkts":8,"orig_ip_bytes":958,"resp_pkts":1,"resp_ip_bytes":40}
{"ts":1684227664.630489,"uid":"ChtLqc1NJMnOu1maA7","id.orig_h":"192.168.1.18","id.orig_p":50110,"id.resp_h":"172.31.0.2","id.resp_p":12654,"proto":"tcp","service":"http","duration":5.712158918380737,"orig_bytes":1252,"resp_bytes":0,"conn_state":"SH","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ScADcFa","orig_pkts":45,"orig_ip_bytes":3064,"resp_pkts":1,"resp_ip_bytes":40}
{"ts":1684227665.23784,"uid":"C5l2rP1L5skfzoAZaa","id.orig_h":"192.168.1.18","id.orig_p":50119,"id.resp_h":"172.31.0.2","id.resp_p":12654,"proto":"tcp","service":"http","duration":5.12164306640625,"orig_bytes":946,"resp_bytes":0,"conn_state":"SH","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ScADcFa","orig_pkts":14,"orig_ip_bytes":1518,"resp_pkts":1,"resp_ip_bytes":40}
{"ts":1684227665.236117,"uid":"CYp05kkOOJ3pCe8Vb","id.orig_h":"192.168.1.18","id.orig_p":50116,"id.resp_h":"172.31.0.2","id.resp_p":12654,"proto":"tcp","service":"http","duration":5.1245081424713135,"orig_bytes":939,"resp_bytes":0,"conn_state":"SH","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ScADcFa","orig_pkts":11,"orig_ip_bytes":1391,"resp_pkts":1,"resp_ip_bytes":40}
{"ts":1684227664.630591,"uid":"CBghhD3zflPexGTckj","id.orig_h":"192.168.1.18","id.orig_p":50111,"id.resp_h":"172.31.0.2","id.resp_p":12654,"proto":"tcp","service":"http","duration":5.730525016784668,"orig_bytes":1253,"resp_bytes":0,"conn_state":"SH","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ScADcF","orig_pkts":21,"orig_ip_bytes":2105,"resp_pkts":0,"resp_ip_bytes":0}
{"ts":1684227665.255085,"uid":"CtWt0LXsI5Wv8gp71","id.orig_h":"192.168.1.18","id.orig_p":50121,"id.resp_h":"172.31.0.2","id.resp_p":12654,"proto":"tcp","service":"http","duration":5.215744972229004,"orig_bytes":674,"resp_bytes":0,"conn_state":"SH","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ScADFa","orig_pkts":9,"orig_ip_bytes":1046,"resp_pkts":1,"resp_ip_bytes":40}
{"ts":1684227665.25312,"uid":"CaCebC3q1DDGFKUNch","id.orig_h":"192.168.1.18","id.orig_p":50120,"id.resp_h":"172.31.0.2","id.resp_p":12654,"proto":"tcp","service":"http","duration":5.284203052520752,"orig_bytes":1017,"resp_bytes":0,"conn_state":"SH","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ScADcFa","orig_pkts":14,"orig_ip_bytes":1589,"resp_pkts":1,"resp_ip_bytes":40}
{"ts":1684227665.421915,"uid":"CYQlIQoh90Anqd2ae","id.orig_h":"192.168.1.18","id.orig_p":50123,"id.resp_h":"172.31.0.2","id.resp_p":12654,"proto":"tcp","duration":5.478057861328125,"orig_bytes":0,"resp_bytes":0,"conn_state":"SH","local_orig":false,"local_resp":true,"missed_bytes":0,"history":"ScAF","orig_pkts":4,"orig_ip_bytes":172,"resp_pkts":0,"resp_ip_bytes":0}

Expected behaviour
Expect the tool to parse this type of logs without error or print an unsupported type message.

Resources
Check documentation here: https://docs.zeek.org/en/master/log-formats.html

Typo in column name never matches label condition

In the class labeler, in the function getLabel(), there's a search for columns that are numeric. The search is done matching column names against a given labeling condition:

if ('bytes' in condColumn) or ('packets' in condColumn) or ('srcport' in condColumn) or ('dstport' in condColumn) or ('sbytes' in condColumn) or ('dbyets' in condColumn) or ('spkts' in condColumn) or ('dpkts' in condColumn) or ('ip_orig_bytes' in condColumn) or ('ip_resp_bytes' in condColumn):

As can be observed, in the sixth comparison, there's a typo: dbyets --> dbytes.

I suggest improving this comparison with the following code, which should be clearer and easy to change in the future:

 column_num_keywords = ['bytes', 'packets', 'srcport', 'dstport', 'sbytes', 'dbytes', 'spkts', 'dpkts', 'ip_orig_bytes', 'ip_resp_bytes']
 if any(keyword in condColumn for keyword in column_num_keywords):

A comment between two rules triggers an infinite loop and maxes out on memory consumption

In the label's configuration of netflowlabeler, comments are allowed. Usually they are placed at the beginning of the section, like shown below:

# Valid and ok comment
Malicious, Network-service-discovery-telnet:
    - srcIP=192.168.100.103 & Proto=tcp & dstPort=23
    - srcIP=192.168.100.103 & Proto=tcp & dstPort=2323
    - srcIP=192.168.100.103 & Proto=tcp & dstPort=9527

If, however, a comment is placed between the rules, it will enter netflowlabeler in an infinite loop, which will make it consume all available memory and then crash:

# Valid and ok comment
Malicious, Network-service-discovery-telnet:
    - srcIP=192.168.100.103 & Proto=tcp & dstPort=23
    - srcIP=192.168.100.103 & Proto=tcp & dstPort=2323
# this is a comment that is not caught properly and will trigger an unwanted behavior
    - srcIP=192.168.100.103 & Proto=tcp & dstPort=9527

IndexError exception when Bytes condition is used

When using a condition such as the following:

Malicious, C&C-FileDownload:
    - dstIP=104.248.160.24 & Bytes>0

I get the following exception:

Problem in main() function at load_conditions 
<class 'IndexError'>
('list index out of range',)
list index out of range

If the Bytes>0 condition is replaced with another such as State=S0 there is no exception happening.

[BUG] Zeek-files-labeler does not finish the new files with a newline

Describe the bug
The original Zeek files end with a new line. When zeek-files-labeler creates the labeled version, it doesn't end the same way. This generates difficulties in validating the post and after content of the files.

To Reproduce
Run zeek-files-labeler on any zeek file and run wc -l on the output.

Expected behaviour
Expect to see the same number of lines on the original and labeled files.

Screenshots
image

zeek-files-labeler.py does not work for x509.log

The script empties the contents of the x509.log file. Most probably because the x509.log uses a different identifier that references the ssl.log and not the conn.log.

The easiest approach would be to just avoid labeling the x509.log.

Undefined name 'f' in process_argus()

I believe this may be a bug, as the 'f' variable is used but doesn't exist in the function that is called. This affects the function process_argus(), which is marked as deprecated.

netflowlabeler.py:1017:16: F821 undefined name 'f'
netflowlabeler.py:1214:20: F821 undefined name 'f'

This was highlighted by running flake8 on the netflowlabeler.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.