Giter Club home page Giter Club logo

lumu-dns-log-parser's Introduction

DNS Log Parser

Overview

This Python script is designed to parse DNS log files, providing valuable insights into the activity recorded. By analyzing the log file, the script generates reports highlighting the number of records processed, the ranking of clients (IPs) based on the number of queries, and the most queried hosts.

Features

  • Accepts a log file name as a parameter for analysis.
  • Generates a summary of the processed records.
  • Ranks clients based on the number of queries made.
  • Identifies the most queried hosts.
  • Presents rankings with both total hits and the percentage they represent from the total records analyzed.
  • Sends the parsed data to the Lumu API.

Usage

To use the script, first you'll need to install the required dependencies:

Setup Virtual Environment and Install Requirements

To ensure a clean and isolated environment for running the DNS Log Parser, it is recommended to use a virtual environment. Follow the steps below to set up a virtual environment and install the required dependencies:

  1. Create a Virtual Environment:
python -m venv venv

Activate the Virtual Environment:

  • On Windows:
.\venv\Scripts\activate
  • On macOS/Linux:
source venv/bin/activate
  1. Install Dependencies:
  • On Windows:
pip install -r requirements.txt
  • On macOS/Linux:
pip3 install -r requirements.txt

Now, your virtual environment is set up, and the required dependencies are installed.

After dependencies are installed, simply run the script with the desired DNS log file as a parameter, you can also input the collector ID and API key as parameters, this will send the parsed data to the lumu API. If these parameters are not provided, the script will only generate the stats report. Both the collector ID and API key are UUID strings.

  • On Windows:
python main.py -f <log_file_path> -c <collector_id> -k <api_key>
  • On macOS/Linux:
python3 main.py -f <log_file_path> -c <collector_id> -k <api_key>

Sample Output

Upon execution, the script will provide output similar to the following:

Parsed File Statistics:
Total records: 16967

Client IPs Rank
---------------  ----  ------
111.90.159.121   3375  19.89%
45.231.61.2      1251  7.37%
187.45.191.2     1089  6.42%
190.217.123.244   738  4.35%
5.63.14.45        634  3.74%
---------------  ----  ------

Host Rank
--------------------------------------------  ----  ------
pizzaseo.com                                  4626  27.26%
sl                                            3408  20.09%
MNZ-efz.ms-acdc.office.com                      67  0.39%
global.asimov.events.data.trafficmanager.net    31  0.18%
www.google.com                                  30  0.18%
--------------------------------------------  ----  ------

Computational Complexity of the Implemented Algorithms

Parsing Algorithm

The algorithm used to parse and store the log file data is based on a dictionary, which is a data structure that provides constant time complexity for insertion and retrieval of elements. This dictionary is composed of unique instances of hosts/clients and their respective number of queries. The algorithm iterates over the log file, and for each line extracts the needed information via regex (O(len) where len is the length of the string). Then, it checks if the host/client is already in the dictionary, if it is, it increments the number of queries, if it isn't, it adds the host/client to the dictionary with a value of 1. This way, the parsing algorithm has a time complexity of O(n), where n is the number of lines in the log file.

Ranking Algorithm

The ranking algorithm is based on the built-in Python module collections.Counter, which is a subclass of dict that provides a convenient way to keep track of the number of occurrences of elements in a list. This module provides a method called most_common, which returns a list of the n most common elements and their respective counts. This method has a time complexity of O(n log n), where n is the number of elements in the list. However, since we are only interested in the top 5 elements of each dict, when we utilize the most_common function to return k > 1 elements, it internally employs heapq.nlargest which has a time complexity of O(n log k), where n is the number of elements in the list/dict and k is the number of elements to return.

In the case of the DNS Parser k = 5, so the time complexity of the ranking algorithm is O(c log 5), where c is the number of unique clients or hosts. Since this operation is performed twice (once for clients and once for hosts), the total time complexity of the ranking algorithm is O(c log 5) + O(h log 5), where h is the number of unique hosts and c is the number of unique clients. Reducing the constants, the time complexity of the ranking algorithm is O(c) + O(h), which is essentially linear.

Total Time Complexity

The total time complexity of the DNS Parser is then O(n)*O(len) + O(c) + O(h), where n is the number of lines in the log file, len is the length of every line, c is the number of unique clients, and h is the number of unique hosts. Since the number of unique clients and hosts is always smaller than the number of lines in the log file, the total time complexity of the DNS Parser is O(n)*O(len).

lumu-dns-log-parser's People

Contributors

elreyzero avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.