Code for Group 66 python implementation of Cyber Data Analytics assigment 3 CS4035. ๐
Team members:
The structure of the project is presented per task:
reservoir_sampling.py
- implementation for reservoir sampling with testing for multiple reservoir sizes
countminsketch.py
- contains theCountMinSketch
class.CountMinSketch.ipynb
- the actual analysis and plots.
Flow visualization.ipynb
- notebook for visualizing different features for the infected hostflow_visualization_utils.py
- helper functions for generating the plots from notebookflow_discretize.py
- implementation of the discretization of flags and bytes followed by combining into a single discrete feature
Profiling.ipynb
- contains the full analysis. note that it takes some time and ram to run
flow_classification.py
- train and test Random Forest classifier for identifying a netflow probability of being a botnet
bonus.py
- implementation of the generation method for adversarial data
โ The actual testing using adversarial data is in the files corresponding to profiling and classification tasks
logger.py
- logging system for generating folders initial structure and saving application logs to HTML filesutils.py
- helper functions used for multiple tasksconfig.txt
- configuration file
data\
- for storing data files with BATADAL datasetsoutput\
- for storing plots at high resolution (Better to be inspected if the ones from the report are too small due to page limit)logs\
- for storing a couple of logs files referred in the report
If unable to clone the repository, download the CTU-13 dataset as follows:
- for Task 1+2 Scenario 6 file
capture20110816.pcap.netflow.labeled
- for the other tasks Scenario 10 file
capture20110818.pcap.netflow.labeled
โ After downloading the files, place them into the
data\
folder
The data files were uploaded using Git LFS being over 100MB. Git LFS is needed to clone the repository. Install it manually or try to use downlopad_data_files.sh
.
The scripts can be run in Anaconda Windows/Linux environment.
You need to create an Anaconda ๐ python 3.6
environment named cyber3
.
Inside that environment some addition packages needs to be installed. Run the following commands inside Anaconda Prompt โจ:
(base) conda create -n cyber3 python=3.6 anaconda
(base) conda activate cyber3
(cyber3) conda install -c conda-forge tqdm
(cyber3) conda install -c conda-forge mmh3