Giter Club home page Giter Club logo

sudo-rushil / dga-intel-web Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 4.0 27.37 MB

This deep learning model uses a CNN-LSTM architecture to predict whether a given domain name is genuine or was artificially generated by a DGA.

Home Page: http://www.dgaintel.com/

License: MIT License

Jupyter Notebook 45.63% Python 0.85% Dockerfile 0.04% Makefile 0.10% JavaScript 50.73% CSS 0.21% HTML 1.79% Shell 0.65%
deep-learning lstm cnn lstm-cnn machine-learning tensorflow keras

dga-intel-web's Introduction

DGA Intel

This deep learning model uses a CNN-LSTM architecture to predict whether a given domain name is genuine or was artificially generated by a DGA.

The Problem

Many forms of malware uses domain generation algorithms (DGAs) to connect with a C&C, which enables it to recieve instructions and perform malicious activities. There have been many attempts to detect whether a given domain name corresponds to a genuine domain, or a fake domain generated by a DGA. Some machine learning methods have utilized clustering based on WHOIS data, etc., to this end. This model builds on past work by using a deep learning architecture to achieve increased accuracy over other methods.

The Model

This model was based on an architecture from [2] and implemented in Tensorflow. It embeds domain names, feeds the embeddings through a convolutional network, feeds that through an LSTM, and passes that through a dense layer for classification. This approach captures the local similarity inherent in genuine domains, as well as spatial connections between characters.

The Data

The training data was a set of 1.5 million domain names labelled as either 0 (genuine) or 1 (fake) from the Splunk DGA app, Alexa's top 1 million domains, and the Bambenek DGA feed. 10% of domains were stripped of their TLD and subdomain before being fed through the model. The test data was a set of 100000 domains from a different slice of this data.

Results

The model was trained for twenty epochs with the Adam optimizer. It was tested by evaluating its predictive accuracy on 100000 domains from the shuffled test datasets. It achieved 98.8% accuracy on the test data.

Website Usage

You can query whether a given domain is legit or fake through this model at http://dgaintel.com/.

Development

The model can be loaded through Tensorflow's Keras API from the domain_classifier_model.h5 file. To further experiment with the code:

  1. Go to Google Colab
  2. Go to File > Open Notebook... > Github
  3. Search for https://github.com/sudo-rushil/dga-intel
  4. Open domain_data.ipynb or domain_model.ipynb

Code Usage

$ git clone https://github.com/sudo-rushil/dga-intel
$ cd dga-intel
$ python predict_domain.py [domain name]

Example

$ python predict_domain.py wikipedia.com

The domain wikipedia.com is genuine with probability 1.0

Contact

If you run across any issues, file an issue at https://github.com/sudo-rushil/dga-intel/issues.

My LinkedIn page can be found here.

References

[1] Abadi, et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.

[2] Yu, Bin; Pan, Jie; Hu, Jiaming; Nascimento, Anderson; De Cock, Martine. "Character Level based Detection of DGA Domain Names". 2018 International Joint Conference on Neural Networks (IJCNN).

dga-intel-web's People

Contributors

ravi-mallarapu avatar sudo-rushil avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.