Giter Club home page Giter Club logo

model-api-sequence's Introduction

model-api-sequence

This models api call sequences using LSTM

Requirements

  • Debian 10 64-bit

Clone repo

$ git clone --recurse-submodules [email protected]:evandowning/model-api-sequence.git

Install depedencies

$ sudo ./setup.sh

Usage

# Extract sequences from nvmtrace dumps (https://github.com/evandowning/nvmtrace/tree/kvm)
$ cd cuckoo-headless/extract_raw
$ python2.7 extract-sequence.py

# Parse sequences into pickle files
$ python3 preprocess.py api-sequences/ cuckoo-headless/extract_raw/api.txt \
          label.txt malware_label.txt features/ windowSize {binary_classification | multi_classification | regression}

# Model data over 10-fold cross-validation & save models to file
$ python3 lstm.py cuckoo-headless/extract_raw/api.txt features/ models/ \
          save_model[True|False] save_data[True|False] \
          {binary_classification | multi_classification | regression} \
          convert_classes.txt

# Evaluate model
$ python3 evaluation.py models/model.json models/weight.h5 features/ \
          malware_label.txt labels.txt predictions.csv convert_classes.txt

Measure similarity of sequences (both inter- and intra-family)

$ python3 sim_stats.py /data/arsa/api-sequences /data/arsa/api-sequences.labels numSamplesPerClass outfile.txt

Create attack config for patchPE

$ python3 attack-config.py

Create PNG images of sequences

$ python3 color.py /data/arsa/api-sequences-features/

NOTES

preprocess.py will write to a file called errors.txt which lists the samples which had no sequences within them or had errors whilst processing the samples. These samples will not be transferred to the features/ folder.

Below I reference a folder api-sequences/ which contains a list of files of sample sequences of malware. Each file is named by the malware's SHA256 hash for uniquely identify each sample. Each file contains the sequence of API calls seen whilst executing the malware.

E.g.:

Open
Read
Write
Connect
Send
Receive
...

malware_label.txt is a file in which each line specifies a sample name and its malware family label separated by a tab character.

E.g.:

2413FB3709B05939F04CF2E92F7D0897FC2596F9AD0B8A9EA855C7BFEBAAE892    familyA
F6FE187982FD924333B446C5FB9B96F328AC8994F88FA34007DBDF4D0FFFBE60    familyA
9711F36C1743E55CB0514C43DF74D981DC7775E11FC3465ADF0F80A2A07AB141    familyB
28E60DBEF52D6C2A2FA385FA04C2CD0880A71517B5C4F0E2ED28EFC393A9E9CE    familyC
E59905D5305FCC54909A875F4CC3F426A3A27584A461E1B16530FC2AD85A0693    familyA
...

If your data is sorted or formatted in any other way, you can modify preprocess.py accordingly.

If you want to add a new API call to keep track of, add it to "api.txt" If you want to add a new malware family, add it to "label.txt"

model-api-sequence's People

Contributors

evandowning avatar ssahingt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

gptcod haoranw96

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.