Giter Club home page Giter Club logo

microsoft-malware-classification-challenge's Introduction

Microsoft-Malware-Classification-Challenge

Beating the benchmark for Microsoft Malware Classification Challenge (BIG 2015)

Hi Kagglers,

Here is my github repository for the solution that has scored 0.1826662 on leader board. Solution is quite simple, tiresome part is data preparation. It used only .byte files to predict category. It calculate frequency of two-byte-codes (00 to FF) along with ?? and use that information for prediction.

Before using these files you have to follow this step:

  1. Extract .byte files from train and test 7z
  2. Gzip .byte files to .byte.gz format and move to train_gz / test_gz file.

I know these two steps will take hell lot of time, for me 6 hours. :)

Once you have 10868 train files and 10873 test files in gz format, run following commands

python data_consolidation.py

python solution.py

Use it, tune it and score as low as you can.

This script should run with Python-2 and Python-3 both. Let me know if you face any problems.

microsoft-malware-classification-challenge's People

Contributors

vrajs5 avatar

Stargazers

Roman avatar

Watchers

Daniel Moore avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.