Giter Club home page Giter Club logo

aptmalware's Introduction

APT Malware Dataset

This dataset contains over 3,500 malware samples that are related to 12 APT groups which alledgedly are sponsored by 5 different nation-states. This dataset was used for benchmarking different Machine Learning approaches performing authorship attribution. This dataset can be used for future benchmarks or malware research.

Data Characteristics

The samples in the dataset are distributed as follows:

Country APT Group Family Requested Downloaded
China APT 1 1007 405
China APT 10 i.a. PlugX 300 244
China APT 19 Derusbi 33 32
China APT 21 TravNet 118 106
Russia APT 28 Bears 230 214
Russia APT 29 Dukes 281 281
China APT 30 164 164
North-Korea DarkHotel DarkHotel 298 273
Russia Energetic Bear Havex 132 132
USA Equation Group Fannyworm 395 395
Pakistan Gorgon Group Different RATs 1085 961
China Winnti 406 387
Total 4449 3594

Remarks

All samples are named according to their SHA-256 hash and grouped by APT group. Samples are put in separate password-protected compressed folders (.zip). The password for all files is infected.

Source

The malware samples are collected using open source threat intelligence reports from multiple vendors. Many threat intelligence reports were collected and a list of all filehashes used as indicators of compromise (IoC) has been collected. These hashes were used to obtain the malware samples from VirusTotal.

The file overview.csv contains an overview of all malware samples and the reports in which their hash-value has been found.

Code Used for Authorship Attribution

The source code of the experiments performed for benchmarking authorship attribution performance can be found at GitHub: APT Attribution Code.

License

Open Database License

This APT Malware Dataset is made available under Open Database License whose full text can be found at http://opendatacommons.org/licenses/odbl/. Any rights in individual contents of the database are licensed under the Database Contents License whose text can be found http://opendatacommons.org/licenses/dbcl/.

aptmalware's People

Contributors

cyber-research avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aptmalware's Issues

Two possible misattributions & wrong country for DarkHotel

Two samples were incorrectly assigned to FancyBear (APT28), although they were attributed to CozyBear (APT29) according to the CrowdStrike article (see table below) that served as ground truth for the blog post referenced in the cyber-research dataset.

crowdstrike

In fact, the blog post incorrectly lists CozyBear (APT29) as creator of all file hashes mentioned in the CrowdStrike article including the hash of the threat intelligence report itself (see image below), even though only two samples belong to CozyBear (APT29) and three to FancyBear (APT28).

blogpost

The file hash of the threat intelligence report was erroneously included in the cyber-research dataset, presumably because the report was also uploaded to VirusTotal as a PDF, so it was probably assumed that it is a real malware sample (despite no AV labels it as malicious).

In the same blog post, nine more threat intelligence reports could be found, eight of which are included in the dataset although they have no or only one detection (false positive) on VirusTotal.

Furthermore, cyber-research assigned the DarkHotel APT to North Korea, although the campaign has been attributed to South Korea by various security researchers (see references on Malpedia).

I created a pull request #2 proposing the following changes:

  • move the two samples from the APT28 to the APT29 folder
  • remove the threat intelligence reports from the dataset
  • change the attributed country of DarkHotel APT from North Korea to South Korea
  • update the expired URLs in the overview.csv
  • update numbers in the README.md

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.