Giter Club home page Giter Club logo

text2ttp's Introduction

Text2TTP

Cyber Threat Intelligence Report to MITRE ATT&CK Metrix

This is the reproduction material for our work: "Semantic Ranking for Automated Adversarial Technique Annotation in Security Text" published in AsiaCCS'24.

If you use our tool, models, or dataset, please cite our work:

@inproceedings{10.1145/3634737.3645000,
    author = {Kumarasinghe, Udesh and Lekssays, Ahmed and Sencar, Husrev Taha and Boughorbel, Sabri and Elvitigala, Charitha and Nakov, Preslav},
    title = {Semantic Ranking for Automated Adversarial Technique Annotation in Security Text},
    year = {2024},
    isbn = {9798400704826},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3634737.3645000},
    doi = {10.1145/3634737.3645000},
    abstract = {We introduce a novel approach for mapping attack behaviors described in threat analysis reports to entries in an adversarial techniques knowledge base. Our method leverages a multi-stage ranking architecture to efficiently rank the most related techniques based on their semantic relevance to the input text. Each ranker in our pipeline uses a distinct design for text representation. To enhance relevance modeling, we leverage pretrained language models, which we fine-tune for the technique annotation task. While generic large language models are not yet capable of fully addressing this challenge, we obtain very promising results. We achieve a recall rate improvement of +35\% compared to the previous state-of-the-art results. We further create new public benchmark datasets for training and validating methods in this domain, which we release to the research community aiming to promote future research in this important direction.},
    booktitle = {Proceedings of the 19th ACM Asia Conference on Computer and Communications Security},
    pages = {49โ€“62},
    numpages = {14},
    keywords = {threat intelligence, TTP annotation, text ranking, text attribution},
    location = {Singapore, Singapore},
    series = {ASIA CCS '24}
}

Core Maintainer and Developer: Udesh Kumarasinghe (mail @ udesh . xyz)

Directory Overview

  • data contains the datasets created in this work.
    • sentences.csv - Aggregated threat behavior dataset.
    • sentences_ioc.csv - Sentence dataset annotated with IOCs.
    • sentences_santitized.csv - Sentence dataset with IOCs sanitized.
  • libs - Python packages to load, preprocess, run the pipeline, and evaluate.
  • preprocessing - Notebooks demonstrating the preprocessing steps.
  • models - Our Pre-trained models used in our experiments. They are available on HuggingFace: Models
  • Pipeline.ipynb - Usage of proposed threat detection pipeline.

How to Use

Refer to the environment.yml file to install the python packages required.

Using the dataset

Load the aggregated dataset using the resources module.

from libs import resources as res

# Import all the sentences in the aggregated dataset
sentences = res.load_annotated()

# Filter to get the sentences of specific dataset
# Options available:
# 'manual' - manually annotated sentences in this work
# 'tram' - sentences from the training dataset of TRAM
# 'cisa' - annotated sentences extracted from CISA reports
# 'eset' - annotated sentences extracted from WeLiveSecurity reports
man_sentences = sentences[sentences.datasets == 'manual']

Running the pipeline

Our implementation of the proposed pipeline uses the pygaggle framework. For simplicity, we exposed the rank module with functionality to preprocess queries, initialize re-rankers and caching.

from libs import rank

# Preprocess the MITRE ATT&K Knowledge Base and report sentences
texts, _ = rank.get_texts(corpus)
queries = rank.get_queries(sentences, label_col='tech_id')

# Initialize the reranking models for the pipeline
stage1_reranker = rank.construct_bm25()
stage2_reranker = rank.construct_sentsecbert()
stage3_reranker = rank.construct_monot5()

Refer to the Pipeline.ipynb for detailed example of the pipeline. Additional, resources on how to run can be found at here.

Third Party Frameworks

This work utilizes modified and extended versions of the following open-source works.

PyGaggle (https://github.com/castorini/pygaggle) - Used as a framework to build the re-ranking pipeline
ioc_parser (https://github.com/armbues/ioc_parser) - For parsing Indicators of Compromise.

text2ttp's People

Contributors

lekssays avatar

Stargazers

 avatar

Watchers

Anurag Shrivastava avatar  avatar Anastasios Fragopoulos avatar

text2ttp's Issues

Request for Requirements File for Text2TTP

Hi Lekssays,

I hope this message finds you well. I am very interested in your Text2TTP on GitHub and would like to replicate it on my local machine. However, I noticed that the repository does not include a requirements file listing the necessary dependencies.

Could you please provide a requirements.txt file or a list of the required packages?

Thank you very much for your assistance.

Best regards,
Emily Watson

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.