TypeDB Data - CTI
Overview
TypeDB Data - CTI is an open source knowledge graph for organisations to store and manage their cyber threat intelligence (CTI) knowledge. It enables CTI professionals to bring together their disparate CTI information into one knowledge graph and find new insights about cyber threats.
The benefits of using TypeDB for CTI:
- TypeDB enables data to be modelled based on logical and object-oriented principles. This makes it easy to create complex schemas and ingest disparate and heterogeneous networks of CTI data, through concepts such as type hierarchies, nested relations and n-ary relations.
- TypeDB's ability to perform logical inference during query runtime enables the discovery of new insights from existing CTI data — for example, inferred transitive relations that indicate the attribution of a particular attack pattern to a state-owned entity.
- TypeDB enables links between hash values, IP addresses, or indeed any data value that is shared to be made by default, as uniqueness of attribute values is a database guarantee. When attributes are inserted, unique values for any data type are only stored once, and all other uses of that value are connected by relations.
This repository provides a schema that is based on STIX2, and contains MITRE ATT&CK as an example dataset to start exploring this CTI knowledge graph. In the future, we plan to incorporate other CTI standards and data sources, in order to create an industry-wide data specification in TypeQL that can be used to ingest any type of CTI data.
STIX
Structured Threat Information Expression (STIX™) is a language and serialization format used to exchange cyber threat intelligence (CTI).
STIX enables organizations to share CTI with one another in a consistent and machine readable manner, allowing security communities to better understand what computer-based attacks they are most likely to see and to anticipate and/or respond to those attacks faster and more effectively.
STIX is designed to improve many different capabilities, such as collaborative threat analysis, automated threat exchange, automated detection and response, and more.
The data model in TypeDB Data - CTI is currently based on STIX (specifically STIX 2.1), offering a unified and consistent data model for CTI information from an operational to strategic level. This enables the ingestion of heterogeneous CTI data to provide analysts with a single common language to describe the data they work with.
To learn more about STIX, this introduction and explanation is a good place to start learning how STIX works and why TypeDB Data - CTI uses it.
An in-depth overview of the how the STIX2 model has been implemented in TypeDB will follow.
MITRE ATT&CK STIX Data
MITRE ATT&CK is a globally-accessible knowledge base of adversary tactics and techniques based on real-world observations. The ATT&CK knowledge base is used as a foundation for the development of specific threat models and methodologies in the private sector, in government, and in the cybersecurity product and service community.
TypeDB Data - CTI includes a migrator to load MITRE ATT&CK STIX and serves as an example datasets to quickly start exploring the knowledge graph.
Installation
Prerequesites:
Clone this repo:
git clone https://github.com/typedb-osi/typedb-data-cti
Set up a virtual environment and install the dependencies:
cd <path/to/typedb-data-cti>/
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Start TypeDB
typedb server
Start the migrator script
python migrate.py
This will create a new database called cti
, insert the schema file and ingest the MITRE ATT&CK datasets; it will take under one minutes to complete.
Examples
Once the data is loaded, these queries can be used to explore the data.
- Does the "Restrict File and Directory Permissions" course of action mitigate the "BlackTech" intrusion set, and if so, how?
match
$course isa course-of-action, has name "Restrict File and Directory Permissions";
$in isa intrusion-set, has name "BlackTech";
$mit (mitigating: $course, mitigated: $in) isa mitigation;
This query returns a relation of type inferred-mitigation
between the two entities:
But the inferred-mitigation
relation does not actually exist in the database, it was inferred at query runtime by TypeDB's reasoner. By double clicking on the inferred relation, the explanation shows that the course-of-action
actually mitigates an attack-pattern
with the name Indicator Blocking
, which has a use
relation with the intrusion-set
.
However, that use
relation (between the intrusion-set
and the attack-pattern
) is also inferred. Double clicking on it shows that the attack-pattern
is not directly used by the intrusion-set
. Instead, it is used by a malware
called Waterbear
, which is used by the intrusion-set
.
- What attack patterns are used by the malwares that were used by the intrusion set APT28?
match
$intrusion isa intrusion-set, has name "APT28";
$malware isa malware, has name $n1;
$attack-pattern isa attack-pattern, has name $n2;
$rel1 (used-by: $intrusion, used: $malware) isa use;
$rel2 (used-by: $malware, used: $attack-pattern) isa use;
This query asks for the entity type intrusion-set
with name APT28
. It then looks for all the malwares
that are connected to this intrusion-set
through the relation use
. The query also fetches all the attack-patterns
that are connected through the relation use
to these malwares
.
The full answer returns 207 results. Two of those results can be visualised in TypeDB Studio like this:
- What are the attack patterns used by the malware "FakeSpy"?
match
$malware isa malware, has name "FakeSpy";
$attack-pattern isa attack-pattern, has name $apn;
$use (used-by: $malware, used: $attack-pattern) isa use;
Running this query will return 15 different attack-patterns
, all of which have a relation of type use
to the malware
. This is how it is visualised in TypeDB Studio:
Community
If you need any technical support or want to engage with this community, you can join the #typedb-data-cti channel in the TypeDB Discord server.