mitre-attack / attack-datasources Goto Github PK
View Code? Open in Web Editor NEWThis content is analysis and research of the data sources currently listed in ATT&CK.
License: Apache License 2.0
This content is analysis and research of the data sources currently listed in ATT&CK.
License: Apache License 2.0
Thank you for this! We love ATT&CK, but the data sources sections have always felt a bit "loose" and left mostly as an exercise for the reader. The blog series and this repo prompted a couple questions I hoped you could discuss:
Why not use/extend an existing schema for the abstractions?
For example, STIX Cyber-observable Objects (SCO) cover some of the same ground, and link nicely with STIX-formatted intel ... like ATT&CK itself. The spec for the objects and their relationships reads a bit like your yaml data sources, and they can be reified with real data. Seems like STIX SCO is a natural fit, plus it has a well-thought-out relationship model, serialization format, extensions, etc.
The Elastic Common Schema (ECS) is great too - it's permissively licensed, available for collaboration on github, has abstractions for many of the examples you provide (users, processes, etc.), and is already powering a lot of searches, visualizations, and analytics. We see it more in ops contexts, and it's perhaps a bit more flexible than SCOs. For example, you see it frequently merged with existing event data so you get the benefit of the abstractions without sacrificing the specificity of the original event.
One of the beautiful things about ATT&CK is it reduced bike-shedding over terminology and helped the infosec community focus - STIX and ECS have put a lot of similar work, seems good to stand on the shoulders of giants. Naming things is hard, and it takes time to overcome intuitions (even at the top level: e.g., to my ear the phrase "data source" connotes the place you get the data, rather than an abstraction of the observable, but I'm just one guy ๐).
In any case, if ATT&CK leveraged one of these for the abstract entities, seems you could save energy for more ATT&CK-specific work like mapping those to (sub-)techniques or the actual concrete logs/artifacts.
Are there plans to be more specific about mappings to artifacts?
Presumably the idea is that (sub-)techniques would eventually use these new abstract data sources to replace or augment the text in the current "Data Sources" section. Unfortunately, unless I'm missing something, the proposed model doesn't seem to have a way to capture links to the concrete logs/artifacts.
For example, the mapping example in figure 13 in part 2 of the blog series illustrates this last step:
That is, it shows links from the data components to specific event logs on the right, and that last step is really useful ... but it doesn't actually live anywhere in this repo's proposed approach. For many teams that last leg is the hard part! If we took your schema, for example, maybe added something like:
- name: Service
definition: Information about software programs that run in the background ...
example_artifacts:
- {os: windows, artifact: Security Audit Event 4688}
- {os: windows, artifact: Sysmon Event 1}
- {os: windows, artifact: Prefetch file}
- {os: linux, artifact: auditd SYSCALL event}
- {os: linux, artifact: auditd EXECVE event}
# etc
Perhaps this is considered out of scope, but hopefully not; it'd be great to see something as authoritative as ATT&CK pointing folks to specific useful artifacts rather than just the abstraction. I'd love to hear your thoughts.
Thanks again for your hard work on this and all the related projects, I look forward to learning more!
Hello.
With the new DS structure NIDS and WAF are no longer available. A new relationship could be created in order to improve the mapping with alert related events:
- source_data_element: network traffic
relationship: triggered
target_data_element: alert
Thanks in advance.
The code that is generating the techniques_to_components_mapping.yaml seems to be writing .nan when there is no data source. Perhaps these should be left blank or omitted.
There's a minor typo (linkiable
) in the definition text, and I think the overall definition can be modified a little for accuracy since PE/ELF/Mach-O encompass both executables and libraries. I would suggest:
Information about module files consisting of one or more classes and interfaces, such as portable executable (PE) format executables/dynamic link libraries (DLL), executable and linkable format (ELF) executables/shared libraries, and Mach-O format executables/shared libraries.
Thanks for the project. It's a very good idea.
It could be useful for many project to reuse the same data source description or create relationships on a permanent basis (just like we do in CyCAT.org).
Hello,
While working in the new data sources you made, I found that there are some duplicates and some non-referenced types in the techniques descriptions.
If this is not on purpose, I can try and find all occurrences and report them to you through a PR for missing data components and a list for duplicates.
Thanks,
When I tried to load logon_session.yml I have gotten the error
mapping values are not allowed here
at line 22, column 72.
The offending line is
description: Data and information that describe a logon session (ex: logon type) and activity within it.
It can be solved by removing the space between the colon and logon
description: Data and information that describe a logon session (ex:logon type) and activity within it.
Best regards,
Sascha90
Below Commands in .ipnyb file reproduce this error:
attack = get_attack_dataframe()
attack.head()
output :
KeyError Traceback (most recent call last)
Input In [32], in
----> 1 attack = get_attack_dataframe()
2 attack.head()File D:\Dec-\attack-datasources-main\docs\scripts\notebook_functions.py:57, in get_attack_dataframe(matrix)
53 attck = json_normalize(attck)
54 # view available columns - my line
55 #print(attck.columns)
56 # selecting columns
---> 57 attck = attck[['technique_id','x_mitre_is_subtechnique','technique','tactic','platform','data_sources']]
59 # Splitting data_sources field
60 attck = attck.explode('data_sources').reset_index(drop=True)File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py:3464, in DataFrame.getitem(self, key)
3462 if is_iterator(key):
3463 key = list(key)
-> 3464 indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
3466 # take() does not accept boolean indexers
3467 if getattr(indexer, "dtype", None) == bool:File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py:1314, in _LocIndexer._get_listlike_indexer(self, key, axis)
1311 else:
1312 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
-> 1314 self._validate_read_indexer(keyarr, indexer, axis)
1316 if needs_i8_conversion(ax.dtype) or isinstance(
1317 ax, (IntervalIndex, CategoricalIndex)
1318 ):
1319 # For CategoricalIndex take instead of reindex to preserve dtype.
1320 # For IntervalIndex this is to map integers to the Intervals they match to.
1321 keyarr = ax.take(indexer)File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py:1377, in _LocIndexer._validate_read_indexer(self, key, indexer, axis)
1374 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1376 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 1377 raise KeyError(f"{not_found} not in index")KeyError: "['x_mitre_is_subtechnique'] not in index"
I found this new data sources very promising as someone coming from the ATT&CK matrix world looking for reducing the gap between events and CTI.
This is more a design question than an issue:
Need to update mappings file (https://github.com/mitre-attack/attack-datasources/blob/main/sub_techniques_research_reference/DataSources_Techniques_Mapping.yaml) following the release (https://twitter.com/MITREattack/status/1379864257697869828).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.