Giter Club home page Giter Club logo

attack-datasources's People

Contributors

adampennin avatar alexiacrumpton avatar cyb3rpandah avatar fenr1r-g avatar glennhd avatar ikiril01 avatar isabella-ma avatar jcwilliamsatmitre avatar jondricek avatar marcusbakker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

attack-datasources's Issues

Questions about prior art and specific mappings

Thank you for this! We love ATT&CK, but the data sources sections have always felt a bit "loose" and left mostly as an exercise for the reader. The blog series and this repo prompted a couple questions I hoped you could discuss:

  1. Why not use/extend an existing schema for the abstractions?

    For example, STIX Cyber-observable Objects (SCO) cover some of the same ground, and link nicely with STIX-formatted intel ... like ATT&CK itself. The spec for the objects and their relationships reads a bit like your yaml data sources, and they can be reified with real data. Seems like STIX SCO is a natural fit, plus it has a well-thought-out relationship model, serialization format, extensions, etc.

    The Elastic Common Schema (ECS) is great too - it's permissively licensed, available for collaboration on github, has abstractions for many of the examples you provide (users, processes, etc.), and is already powering a lot of searches, visualizations, and analytics. We see it more in ops contexts, and it's perhaps a bit more flexible than SCOs. For example, you see it frequently merged with existing event data so you get the benefit of the abstractions without sacrificing the specificity of the original event.

    One of the beautiful things about ATT&CK is it reduced bike-shedding over terminology and helped the infosec community focus - STIX and ECS have put a lot of similar work, seems good to stand on the shoulders of giants. Naming things is hard, and it takes time to overcome intuitions (even at the top level: e.g., to my ear the phrase "data source" connotes the place you get the data, rather than an abstraction of the observable, but I'm just one guy ๐Ÿ™‚).

    In any case, if ATT&CK leveraged one of these for the abstract entities, seems you could save energy for more ATT&CK-specific work like mapping those to (sub-)techniques or the actual concrete logs/artifacts.

  2. Are there plans to be more specific about mappings to artifacts?

    Presumably the idea is that (sub-)techniques would eventually use these new abstract data sources to replace or augment the text in the current "Data Sources" section. Unfortunately, unless I'm missing something, the proposed model doesn't seem to have a way to capture links to the concrete logs/artifacts.

    For example, the mapping example in figure 13 in part 2 of the blog series illustrates this last step:

    Data source mapping example

    That is, it shows links from the data components to specific event logs on the right, and that last step is really useful ... but it doesn't actually live anywhere in this repo's proposed approach. For many teams that last leg is the hard part! If we took your schema, for example, maybe added something like:

    - name: Service
      definition: Information about software programs that run in the background ...
      example_artifacts:
        - {os: windows, artifact: Security Audit Event 4688}
        - {os: windows, artifact: Sysmon Event 1}
        - {os: windows, artifact: Prefetch file}
        - {os: linux, artifact: auditd SYSCALL event}
        - {os: linux, artifact: auditd EXECVE event}
        # etc

    Perhaps this is considered out of scope, but hopefully not; it'd be great to see something as authoritative as ATT&CK pointing folks to specific useful artifacts rather than just the abstraction. I'd love to hear your thoughts.

Thanks again for your hard work on this and all the related projects, I look forward to learning more!

Support NIDS and WAF via new 'network traffic content' relationship

Hello.

With the new DS structure NIDS and WAF are no longer available. A new relationship could be created in order to improve the mapping with alert related events:

  • Data source: Network Traffic
  • Data component: network traffic content
  • Relationship:
  - source_data_element: network traffic        
    relationship: triggered        
    target_data_element: alert

Thanks in advance.

Fix Definition for Module

definition: Information about module files such as executable, dynamic link library (dll), executable and linkiable format (elf), and Mach-o consisting of one or more classes and interfaces.

There's a minor typo (linkiable) in the definition text, and I think the overall definition can be modified a little for accuracy since PE/ELF/Mach-O encompass both executables and libraries. I would suggest:

Information about module files consisting of one or more classes and interfaces, such as portable executable (PE) format executables/dynamic link libraries (DLL), executable and linkable format (ELF) executables/shared libraries, and Mach-O format executables/shared libraries.

Permanent UUID or ID in attack-datasources

Thanks for the project. It's a very good idea.

  • Will you add an fixed/permanent UUID or ID in the sources?

It could be useful for many project to reuse the same data source description or create relationships on a permanent basis (just like we do in CyCAT.org).

Inexistant data components references and duplicate sources

Hello,

While working in the new data sources you made, I found that there are some duplicates and some non-referenced types in the techniques descriptions.

In the following example :
image

  • File: File Content does not exist here
  • File: File Creation is referenced multiple times.

If this is not on purpose, I can try and find all occurrences and report them to you through a PR for missing data components and a list for duplicates.

Thanks,

Small loading error

When I tried to load logon_session.yml I have gotten the error

mapping values are not allowed here
at line 22, column 72.

The offending line is
description: Data and information that describe a logon session (ex: logon type) and activity within it.

It can be solved by removing the space between the colon and logon
description: Data and information that describe a logon session (ex:logon type) and activity within it.

Best regards,
Sascha90

KeyError: "['x_mitre_is_subtechnique'] not in index"

  • This error occurs in the notebook_functions.py file at the get_attack_dataframe function.

Below Commands in .ipnyb file reproduce this error:
attack = get_attack_dataframe()
attack.head()

output :

KeyError Traceback (most recent call last)
Input In [32], in
----> 1 attack = get_attack_dataframe()
2 attack.head()

File D:\Dec-\attack-datasources-main\docs\scripts\notebook_functions.py:57, in get_attack_dataframe(matrix)
53 attck = json_normalize(attck)
54 # view available columns - my line
55 #print(attck.columns)
56 # selecting columns
---> 57 attck = attck[['technique_id','x_mitre_is_subtechnique','technique','tactic','platform','data_sources']]
59 # Splitting data_sources field
60 attck = attck.explode('data_sources').reset_index(drop=True)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py:3464, in DataFrame.getitem(self, key)
3462 if is_iterator(key):
3463 key = list(key)
-> 3464 indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
3466 # take() does not accept boolean indexers
3467 if getattr(indexer, "dtype", None) == bool:

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py:1314, in _LocIndexer._get_listlike_indexer(self, key, axis)
1311 else:
1312 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
-> 1314 self._validate_read_indexer(keyarr, indexer, axis)
1316 if needs_i8_conversion(ax.dtype) or isinstance(
1317 ax, (IntervalIndex, CategoricalIndex)
1318 ):
1319 # For CategoricalIndex take instead of reindex to preserve dtype.
1320 # For IntervalIndex this is to map integers to the Intervals they match to.
1321 keyarr = ax.take(indexer)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py:1377, in _LocIndexer._validate_read_indexer(self, key, indexer, axis)
1374 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1376 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 1377 raise KeyError(f"{not_found} not in index")

KeyError: "['x_mitre_is_subtechnique'] not in index"

Questions about data format

I found this new data sources very promising as someone coming from the ATT&CK matrix world looking for reducing the gap between events and CTI.

This is more a design question than an issue:

  1. Why did you choose YAML over JSON that is widely used in the cti repo ?
  2. Why did not you follow the STIX format to make it more easily connectable to the (sub)technique from the same cti repo ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.