Today a user cannot point to a folder and ingest all datasets with the tool.

Initial <a href="https://splunk-usergroups.slack.com/archives/CDNHXVBGS/p1649691877572

Add bulk replay capabilities to replay.py about attack_data HOT 2 OPEN

splunk commented on June 23, 2024

Add bulk replay capabilities to replay.py

from attack_data.

Comments (2)

fryguy04 commented on June 23, 2024

Initial Slack conversation over here

One idea to start the conversation is to split the current replay.yml into two parts ...

config.yml which contains the Splunk params (host/user/pass) + default index + update_timestamp
dataset.yml which would exists in each datasets directory (adding info to existing yml file) and contains name + source + sourcetype + index (if user wants to override default one in config.yml

Propose we standardize the per-directory yml filename to dataset.yml so it can easily be found/recognized.

Calling replay.py could look like this ...

python replay.py -h 
      -c config.yml         Splunk configuration (host/user/pass/index/override timestamp) (required)
       -d <directory>      Directory to recursively search for dataset.yml to start ingesting (required)

       -i <index>             Override index in config.yml (optional)
       -t                            Override config.yml and update timestamps (optional)
       -s <seconds>       Sleep seconds in between directory ingests (allow splunk to catchup indexing) (optional)

Each directory's *.yml currently seems to have the sourctypes but not linked/ordered with filename. Here's an example

author: Patrick Bareiss, Michael Haag
id: cc9b25d6-efc9-11eb-926b-550bf0943fbb
date: '2022-01-12'
description: 'Atomic Test Results: Successful Execution of test T1003.001-1 Windows
  Credential Editor Successful Execution of test T1003.001-2 Dump LSASS.exe Memory
  using ProcDump Return value unclear for test T1003.001-3 Dump LSASS.exe Memory using
  comsvcs.dll Successful Execution of test T1003.001-4 Dump LSASS.exe Memory using
  direct system calls and API unhooking Return value unclear for test T1003.001-6
  Offline Credential Theft With Mimikatz Return value unclear for test T1003.001-7
  LSASS read with pypykatz '
environment: attack_range
dataset:
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-powershell.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-security.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon_creddump.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-system.log
sourcetypes:
- XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
- WinEventLog:Microsoft-Windows-PowerShell/Operational
- WinEventLog:System
- WinEventLog:Security
references:
- https://attack.mitre.org/techniques/T1003/001/
- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.001/T1003.001.md
- https://github.com/splunk/security-content/blob/develop/tests/T1003_001.yml

As you can see the 'dataset' files are in a different order than 'sourcetypes'. Propose we bring a formal linkage from the filename to the source/sourcetype (basically moving replay_parameters logic from replay.yml to each directory's dataset.yml file so it can be documented per dataset capture and replayed

author: Patrick Bareiss, Michael Haag
id: cc9b25d6-efc9-11eb-926b-550bf0943fbb
date: '2022-01-12'
description: 'Atomic Test Results: Successful Execution of test T1003.001-1 Windows
  Credential Editor Successful Execution of test T1003.001-2 Dump LSASS.exe Memory
  using ProcDump Return value unclear for test T1003.001-3 Dump LSASS.exe Memory using
  comsvcs.dll Successful Execution of test T1003.001-4 Dump LSASS.exe Memory using
  direct system calls and API unhooking Return value unclear for test T1003.001-6
  Offline Credential Theft With Mimikatz Return value unclear for test T1003.001-7
  LSASS read with pypykatz '
environment: attack_range
references:
- https://attack.mitre.org/techniques/T1003/001/
- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.001/T1003.001.md
- https://github.com/splunk/security-content/blob/develop/tests/T1003_001.yml

replay_parameters:
  - name: atomic_red_team/windows-powershell.log
       source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
       sourcetype: xmlwineventlog
       notes: <optional>
  - name: windows-sysmon.log
       source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
       sourcetype: xmlwineventlog

from attack_data.

josehelps commented on June 23, 2024

I really dig this proposal, although it will cause us to have to refactor a few aspects of our testing pipeline to read from the new yaml structures. With this approach we can/should also create a spec for the dataset.yml and run CI/CD validation for it on every PR. Similarly to security_content repo here. Let me bring this back to the team and think through it but at the surface looks absolutely doable 😄. Thank you so much for spending the time to write this up, super useful!

from attack_data.

Add bulk replay capabilities to replay.py about attack_data HOT 2 OPEN

Comments (2)

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent