Giter Club home page Giter Club logo

Comments (2)

fryguy04 avatar fryguy04 commented on June 23, 2024

Initial Slack conversation over here

One idea to start the conversation is to split the current replay.yml into two parts ...

  1. config.yml which contains the Splunk params (host/user/pass) + default index + update_timestamp
  2. dataset.yml which would exists in each datasets directory (adding info to existing yml file) and contains name + source + sourcetype + index (if user wants to override default one in config.yml

Propose we standardize the per-directory yml filename to dataset.yml so it can easily be found/recognized.

Calling replay.py could look like this ...

python replay.py -h 
      -c config.yml         Splunk configuration (host/user/pass/index/override timestamp) (required)
       -d <directory>      Directory to recursively search for dataset.yml to start ingesting (required)

       -i <index>             Override index in config.yml (optional)
       -t                            Override config.yml and update timestamps (optional)
       -s <seconds>       Sleep seconds in between directory ingests (allow splunk to catchup indexing) (optional)      

Each directory's *.yml currently seems to have the sourctypes but not linked/ordered with filename. Here's an example

author: Patrick Bareiss, Michael Haag
id: cc9b25d6-efc9-11eb-926b-550bf0943fbb
date: '2022-01-12'
description: 'Atomic Test Results: Successful Execution of test T1003.001-1 Windows
  Credential Editor Successful Execution of test T1003.001-2 Dump LSASS.exe Memory
  using ProcDump Return value unclear for test T1003.001-3 Dump LSASS.exe Memory using
  comsvcs.dll Successful Execution of test T1003.001-4 Dump LSASS.exe Memory using
  direct system calls and API unhooking Return value unclear for test T1003.001-6
  Offline Credential Theft With Mimikatz Return value unclear for test T1003.001-7
  LSASS read with pypykatz '
environment: attack_range
dataset:
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-powershell.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-security.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon_creddump.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-system.log
sourcetypes:
- XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
- WinEventLog:Microsoft-Windows-PowerShell/Operational
- WinEventLog:System
- WinEventLog:Security
references:
- https://attack.mitre.org/techniques/T1003/001/
- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.001/T1003.001.md
- https://github.com/splunk/security-content/blob/develop/tests/T1003_001.yml

As you can see the 'dataset' files are in a different order than 'sourcetypes'. Propose we bring a formal linkage from the filename to the source/sourcetype (basically moving replay_parameters logic from replay.yml to each directory's dataset.yml file so it can be documented per dataset capture and replayed

author: Patrick Bareiss, Michael Haag
id: cc9b25d6-efc9-11eb-926b-550bf0943fbb
date: '2022-01-12'
description: 'Atomic Test Results: Successful Execution of test T1003.001-1 Windows
  Credential Editor Successful Execution of test T1003.001-2 Dump LSASS.exe Memory
  using ProcDump Return value unclear for test T1003.001-3 Dump LSASS.exe Memory using
  comsvcs.dll Successful Execution of test T1003.001-4 Dump LSASS.exe Memory using
  direct system calls and API unhooking Return value unclear for test T1003.001-6
  Offline Credential Theft With Mimikatz Return value unclear for test T1003.001-7
  LSASS read with pypykatz '
environment: attack_range
references:
- https://attack.mitre.org/techniques/T1003/001/
- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.001/T1003.001.md
- https://github.com/splunk/security-content/blob/develop/tests/T1003_001.yml

replay_parameters:
  - name: atomic_red_team/windows-powershell.log
       source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
       sourcetype: xmlwineventlog
       notes: <optional>
  - name: windows-sysmon.log
       source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
       sourcetype: xmlwineventlog


from attack_data.

josehelps avatar josehelps commented on June 23, 2024

I really dig this proposal, although it will cause us to have to refactor a few aspects of our testing pipeline to read from the new yaml structures. With this approach we can/should also create a spec for the dataset.yml and run CI/CD validation for it on every PR. Similarly to security_content repo here. Let me bring this back to the team and think through it but at the surface looks absolutely doable 😄. Thank you so much for spending the time to write this up, super useful!

from attack_data.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.