Giter Club home page Giter Club logo

dissect.eventlog's People

Contributors

martinvanhensbergen avatar miauwkeru avatar pyrco avatar schamper avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

zawadidone

dissect.eventlog's Issues

Feature request: (optional) lazy loading for event log objects for better performance

Currently, when parsing eventlog files, each entry is fully parsed into a nice structure using the bxml package. While this is very useful if you want full details of all events, I generally only want to filter for specific event id's, and proceed to parse out data belonging to them.

Normally, this is fine as event logs aren't that big. But I'm currently running into a serious performance issues when trying to parse an event log file which is over 5 gigabytes in size, which is taking me multiple hours to do inside of a virtual machine (with 2 processors, and 2 cores per processor & 16GB ram). Take the following example:

Benchmarking

import sys
from dissect.eventlog import Evtx


with open(sys.argv[1], "rb") as fd:
    for i, record in enumerate(Evtx(fd)):
        if i == 100000:
            print('Enumerated 100k events, aborting.')
            break

Benchmarking this with time:

$ time python3 stress-test.py damn-boi-he-thicc.evtx 
Enumerated 100k events, aborting.

real    44.02s
user    43.80s
sys     0.12s
cpu     99%

Using cProfile and snakeviz to check what is causing the performance hit:

$ python3 -m cProfile -o stress-test.perf stress-test.py damn-boi-he-thicc.evtx 
Enumerated 100k events, aborting.

$ snakeviz stress-test.perf

image

image

Causes

So far from what I can quickly tell is that the _get_map_recursive method is to blame as it makes two excessive calls to the following methods:

  • The dissect.eventlog.bxml.read_sid function calls dissect.cstruct.expression.ExpressionTokenizer.match which takes up a lot of CPU time. I in my case, I don't care too much about SID's until I filter out elements where I actually need them, meaning it might be a good idea to make this attribute lazy somehow.

  • Initializing the enum in the BxmlTemplateDescriptor constructor appears to be very slow. Might be increased by either using a drop-in replacement for enums like with fastenum, or just using another faster type like a dictionary

Conclusions

After looking around the performance data, most of the issues appear to be due to the fact lots of calls to dissect.cstruct are made unnecessarily. Moreover, while parsing each event log is useful in forensics cases where you need all the data, in other cases, it's rarely needed.

In my eyes the only 'important' properties to look for upfront could be the EventID, TimeGenerated and TimeWritten, which I could use to filter on and then request those logs are parsed on-demand.

Okay that's all thank you for coming to my TED talk

Manifest generator

Generate the Manifest file using both the WEVT and MUI parser

So we can use it to generate ETL manifests!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.