fox-it / dissect.eventlog Goto Github PK
View Code? Open in Web Editor NEWA Dissect module implementing parsers for the Windows EVT, EVTX and WEVT log file formats.
License: GNU Affero General Public License v3.0
A Dissect module implementing parsers for the Windows EVT, EVTX and WEVT log file formats.
License: GNU Affero General Public License v3.0
Currently, when parsing eventlog files, each entry is fully parsed into a nice structure using the bxml
package. While this is very useful if you want full details of all events, I generally only want to filter for specific event id's, and proceed to parse out data belonging to them.
Normally, this is fine as event logs aren't that big. But I'm currently running into a serious performance issues when trying to parse an event log file which is over 5 gigabytes in size, which is taking me multiple hours to do inside of a virtual machine (with 2 processors, and 2 cores per processor & 16GB ram). Take the following example:
import sys
from dissect.eventlog import Evtx
with open(sys.argv[1], "rb") as fd:
for i, record in enumerate(Evtx(fd)):
if i == 100000:
print('Enumerated 100k events, aborting.')
break
Benchmarking this with time
:
$ time python3 stress-test.py damn-boi-he-thicc.evtx
Enumerated 100k events, aborting.
real 44.02s
user 43.80s
sys 0.12s
cpu 99%
Using cProfile
and snakeviz
to check what is causing the performance hit:
$ python3 -m cProfile -o stress-test.perf stress-test.py damn-boi-he-thicc.evtx
Enumerated 100k events, aborting.
$ snakeviz stress-test.perf
So far from what I can quickly tell is that the _get_map_recursive
method is to blame as it makes two excessive calls to the following methods:
The dissect.eventlog.bxml.read_sid
function calls dissect.cstruct.expression.ExpressionTokenizer.match
which takes up a lot of CPU time. I in my case, I don't care too much about SID's until I filter out elements where I actually need them, meaning it might be a good idea to make this attribute lazy somehow.
Initializing the enum in the BxmlTemplateDescriptor
constructor appears to be very slow. Might be increased by either using a drop-in replacement for enums like with fastenum, or just using another faster type like a dictionary
After looking around the performance data, most of the issues appear to be due to the fact lots of calls to dissect.cstruct
are made unnecessarily. Moreover, while parsing each event log is useful in forensics cases where you need all the data, in other cases, it's rarely needed.
In my eyes the only 'important' properties to look for upfront could be the EventID
, TimeGenerated
and TimeWritten
, which I could use to filter on and then request those logs are parsed on-demand.
Okay that's all thank you for coming to my TED talk
Generate the Manifest file using both the WEVT and MUI parser
So we can use it to generate ETL manifests!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.