Giter Club home page Giter Club logo

pourover's Introduction

Pourover: Log Parsing for Lizards

https://circleci.com/gh/zthart/pourover/tree/develop.svg?style=svg

Pourover is the only chemicaly-altered CEF Log Parsing library for Python, ideal for consumption by Lizard People.

the requests guy does it so maybe it'll work for me

Some stuff we can do:

from datetime import datetime
import pourover


# Create log objects from a file
log = pourover.parse_file('test.log')

# check the length pythonically - expose useful properties
if len(log) > 10:
    if log.has_syslog_prefix and log.start_time > datetime(year=2018, month=4, day=20):
        # perform some operations
        pass
    else:
        # perform some operations on a logfile that doesn't have syslog prefixes
        pass
else:
    # perform some operations on a really small log
    pass

# Find messages with a certain value in the header
search_results = log.search_headers('Specific Vendor')

for message in log:
    # iterate through each message in the log like you'd expect to be able to
    pass

# Logs can be indexed/sliced in the way you'd expect
first_message = log[0]
last_message = log[-1]

# Create message objects from a string
message = pourover.parse_line('Apr 15 22:11:20 testhost CEF:0|Test Vendor|Test Product|Test Version|100|Test Name|100|src=1.1.1.1 dst=1.1.1.2')

if message.has_syslog_prefix:
    if message.timestamp > datetime(year=2018, month=4, day=20):
        # perform an operation on logs from later than April 20th, 2018
        pass

if 'src' in message.extensions:
    # do something if it's got an extension called 'src'
    pass

if message.device_vendor == 'Some Vendor':
    # do something if the vendor is Some Vendor
    pass

# stick this message right onto that log (it'll even order the messages by timestamp - wow!)
log.append(message)

Installing 💻

To install Pourover, simply run

$ pip install pourover
✨🐊✨

Features 🐊

- 🐲 Create CEF-formatted log lines from parameters with support for extensions and a syslog prefix
- 🐲 Create useful line objects from a string, or an entire log object from a file
- 🐲 Iterable log objects to manipulate collections of logs at once
- 🐲 Parse lines with or without syslog prefixes or extensions with ease
- 🐲 Search logs for messages with specific headers or extensions
- 🐲 And more to come...

Contributing 🐉

🐛 Bugs:
Please create any issues you think I should check out! If there's a bug you spot or a function you think is acting up, please let me know. This project will have tests eventually, but until then I'm sure there will be issues sprouting up from time to time!
New Features/PRs:
The project is still in it's infancy, so PRs might have a rough time getting merged in while the codebase is in a constant state of flux, but I'd me more than happy to have a discussion with you about a new feature you'd like to see!

Get in Touch 🐍

If you've found a Bug or would like to make a feature request, please see the Contributing section above, thanks!

If you'd like to reach out, shoot me an email at [email protected].

pourover's People

Contributors

zthart avatar

Watchers

James Cloos avatar

pourover's Issues

Escape Special Characters Found in Parameters Passed in for Message Creation

The create_line() function will, currently, blindly add the parameters it is passed to a message

header = 'CEF:' + '|'.join(
        [str(version), dev_vendor, dev_product, dev_version, str(dev_event_class_id), name, str(severity)]
) + '|'

If, for example, one of these parameters contains a pipe (|) character, the message will be invalid as it can't be parsed. This also includes \r\n sequences or any other escape sequence that would break the CEF format.

Hosted Documentation

I'm generally pretty good at writing docstrings, but some hosted docs would make this almost a real project

Discrepancy between name and event in models

Not sure what I was on when I took my first sprint at making this - there are some issues with the CEFMessage object and how it describes the Name portion of the CEF Prefix.

According to the CEF event interoperability standard document I was most recently able to get my hands on, the Name field is for a human readable name of the event described in the message. In many places throughout the codebase, this field is referred to as the device name - this is incorrect.

In the replace() function of the CEFMessage object, there are two optional arguments (device_event and device_name) that are implemented as though they are separate fields in the prefix.

The discrepancy in naming should be resolved (the proper name for the field is just Name as far as the document mentioned above is concerned), but the intention for the data stored in that field should be made clear as the name of the event.

Type consistency in prefix

The Version and Severity fields in the CEF prefix should be integers - attempt to cast the parsed value, raise an exception on failure

Fix FutureWarning in functions.py

tests/test_models.py::TestPourover::test_factory_availability
  /Users/zach/Documents/Projects/pourover/pourover/functions.py:62: FutureWarning: Possible nested set at position 29
    split_at_syslog_prefix = re.search(SYSLOG_SEP, header_values[0])

Showing up during tests, isn't really causing issues so It's fine for now, but something I should fix

More robust prefix parsing

There are definitely some not-exactly-the-standard prefixes that won't parse correctly at the moment - maybe switch away from regex-only matching and do some "intelligent" parsing.

Test Suite and CI

We need some tests so that I'm not just assuming things aren't breaking. CI would be cool too as like a stretch goal

Immutable Message Objects

Determine the usefulness of making CEFMessage objects immutable

Currently a message object created via its __init__() function will be empty, but both the headers and extensions can be modified as normal dict operations

from datetime import datetime
import pourover

message = pourover.parse_line('Apr 15 22:11:20 testhost CEF:0|Test Vendor|Test Product|Test Version|100|Test Name|100|src=1.1.1.1 dst=1.1.1.2')

# this is perfectly fine, as the code is currently written
message.extensions['src'] = 'totally new value'
message.headers['Prefix'] = datetime(year=2010, month=4, day=20, hour=4, minute=20, second=69).strftime('%b %d %H:%M:%S') + ' host'

This almost seems like it defeats the purpose of logging - I don't really want someone to be able to change an object in-place, it destroys the integrity of the log. Someone could, in theory, add a new Message to the log in between previous values, but at least in that scenario the original messages are all there.

Perhaps a datetime-style replace() function that will return new CEFMessage objects with replaced values?

Fix DeprecationWarnings for pipe escapes

pourover/functions.py:144
  /Users/zach/Documents/Projects/pourover/pourover/functions.py:144: DeprecationWarning: invalid escape sequence \|
    dev_vendor = dev_vendor.replace('|', '\|')

Investigate solution here

Intelligently Handle Year Assumption of CEFLine objects

Currently, the _parse_timestamp() function of the CEFLine class assumes that the year of all lines passed to it is Current Year. We should assume that if the month and day parsed from the are greater than the month and day of the present day that the log is from the past, rather than the future.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.