Giter Club home page Giter Club logo

logsanitizer's Introduction

logsanitizer

Build Status Downloads Version License

Log processing and sanitizer tool written in Python. Please take into consideration to use with pypy to get the best performance. It's working well in Python 2.7 & 3.3+.

This package does the following:

  • Reads a log file line by line.
  • Detects the dialect of every line individually.
  • Filters and removes based on pre-defined rules.
  • Classifies your logs (e.g.: determines event's name).
  • Reformats your line and writes in a standardized format.

Installation

You can use pip or easy_install to install this package.

$ pip install logsanitizer

How to write a dialect

Since we don't provide any built-in dialect you have to write your own if you want to use this package for your work.

Let's see an example about how you can do this. For example your nginx creates lines in the following format where the json_data is containing the user_id and the client_id.

:timestamp :service_name :method :url :status_code :json_data

At first, you have to write a python file that describes your dialect. Call this file nginx.py.

import json
import logsanitizer

class NginxLine(logsanitizer.Line):
    # Dialect's parse method to split the line into variables.
    @classmethod
    def parse(cls, classificator, line):
        return cls(classificator, *line.split(' ',5))

    # Constructor to load the line, if this function 
    # fails then this dialect will be skipped and it will
    # try to parse with the next dialect in order.
    def __init__(self, classificator, timestamp, service_name, method, url, status_code, data):
        # Call the parent function, it will set the `classificator` variable.
        super(NginxLine, self).__init__(classificator)

        # Save the basic information, remember the variable names.
        self.timestamp = timestamp
        self.service_name = service_name
        self.method = method
        self.url = url
        self.status_code = status_code
        
        # Parse the JSON file.
        json_data = json.loads(data)
        
        # Save the parsed variables into variables.
        self.user_id = json_data.get('user_id')
        self.client_id = json_data.get('client_id')

        self.event = None # Will be classified later.

	# Checks if it's the given dialect or not
    def is_type(self):
        return all([self.user_id, self.client_id])

    # Checks if it's a productional line or not
    def is_production(self):
        return 'production' in self.service_name

    # Defines the standardized CSV format that will be 
    # generated with this dialect. You may use the same
    # output for every dialect you have.
    def get_row(self):
        return [ self.user_id, self.client_id, self.event ]

Now, you have to create a Yaml configuration file for this dialect. Let's call this file nginx.yaml. The main idea behind this separation was to keep the format fixed and the rules easily changeable.

# Header, please fill out these fields.
dialect: nginx
package: nginx.py
class: NginxLine

# Describe your classifications.
classifications:

# Use `match_` prefix to add equal condition between a 
# variable and it's value. In this case, it means the following:
# if line.url == '/projects':
#     line.event = 'ListProjects'
- match_url: /projects
  event: ListProjects

# Use the `pattern_` prefix to add a regular expression between
# a variable and it's value. In this case, it means the following:
# if re.match(r'^/projects/\d+$', line.url):
#     line.event = 'ViewProject'
- pattern_url: !regexp '^/projects/\d+$'
  event: ViewProject

# You can use the `{\d}` to refer the regular expression's group
# attribute.
- pattern_url: !regexp '^/projects/\d+/(.*)$'
  event: ViewProject.Page.{0}

# You can combine multiple conditions. They'll have an AND 
# relation, so every condition have to be fulfilled.
- pattern_url: !regexp '^/login/(.*)$'
  match_status: 302
  event: Login.Provider.{0}

# Ignores every following line where the `method` is `GET`.
- match_method: GET
  ignore: true

# No condition, it will only change the `event`'s value.
- event: OtherEvent

That's it. You can execute your script as

$ cat input-file.log | logsanitizer nginx.yml > output-file.log 

You can also use multiple dialects.

$ cat input-file.log | logsanitizer nginx.yml otherservice.yml > output-file.log 

License

Copyright © 2016 Microsoft.

Distributed under the MIT License.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

logsanitizer's People

Contributors

bfaludi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

logsanitizer's Issues

Odd Character In Output

Hey,

So today I've been using this repo quite heavily, and I've noticed that my return is producing a weird character which sits in between all of my characters in my log lines. All of my logs are in UTF-8, here it is: "�", any ideas?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.