Giter Club home page Giter Club logo

loglizer's Introduction

loglizer

A Python toolkit for anomaly detection via log analysis

Loglizer is an open-source python tool for automatic log-based anomaly detection with machine learning techniques. In this project, six popular anomaly detection methods are implemented and evaluated on two public datasets. More details (e.g., experimental results, findings) can be found in our paper.

Log Data

We collected a number of log datasets in loghub. Please send a request to acquire the data through the link.


Background

Anomaly detection plays an important role in management of modern large-scale distributed systems. Logs, which record system runtime information, are widely used for anomaly detection. Traditionally, developers (or operators) often inspect the logs manually with keyword search and rule matching. The increasing scale and complexity of modern systems, however, make the volume of logs explode, which renders the infeasibility of manual inspection. To reduce manual effort, many anomaly detection methods based on automated log analysis are proposed. ย  In our paper, we provide a detailed review and evaluation of six state-of-the-art log-based anomaly detection methods, including three supervised methods and three unsupervised methods, and also release an open-source toolkit allowing ease of reuse. These methods have been evaluated on two publicly-available production log datasets.

Overview of framework

1. Log collection: Logs are generated and collected by system and sofwares during running, which includes distributed systems (e.g., Spark, Hadoop), standalone systems (e.g., Windows, Mac OS) and softwares (e.g., Zookeeper).
2. Log Parsing: Raw Logs contain too much runtime information (e.g., IP address, file name). These variable information are often removed after log parsing as they are useless for debugging. After parsing, raw logs become log events, which are abstraction of raw logs. Details are given in our previous work: Logparser
3. Feature Extraction: Logs are grouped into log sequences via Task ID or time, and these log sequences are vectorized and weighted.
4. Anomaly Detection: Some machine learning models are trained and applied to detect anomalies.

The framework is illustrated as follows:

Framework of Anomaly Detection

In our toolbox, we mainly focus on Feature Extraction and Anomaly Detection, while Log Collection and Log Parsing are out of the scope of this project. To be more specific, the input is the parsed log events, and the output is whether it is anomaly for each log sequence.

Anomaly detection methods

Paper

Our paper is published on the 27th International Symposium on Software Reliability Engineering (ISSRE 2016), Ottawa, Canada. The information can be found here:
Title: Experience Report: System Log Analysis for Anomaly Detection
Authors: Shilin He, Jieming Zhu, Pinjia He, and Michael R. Lyu
Paper link: paper

Bibtex:
@Inproceedings{He16ISSRE,
title={Experience Report: System Log Analysis for Anomaly Detection},
author={He, S. and Zhu, J. and He, P. and Lyu, M. R.},
booktitle={ISSRE'16: Proc. of the 27th International Symposium on Software Reliability Engineering}
}

Question:

For easy maintenance, please raise an issue at ISSUE

History:

  • May 14, 2016: initial commit
  • Sep 21, 2017: update code and ReadME
  • March 21, 2018: rewrite most of the code, re-organize the project structure, and add detailed comments

loglizer's People

Contributors

shilinhe avatar pinjiahe avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.