Giter Club home page Giter Club logo

out-of-memory's Introduction

Out-Of-Memory Investigation .py

Python 2.7.x - 3.x compatible

The following python script can be used to calculate the estimated RSS (RAM) value of each service at the time a kernel invoked OOM killer.

At the time of an OOM incident, the system logs the estimated RSS value of each service in its system log. Based off of this information the script will calculate how much RAM the services were "theoretically" trying to use, the total RAM value of all services and how much RAM your system actually has to offer these services. Allowing for further investigation into the memory usage of the top "offending" service(s).

The script can check:

  • Default system log
  • Journald
  • Dmesg

This script is designed to run on the following systems (that have python 2.7+):

  • Ubuntu
  • Debian
  • RHEL
  • CentOS
  • RockyLinux
  • AlamaLinux
  • OSX

More systems can be added. Please request here if you want something specific.


Running

Installation/Running

There are 3 methods of running the script:

  • Via git:

    • git clone [email protected]:LukeShirnia/out-of-memory.git;
    • cd out-of-memory;
    • python oom_investigate.py
  • Installation via pip:

    • git clone [email protected]:LukeShirnia/out-of-memory.git;
    • cd out-of-memory;
    • pip install .
    • oom_investigate
  • curl https://raw.githubusercontent.com/LukeShirnia/out-of-memory/v2/oom_investigate.py | python

    • WARNING: This isn't safe. You should probably grab the latest commit and curl that instead if you REALLY want to curl.

Usage:

usage.png output

Output Information:

The output from this script can be broken down into 4 main sections:

  • System info
  • Log file info
  • Warning overview
  • Incident overview

Section 1 - System Info

This section gives you a quick breakdown of the sytem the script has been executed on

systeminfo.png output

Section 2 - Log File Info

Information about which log file the script is using
logfileinfolog.png output
logfileinfo.png output
logfileinfodmesg.png output

Section 3 - Warning Overview

If at least one oom incident is detected, the script will first run, then create a summary overview.
It will attempt to inform you of the following information:

  • How many incidents have occurred in log
  • What services were killed and how many times
  • What incident was the worst and the RAM consumed at the time of this incident

warningoverview.png output

Section 4 - Incident Overview

By default the script will show the first 5 OOM incidents. You can show more or less. You can also show the incidents in reverse with --reverse flag.

exampleincident.png output

If there are a large amount of oom incidents, the top RAM consuming incident probably won't be in the script output unless --all is specified.
To make sure we always show the worst incident, a new section will be created at the beginning of the output:

highestincident.png output

out-of-memory's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

out-of-memory's Issues

FR: Add override for 300MB file limit

Currently the script does not run when the file is larger than 300Mb.
Add an option like --override to allow to bypass this limit if the device has a significant amount of RAM.

Note: Limit was put in place for small devices. Maybe add a check for free RAM and base the size of the file on that.

Add journalctl compatibility

Fedora 28 (and soon RHEL 8) will require journalctl compatibility as they do not log to /var/log/messages as before.
Add this functionality

Ctrl+c, ctrl+d error (--quick)

[root@lga-db ~]# monkey.py -o -- -q
Downloading oom tool ...
----------------------------------------
      _____ _____ _____ 
     |     |     |     |
     |  |  |  |  | | | |
     |_____|_____|_|_|_|
     Out Of Memory Analyser

Disclaimer:
If system OOMs too viciously, there may be nothing logged!
Do NOT take this script as FACT, investigate further
----------------------------------------
Checking other logs, select an option:
Option: 1  /var/log/messages          - Occurrences: 1
           /var/log/messages-20170806.gz - Occurrences: 0
Option: 2  /var/log/messages-20170814.gz - Occurrences: 1
Option: 3  /var/log/messages-20170820.gz - Occurrences: 1
           /var/log/messages-20170827.gz - Occurrences: 0

Which file should we check next?
Select an option number between 1 and 3: ^C
Traceback (most recent call last):
  File "/home/rack/monkeys/monkey-2517a72ed6.py", line 1479, in <module>
    main()
  File "/home/rack/monkeys/monkey-2517a72ed6.py", line 1437, in main
    external_script_action(opt, args)
  File "/home/rack/monkeys/monkey-2517a72ed6.py", line 1402, in external_script_action
    p.communicate(script)
  File "/usr/lib64/python2.7/subprocess.py", line 797, in communicate
    self.wait()
  File "/usr/lib64/python2.7/subprocess.py", line 1376, in wait
    pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
  File "/usr/lib64/python2.7/subprocess.py", line 478, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt

Different Return Codes

Investigate implementing different return codes depending on what is returned by the script

Example:

Run script with the following output:

  1. NO files have OOM issues - return code 1
  2. OOM Found in 1 file - return code 2
  3. Loads of OOM found (> 5) - return code 3

This allows automation tools to implement different "time saved" values depending on the scripts return code

oom in dmesg but not system logs

Dmesg reporting OOM issues but script reporting nothing

This is because dmesg is reporting very old oom incidents.
System has rotated and purged all logs since that report so there is nothing left in the log file.

Grab the oldest date (1st line in oldest compressed file) and print message to explain that its an old message, there are no incidents since $date

Report # of processes

Add functionality to report on the total number of processes recorded when the system ooms

'NoneType' object has no attribute 'endswith'

Here's the output I'm getting on Fedora 25:

Python 2.7.13

~/out-of-memory $ python oom-investigate.py 
----------------------------------------
      _____ _____ _____ 
     |     |     |     |
     |  |  |  |  | | | |
     |_____|_____|_|_|_|
     Out Of Memory Analyser

Disclaimer:
If system OOMs too viciously, there may be nothing logged!
Do NOT take this script as FACT, investigate further
----------------------------------------
Unsupported OS

Error:
'NoneType' object has no attribute 'endswith'

----------------------------------------

CentOS/RHEL 5 - No information provided

Doesnt provide information with there is an OOM incident with CentOS/RHEL 5

Although the system does not log in the same manner, more information can be provided.
Update script to provide a little bit more information

RHEL 5 - compatibility issue

File "<stdin>", line 80
    with open("/proc/meminfo", "r") as meminfo:
            ^
SyntaxError: invalid syntax

Note: RHEL5 and python 2.4.x are EOL. I will not go out of my way to accommodate for EOL OS's and python versions.

Date objects

Fix date objects so the sort works correctly rather than the current sorting via alphabetical

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.