Giter Club home page Giter Club logo

soscleaner's Introduction

SOSCleaner

Build Status Coverage Status PyPI version

Purpose

SOSCleaner is a tool to consistently obfuscate sensitive information in large datasets like Red Hat sosreports. It works on any data set, from 1 file to thousands.

Documentation

Documentation has been moved to ReadTheDocs.

https://soscleaner.readthedocs.io

Important Links

soscleaner's People

Contributors

arif-ali avatar bmr-cymru avatar jduncan-rva avatar rmkraus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soscleaner's Issues

unittest coverage

it would be nice to have some testing actually happening to confirm operation instead of just squinting one-eyed and staring at things.

soscleaner API

Previously brought up:

Some refactoring to make the substitution engine available as an API

This will take some doing and some thinking, but I want to keep tracking it.

Tracebacks encountered when running soscleaner against an unpacked sosreport

Is this user-error or a bug? :-) I encountered the following when running soscleaner against the directory 'sosreport' which contained an expanded sosreport:

[root@laptop 01308802]# soscleaner sosreport
12-10 10:51:43 soscleaner CONSOLE: Log File Created at /tmp/soscleaner-7990507535172329.log
12-10 10:51:43 soscleaner CONSOLE: soscleaner version 0.2.2
12-10 10:51:43 soscleaner WARNING: soscleaner is a tool to help obfuscate sensitive information from an existing sosreport.
12-10 10:51:43 soscleaner WARNING: Please review the content before passing it along to any third party.
12-10 10:51:43 soscleaner CONSOLE: Beginning SOSReport Extraction
12-10 10:51:47 soscleaner CONSOLE: Processing hosts file for better obfuscation coverage
12-10 10:51:47 soscleaner CONSOLE: IP Obfuscation Start Address - 10.230.230.1
12-10 10:51:47 soscleaner CONSOLE: *** SOSCleaner Processing ***
12-10 10:52:03 soscleaner CONSOLE: *** SOSCleaner Statistics ***
12-10 10:52:03 soscleaner CONSOLE: IP Addresses Obfuscated - 81
12-10 10:52:03 soscleaner CONSOLE: Hostnames Obfuscated - 18
12-10 10:52:03 soscleaner CONSOLE: Domains Obfuscated - 0
12-10 10:52:03 soscleaner CONSOLE: Total Files Analyzed - 2318
12-10 10:52:03 soscleaner CONSOLE: *** SOSCleaner Artifacts ***
12-10 10:52:03 soscleaner CONSOLE: Creating IP Report - /tmp/soscleaner-7990507535172329-ip.csv
12-10 10:52:03 soscleaner CONSOLE: Creating Hostname Report - /tmp/soscleaner-7990507535172329-hostname.csv
12-10 10:52:03 soscleaner CONSOLE: Creating Domainname Report - /tmp/soscleaner-7990507535172329-dn.csv
12-10 10:52:03 soscleaner CONSOLE: Creating SOSCleaner Archive - /tmp/soscleaner-7990507535172329.tar.gz
12-10 10:52:09 soscleaner ERROR: [Errno 2] No such file or directory: '/tmp/soscleaner-origin-7990507535172329'
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/soscleaner.py", line 350, in _clean_up
shutil.rmtree(self.origin_path)
File "/usr/lib64/python2.7/shutil.py", line 239, in rmtree
onerror(os.listdir, path, sys.exc_info())
File "/usr/lib64/python2.7/shutil.py", line 237, in rmtree
names = os.listdir(path)
OSError: [Errno 2] No such file or directory: '/tmp/soscleaner-origin-7990507535172329'
12-10 10:52:09 soscleaner CONSOLE: SOSCleaner Complete
[root@laptop 01308802]#

why does soscleaner need to be run as root?

usnavsmlb09.ndc.alcatel-lucent.com>> soscleaner s_2973_
07-28 10:08:20 soscleaner CONSOLE: Log File Created at /tmp/soscleaner-20140728150820.log
Traceback (most recent call last):
File "/usr/bin/soscleaner", line 27, in
main()
File "/usr/bin/soscleaner", line 24, in main
cleaner.clean_report(options, sosreport)
File "/usr/lib/python2.6/site-packages/soscleaner.py", line 523, in clean_report
self._check_uid() #make sure it's soscleaner is running as root
File "/usr/lib/python2.6/site-packages/soscleaner.py", line 67, in _check_uid
raise Exception("You Must Execute soscleaner As Root")

Exception: You Must Execute soscleaner As Root

I have this installed on a corporate IT server and want to use it as a regular user and do not have root perms.

soscleaner uses 'the other' python-magic bindings

# rpm -Uvh dist/soscleaner-0.1-1.noarch.rpm 
[sudo] password for breeves: 
Preparing...                          ################################# [100%]
Updating / installing...
   1:soscleaner-0.1-1                 ################################# [100%]
# soscleaner
Traceback (most recent call last):
  File "/bin/soscleaner", line 5, in <module>
    from SOSCleaner import SOSCleaner
  File "/usr/lib/python2.7/site-packages/SOSCleaner.py", line 27, in <module>
    from python_magic import magic
ImportError: No module named python_magic

soscleaner fails for reports without hostname

It's possible to run sosreport with the general module disabled. This means the resulting report will not contain a hostname link from the report root.

This causes soscleaner to fail with the following backtrace:

 # soscleaner sosreport-rhn-support-bmr-20140623175230-ec9d.tar.xz
06-26 13:24:31 SOSCleaner CONSOLE: Log File Created at /tmp/soscleaner-20140626122431.log
06-26 13:24:31 SOSCleaner CONSOLE: Beginning SOSReport Extraction
06-26 13:24:31 SOSCleaner CONSOLE: soscleaner - 0.1
06-26 13:24:31 SOSCleaner CONSOLE: soscleaner is a tool to help obfuscate sensitive information from an existing sosreport.
06-26 13:24:31 SOSCleaner CONSOLE: Please review the content before passing it along to any third party.
06-26 13:24:31 SOSCleaner ERROR: [Errno 2] No such file or directory: '/tmp/soscleaner-20140626122431/hostname'
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/SOSCleaner.py", line 314, in _get_hostname
    fh = open(hostfile, 'r')
IOError: [Errno 2] No such file or directory: '/tmp/soscleaner-20140626122431/hostname'
Traceback (most recent call last):
  File "/usr/bin/soscleaner", line 22, in <module>
    main()
  File "/usr/bin/soscleaner", line 18, in main
    cleaner = SOSCleaner(options, sosreport)
  File "/usr/lib/python2.6/site-packages/SOSCleaner.py", line 61, in __init__
    self.hostname, self.domainname, self.is_fqdn = self._get_hostname()
  File "/usr/lib/python2.6/site-packages/SOSCleaner.py", line 331, in _get_hostname
    raise Exception('GetHostname Error: Cannot resolve hostname from %s') % hostfile
TypeError: unsupported operand type(s) for %: 'exceptions.Exception' and 'str'

preserve IP address relationships when mapping

To increase the diagnostic value of the obfuscated data soscleaner should preserve IP address relationships (i.e. netmask and network vs. host portions).

I'm not 100% sure what such a mapping scheme needs to look like for the general case - it's probably worth running this past some networking folks to get their input.

allow for scanning of arbitrary files/dirs

right now the only thing that truly ties soscleaner to an sosreport is the hostname file. It would be nice to be able to specify a path to a file/directory/archive that is just something to be analyzed and not an sosreport and allow it to be scanned.

domain name not removed in some files

soscleaner-0.2.2-7

When run against an SOS report generated on a system on a domain of "sandbox.local" the following files still retained the domain "sandbox.local" in their contents:

./sos_commands/postfix/postconf: mydomain = sandbox.local
./etc/sysconfig/network-scripts/ifcfg-eth0: DOMAIN="sandbox.local"
./etc/resolv.conf: search sandbox.local

report extraction

When extracting the resultant cleansed sosreport, the directory structure looks like this:

tar xvf soscleaner-20131210165031.tar.gz

/tmp/tmp/soscleaner-20131210165031/

note the addition of tmp, I believe we were looking for /tmp/soscleaner-20131210165031/

hostname regex doesn't handle hyphens

line = 'Jun 24 15:37:44 dhcp-192-168-1-118 sshd[2821]: warning: /etc/hosts.allow, line 12: host name/name mismatch: dhcp-192-168-1-119.jeduncan.com != dhcp-192-168-1-119.jeduncan.com.jeduncan.com'

>>> cleaner._clean_line(line)
'Jun 24 15:37:44 host0 sshd[2821]: warning: /etc/hosts.allow, line 12: host name/name mismatch: dhcp-192-168-1-host1.example0.com != dhcp-192-168-1-host1.example0.comhost2.example0.com'
>>> cleaner._clean_line(line)
'Jun 24 15:37:44 host0 sshd[2821]: warning: /etc/hosts.allow, line 12: host name/name mismatch: dhcp-192-168-1-host1.example0.com != dhcp-192-168-1-host1.example0.comhost2.example0.com'

The culprit is the regex that is compiled to search for hostnames in https://github.com/jduncan-rva/soscleaner/blob/master/src/SOSCleaner.py#L269-270

regex = re.compile(r'\w*\.%s' % d)
hostnames = [each for each in regex.findall(line)] 

The regex needs to include all hostname-appropriate characters (a-Z0-9, hyphens,underscores)

hostnames not being scrubbed

hostnames, for example in /etc/hosts, are not being changed correctly. For example:

192.168.0.10 server1.mydom.com server1
192.168.0.11 server2.mydom.com server2

becomes:

10.230.230.9 host1.example.com server1
10.230.230.11 host2.example.com server2

"server1" and "server2" should be replaced to "host1" and "host2"

soscleaner backtraces without arguments

# soscleaner 
Traceback (most recent call last):
  File "/usr/bin/soscleaner", line 22, in <module>
    main()
  File "/usr/bin/soscleaner", line 17, in main
    sosreport = args[0] #grab the sosreport path
IndexError: list index out of range

Would be nice if it just displayed the usage message:

# soscleaner -h
Usage: soscleaner [-l-r] /path/to/sosreport

Options:
  -h, --help            show this help message and exit
  -l LOGLEVEL, --log_level=LOGLEVEL
                        The Desired Log Level (default = INFO) Options are
                        DEBUG, INFO, WARNING, ERROR
  -r, --reporting       Create CSV output for IP and Hostname databases
                        (enabled by default)

make uuid not time-based

right now when a new instance is created, it uses a timestamp to create a uuid.

at least during testing, and conceivable in the real world. this can cause issues if multiple instances are created in quick succession.

make the uuid a random, unique string of characters.

IP Engine Tracker Bug

to track issues that deal with refactoring the IP Substitution Engine for Release 0.3

Docs Tracker Bug

between the wiki, readme, man page, web pages, etc. the language should be consistent throughout so nothing is confusing.

some files are being dropped that shouldn't be

$ ls _/etc/_daily
ecdba1a-2014091112491410432552/etc/cron.daily:
logrotate makewhatis.cron mlocate.cron prelink readahead.cron rhsmd tmpwatch

soscleaner-1274885550442042/etc/cron.daily:
logrotate makewhatis.cron mlocate.cron prelink readahead.cron rhsmd

The 'tmpwatch' file in the full sosreport is missing in soscleaner output.

$ file tmpwatch
tmpwatch: Bourne shell script text executable

soscleaner breaks multicast addresses

Whether an address is multicast or not is significant diagnostic data. Currently soscleaner discards this when masking IPs:

-               mcastaddr: 239.255.1.1
+               mcastaddr: 10.230.230.82

We should map found mcast addresses to a mangled (but valid) mcast address.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.