Giter Club home page Giter Club logo

fhightower / ioc-finder Goto Github PK

View Code? Open in Web Editor NEW
151.0 8.0 38.0 1.38 MB

Simple, effective, and modular package for parsing observables (indicators of compromise (IOCs), network data, and other, security related information) from text. It uses grammars rather than regexes which makes it more readable, maintainable, and hackable. Explore our interactive documentation here: https://hightower.space/ioc-finder/

License: GNU Lesser General Public License v3.0

Makefile 1.37% Python 98.28% Dockerfile 0.08% Shell 0.27%
indicators-of-compromise threat-intelligence threat-hunting threat-sharing grammar-parser threatintel cidr-range ipv4 iocs ioc-finder

ioc-finder's Introduction

IOC Finder

PyPi PyPI - Downloads CI Lint codecov License: LGPL v3 live demo

Parse indicators of compromise (also known as "observables" or "network data" - e.g. urls, email addresses, etc) from text.

๐Ÿ“– Documentation (it's interactive!)

๐Ÿ’ช I'm looking for sponsorship for this project. I have a number of improvements and helpful features I'd like to add and would appreciate some support as I invest this time and focus. If you use this project for in a commercial capacity and/or find it useful, please consider contributing even a small amount. Thanks!

ioc-finder's People

Contributors

dependabot[bot] avatar fhightower avatar frikky avatar nor3th avatar ptmcg avatar wesinator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ioc-finder's Issues

File path improvements

The grammar for file path needs some improvements... here are some known issues:

  • the grammar parses urls as a file name (https://twitter.com/ => s://twitter.com)
  • [ ]

Feature request: add a CLI executable

It would be great if you could add a CLI executable to the package.
My expectation is like this:

$ echo "This is just an example.com https://example.org/test/bingo.php" | ioc-finder
{
    "asns": [],
    "bitcoin_addresses": [],
    "complete_email_address": [],
    "cves": [],
    "domains": ["example.org", "example.com"],
    "email_addresses": [],
    "google_adsense_publisher_ids": [],
    "google_analytics_tracker_ids": [],
    "ipv4_cidrs": [],
    "ipv4s": [],
    "ipv6s": [],
    "md5s": [],
    "registry_key_paths": [],
    "sha1s": [],
    "sha256s": [],
    "sha512s": [],
    "urls": ["https://example.org/test/bingo.php"]
}

Parse user agents

I would like to add a basic grammar for common/commonly formatted user agents.

For example, here are the user agents from https://storage.googleapis.com/chronicle-research/DuQu%201.5%20A%20Ghost%20in%20the%20Wires.pdf:

Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.1)
Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20120403211507 Firefox/12.0
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.13 (KHTML, like Gecko)
Chrome/24.0.1290.1 Safari/537.13
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET
CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; msn
OptimizedIE8;ZHCN)
Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.7 (KHTML, like Gecko)
Chrome/16.0.912.75 Safari/535.7
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.8 (KHTML, like Gecko)
Chrome/17.0.940.0 Safari/535.8
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/4.0; InfoPath.2; SV1;
.NET CLR 2.0.50727; WOW64)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET
CLR 2.0.50727; Media Center PC 6.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729;
.NET4.0C)
Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 6.0; fr-FR)
Mozilla/5.0 (Windows NT 6.2; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1; .NET CLR 1.0.3705; Media Center
PC 3.1; Alexa Toolbar; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko)
Chrome/17.0.963.56 Safari/535.11
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko Firefox/11.0
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET
CLR 2.0.50727; InfoPath.2)
Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko)
Chrome/19.0.1061.1 Safari/536.3
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;
chromeframe/13.0.782.215)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET
CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Zune
4.0; InfoPath.3; MS-RTC LM 8; .NET4.0C; .NET4.0E)
Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko)
Chrome/20.0.1090.0 Safari/536.6
16
Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko)
Chrome/19.0.1062.0 Safari/536.3
Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322)
Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/ 20120405 Firefox/14.0.1
Mozilla/5.0 (compatible; MSIE 10.6; Windows NT 6.1; Trident/5.0; InfoPath.2; SLCC1;
.NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 2.0.50727) 3gpp-gba
UNTRUSTED/1.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120427 Firefox/15.0a1
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko)
Chrome/17.0.963.66 Safari/535.11
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.28) Gecko/20120306
Firefox/5.0.1
Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/535.7 (KHTML, like Gecko)
Chrome/16.0.912.36 Safari/535.7
Mozilla/4.0 (Windows; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
Mozilla/4.0 (compatible; MSIE 6.01; Windows NT 6.0)
Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.13 (KHTML, like Gecko)
Chrome/24.0.1290.1 Safari/537.13
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko)
Chrome/19.0.1055.1 Safari/535.24
Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET
CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Zune
3.0)
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1; .NET CLR 1.1.4322; InfoPath.1)
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1; Media Center PC 3.0; .NET CLR
1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1)
Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 6.0; en-US)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; Media
Center PC 6.0; InfoPath.3; MS-RTC LM 8; Zune 4.7
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.11 (KHTML, like Gecko)
Chrome/17.0.963.65 Safari/535.11
Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; FDM; SV1)
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.11 (KHTML, like Gecko)
Chrome/17.0.963.66 Safari/535.11
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR
2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Zune
4.0; Tablet PC 2.0; InfoPath.3; .NET4.0C; .NET4.0E)
Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/5.0
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko)
Chrome/19.77.34.5 Safari/537.1
Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko)
Chrome/19.0.1061.0 Safari/536.3
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; SLCC2; .NET CLR 2.0.50727;
.NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MS-RTC LM 8; .NET4.0C;
.NET4.0E; InfoPath.3)
Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; FDM; SV1; .NET CLR 3.0.04506.30)
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1; .NET CLR 1.1.4322; Alexa
Toolbar)
Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.11 (KHTML, like Gecko)
Chrome/17.0.963.66 Safari/535.11
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko)
Chrome/22.0.1216.0 Safari/537.2
Mozilla/5.0 (compatible; Windows; U; Windows NT 6.2; WOW64; en-US; rv:12.0)
Gecko/20120403211507 Firefox/12.0
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1; .NET CLR 1.1.4322; InfoPath.1;
.NET CLR 2.0.50727)
Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120403211507 Firefox/14.0.1
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; yie8)

Add option to differentiate between imphashes and md5s

I propose that we add an optional feature to make a best effort attempt to differentiate between an import hash and an md5.

I propose we do this by checking if the word "imphash" is near the hash.

Here are some examples I would like to be able to handle:

In [1]: import pefile

In [2]: file=pefile.PE('tasksche.exe')

In [3]: file.get_imphash()

Out[3]: '68f013d7437aa653a8a98a05807afeb1'

this example ^ may be difficult

SHA-256 093e394933c4545ba7019f511961b9a5ab91156cf791f45de074acad03d1a44a
Dropper imphash: 18ddf28a71089acdbab5038f58044c0a
C2 IP: 210.209.127.8:443
r imphash_>.

Searching the example imphash c948ebda9bd9367f9fc50e01020766c8 dropped by RTF
b2b8127bae5b61e258b17dc057338075 (24 / 51 on Virustotal April 11 2014) shows a
number of dropped samp
authentihash 3f1b149d07e7e8636636b8b7f7043c40ed64a10b28986181fb046c498432c2d4
imphash f34d5f2d4577ed6d9ceec516c1f5a744
The size of the file is 37.5 KB (38400 bytes)
imphash f34d5f2d4577ed6d9ceec516c1f5a744
imphash: f34d5f2d4577ed6d9ceec516c1f5a744
IMPHASH 9d4731894b0c35c21d30605c7440ae67

Improving URL grammar

Urls like the following are not parsed

http://wmfolcs3.pn.4y.nv.kr2x1dt.net/gz+/(y%40%26//%3c7aew%5cqv%0a/%0bcz,r/r%5c%7b/7re//6%3e/f%23%7ce0p'6_%09/d%5c

The url above is parsed as:

http://wmfolcs3.pn.4y.nv.kr2x1dt.net/gz+/(y

Registry key paths that contain whitespaces are truncated

Windows registry keys paths are truncated from the whitespace character onwards, as such full path is not displayed in the output. Below is how to replicate the error on Python 3.6.7.

from ioc_finder import find_iocs
text = "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\ConsoleIME"
iocs = find_iocs(text)
print(iocs['registry_key_paths'])
# output: ['HKLM\\SOFTWARE\\Microsoft\\Windows']

The error seems to occur because pyparsing grammars ignores whitespace by default, specifically in line 167 of ioc_grammers.py as shown below.

registry_key_subpath = OneOrMore(Word('\\') + Word(alphanums))

One quick workaround that I discovered when working with whitespaces and pyparsing is to use the White() class in pyparsing as shown below.

registry_key_subpath = OneOrMore(Word('\\') +  Combine(OneOrMore(Word(alphanums)| White(' ',max=1) + ~White())))

References:

Invalid IPv4 parsed from dotted date str

2018.12.15.14.05.43

t = "2018.12.15.14.05.43"
>>> iocs = ioc_finder.find_iocs(t)
>>> iocs['ipv4s']
['018.12.15.14']

Expected: Don't parse IPv4 from substring of dotted decimals, and leading 0 should negate IPv4 matching.

REQUEST FOR INPUT (yes, your input!): Match xmpp addresses as much as possible

I would like to differentiate email addresses from xmpp addresses (https://en.wikipedia.org/wiki/XMPP) as much as possible. To this end, I propose that we use certain keywords to differentiate the two. One keyword that comes to mind is "jabber". If something is parsed as an email address, but contains the word "jabber" in the domain name (e.g. [email protected]), I'm comfortable parsing that as an xmpp address rather than an email address.

As far as implementation is concerned, I propose that the ioc-finder library would differentiating between email addresses from xmpp addresses by default, but would provide the option to not do this.

I would love to hear your input on any/all of the following questions:

Questions

  • Are there any concerns with the overall goal of differentiating email addresses from xmpp addresses?
  • Are there any concerns with my approach (using keywords)?
  • Any other ways to go about differentiating email from xmpp addresses?
  • If we adopt the strategy described above, are there any other keywords we could use that would provide a high-confidence differentiation between email and xmpp addresses?

Implement parsing options

There should be options to:

  • not parse ip addresses from cidr ranges
  • not parse domain names/ip addresses from email addresses
  • not parse email/domain/ip addresses from urls

These should all be implemented as arguments for the find_iocs function.

Be able to parse registry keys as below

Taken from https://blog.talosintelligence.com/2019/03/threat-roundup-0322-0329.html:

<HKCU>\SOFTWARE\MICROSOFT\WINDOWS\CURRENTVERSION\EXPLORER\ADVANCED
Value Name: ShowSuperHidden
<HKCU>\SOFTWARE\MICROSOFT\WINDOWS\CURRENTVERSION\EXPLORER\ADVANCED
Value Name: HideFileExt
<HKCU>\SOFTWARE\MICROSOFT\WINDOWS\CURRENTVERSION\EXPLORER\ADVANCED
Value Name: SuperHidden
<HKLM>\SOFTWARE\WOW6432NODE\MICROSOFT\WINDOWS\CURRENTVERSION\EXPLORER\ADVANCED\FOLDER\HIDEFILEEXT
Value Name: DefaultValue
<HKLM>\SOFTWARE\MICROSOFT\WINDOWS\CURRENTVERSION\POLICIES\EXPLORER\RUN
Value Name: PC
<HKLM>\SOFTWARE\WOW6432NODE\MICROSOFT\WINDOWS\CURRENTVERSION\RUN
Value Name: avscan

Add option to differentiate between authentihashes and sha256s

Similar principle to #34 . I propose that we add an option to try to differentiate between authentihashes and sha256s by looking for the word "authentihash" around the hash.

ails -
MD5 10fe7cd6f81357e8349d545c973033dd
SHA1 89f6fdaad78285bd97234fec2f95b9ca06805d20
SHA256 2e76903c877088fb628337a9d520c57131daf2eb13a2114381c78b3f0fdd7d3b
ssdeep768: m5uN0g6 / voLsggvJ3tKFEXTPB7d0kAF10ttSwU9: JNF6 / QLB45wKXbBx0ki2tSwO
authentihash 3f1b149d07e7e8636636b8b7f7043c40ed64a10b28986181fb046c498432c2d4
imphash f34d5f2d4577ed6d9ceec516c1f5a744
The size of the file is 37.5 KB (38400 bytes)
Win32 EXE file type
DescriptionPE32 executable for MS Windows (GUI) Intel 80386 Mono / .Net assembly

MD5 09e5352837eeaf9318baca7f4b22dde5
SHA1 4eef6f43cc0b249d16673cdb5a3d204dbe97ac52
SHA256 d9507c83cde125a881c896b7988347db42e8864414706d0c5389c64a894e6feb
ssdeep1536: C + eXyv46Q ++ acFswTo5E8i9ZTN3xe6MB5 + VCfAk5SXdsmpE4mwU0: 3qyv47 / sw05F0N3xe52QAkEtsj43
authentihash 7158bf3f3f0d4944afda5ff6c8bb8c9bdb8d205e6de4688bd3c868439b5bc22d
imphash f34d5f2d4577ed6d9ceec516c1f5a744
The file size is 82.5 KB (84480 bytes)
Win32 EXE file type
DescriptionPE32 executable for MS Windows (GUI) Intel 80386 32-bit Mono / .Net assembly

MD5 6652ffce33c65eb83be94140aad3264f
SHA1 6a0df4ffde84321cca40908c485fff999b6c8625
SHA256 65491998cf8f574a0e176ba761e923674a19a226b2445b363c6781849359a7ff
ssdeep6144: Bnnx37f + gLHdEVyqEmMgyzj2jVvadglTaM5Wa8ik: BnnBEcqEmMgy + jVvcgl2eWt
authentihash 54566d0372b0dc82586dbcb0844ad5db54ddf90ad3150b491380d6d9fc3b537a
imphash f34d5f2d4577ed6d9ceec516c1f5a744
The size of the file is 194.5 KB (19916

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.