Giter Club home page Giter Club logo

cuckooml's People

Contributors

ameily avatar botherder avatar deeso avatar dmaciejak avatar doomedraven avatar espenfjo avatar gtback avatar heipei avatar hughpearse avatar ikiril01 avatar init99 avatar jahrome avatar jamu avatar jbremer avatar jekil avatar killerinstinct avatar lehmz avatar mschloesser-r7 avatar nfllab avatar nickycm avatar pdelsante avatar r3comp1le avatar rep avatar robertsjw avatar rodionovd avatar so-cool avatar tankbusta avatar thorsten-sick avatar titotix avatar xayon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cuckooml's Issues

Import statement in virustotal.py

On line 21 in /lib/cuckoo/common/virustotal.py shoudn't it be

from modules.processing.cuckooml import Instance

instead of

from modules.cuckooml.cuckooml import Instance ?

Remove "unknown" OS label

"unknown" OS label needs to be removed in virustotal.py as it collides with "none" label in cuckooml.py.

Resolving abbreviated malware names

Right now the first mapping which is the longest string matched is used. To improve labelling all possible matches need to be considered and the most probable abbreviation combination i.e. the one that uses all of the sub-strings should be chosen.
For example "adload" right now will be split into "a" and "dload" with the latter mapped to downloader. A better split would be "ad" (adware) and "load" (downloader).

Useful malware features

The base of ML features for binaries analysed by Cuckoo is going to be inspired by Reviewer Integration and Performance Measurement for Malware Detection by B Miller et al (available here).
They name all kind of binary features both static and dynamic which seems a good starting point for this project:

  • static attributes:
    • binary metadata,
    • digital signing,
    • heuristic tools,
    • packer detection,
    • portable executable format,
    • static imports;
  • dynamic attributes:
    • dynamic imports, mutexes, processes,
    • filesystem operations,
    • network operations,
    • registry operations,
    • Windows API calls.

Once implemented they should be reviewed and revised with regard to usability for this project.

More useful *normalised* field in VirusTotal JSONs

VirusTotal supplies malware names which are simply not readable. Currently 'normalised' field generated by cuckoo and available in JSONs is not much of a use.
The goal is to create better normalised malware names which can then be used as labels for testing cuckooml clustering and classification.

Make CuckooML plotting dependant on library imports

In the try: import... create a global variable for all the libraries necessary for plotting and condition CuckooML plotting on that.
The result: no need to install plotting packages if you're only interested in malware analysis with textual output.

getting error running cuckoo.py --ml

I have tried to run cuckooml but always getting the following erorr! even tried to run your example in my IDE but still getting same error.

Traceback (most recent call last):
File "/home/ubuntu/Downloads/pycharm-community-2016.3.2/helpers/pydev/pydevd.py", line 1596, in
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/ubuntu/Downloads/pycharm-community-2016.3.2/helpers/pydev/pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/ubuntu/PycharmProjects/cuckooml/mltestcase.py", line 10, in
loader.load_binaries("/home/ubuntu/Downloads/cuckooml/sample_data/dict")
File "/home/ubuntu/PycharmProjects/cuckooml/modules/processing/cuckooml.py", line 1189, in load_binaries
self.binaries[f].label_sample()
File "/home/ubuntu/PycharmProjects/cuckooml/modules/processing/cuckooml.py", line 1305, in label_sample
merged_labels += self.scans[vendor]["normalized"][label_type]
TypeError: list indices must be integers, not str

Sorting in clustering_results.csv

Hi @So-Cool

The sorting issue in clustering_results.csv is as follows:
1,10..19,2,20..[sample end 62], 7,8,9

I'm currently trying to create my own ground truth labels list, which means I will have to account for that sorting mistake when creating my own list. I'm wondering whether the ground truth labels generated by CuckooML are in sync with the clustering results, i.e. are they subject to the same bug or does it only affect the one list?

cuckooml showcase

It seems worthwhile to create some kind of a cuckooml showcase that performs clustering on some real data and gives some comments on interpretation of the results; possibly including cuckooml package usage guidelines; maybe in an iPython Notebook format.

Only one set of features is used for clustering

Hi,

thanks for sharing this project! I am in the process of adding features to the nominal feature set. In that process I noticed that my changes were not taken into account in the clustering results, even though I specified nominal in the configuration. I believe the reason is that the code that handles the configuration settings is using an if... elif construct, which will lead to only choosing one set of features. Relevant code snippet is:

    # Select features                               
    selected_features = []                          
    sf = [i.strip() for i in cfg.cuckooml.features.split(",")]
    if "simple" in sf:
        selected_features.append(simple_features)
    elif "nominal" in sf:
        selected_features.append(features_nominal)
    elif "numerical" in sf:
        selected_features.append(features_numerical)

Reading in the data for analysis

The simplest solution is reading in the JSONs placed in the /storage directory. At later stages it might be worth developing something more natural.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.