Giter Club home page Giter Club logo

signaturecandidatesautodiscover.py's Introduction

signatureCandidatesAutoDiscover.py Unlicensed work

wheel (GitLab) wheel (GHA via nightly.link) GitLab Build Status GitLab Coverage GitHub Actions](https://github.com/KOLANICH-tools/signatureCandidatesAutoDiscover.py/actions/) Libraries.io Status Code style: antiflash

We have moved to https://codeberg.org/KOLANICH-tools/signatureCandidatesAutoDiscover.py, grab new versions there.

Under the disguise of "better security" Micro$oft-owned GitHub has discriminated users of 1FA passwords while having commercial interest in success of FIDO 1FA specifications and Windows Hello implementation which it promotes as a replacement for passwords. It will result in dire consequencies and is competely inacceptable, read why.

If you don't want to participate in harming yourself, it is recommended to follow the lead and migrate somewhere away of GitHub and Micro$oft. Here is the list of alternatives and rationales to do it. If they delete the discussion, there are certain well-known places where you can get a copy of it. Read why you should also leave GitHub.


This is a tool that helps you to automatically discover signatures used in file formats and/or protocols using disassembly listings of the software and the dataset of the files used by it.

It relies on the following assumptions, causing the limitations of the tool:

  1. in order to create a valid file in a certain format using signatures software has to write the signature somewhere.
  2. the software is not obfuscated or packed and the decompiler/disassembler has done its work correctly
  3. the signature is usually 4 bytes, so uint32_t. 4 bytes give enough low probability of false identification of file format.
  4. when using in-memory structures, including memory-mapped files the signature is usually aligned within ith own struct (it may be not aligned relative to root struct base). It makes appending it easier.
  5. when reading signature from files using stream API (fread and so on) it is usually convenient for a programmer to read the block as whole rather than read it byte-by-byte in a random order.
  6. when comparing/writing signatures read this way the compiler will optimize compares and writes by using the corresponding integer types.
  7. the compiler will put the signatures into immediate values into the instructions
  8. signatures should have low probability to occur by chance.

So the principle of the tool is simple:

  1. Read the disassembly/decompilation of the software and identify the instructions doing 4-byte integer assignments and comparisons. Collect their operands.
  2. Because certain low-entropy integers like 0x000000FF will likely occur by chance, filter them out heuristically.
  3. Check the presence of the remaining candidates within files, count occurences, print the listing.
  4. Remove the integers seen only once within the dataset.
  5. Print the rest as a nice table.

How to use

  1. Get prior knowledge that the format in question uses signatures.
  2. Create a dataset of files containing the signatures.
  3. Collect enough different implemetations of the software dealing with the format. Disassemble and/or decompile it with retdec or other decompiler.
  4. Execute the tool within the directory with decompilation results, providing it with the glob expression to the files containing the data.
  5. The tool will give you the list of signature candidates with their counts of occurences within the dataset and different representations convenient for grepping within disassembly listings and decompilation results.

signaturecandidatesautodiscover.py's People

Contributors

kolanich avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.