Giter Club home page Giter Club logo

dis86's Introduction

Dis86

Dis86 is a decompiler for 16-bit real-mode x86 DOS binaries.

Purpose

Dis86 has been built for doing reverse-engineering work such as analyzing and re-implementing old DOS video games from the early 1990s. The project is a work-in-progress and the development team makes no guarantees it will work or be useful out-of-the-box for any applications other than their own. Features and improvements are made on-demand as needed.

Goals and Non-goals

Goals:

  • Support reverse-engineering 16-bit real-mode x86 DOS binaries
  • Generate code that is semantically correct (in so far as practical)
  • Generate code that integrates will with a hybrid-runtime system (Hydra) [currently unreleased]
  • Avoid making many assumptions or using heuristics that can lead to broken decompiled code
  • Be hackable and easy to extend as required
  • Automate away common manual transformations and let a human reverser focus on the subjective tasks a computer cannot do well (e.g. naming things)

Non-goals:

  • Output code beauty (semantic correctness is more important)
  • Re-compilable to equivalent binaries

Also, we generally prefer manual configuration/annotation tables to flawed heuristics that will generate incorrect code.

Discussion of Internals

Discussion of the internals will be published periodically on the author's blog: xorvoid

Building

Assuming you have rust and cargo installed:

just build

Some Commands

Emit Disassembly:

./target/debug/dis86 --config <your_config.bsl> --binary <raw_text_segment> --name <function_name> --emit-dis <output-file>

Emit initial Intermediate Representation (IR):

./target/debug/dis86 --config <your_config.bsl> --binary <raw_text_segment> --name <function_name> --emit-ir-initial <output-file>

Emit final (optimized) Intermediate Representation (IR):

./target/debug/dis86 --config <your_config.bsl> --binary <raw_text_segment> --name <function_name> --emit-ir-final <output-file>

Visualize the control-flow graph with graphviz:

./target/debug/dis86 --config <your_config.bsl> --binary <raw_text_segment> --name <function_name> --emit-graph /tmp/ctrlflow.dot
dot -Tpng /tmp/ctrlflow.dot > /tmp/control_flow_graph.png
open /tmp/control_flow_graph.png

Emit inferred higher-level control-flow structure:

./target/debug/dis86 --config <your_config.bsl> --binary <raw_text_segment> --name <function_name> --emit-ctrlflow <output-file>

Emit an Abstract Syntax Tree (AST):

./target/debug/dis86 --config <your_config.bsl> --binary <raw_text_segment> --name <function_name> --emit-ast <output-file>

Emit C code:

./target/debug/dis86 --config <your_config.bsl> --binary <raw_text_segment> --name <function_name> --emit-code <output-file>

Caveats & Limitations

Primary development goal is to support an ongoing reverse-engineering and reimplementation project. The decompiler is also designed to emit code that integrates well with a hybrid runtime system (called Hydra) that is used to run a partially-decompiled / reimplemented project. As such, uses that fall out of this scope have been unconsidered and may have numerous unknown issues.

Some specific known limitations:

  • The decompiler accepts only a flat binary region for the text segment. It doesn't handle common binary file-formats (e.g. MZ) at the moment.
  • Handling of many 8086 opcodes are unimplemented in the assembly->ir build step. Implementations are added as needed.
  • Handling of some IR ops are unimplemented in the ir->ast convert step. Implementations are added as needed.
  • Control-flow synthesis is limited to while-loops, if-stmts, and switch-stmts. If-else is unimplemented.
  • Block scheduling and placement is very unoptimal for more complicated control-flow.
  • ... and many more ...

Future Plans / Wishlist

Feature wishlist

  • Array accesses
  • Compound types (struct and unions)
  • Synthesizing struct/union member access
  • If-else statements
  • Pointer analysis and arithmetic
  • More "u16 pair -> u32" fusing
  • Improved type-aware IR
  • Less verbose output C code patterns for common operations (e.g. passing pointer as a function call arg)

Prehistoric versions

Dis86 began life as a simple disassembler and 1-to-1 instruction => C-statement decompiler that integrated well with the Hydra Runtime. Over time it gained complexity and it became difficult to implement more sophisticated transformations. So, it was rebuilt and rearchitected with a proper SSA IR.

The older versions remain in the repo under old/. In particular, old/v2 was much less sophisticated albeit more complete in terms of the input machine-code it could handle.

These versions remain as sometimes they are still useful when the latest version is missing some feature.

dis86's People

Contributors

xorvoid avatar

Stargazers

Andrea Boscarino avatar Nancy avatar Kenneth J Davis avatar Matt Barnard avatar Laura Kirsch avatar Robert Riebisch avatar Nikolai Wuttke-Hohendorf avatar Neuvieme Porte avatar Evan Richter avatar Media Explorer avatar Maximilien Noal avatar Larson T. avatar  avatar RVU avatar  avatar x0r avatar yibit avatar Shae Erisson avatar Mihai Todor avatar Max Bernstein avatar kai avatar Alec avatar Devin Smith avatar Sven Sackers avatar B. Reino avatar  avatar Rakuram avatar Michael Büsch avatar

Watchers

 avatar  avatar

Forkers

neuviemeporte

dis86's Issues

Add a license

I know at least one of your other projects (Forsp) uses MIT. I'd say that or UPL would be nice to add here if you want to have the license permissive. Personally, I still like copy-left licenses, but I know the trend lately has been more towards permissive licenses. Really though, about anything OSI approved would be appreciated.

Hi

Hi, it seems we doing similar things with reverse enginerring:
https://github.com/xor2003/libdosbox
I discovered a method to achieve full source code translation within a few weeks by comparing each translated instruction in an emulator.
https://github.com/xor2003/masm2c
You’ve taken it a step further by creating a decompiler. Perhaps it makes sense to merge our efforts.
Before finding your project, I was considering reusing the angr decompiler and https://github.com/albertan017/LLM4Decompile

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.