Giter Club home page Giter Club logo

barf-project's Introduction

BARF : Binary Analysis and Reverse engineering Framework

Build Status

The analysis of binary code is a crucial activity in many areas of the computer sciences and software engineering disciplines ranging from software security and program analysis to reverse engineering. Manual binary analysis is a difficult and time-consuming task and there are software tools that seek to automate or assist human analysts. However, most of these tools have several technical and commercial restrictions that limit access and use by a large portion of the academic and practitioner communities. BARF is an open source binary analysis framework that aims to support a wide range of binary code analysis tasks that are common in the information security discipline. It is a scriptable platform that supports instruction lifting from multiple architectures, binary translation to an intermediate representation, an extensible framework for code analysis plugins and interoperation with external tools such as debuggers, SMT solvers and instrumentation tools. The framework is designed primarily for human-assisted analysis but it can be fully automated.

The BARF project includes BARF and related tools and packages. So far the project is composed of the following items:

  • BARF : A multiplatform open source Binary Analysis and Reverse engineering Framework
  • PyAsmJIT : A JIT for the Intel x86_64 and ARM architecture.
  • Tools built upon BARF:
    • BARFgadgets : Lets you search, classifiy and verify ROP gadgets inside a binary program.
    • BARFcfg : Lets you recover the control-flow graph of the functions of a binary program.
    • BARFcg : Lets you recover the call graph of the functions of a binary program.

For more information, see:

  • BARF: A multiplatform open source Binary Analysis and Reverse engineering Framework (Whitepaper) [en]
  • BARFing Gadgets (ekoparty2014 presentation) [es]

Current status:

Latest Release v0.6.0
URL https://github.com/programa-stic/barf-project/releases/tag/v0.6.0
Change Log https://github.com/programa-stic/barf-project/blob/v0.6.0/CHANGELOG.md

All packages were tested on Ubuntu 16.04 (x86_64).

BARF

BARF is a Python package for binary analysis and reverse engineering. It can:

  • Load binary programs in different formats (ELF, PE, etc),
  • It supports the Intel x86 architecture for 32 and 64 bits,
  • It supports the ARM architecture for 32 bits,
  • It operates on an intermediate language (REIL) thus all analysis algorithm are architecture-agnostic,
  • It has integration with Z3 and CVC4 SMT solvers which means that you can express fragments of code as formulae and check restrictions on them.

It is currently under development.

Installation

BARF depends on the following SMT solvers:

  • Z3 : A high-performance theorem prover being developed at Microsoft Research.
  • CVC4 : An efficient open-source automatic theorem prover for satisfiability modulo theories (SMT) problems.

The following command installs BARF on your system:

$ sudo python setup.py install

You can also install it locally:

$ sudo python setup.py install --user

Notes

  • Only one SMT solver is needed in order to work. You may choose between Z3 and CVC4 or install both.
  • To run some tests you need to install PyAsmJIT first: sudo pip install pyasmjit
  • You may need to install Graphviz: sudo apt-get install graphviz

Quickstart

This is a very simple example which shows how to open a binary file and print each instruction with its translation to the intermediate language (REIL).

from barf import BARF

# Open binary file.
barf = BARF("examples/misc/samples/bin/branch4.x86")

# Print assembly instruction.
for addr, asm_instr, reil_instrs in barf.translate():
    print("{:#x} {}".format(addr, asm_instr))

    # Print REIL translation.
    for reil_instr in reil_instrs:
        print("\t{}".format(reil_instr))

We can also recover the CFG and save it to a .dot file.

# Recover CFG.
cfg = barf.recover_cfg()

# Save CFG to a .dot file.
cfg.save("branch4.x86_cfg")

We can check restrictions on code using a SMT solver. For instance, suppose you have the following code:

 80483ed:       55                      push   ebp
 80483ee:       89 e5                   mov    ebp,esp
 80483f0:       83 ec 10                sub    esp,0x10
 80483f3:       8b 45 f8                mov    eax,DWORD PTR [ebp-0x8]
 80483f6:       8b 55 f4                mov    edx,DWORD PTR [ebp-0xc]
 80483f9:       01 d0                   add    eax,edx
 80483fb:       83 c0 05                add    eax,0x5
 80483fe:       89 45 fc                mov    DWORD PTR [ebp-0x4],eax
 8048401:       8b 45 fc                mov    eax,DWORD PTR [ebp-0x4]
 8048404:       c9                      leave
 8048405:       c3                      ret

And you want to know what values you have to assign to memory locations ebp-0x4, ebp-0x8 and ebp-0xc in order to obtain a specific value in eax register after executing the code.

First, we add the instructions to the analyzer component.

from barf import BARF

# Open ELF file
barf = BARF("examples/misc/samples/bin/constraint1.x86")

# Add instructions to analyze.
for addr, asm_instr, reil_instrs in barf.translate(0x80483ed, 0x8048401):
    for reil_instr in reil_instrs:
        barf.code_analyzer.add_instruction(reil_instr)

Then, we generate expressions for each variable of interest and add the desired restrictions on them.

ebp = barf.code_analyzer.get_register_expr("ebp", mode="post")

# Preconditions: set range for variable a and b
a = barf.code_analyzer.get_memory_expr(ebp-0x8, 4, mode="pre")
b = barf.code_analyzer.get_memory_expr(ebp-0xc, 4, mode="pre")

for constr in [a >= 2, a <= 100, b >= 2, b <= 100]:
    barf.code_analyzer.add_constraint(constr)

# Postconditions: set desired value for the result
c = barf.code_analyzer.get_memory_expr(ebp-0x4, 4, mode="post")

for constr in [c >= 26, c <= 28]:
    barf.code_analyzer.add_constraint(constr)

Finally, we check is the restrictions we establish can be resolved.

if barf.code_analyzer.check() == 'sat':
    print("[+] Satisfiable! Possible assignments:")

    # Get concrete value for expressions
    a_val = barf.code_analyzer.get_expr_value(a)
    b_val = barf.code_analyzer.get_expr_value(b)
    c_val = barf.code_analyzer.get_expr_value(c)

    # Print values
    print("- a: {0:#010x} ({0})".format(a_val))
    print("- b: {0:#010x} ({0})".format(b_val))
    print("- c: {0:#010x} ({0})".format(c_val))

    assert a_val + b_val + 5 == c_val
else:
    print("[-] Unsatisfiable!")

You can see these and more examples in the examples directory.

Overview

The framework is divided in three main components: core, arch and analysis.

Core

This component contains essential modules:

  • REIL: Provides definitions for the REIL language. It, also, implements an emulator and a parser.
  • SMT: Provides means to interface with Z3 and CVC4 SMT solver. Also, it provides functionality to translate REIL instructions to SMT expressions.
  • BI: The Binary Interface module is responsible for loading binary files for processing (it uses PEFile and PyELFTools.)

Arch

Each supported architecture is provided as a subcomponent which contains the following modules.

  • Architecture: Describes the architecture, i.e., registers, memory address size.
  • Translator: Provides translators to REIL for each supported instruction.
  • Disassembler: Provides disassembling functionalities (it uses Capstone.)
  • Parser: Transforms instruction from string to object form.

Analysis

So far this component consists of modules: Control-Flow Graph, Call Graph and Code Analyzer. The first two, provides functionality for CFG and CG recovery, respectively. The latter, its a high-level interface to the SMT solver related functionality.

Tools

BARFgadgets

BARFgadgets is a Python script built upon BARF that lets you search, classifiy and verify ROP gadgets inside a binary program. The search stage finds all ret-, jmp- and call-ended gadgets inside the binary. The classification stage classifies previously found gadgets according to the following types:

  • No-Operation,
  • Move Register,
  • Load Constant,
  • Arithmetic/Logical Operation,
  • Load Memory,
  • Store Memory,
  • Arithmetic/Logical Load,
  • Arithmetic/Logical Store and
  • Undefined.

This is done through instruction emulation. Finally, the verification stage consists of using a SMT solver to verify the semantic assigned to each gadget in the second stage.

usage: BARFgadgets [-h] [--version] [--bdepth BDEPTH] [--idepth IDEPTH] [-u]
                   [-c] [-v] [-o OUTPUT] [-t] [--sort {addr,depth}] [--color]
                   [--show-binary] [--show-classification] [--show-invalid]
                   [--summary SUMMARY] [-r {8,16,32,64}]
                   filename

Tool for finding, classifying and verifying ROP gadgets.

positional arguments:
  filename              Binary file name.

optional arguments:
  -h, --help            show this help message and exit
  --version             Display version.
  --bdepth BDEPTH       Gadget depth in number of bytes.
  --idepth IDEPTH       Gadget depth in number of instructions.
  -u, --unique          Remove duplicate gadgets (in all steps).
  -c, --classify        Run gadgets classification.
  -v, --verify          Run gadgets verification (includes classification).
  -o OUTPUT, --output OUTPUT
                        Save output to file.
  -t, --time            Print time of each processing step.
  --sort {addr,depth}   Sort gadgets by address or depth (number of
                        instructions) in ascending order.
  --color               Format gadgets with ANSI color sequences, for output
                        in a 256-color terminal or console.
  --show-binary         Show binary code for each gadget.
  --show-classification
                        Show classification for each gadget.
  --show-invalid        Show invalid gadget, i.e., gadgets that were
                        classified but did not pass the verification process.
  --summary SUMMARY     Save summary to file.
  -r {8,16,32,64}       Filter verified gadgets by operands register size.

For more information, see README.

BARFcfg

BARFcfg is a Python script built upon BARF that lets you recover the control-flow graph of a binary program.

usage: BARFcfg [-h] [-s SYMBOL_FILE] [-f {txt,pdf,png,dot}] [-t]
               [-d OUTPUT_DIR] [-b] [--show-reil]
               [--immediate-format {hex,dec}] [-a | -r RECOVER]
               filename

Tool for recovering CFG of a binary.

positional arguments:
  filename              Binary file name.

optional arguments:
  -h, --help            show this help message and exit
  -s SYMBOL_FILE, --symbol-file SYMBOL_FILE
                        Load symbols from file.
  -f {txt,pdf,png,dot}, --format {txt,pdf,png,dot}
                        Output format.
  -t, --time            Print process time.
  -d OUTPUT_DIR, --output-dir OUTPUT_DIR
                        Output directory.
  -b, --brief           Brief output.
  --show-reil           Show REIL translation.
  --immediate-format {hex,dec}
                        Output format.
  -a, --recover-all     Recover all functions.
  -r RECOVER, --recover RECOVER
                        Recover specified functions by address (comma
                        separated).

BARFcg

BARFcg is a Python script built upon BARF that lets you recover the call graph of a binary program.

usage: BARFcg [-h] [-s SYMBOL_FILE] [-f {pdf,png,dot}] [-t] [-a | -r RECOVER]
              filename

Tool for recovering CG of a binary.

positional arguments:
  filename              Binary file name.

optional arguments:
  -h, --help            show this help message and exit
  -s SYMBOL_FILE, --symbol-file SYMBOL_FILE
                        Load symbols from file.
  -f {pdf,png,dot}, --format {pdf,png,dot}
                        Output format.
  -t, --time            Print process time.
  -a, --recover-all     Recover all functions.
  -r RECOVER, --recover RECOVER
                        Recover specified functions by address (comma
                        separated).

PyAsmJIT

PyAsmJIT is a Python package for x86_64/ARM assembly code generation and execution.

This package was developed in order to test BARF instruction translation from x86_64/ARM to REIL. The main idea is to be able to run fragments of code natively. Then, the same fragment is translated to REIL and executed in a REIL VM. Finally, both final contexts (the one obtained through native execution and the one from emulation) are compare for differences.

For more information, see PyAsmJIT.

License

The BSD 2-Clause License. For more information, see LICENSE.

barf-project's People

Contributors

adrianherrera avatar cnheitman avatar jmorse avatar lieanu avatar seraphime avatar soundslocke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

barf-project's Issues

Error in smttranslator.py

I found the following error in smttranslator.py.

 File "/usr/local/lib/python2.7/dist-packages/barf-0.2-py2.7.egg/barf/core/smt/smttranslator.py", line 689, in _translate_sext
    expr = (op1_var == smtlibv2.SEXTEND(op1_var, op3_var))
TypeError: SEXTEND() takes exactly 3 arguments (2 given)

Improve CFG recovery - Process Symbol Tables

The CFG recovery functionality uses symbol information to correctly generate the control flow graph of a specific function, however, this information is not automatically extracted from the binary being processed (it has to be done manually). The goal is to extract symbol information automatically once a binary is loaded so it can be used by the different analysis module such as the basicblock (which implement the CFG recovery functionality).

barf.arch.arm.disassembler.CapstoneOperandNotSupported: Instruction: ldcvc p5, c15,

File "deflat.py", line 89, in main
    cfg = barf.recover_cfg(start=start)
  File "D:\Users\Root\AppData\Local\Programs\Python\Python37\lib\site-packages\barf-0.6.0-py3.7.egg\barf\barf.py", line 308, in recover_cfg
    cfg, _ = self._recover_cfg(start=start, end=end, symbols=symbols, callback=callback)
  File "D:\Users\Root\AppData\Local\Programs\Python\Python37\lib\site-packages\barf-0.6.0-py3.7.egg\barf\barf.py", line 375, in _recover_cfg
    bbs, calls = self.bb_builder.build(start_addr, end_addr, symbols)
  File "D:\Users\Root\AppData\Local\Programs\Python\Python37\lib\site-packages\barf-0.6.0-py3.7.egg\barf\analysis\graphs\controlflowgraph.py", line 450, in build
    return self.strategy.build(start, end, symbols)
  File "D:\Users\Root\AppData\Local\Programs\Python\Python37\lib\site-packages\barf-0.6.0-py3.7.egg\barf\analysis\graphs\controlflowgraph.py", line 221, in build
    bbs = self._recover_bbs(start, end, symbols)
  File "D:\Users\Root\AppData\Local\Programs\Python\Python37\lib\site-packages\barf-0.6.0-py3.7.egg\barf\analysis\graphs\controlflowgraph.py", line 364, in _recover_bbs
    bb = self._disassemble_bb(addr, end + 0x1, symbols)
  File "D:\Users\Root\AppData\Local\Programs\Python\Python37\lib\site-packages\barf-0.6.0-py3.7.egg\barf\analysis\graphs\controlflowgraph.py", line 294, in _disassemble_bb
    asm = self._disasm.disassemble(data_chunk, addr)
  File "D:\Users\Root\AppData\Local\Programs\Python\Python37\lib\site-packages\barf-0.6.0-py3.7.egg\barf\arch\arm\disassembler.py", line 211, in disassemble
    instr = self._cs_translate_insn(disasm)
  File "D:\Users\Root\AppData\Local\Programs\Python\Python37\lib\site-packages\barf-0.6.0-py3.7.egg\barf\arch\arm\disassembler.py", line 357, in _cs_translate_insn
    operands = [self.__cs_translate_operand(op, cs_insn) for op in cs_insn.operands]
  File "D:\Users\Root\AppData\Local\Programs\Python\Python37\lib\site-packages\barf-0.6.0-py3.7.egg\barf\arch\arm\disassembler.py", line 357, in <listcomp>
    operands = [self.__cs_translate_operand(op, cs_insn) for op in cs_insn.operands]
  File "D:\Users\Root\AppData\Local\Programs\Python\Python37\lib\site-packages\barf-0.6.0-py3.7.egg\barf\arch\arm\disassembler.py", line 352, in __cs_translate_operand
    raise CapstoneOperandNotSupported(error_msg)
barf.arch.arm.disassembler.CapstoneOperandNotSupported: Instruction: ldcvc p5, c15, [ip, #-0x2b4]. Unknown operand type: 65

BARF fails to disassemble /bin/ls

Using recover_cfg.py to disassemble /bin/ls fails:

[+] Recovering program CFG...
Traceback (most recent call last):
File "./recover_cfg.py", line 28, in <module>
  cfg = barf.recover_cfg()
File "/home/g/Codigo/barf-project/barf/barf/barf.py", line 294, in recover_cfg
  bb_list = self.bb_builder.build(start_addr, end_addr)
File "/home/g/Codigo/barf-project/barf/barf/analysis/basicblock/basicblock.py", line 368, in build 
  bbs = self._find_candidate_bbs(start_address, end_address)
File "/home/g/Codigo/barf-project/barf/barf/analysis/basicblock/basicblock.py", line 426, in _find_candidate_bbs
  bb = self._disassemble_bb(curr_addr, end_address + 0x1)
File "/home/g/Codigo/barf-project/barf/barf/analysis/basicblock/basicblock.py", line 553, in _disassemble_bb
  ir = self._ir_trans.translate(asm)
File "/home/g/Codigo/barf-project/barf/barf/arch/x86/x86translator.py", line 304, in translate 
  check_operands_size(instr, self._arch_info.architecture_size)
File "/home/g/Codigo/barf-project/barf/barf/arch/x86/x86translator.py", line 193, in check_operands_size
"Invalid operands size: %s" % instr
  AssertionError: Invalid operands size: stm   [QWORD rdi, EMPTY, DWORD t2999]

where /bin/ls is a ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=64d095bc6589dd4bfbf1c6d62ae985385965461b, stripped

x86 recover_cfg.py fails with TypeError: unsupported operand type(s)

I have just finished installing the latest version of barf and was trying the example scripts. When I ran the recover_cfg.py script (all other x86 scripts worked) barf would error out with:

$> cat barf.log 
2018-04-12 10:26:45,874: barf.barf:INFO: [+] BARF: Initializing...
2018-04-12 10:26:45,879: smtlibv2:DEBUG: >(set-option :global-decls false)
2018-04-12 10:26:45,880: smtlibv2:DEBUG: >(set-logic QF_AUFBV)
2018-04-12 10:26:45,880: smtlibv2:DEBUG: >(declare-fun MEM_0 () (Array (_ BitVec 64) (_ BitVec 8)))
2018-04-12 10:26:45,884: smtlibv2:DEBUG: >(set-option :global-decls false)
2018-04-12 10:26:45,884: smtlibv2:DEBUG: >(set-logic QF_AUFBV)
2018-04-12 10:26:45,884: smtlibv2:DEBUG: >(declare-fun MEM_0 () (Array (_ BitVec 64) (_ BitVec 8)))
2018-04-12 10:26:45,923: barf.analysis.basicblock.basicblock:ERROR: Failed to save basic block graph: ~/barf-project/examples/bin/x86/branch4_cfg (dot)
Traceback (most recent call last):
  File "~/barf/local/lib/python2.7/site-packages/barf-0.2.1-py2.7.egg/barf/analysis/basicblock/basicblock.py", line 415, in save
dot_graph.write("{}.{}".format(filename, format), format=format)
  File "build/bdist.linux-x86_64/egg/pydot.py", line 1756, in write
s = self.create(prog, format, encoding=encoding)
  File "build/bdist.linux-x86_64/egg/pydot.py", line 1836, in create
self.write(tmp_name, encoding=encoding)
  File "build/bdist.linux-x86_64/egg/pydot.py", line 1750, in write
s = self.to_string()
  File "build/bdist.linux-x86_64/egg/pydot.py", line 1492, in to_string
graph.append( node.to_string()+'\n' )
  File "build/bdist.linux-x86_64/egg/pydot.py", line 623, in to_string
node += ' [' + node_attr + ']'
TypeError: unsupported operand type(s) for +=: 'long' and 'str'

I am trying to run this on Ubuntu 16.04:
Linux devlinux 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Add minimal support for missing instructions

I made a short list of unsopported instructions using the serial testcases, with examples:

  • bt eax, edx
  • movsx eax, al
  • cmpxchg dword ptr [edi], ecx

These instructions (probably) should be supported to obtain correct resuts in our testcases.

Exception: Error loading ELF file

barf = BARF("/home/user/code")
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.8/dist-packages/barf-0.6.0-py3.8.egg/barf/barf.py", line 90, in init
self.open(filename)
File "/usr/local/lib/python3.8/dist-packages/barf-0.6.0-py3.8.egg/barf/barf.py", line 211, in open
self.binary = BinaryFile(filename)
File "/usr/local/lib/python3.8/dist-packages/barf-0.6.0-py3.8.egg/barf/core/binary.py", line 160, in init
self._open(filename)
File "/usr/local/lib/python3.8/dist-packages/barf-0.6.0-py3.8.egg/barf/core/binary.py", line 229, in _open
self._open_elf(filename)
File "/usr/local/lib/python3.8/dist-packages/barf-0.6.0-py3.8.egg/barf/core/binary.py", line 279, in _open_elf
raise Exception("Error loading ELF file.")
Exception: Error loading ELF file.

code:
https://pastebin.com/HeRaWfUc

Not installable via `pip`

Currently there's no way to make the BARF project available to another project as a requirement via PyPI, or to install it for use via pip. The process is generally painless.

Both pybfd and capstone are available by via pip install, so those dependencies can be automatically installed. Z3 is not yet available in this manner, but I've opened an issue with the project to make it available. In any case, it can be left as a manual dependency.

Would you be interested in implementing this functionality to make BARF more easily usable from external projects? In particular, I'm looking at integrating it with binjitsu.

CFG recovery issue with noreturn functions

If a noreturn function occurs in a function for which the CFG is being created, the traversing process isn't stopped when it encounters a noreturn function. After a noreturn function, there might be data entries (i.e. dd) which can't be disassembled, so this may lead to unknown behaviors.

For example in the binary file attached, in function foo we have a __stack_chk_fail function call which is noreturn. CFG recovery in this case goes through the next function and yields a weird graph.

Another issue is that in that function there's an instruction which pops on r15 (i.e. pc). In ARM, stack pop on r15 means function return if its corresponding push was r14 (i.e. link register).

By the way, to build CFG graphs I can only supply the start address and can't determine the address at which a function ends, so CFG recovery functionality should be smart enough to correctly ends at the end of a function.

test-bin.zip

Support PySMT

Current support for SMT solvers interaction is provided by a Python module taken from PySymEmu with custom modifications. It only supports Z3 and CVC4 solvers. The goal is to replace the aforementioned module with the package PySMT which supports multiple solvers very easily.

Unable to open PE files

Hi,

I am unable to load pe files. The reason is that pe.sections[section_idx].get_data() always returns null. But after going through pefile library, I fixed the problem by givng the virtualaddress as the parameter for get_data().
This is the fix:
self._section_text = pe.sections[section_idx].get_data(pe.sections[section_idx].VirtualAddress)

This hack works fine for me now . I don't know whether this is correct?

Thanks.

Regards,
Maggie

Support MIPS architecture

MIPS is a very well known architecture used in multiple devices. The goal is to provide support for the architecture.

Details are unavailable (CS_ERR_DETAIL)

I've installed BARF but I'm not sure if it's installed correctly. I run the following code for a binary shared object file (.so), but it encounters an error while it's translating.

from barf import BARF

# Open binary file.
barf = BARF('bin/tests/test1.so')

# Print assembly instruction.
for addr, asm_instr, reil_instrs in barf.translate():
    print("0x{addr:08x} {instr}".format(addr=addr, instr=asm_instr))

    # Print REIL translation.
    for reil_instr in reil_instrs:
        print("{indent:11s} {instr}".format(indent="", instr=reil_instr))

Output:

Couldn't import dot_parser, loading of dot files will not be possible.
0x00000e98 moval r4, r0
            str   [DWORD r0, EMPTY, DWORD r4]
            and   [DWORD r0, DWORD 0xffffffff, DWORD t1]
            bisz  [DWORD t1, EMPTY, BIT zf]
            bsh   [DWORD r0, DWORD 0xffffffe1, DWORD t2]
            and   [DWORD t2, DWORD 0x1, BIT t3]
            str   [BIT t3, EMPTY, BIT nf]
0x00000e9a bal #0x9dc
            jcc   [BIT 0x1, EMPTY, POINTER 0x9dc00]
0x00000e9c moval r0, r0
            str   [DWORD r0, EMPTY, DWORD r0]
            and   [DWORD r0, DWORD 0xffffffff, DWORD t4]
            bisz  [DWORD t4, EMPTY, BIT zf]
            bsh   [DWORD r0, DWORD 0xffffffe1, DWORD t5]
            and   [DWORD t5, DWORD 0x1, BIT t6]
            str   [BIT t6, EMPTY, BIT nf]
0x00000e9e bal #0xfc0
            jcc   [BIT 0x1, EMPTY, POINTER 0xfc000]
Traceback (most recent call last):
  File "/home/.../main.py", line 15, in <module>
    for addr, asm_instr, reil_instrs in barf.translate():
  File "/usr/local/lib/python2.7/dist-packages/barf-0.2-py2.7.egg/barf/barf.py", line 195, in translate
    for addr, asm, _ in self.disassemble(start_addr, end_addr):
  File "/usr/local/lib/python2.7/dist-packages/barf-0.2-py2.7.egg/barf/barf.py", line 217, in    disassemble
    asm = self.disassembler.disassemble(self.text_section[start:end], curr_addr)
  File "/usr/local/lib/python2.7/dist-packages/barf-0.2-py2.7.egg/barf/arch /arm/armdisassembler.py", line 203, in disassemble
    instr = self._cs_translate_insn(disasm)
  File "/usr/local/lib/python2.7/dist-packages/barf-0.2-py2.7.egg/barf/arch/arm/armdisassembler.py", line 348, in _cs_translate_insn
    operands = [self.__cs_translate_operand(op, cs_insn) for op in cs_insn.operands]
  File "/usr/lib/python2.7/dist-packages/capstone/__init__.py", line 541, in __getattr__
    raise CsError(CS_ERR_DETAIL)
 capstone.CsError: Details are unavailable (CS_ERR_DETAIL)

Process finished with exit code 1

However, BARF runs smoothly with no problems on the toys supplied in the repo.

More Info:

  • The binary is in ARM32, mode:ARM
  • pip list | grep capstone -> capstone (3.0.4)
  • python-capstone 3.0.4 is installed
  • libcapstone3 and libcapstone-dev are installed
  • PyBFD is ok.

Error in barf/analysis/gadgets/verifier.py

I am doing my project on the basis of BARF, and I encountered this problem.
AttributeError: 'CodeAnalyzer' object has no attribute 'get_memory_curr'.
I review the code and discover the class 'CodeAnalyzer' have the method 'get_memory' instead of 'get_memory_curr', and it can be discovered in barf/analysis/codeanalyzer/codeanalyzer.py.
The method '_get_constrs_no_operation' in class 'GadgetVerifier' located in 122L, barf/analysis/codeanalyzer/verifier.py calls this method. The commented code in 121L seems to be correct.

When I use "barf = BARF(filename)", cause an error

When I use "barf = BARF(filename)", get an error:

File "C:\python27-x64\Scripts\z3-script.py", line 11, in
load_entry_point('z3==0.2.0', 'console_scripts', 'z3')()
File "c:\python27-x64\lib\site-packages\pkg_resources_init_.py", line 565, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "c:\python27-x64\lib\site-packages\pkg_resources_init_.py", line 2631, in load_entry_point
return ep.load()
File "c:\python27-x64\lib\site-packages\pkg_resources_init_.py", line 2291, in load
return self.resolve()
File "c:\python27-x64\lib\site-packages\pkg_resources_init_.py", line 2297, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "c:\python27-x64\lib\site-packages\z3\snap.py", line 14, in
from z3.config import get_config
ImportError: No module named config

My Environment: Windows10, Z3 version: 0.2.0, barf version: 0.5.0
Thanks!

Add support for REIL extensions.

Add support for the instructions below (based on this implementation) in order to ease the translation process.

REIL Extensions.

LSHL: Logical left shift.
LSHR: Logical right shift.
ASHR: Arithmetic right shift.
SDIV: Signed division.
SMUL: Signed multiplication.
SEXT: Sign extension.
SYS: Transition between user and supervisor level code.
BISNZ: Comparison for non-zero value.
EQU: Comparison for equality.

Implement Optional instruction flags

Implement Optional instruction flags instead of home-brewn REIL instructions.

For example, the implemented RET instruction could be substituted by an JCC instruction with the optional IOPT_RET flag implemented in openREIL.

I acknowledge the work done on this REIL implementation, but this could increase the interoperability of the two projects and may help to make REIL a more popular IR.

Gadget not correctly verified?

Is there a reason why the following assembly code would not be a valid store memory gadget? The classification stage picked it up but it does not verify as a valid store memory gadget. Is this a bug or am I missing something?

mov dword ptr [rax], esi ; ret 

BARF fails to disassemble /bin/true

Using recover_cfg.py to disassemble /bin/true fails:

[+] Recovering program CFG...
[-] Index out of range : 0x40133f
Traceback (most recent call last):
  File "./recover_cfg.py", line 28, in <module>
    cfg = barf.recover_cfg()
  File "/home/g/Codigo/barf-project/barf/barf/barf.py", line 294, in recover_cfg
    bb_list = self.bb_builder.build(start_addr, end_addr)
  File "/home/g/Codigo/barf-project/barf/barf/analysis/basicblock/basicblock.py", line 368, in build
    bbs = self._find_candidate_bbs(start_address, end_address)
  File "/home/g/Codigo/barf-project/barf/barf/analysis/basicblock/basicblock.py", line 426, in _find_candidate_bbs
    bb = self._disassemble_bb(curr_addr, end_address + 0x1)
  File "/home/g/Codigo/barf-project/barf/barf/analysis/basicblock/basicblock.py", line 548, in _disassemble_bb
    asm = self._disasm.disassemble(self._mem[start:end], addr)
  File "/home/g/Codigo/barf-project/barf/barf/core/bi.py", line 67, in __getitem__
    raise IndexError(reason)
IndexError: string index out of range

where /bin/true is a ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=9d915e13fb31a59c4d02b39bd596af20873aca0b, stripped

Improve CFG recovery - Process Jump Tables

Currently, there is no support for jump tables processing when generating the CFG of a function. The goal is to provide a way to process them for the current supported architectures.

Implement taints as sets

I think one of most flexible and useful ways of taint data to perform static/dynamic binary analysis is using sets to carry different types of data. In this case, the "False" value of a missing taint is replaced by the empty set and the taint disjunction by the union of the taint sets. A POC of the modifications needed for this enhacement is available here.

For example, in dynamic analysis, they can be used to track how different offset bytes of a file taint instructions in a trace.

Old style X86Parser class.

Inadvertent old-style class is preventing using super on X86Parser subclasses:

< class X86Parser():

---
> class X86Parser(object):

Thanks!

Build a proper replacement for PyBFD

Currently, BARF relies on PyBFD to open and read the various binary formats that exists. However, there are some issues with the library. The main problem is the lack of Windows support. The idea is to build a replacement for PyBFD using existing libraries that handles specific binary formats, for instance,
pyelftools and pefile.

Error with "push large dword ptr fs:0"

Error occurred while processing "push large dword ptr fs:0" and "mov large fs:0, ecx" in develop and impove branches

Traceback (most recent call last):
File "/usr/lib/python2.7/logging/init.py", line 851, in emit
msg = self.format(record)
File "/usr/lib/python2.7/logging/init.py", line 724, in format
return fmt.format(record)
File "/usr/lib/python2.7/logging/init.py", line 464, in format
record.message = record.getMessage()
File "/usr/lib/python2.7/logging/init.py", line 328, in getMessage
msg = msg % self.args
File "/home/user/barf-project-develop/barf/barf/core/reil/reil.py", line 282, in str
operands_str = ", ".join(map(print_oprnd, self._operands))
File "/home/user/barf-project-develop/barf/barf/core/reil/reil.py", line 259, in print_oprnd
size_str = str(oprnd.size) if oprnd.size else ""
AttributeError: 'NoneType' object has no attribute 'size'
Logged from file x86translator.py, line 287
Traceback (most recent call last):
File "test.py", line 27, in
for addr, asm_instr, reil_instrs in barf.translate(ea_start, ea_end):
File "/home/user/barf-project-develop/barf/barf/barf.py", line 179, in translate
yield addr, asm, self.ir_translator.translate(asm)
File "/home/user/barf-project-develop/barf/barf/arch/x86/x86translator.py", line 282, in translate
check_operands_size(instr, self._arch_info.architecture_size)
File "/home/user/barf-project-develop/barf/barf/arch/x86/x86translator.py", line 162, in check_operands_size
assert instr.operands[0].size == arch_size,
AttributeError: 'NoneType' object has no attribute 'size'

Optimize to speed up the gadget finder

A comparison between rop-tool, ROPgadget, BARFgadget when used to find gadgets in libc.so.6.

rop-tools(written in c):
1229 gadgets found.
rop-tool gadget libc.so.6 17.29s user 0.01s system 100% cpu 17.289 total

ROPgadget:
Unique gadgets found: 21240
ROPgadget --binary libc.so.6 72.30s user 10.25s system 99% cpu 1:22.82 total

BARFgadget:
Find Stage : 358.472s
Classification Stage : 854.280s
Verification Stage : 377.223s
Total : 1589.976s

Suggestion:

  • BARFgadget should not translate instructions to REIL when finding gadgets, It cost too much time.

Support movabs

'movabs' is a GAS specific notation adopted by capstone. It is the same Opcode as regular mov instructions, but is used to handle 64bit operands.

I've implemented rudimentary support like this:

    def _translate_movabs(self, tb, instruction):
        # alias for mov with 64bit operands
        self._translate_mov(tb, instruction)

It works well in the scenarios I tested (mostly objectiveC on x64)

ARM translating mode is set to Thumb by default

As far as I know, BARF currently doesn't support thumb mode, however it's set to thumb mode by default.

For ARM mode binaries, recover_cfg function won't work as it assumes it's thumb. In armdisassembler.py, the disassembler function is called without provision of its last argument, so it assumes it's a thumb binary and thereafter everything will be messed up.

Adding architecture support dynamically

It would be very useful to employ BARF's REIL analysis tools for architectures not supported in BARF without modifying its core but rather through a defined API that allows to add support for an architecture (information, disassembler, REIL translator) in a dynamic fashion. In this way, BARF would be used as a library inside another project.

An example of this use case is the Hexag00n project (that works with the Hexagon architecture, currently not supported in BARF), where BARF's REIL analysis tools would be very useful. Hexag00n already has its own disassembler (for the Hexagon architecture) and REIL translator, porting these to the BARF core is not a trivial task, nor it is desirable: BARF and Hexag00n should be able to work together while avoiding as much coupling as possible.

In this scenario Hexag00n will use BARF as a library facilitating the Hexagon architecture support to BARF in a dynamic fashion, without modifying BARF's source code, e.g., forcing BARF to import Hexag00n disassembler (and thus generating a circular dependency). An API has to be defined in BARF to be able to import that architecture support, for this purpose the load_architecture method has been added to the BARF core. This method is employed in an Hexag00n's example script which will be used as a first step to precisely define what has to be provided to BARF to generate REIL code (from an Hexagon binary) through the BARF framework.

For now, what has been characterized as architecture support are 3 base classes:

  • ArchitectureInformation: contains the basic definitions of an architecture like its size, registers, etc.
  • Disassembler: encapsulates the architecture disassembler.
  • Translator: encapsulates the architecture to REIL translator.

For each architecture supported, new classes have to be derived from these ones, containing all the necessary information for BARF to generate an equivalent REIL code and proceed to its analysis. Up to now these derived classes are part of the BARF core (like x86 and ARM architectures). With this new functionality, the objective is, through a defined API, being able to provide architecture support on the fly while using BARF as a library (imported in the working project).

keyerror in basicblock during edge creation

Hello guys,
I am doing some malware analysis and building a CFG from this particular sample.

This is the output log in BARF:

2018-03-11 07:19:20,643: barf.barf:INFO: Initializing BARF
2018-03-11 07:19:20,702: barf.core.smt.smtsolver:DEBUG: > (set-option :global-decls false)
2018-03-11 07:19:20,703: barf.core.smt.smtsolver:DEBUG: > (set-logic QF_AUFBV)
2018-03-11 07:19:20,703: barf.core.smt.smtsolver:DEBUG: > (declare-fun MEM_0 () (Array (_ BitVec 32) (_ BitVec 8)))
2018-03-11 07:19:20,704: barf.arch.emulator:INFO: Loading PE image into memory
2018-03-11 07:19:20,728: barf.arch.emulator:INFO: Loading section #0 (0x401000-0x404600)
2018-03-11 07:19:20,741: barf.arch.emulator:INFO: Loading section #1 (0x405000-0x405000)
2018-03-11 07:19:20,741: barf.arch.emulator:INFO: Loading section #2 (0x415000-0x41f800)
2018-03-11 07:19:20,782: barf.arch.emulator:INFO: Loading section #3 (0x420000-0x425000)
2018-03-11 07:19:20,855: barf.core.smt.smtsolver:DEBUG: > (set-option :global-decls false)
2018-03-11 07:19:20,855: barf.core.smt.smtsolver:DEBUG: > (set-logic QF_AUFBV)
2018-03-11 07:19:20,856: barf.core.smt.smtsolver:DEBUG: > (declare-fun MEM_0 () (Array (_ BitVec 32) (_ BitVec 8)))
2018-03-11 07:19:20,856: barf.arch.emulator:INFO: Loading PE image into memory
2018-03-11 07:19:20,881: barf.arch.emulator:INFO: Loading section #0 (0x401000-0x404600)
2018-03-11 07:19:20,893: barf.arch.emulator:INFO: Loading section #1 (0x405000-0x405000)
2018-03-11 07:19:20,893: barf.arch.emulator:INFO: Loading section #2 (0x415000-0x41f800)
2018-03-11 07:19:20,934: barf.arch.emulator:INFO: Loading section #3 (0x420000-0x425000)
2018-03-11 07:19:21,244: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, edi, 0x3 [0f a4 f8 03])
2018-03-11 07:19:21,245: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, edi, 0x3 [0f a4 f8 03])
2018-03-11 07:19:21,401: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:21,402: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:21,605: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:22,838: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, ecx, 0x3 [0f a4 c8 03])
2018-03-11 07:19:22,839: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, ecx, 0x3 [0f a4 c8 03])
2018-03-11 07:19:23,153: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:23,154: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:23,237: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, eax, 0x3 [0f a4 c0 03])
2018-03-11 07:19:23,346: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:23,347: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:23,371: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:23,373: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:23,473: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:23,647: barf.arch.x86.x86translator:INFO: Instruction not supported: shld (shld eax, esi, 0x3 [0f a4 f0 03])
2018-03-11 07:19:24,940: barf.analysis.basicblock.basicblock:ERROR: Failed to save basic block graph: 000021ce9241b56a22923f51ec5895ab.x86_cfg (.dot)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/barf/analysis/basicblock/basicblock.py", line 911, in save
    edge = self._create_edge(nodes[bb_src.address], nodes[bb_dst_addr], branch_type)
KeyError: 4248935

It seems that is creating an edge between nodes that doesn't exist?

ValueError: Unknown format code 'x' for object of type 'str'

When I try recovering cfg from /bin/ls in ubuntu 16.04 I get this error:

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/barf-0.3-py2.7.egg/barf/barf.py", line 299, in recover_cfg
cfg, _ = self._recover_cfg(start=ea_start, end=ea_end, symbols=symbols, callback=callback)
File "/usr/local/lib/python2.7/dist-packages/barf-0.3-py2.7.egg/barf/barf.py", line 349, in recover_cfg
name = "sub
{:x}".format(start)
ValueError: Unknown format code 'x' for object of type 'str'

Here is my code:

from barf import BARF
b = BARF("/bin/ls")
cfg = b.recover_cfg()

Issues with translate opcode 'rsb'

When I analyze a binary generated by Android-NDK, I face some questions about CFG_recover
The CFG in IDA looks like this:
1524802763350

I use BARF to generate the CFG,

filename = sys.argv[1]
start = int(sys.argv[2], 16)
barf = BARF(filename)
base_addr = barf.binary.entry_point >> 12 << 12

cfg = barf.recover_cfg(start)
blocks = cfg.basic_blocks

there is a command 'RSB r0, r0,0x0' in the beginning of the second block. Every time I meet command start with rsb, errors like follow appear:

Traceback (most recent call last):
..............
File "/Users/mark/Envs/angr/lib/python2.7/site-packages/barf-0.5.0-py2.7.egg/barf/arch/arm/translators/data.py", line 141, in _translate_rsb
self._translate_sub(tb, instruction)
AttributeError: 'ArmTranslator' object has no attribute '_translate_sub'

But there is a def '_translate_sub' in data.py. Can anyone help me? Or this is a bug?

Use BARF as a disassembly backend for Binnavi

BinNavi is a graphical binary navigator useful for reverse engineering software. Currently, it relies on a commercial software, IDA Pro, to do the disassembly work. The goal is to use BARF as a replacement for the tasks done by IDA Pro.

Move pyasmjit to a separate project/repo

Hi,

Is it possible to move pyasmjit to a separate project/repo outside of the barf-project repo? I am interested in experimenting with pyasmjit as a stand-alone library (i.e. without the entire barf dependency) and extending its functionality. Having pyasmjist exist as a separate entity would greatly simplify this.

Many thanks.

Regards,
Adrian

Error in translation of stack canary check to REIL instruction and SMT expression

Hi,

I am getteting error when stack canary is tranalated to SMT expression. This is the error:
File "XXX/barf-project/barf/barf/analysis/codeanalyzer/codeanalyzer.py", line 313, in check_path_satisfiability
smt_mem_addr = smtlibv2.BitVec(32, "#x%08x" % instr.operands[0].name)
AttributeError: 'ReilImmediateOperand' object has no attribute 'name'

I think the error is caused by this instruction and the correspoding REIL tralanation:
mov eax, dword ptr gs:[0x14]
ldm [DWORD 0x14, EMPTY, DWORD t22]

Thank you.

Regards,
Maggie

Bug??

i use recover_cfg to create CFG,but the boclk is not same to IDA.
use code:
barf.recover_cfg(ea_start=start, ea_end=0x00008790 + 0x2, arch_mode=ARCH_ARM_MODE_THUMB)`

to create the end address is 0x875d; but the real end adress is 0x00008790

the binary :

binary rar

.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.