Giter Club home page Giter Club logo

dagger's Introduction

Dagger

Build Status

Dagger is a binary translator to LLVM IR, with the goal of being as native as possible to the LLVM infrastructure.

Building

As an LLVM fork, Dagger is built the same way; assuming you have a reasonably recent toolchain and CMake, just do:

  $ cd dagger
  $ mkdir build
  $ cd build
  $ cmake ..
  $ make

More information on the llvm.org Getting Started and CMake pages.

Usage

While Dagger is intended to be usable as a library, it does come with tools:

Static Binary Translation to IR: llvm-dec

llvm-dec takes in an object file and produces IR.

  $ ./bin/llvm-dec ./a.out

Dynamic Binary Translation: DYN (OS X-only)

DYN is an OS X-only dylib that is intended to be preloaded so that it can hijack program execution:

  $ echo "int main() { return 42; }" | clang -x c -
  $ DYLD_INSERT_LIBRARIES=./lib/libDYN.dylib ./a.out
  $ echo $?
 42

This will "execute" a.out by translating all of its code to LLVM IR, JITting that, and finally executing it.

The DCDYN_OPTIONS environment variable can be used to pass command-line options. For instance, if you're really brave, you can try:

 $ DCDYN_OPTIONS="-print-after-all" DYLD_INSERT_LIBRARIES=build/lib/libDYN.dylib ./a.out

which will print tons of LLVM debug output.

Features

X86 is the main currently supported target. There is ongoing work on adding AArch64 support.

The Mach-O object file format is the best supported. Basic ELF is also supported. However, except for DYN, there is always a generic fallback, so YMMV with other formats.

dagger's People

Contributors

ahatanak avatar ahmedbougacha avatar arsenm avatar asl avatar atrick avatar bob-wilson avatar chandlerc avatar chapuni avatar cunningbaldrick avatar d0k avatar ddunbar avatar dexonsmith avatar dwblaikie avatar echristo avatar espindola avatar isanbard avatar lattner avatar lhames avatar majnemer avatar mbrukman avatar nlewycky avatar resistor avatar rksimon avatar rnk avatar rotateright avatar sanjoy avatar stoklund avatar tnorthover avatar topperc avatar tstellaramd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dagger's Issues

Question about i386 target triple

Hi. Sorry, maybe a noob question. I've succesfully built dagger on my x86_64 machine. When I issue

llvm-dec <32bit executable ELF>

it fails with:

error: no dc translator for target i386-unknown-unknown

The same works fine with 64 bit ELFs.

Is it possible to build dagger to support both architectures?
If not is it possible to create another build which supports i386 target on a x86_64 host?

Suggested OS / version / other versions of dependencies?

Hi! I'm trying to build this in an ubuntu jammy docker container, and using all the default libraries from api. I hit about 90% and get:

2865.1 [ 90%] Building CXX object tools/llc/CMakeFiles/llc.dir/llc.cpp.o
2868.0 [ 90%] Linking CXX executable ../../bin/llc
2895.7 [ 90%] Built target llc
2895.7 [ 90%] Building CXX object tools/lli/CMakeFiles/lli.dir/lli.cpp.o
2898.2  llvm::MachineBasicBlock*>]':
2898.2 /opt/dagger/include/llvm/CodeGen/SlotIndexes.h:674:27:   required from here
2898.2 /opt/dagger/include/llvm/ADT/SmallVector.h:309:11: warning: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'struct std::pair<llvm::SlotIndex, llvm::MachineBasicBlock*>' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
2898.2   309 |     memcpy(this->end(), &Elt, sizeof(T));
2898.2       |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2898.2 In file included from /usr/include/c++/11/bits/stl_algobase.h:64,
2898.2                  from /usr/include/c++/11/bits/char_traits.h:39,
2898.2                  from /usr/include/c++/11/string:40,
2898.2                  from /usr/include/c++/11/bits/locale_classes.h:40,
2898.2                  from /usr/include/c++/11/bits/ios_base.h:41,
2898.2                  from /usr/include/c++/11/streambuf:41,
2898.2                  from /usr/include/c++/11/bits/streambuf_iterator.h:35,
2898.2                  from /usr/include/c++/11/iterator:66,

I was wondering if you might give an explicit build environment example that might fix the error? Thanks!

Cannot compile intrinsic llvm.dc.translate.at

Is the LLVM IR output of dagger supposed to be compilable? It seems that for every binary I lift into IR, dagger inserts an llvm.dc.translate.at intrinsic. When I recompile the LLVM IR file, llc complains about this intrinsic (same issue with lli). I am using the llc/lli built from the dagger fork.

Cannot translate instruction: XCHG32rm

Hello. Does anybody know how to workaround it?

[xaionaro@void helloworld]$ /home/xaionaro/src/dagger/build/bin/llvm-dec helloworld 
Cannot translate instruction: 
    XCHG32rm: <MCInst 14653 <MCOperand Reg:20> <MCOperand Reg:20> <MCOperand Reg:37> <MCOperand Imm:1> <MCOperand Reg:0> <MCOperand Imm:0> <MCOperand Reg:0>>
Couldn't translate instruction

UNREACHABLE executed at /home/xaionaro/src/dagger/lib/DC/DCTranslator.cpp:144!
#0 0x0000000000d5cd3e llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/xaionaro/src/dagger/lib/Support/Unix/Signals.inc:398:22
#1 0x0000000000d5cdd1 PrintStackTraceSignalHandler(void*) /home/xaionaro/src/dagger/lib/Support/Unix/Signals.inc:462:1
#2 0x0000000000d5b2e0 llvm::sys::RunSignalHandlers() /home/xaionaro/src/dagger/lib/Support/Signals.cpp:49:19
#3 0x0000000000d5c6b6 SignalHandler(int) /home/xaionaro/src/dagger/lib/Support/Unix/Signals.inc:252:1
#4 0x00007fa1f3a73b20 __restore_rt (/lib64/libpthread.so.0+0x14b20)
#5 0x00007fa1f351a625 raise (/lib64/libc.so.6+0x3c625)
#6 0x00007fa1f35038d9 abort (/lib64/libc.so.6+0x258d9)
#7 0x0000000000cea459 bindingsErrorHandler(void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) /home/xaionaro/src/dagger/lib/Support/ErrorHandling.cpp:127:55
#8 0x0000000000a410aa llvm::DCTranslator::translateFunction(llvm::MCFunction const&) /home/xaionaro/src/dagger/lib/DC/DCTranslator.cpp:139:73
#9 0x0000000000a477c4 llvm::translateRecursivelyAt(llvm::ArrayRef<unsigned long>, llvm::DCTranslator&, llvm::MCModule&, llvm::MCObjectDisassembler*, llvm::MCObjectSymbolizer*) /home/xaionaro/src/dagger/lib/DC/DCTranslatorUtils.cpp:79:46
#10 0x000000000040c4c3 main /home/xaionaro/src/dagger/tools/llvm-dec/llvm-dec.cpp:228:44
#11 0x00007fa1f35051a3 __libc_start_main (/lib64/libc.so.6+0x271a3)
#12 0x000000000040b10e _start (/home/xaionaro/src/dagger/build/bin/llvm-dec+0x40b10e)
Stack dump:
0.      Program arguments: /home/xaionaro/src/dagger/build/bin/llvm-dec helloworld 
1.      DC: Translating Function at address 401000
2.      DC: Translating Basic Block at address 401000
3.      DC: Translating instruction XCHG32rm at address 401009
Aborted (core dumped)

The source code used to build the binary:

package main
import "fmt"
func main() {
        fmt.Println("Hello, world!")
}

If I add option -dc-translate-unknown-to-undef it returns:

$ /home/xaionaro/src/dagger/build/bin/llvm-dec -dc-translate-unknown-to-undef helloworld 
Couldn't translate instruction: 
    XCHG32rm: <MCInst 14653 <MCOperand Reg:20> <MCOperand Reg:20> <MCOperand Reg:37> <MCOperand Imm:1> <MCOperand Reg:0> <MCOperand Imm:0> <MCOperand Reg:0>>
Couldn't translate instruction: 
    XCHG64rm: <MCInst 14656 <MCOperand Reg:36> <MCOperand Reg:36> <MCOperand Reg:37> <MCOperand Imm:1> <MCOperand Reg:0> <MCOperand Imm:0> <MCOperand Reg:0>>
Couldn't translate instruction: 
    XCHG64rm: <MCInst 14656 <MCOperand Reg:36> <MCOperand Reg:36> <MCOperand Reg:37> <MCOperand Imm:1> <MCOperand Reg:0> <MCOperand Imm:0> <MCOperand Reg:0>>
Couldn't translate instruction: 
    XCHG64rm: <MCInst 14656 <MCOperand Reg:36> <MCOperand Reg:36> <MCOperand Reg:39> <MCOperand Imm:1> <MCOperand Reg:0> <MCOperand Imm:0> <MCOperand Reg:0>>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
Couldn't translate instruction: 
    INT3: <MCInst 1022>
llvm-dec: /home/xaionaro/src/dagger/lib/Target/X86/DC/X86DCInstruction.cpp:211: virtual bool llvm::X86DCInstruction::translateTargetInst(): Assertion `NextOpc == ISD::LOAD && "Expected to load operand for X86 LOCK-prefixed instruction"' failed.
#0 0x0000000000d5cd3e llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/xaionaro/src/dagger/lib/Support/Unix/Signals.inc:398:22
#1 0x0000000000d5cdd1 PrintStackTraceSignalHandler(void*) /home/xaionaro/src/dagger/lib/Support/Unix/Signals.inc:462:1
#2 0x0000000000d5b2e0 llvm::sys::RunSignalHandlers() /home/xaionaro/src/dagger/lib/Support/Signals.cpp:49:19
#3 0x0000000000d5c6b6 SignalHandler(int) /home/xaionaro/src/dagger/lib/Support/Unix/Signals.inc:252:1
#4 0x00007fee24b28b20 __restore_rt (/lib64/libpthread.so.0+0x14b20)
#5 0x00007fee245cf625 raise (/lib64/libc.so.6+0x3c625)
#6 0x00007fee245b88d9 abort (/lib64/libc.so.6+0x258d9)
#7 0x00007fee245b87a9 _nl_load_domain.cold (/lib64/libc.so.6+0x257a9)
#8 0x00007fee245c7a66 (/lib64/libc.so.6+0x34a66)
#9 0x00000000008a00ca llvm::X86DCInstruction::translateTargetInst() /home/xaionaro/src/dagger/lib/Target/X86/DC/X86DCInstruction.cpp:213:20
#10 0x0000000000a31e56 llvm::DCInstruction::tryTranslateInst() /home/xaionaro/src/dagger/lib/DC/DCInstruction.cpp:225:3
#11 0x0000000000a311b6 llvm::DCInstruction::translate() /home/xaionaro/src/dagger/lib/DC/DCInstruction.cpp:100:34
#12 0x0000000000a41015 llvm::DCTranslator::translateFunction(llvm::MCFunction const&) /home/xaionaro/src/dagger/lib/DC/DCTranslator.cpp:141:13
#13 0x0000000000a477c4 llvm::translateRecursivelyAt(llvm::ArrayRef<unsigned long>, llvm::DCTranslator&, llvm::MCModule&, llvm::MCObjectDisassembler*, llvm::MCObjectSymbolizer*) /home/xaionaro/src/dagger/lib/DC/DCTranslatorUtils.cpp:79:46
#14 0x000000000040c4c3 main /home/xaionaro/src/dagger/tools/llvm-dec/llvm-dec.cpp:228:44
#15 0x00007fee245ba1a3 __libc_start_main (/lib64/libc.so.6+0x271a3)
#16 0x000000000040b10e _start (/home/xaionaro/src/dagger/build/bin/llvm-dec+0x40b10e)
Stack dump:
0.  Program arguments: /home/xaionaro/src/dagger/build/bin/llvm-dec -dc-translate-unknown-to-undef helloworld 
1.  DC: Translating Function at address 408190
2.  DC: Translating Basic Block at address 40842E
3.  DC: Translating instruction OR8mr at address 408438
Aborted (core dumped)

out of memory when linking

My machine has 8GB of RAM and more than 100 GB of storage. But an error happens when linking llvm-lto,

collect2: fatal error: ld terminated with signal 9 [Killed]
compilation terminated.
tools/llvm-lto/CMakeFiles/llvm-lto.dir/build.make:264: recipe for target 'bin/llvm-lto' failed
make[2]: *** [bin/llvm-lto] Error 1
make[2]: *** Deleting file 'bin/llvm-lto'
CMakeFiles/Makefile2:15267: recipe for target 'tools/llvm-lto/CMakeFiles/llvm-lto.dir/all' failed
make[1]: *** [tools/llvm-lto/CMakeFiles/llvm-lto.dir/all] Error 2
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2

According to other people's experience online, this error was caused due to out of memory. But isn't 8GB sufficient for most software building?

Segmentation fault if translateRecursivelyAt() finds new functions to analyze

There is a Segmentation fault in dagger if translateRecursivelyAt() finds new function, i think.

The problem occures in line 221 of llvm-dec.cpp:

  for (auto &F : MCM->funcs())
    translateRecursivelyAt(F->getStartAddr(), *DT, *MCM, OD.get(), MOS.get());

We get an interator-range from MCM->funcs() (which is in fact a vector of unique_ptrs that hold all the MCFunctions MCM found) and iterate over it. But translateRecursiveAt() updates the list of this vector of functions which we are iterating if it finds a new one. If this occures our iterator-range gets invalid and we access invalid memory with our reference F.

The code in DCTranslatorUtils.cpp (ranslateRecursiveAt()) who is updating the vector of functions is the following (lines 63 to 72):

    // Now look for the function if it was already in the module.
    MCFunction *MCFN = MCM.findFunctionAt(Addr);
    // If it wasn't, we need to disassemble it.
    if (!MCFN) {
      if (!MCOD)
        report_fatal_error(("Unable to translate unknown function at " +
                            utohexstr(Addr) + " without a disassembler!")
                               .c_str());
      MCFN = MCOD->createFunction(&MCM, Addr);
    }

If we do not execute MCOD->createFunction() here to create a new function in MCM, then no problem will arise, but we do not analyse the new function we found.

For a quick fix, I replaced the two lines in llvm-dec as follow:

  bool newFunctions = true; // we have functions in MCM to progress
  unsigned int fnCount = 0; // total number of functions in MCM
  std::vector<std::unique_ptr<MCFunction>> functionList; // copy of MCM-fn-list

  while(newFunctions){
    functionList.clear(); // make sure, that the list is empty
    for (auto &F : MCM->funcs())
      functionList.push_back(std::move(F)); // copy all functions from MCM
    if (fnCount != functionList.size()){ // on new functions progress them
      fnCount = functionList.size();
      for (auto &F : functionList){
        if(F.get() != nullptr) // do not process the same functions again
          translateRecursivelyAt(
              F->getStartAddr(), *DT, *MCM, OD.get(), MOS.get());
      }
    }else{
      newFunctions = false;
    }
  }

Now it seems to work properly, but for sure, this is very ugly code and we have to find a better solution.

What do you think about create a member-variable in MCM who holds the number of functions it found and if we add a new one to MCM it increments. In llvm-dec we could check this number every time before we invoke translateRecursivelyAt() and if a change occures we start again with our interation. If such a solution would be ok for you i could make a pull request or an patch or something like that for this to fix it.

Issue with hello world decompilation

With a program like

#include<stdio.h>
int main(int argc, char* argv[]) {
  printf("Hello WOrld!\n");
  return 0;
}
clang -m64 hello.c -o hello
./build/bin/llvm-dec ./hello

Gives:

  <MCInst 939>
Cannot translate instruction: 
  <MCInst 939>
Cannot translate instruction: 
  <MCInst 939>
Cannot translate instruction: 
  <MCInst 939>
Cannot translate instruction: 
  <MCInst 939>
Cannot translate instruction:

But if compiled with
clang -m64 hello.c -c -o hello
It correctly decompiles to IR. But the IR does not seem functional? Is the external function printf identified?

Overlapping .tbss-Section in ELF-Files

In gcc-compiled ELF-Files the .tbss-section is sometimes overlapping another section (if threads and locks are used in the binary), which will prevent dagger from working on those files because dagger checks for overlapping sections and stops if it should process overlapping sections.

A post from stackoverflow describes this more clearly - http://stackoverflow.com/questions/25501044/gcc-ld-overlapping-sections-tbss-init-array-in-statically-linked-elf-bin.

Because thread-local sections like the .tbss-section have the SHF_TLS-Flag set, we could ignore them easily in the shouldSkipELFSection function in the MCObjectSymbolizer to solve this issue in the ELF-Files. So we could do something like this in MCObjectSymbolizer.cpp:

static bool shouldSkipELFSection(SectionRef S) {
  // skip sections with the SHF_TLS flag
  if (ELFSectionRef(S).getFlags() & ELF::SHF_TLS)
    return true;
  else  // skip all sections without the SHF_ALLOC flag
    return !(ELFSectionRef(S).getFlags() & ELF::SHF_ALLOC);
}

The sections with the SHF_TLS flag (.tbss, .tdata[x]) should contain only data and no instructions, so i think it would be save for dagger to ignore them for ELF-Files. Because .tbss seems to be the only overlapping section it would be possible to ignore only this section as well. So the following works fine as well for all samples i discovered until now.

static bool shouldSkipELFSection(SectionRef S) {
  StringRef SectionName{};
  S.getName(SectionName);
  if(SectionName == ".tbss")
    return true;
  else
    return !(ELFSectionRef(S).getFlags() & ELF::SHF_ALLOC);
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.