Comments (1)
Hi @zaoqi,
I added a simple example to demonstrate the usage of decomp at examples/demo.
For a quite reference, first, install the needed tools.
# Install Clang and LLVM
pacman -Sy clang llvm
# Install GraphViz dot
pacman -Sy graphviz
go get github.com/decomp/decomp/cmd/...
go get github.com/decomp/exp/cmd/dot2png
# TODO: uncomment once https://github.com/jackspirou/grind/pull/6 is merged.
#go get github.com/jackspirou/grind
# Below is a workaround until jackspirou/grind#6 is merged.
git clone -b go-importer https://github.com/mewpull/grind /tmp/grind; cd /tmp/grind; go install; cd -
go get github.com/mewkiz/cmd/sar
go get golang.org/x/tools/cmd/goimports
Then, navigate to the examples/demo directory, and run make
.
The Makefile performs the steps listed in the comments of the Makefile, a summary of which is:
For a given input file foo.c
- Produce an LLVM IR assembly file
foo.ll
offoo.c
using Clang. - Produce control flow graphs of each function of
foo.ll
in GraphViz DOT format usingll2dot
. The control flow graphs offoo.ll
are stored infoo_graphs
, wherefoo_graphs/main.dot
corresponds to the control flow graph of themain
function. - Perform control flow analysis on the control flow graphs using
restructure
to recovery high-level control flow primitives (e.g. for-loops, if-statements). For a given control flow graph e.g.foo_graphs/main.dot
, the recovered control flow information is stored infoo_graphs/foo.json
. Note, the-img
flag is produces debug output images for each step of the control flow recovery (e.g. before transformationfoo_graphs/main_0001a.png
, after transformationfoo_graphs/main_0001b.png
) to help give an intuition for how the control flow analysis is conducted. - Decompile the LLVM IR assembly
foo.ll
into an unpolished Go source filefoo_pre.go
usingll2go
, and the recovered control flow information (stored infoo_graphs/*.json
). - Perform a set of post-processing Go source code rewrites to improve readability and make the code more idiomatic using
go-post
andgrind
. The output of these rewrites isfoo.go
. - Format the source code using
goimports
and fix whitespace withsar
(a regexp search and replace tool).
An example run is listed below:
$ make
# * Step 0
# - Convert original C source code to LLVM IR.
#
# - input: foo.c
# - output: foo.ll
clang -S -emit-llvm -o foo.ll foo.c
opt -S --mem2reg -o foo.ll foo.ll
# * Step 1
# - Produce control flow graphs for each function of the LLVM IR.
#
# - input: foo.ll
# - output: foo_graphs/main.dot
ll2dot -f foo.ll
ll2dot: parsing file "foo.ll".
ll2dot: parsing function "@main".
ll2dot: creating file "foo_graphs/main.dot".
# * Step 2
# - Perform control flow analysis on the control flow graphs to recover
# the high-level control flow primitives of each function.
#
# - input: foo_graphs/main.dot
# - output: foo_graphs/main.json
# - output (optional): foo_graphs/main_NNNN{a,b}.dot (when the debug flag -steps is used)
cd foo_graphs; restructure -indent -steps -o main.json main.dot
# (optional) Convert DOT control flow graphs to PNG images.
dot2png foo_graphs/*.dot
2019/10/31 12:45:12 Creating: "foo_graphs/main_0001a.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0001b.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0002a.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0002b.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0003a.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0003b.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0004a.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0004b.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main.png"
# * Step 3
# - Decompile the LLVM IR assembly to unpolished Go source code, using
# control flow recovery information produced by restructure.
#
# - input: foo.ll
# - input: foo_graphs/main.json
# - output: foo_pre.go
ll2go foo.ll > foo_pre.go
ll2go: parsing file "foo.ll".
ll2go: decompiling function "@main".
# * Step 4
# - Post-process the decompiled Go source code.
#
# - input: foo.go (actually a copy of foo_pre.go, since rewrite happens inplace)
# - output: foo.go
cp foo_pre.go foo.go
go-post foo.go
foo.go: fixed unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved mem2var cmain varnames
# Force run "localid" post-processing rule (disabled by default).
go-post -force localid foo.go
foo.go: fixed localid assignbinop
# Move variable declarations closer to their use.
grind foo.go
0
c_main: cannot inline gotos without type information
main: cannot inline gotos without type information
EDIT: foo.go
@@ -3,16 +3,11 @@ package main
import "os"
func c_main(_0 int32, _1 **int8) int32 {
- var v_3 int32
- var v_4 int32
- var v_5 **int8
- var v_6 int32
- var v_7 int32
- v_3 = 0
- v_4 = _0
- v_5 = _1
- v_7 = 0
- v_6 = 0
+ v_3 := int32(0)
+ v_4 := _0
+ v_5 := _1
+ v_7 := int32(0)
+ v_6 := int32(0)
for v_6 <
10 {
1
c_main: cannot inline gotos without type information
main: cannot inline gotos without type information
# The rewrites of Grind enabled further rewrite rules of go-post.
#
# Note: use -diff to see the change of an given go-post rewrite; e.g.
#
# go-post -diff -r forloop foo.go
go-post -r forloop foo.go
foo.go: fixed forloop
go-post -r unusedvar foo.go
foo.go: fixed unusedvar
# Use sar to remove useless newlines.
sar -i "\n\n" "\n" foo.go
sar -i '([*<])\n' '$1 ' foo.go
# Use goimports to format the output Go source code.
#
# - input: foo.go
# - output: foo.go
goimports -w foo.go
# Add whitespace between top-level function declarations.
sar -i "}\nfunc" "}\n\nfunc" foo.go
Hope that helps!
Cheers,
Robin
P.S. I would personally not use decomp/decomp
for anything serious at the moment, as proper data flow analysis has not yet been implemented. As such, the rewrites performed by go-post
are not sound. Also, the output of ll2go
is very naiive as can be seen if you look closer at the foo_pre.go
output, which would run into an infinite loop if you were to run it (this is due to a naiive translation out of SSA form as we do not yet have proper data flow analysis information available).
Input C source file foo.c
(original source code, unknown to the decompiler):
int main(int argc, char **argv) {
int i, x;
x = 0;
for (i = 0; i < 10; i++) {
if (x < 100) {
x += 3*i;
}
}
return x;
}
LLVM IR assembly file foo.ll
(produced by Clang):
; ModuleID = 'foo.ll'
source_filename = "foo.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
; Function Attrs: noinline nounwind optnone sspstrong uwtable
define dso_local i32 @main(i32, i8**) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca i32, align 4
%7 = alloca i32, align 4
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 8
store i32 0, i32* %7, align 4
store i32 0, i32* %6, align 4
br label %8
8: ; preds = %20, %2
%9 = load i32, i32* %6, align 4
%10 = icmp slt i32 %9, 10
br i1 %10, label %11, label %23
11: ; preds = %8
%12 = load i32, i32* %7, align 4
%13 = icmp slt i32 %12, 100
br i1 %13, label %14, label %19
14: ; preds = %11
%15 = load i32, i32* %6, align 4
%16 = mul nsw i32 3, %15
%17 = load i32, i32* %7, align 4
%18 = add nsw i32 %17, %16
store i32 %18, i32* %7, align 4
br label %19
19: ; preds = %14, %11
br label %20
20: ; preds = %19
%21 = load i32, i32* %6, align 4
%22 = add nsw i32 %21, 1
store i32 %22, i32* %6, align 4
br label %8
23: ; preds = %8
%24 = load i32, i32* %7, align 4
ret i32 %24
}
attributes #0 = { noinline nounwind optnone sspstrong uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{!"clang version 9.0.0 (tags/RELEASE_900/final)"}
Control flow graph of the main
function in GraphViz DOT format main.dot
:
strict digraph "main" {
// Node definitions.
2 [label=entry];
8;
11;
23;
14;
19;
20;
// Edge definitions.
2 -> 8;
8 -> 11 [
color=darkgreen
label=true
];
8 -> 23 [
color=red
label=false
];
11 -> 14 [
color=darkgreen
label=true
];
11 -> 19 [
color=red
label=false
];
14 -> 19;
19 -> 20;
20 -> 8;
}
Steps taken during control flow recovery:
- Step 1 (before)
- Step 1 (after)
- Step 2 ( before)
- Step 2 (after)
- Step 3 (before)
- Step 3 (after)
- Step 4 (before)
- Step 4 (after)
Control flow recovery information produced by restructure
in JSON format foo_graphs/main.json
:
[
{
"prim": "if",
"nodes": {
"body": "14",
"cond": "11",
"exit": "19"
},
"entry": "11",
"exit": "19"
},
{
"prim": "seq",
"nodes": {
"entry": "11",
"exit": "20"
},
"entry": "11",
"exit": "20"
},
{
"prim": "pre_loop",
"nodes": {
"body": "11",
"cond": "8",
"exit": "23"
},
"entry": "8",
"exit": "23"
},
{
"prim": "seq",
"nodes": {
"entry": "2",
"exit": "8"
},
"entry": "2",
"exit": "8"
}
]
Unpolished Go source code produced by ll2go
, foo_pre.go
:
package main
func main(_0 int32, _1 **int8) int32 {
_3 = new(int32)
_4 = new(int32)
_5 = new(**int8)
_6 = new(int32)
_7 = new(int32)
*_3 = 0
*_4 = _0
*_5 = _1
*_7 = 0
*_6 = 0
_9 = *_6
_10 = _9 < 10
for _10 {
_12 = *_7
_13 = _12 < 100
if _13 {
_15 = *_6
_16 = 3 * _15
_17 = *_7
_18 = _17 + _16
*_7 = _18
}
_21 = *_6
_22 = _21 + 1
*_6 = _22
}
_24 = *_7
return _24
}
Go source code after post-processing (using go-post
and grind
), foo.go
:
package main
import "os"
func c_main(_0 int32, _1 **int8) int32 {
v_7 := int32(0)
for v_6 := int32(0); v_6 < 10; v_6++ {
if v_7 < 100 {
v_7 += 3 * v_6
}
}
return v_7
}
func main() {
ret := int(c_main(0, nil))
os.Exit(ret)
}
Copy of original C source code for ease of comparison:
int main(int argc, char **argv) {
int i, x;
x = 0;
for (i = 0; i < 10; i++) {
if (x < 100) {
x += 3*i;
}
}
return x;
}
At least for this simple example, the decompiled Go source code produces the same result as the original C source code:
# Result produced by original C source code:
$ clang -o foo foo.c
$ ./foo ; echo $?
108
# Result produced by decompiled Go source code:
$ go run foo.go
exit status 108
from decomp.
Related Issues (20)
- challenge: Decompile the mini-c compiler, and recompile it producing byte identical output HOT 2
- cmd/ll2go: Add support for remaining instructions HOT 1
- decomp: High level steps of the decompiler pipeline
- review: Structuring Decompiled Graphs [8h]
- review: A Structuring Algorithm for Decompilation [8h]
- review: Characterizations of Reducible Flow Graphs [8h]
- review: Taming control flow: A structured approach to eliminating goto statements [8h] HOT 1
- review: Principal type-schemes for functional programs
- review: Type-based decompilation (or program reconstruction via type reconstruction)
- review: TIE: Principled reverse engineering of types in binary programs HOT 1
- review: Scalable variable and data type detection in a binary rewriter
- review: BinType: A Scalable Type Inference Tool for Compiled C Programs HOT 1
- review: Type inference on executables
- review: dynStruct: An automatic reverse engineering tool for structure recovery and memory use analysis
- review: Polymorphic type inference for machine code HOT 1
- meta: Report, A Meta-study on Type Analysis of Low-level Code
- Add Microsoft's mctoll to front-ends.md HOT 1
- panic: support for value *constant.CharArray not yet implemented HOT 3
- [Error] make in demo folder HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from decomp.