Giter Club home page Giter Club logo

Comments (1)

mewmew avatar mewmew commented on June 16, 2024

Hi @zaoqi,

I added a simple example to demonstrate the usage of decomp at examples/demo.

For a quite reference, first, install the needed tools.

# Install Clang and LLVM
pacman -Sy clang llvm

# Install GraphViz dot
pacman -Sy graphviz

go get github.com/decomp/decomp/cmd/...
go get github.com/decomp/exp/cmd/dot2png
# TODO: uncomment once https://github.com/jackspirou/grind/pull/6 is merged.
#go get github.com/jackspirou/grind
# Below is a workaround until jackspirou/grind#6 is merged.
git clone -b go-importer https://github.com/mewpull/grind /tmp/grind; cd /tmp/grind; go install; cd -

go get github.com/mewkiz/cmd/sar
go get golang.org/x/tools/cmd/goimports

Then, navigate to the examples/demo directory, and run make.

The Makefile performs the steps listed in the comments of the Makefile, a summary of which is:

For a given input file foo.c

  1. Produce an LLVM IR assembly file foo.ll of foo.c using Clang.
  2. Produce control flow graphs of each function of foo.ll in GraphViz DOT format using ll2dot. The control flow graphs of foo.ll are stored in foo_graphs, where foo_graphs/main.dot corresponds to the control flow graph of the main function.
  3. Perform control flow analysis on the control flow graphs using restructure to recovery high-level control flow primitives (e.g. for-loops, if-statements). For a given control flow graph e.g. foo_graphs/main.dot, the recovered control flow information is stored in foo_graphs/foo.json. Note, the -img flag is produces debug output images for each step of the control flow recovery (e.g. before transformation foo_graphs/main_0001a.png, after transformation foo_graphs/main_0001b.png ) to help give an intuition for how the control flow analysis is conducted.
  4. Decompile the LLVM IR assembly foo.ll into an unpolished Go source file foo_pre.go using ll2go, and the recovered control flow information (stored in foo_graphs/*.json).
  5. Perform a set of post-processing Go source code rewrites to improve readability and make the code more idiomatic using go-post and grind. The output of these rewrites is foo.go.
  6. Format the source code using goimports and fix whitespace with sar (a regexp search and replace tool).

An example run is listed below:

$ make
# * Step 0
#    - Convert original C source code to LLVM IR.
#
#    - input:  foo.c
#    - output: foo.ll
clang -S -emit-llvm -o foo.ll foo.c
opt -S --mem2reg -o foo.ll foo.ll
# * Step 1
#    - Produce control flow graphs for each function of the LLVM IR.
#
#    - input:  foo.ll
#    - output: foo_graphs/main.dot
ll2dot -f foo.ll
ll2dot: parsing file "foo.ll".
ll2dot: parsing function "@main".
ll2dot: creating file "foo_graphs/main.dot".
# * Step 2
#    - Perform control flow analysis on the control flow graphs to recover
#      the high-level control flow primitives of each function.
#
#    - input:  foo_graphs/main.dot
#    - output: foo_graphs/main.json
#    - output (optional): foo_graphs/main_NNNN{a,b}.dot (when the debug flag -steps is used)
cd foo_graphs; restructure -indent -steps -o main.json main.dot
# (optional) Convert DOT control flow graphs to PNG images.
dot2png foo_graphs/*.dot
2019/10/31 12:45:12 Creating: "foo_graphs/main_0001a.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0001b.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0002a.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0002b.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0003a.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0003b.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0004a.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main_0004b.png"
2019/10/31 12:45:12 Creating: "foo_graphs/main.png"
# * Step 3
#    - Decompile the LLVM IR assembly to unpolished Go source code, using
#      control flow recovery information produced by restructure.
#
#    - input:  foo.ll
#    - input:  foo_graphs/main.json
#    - output: foo_pre.go
ll2go foo.ll > foo_pre.go
ll2go: parsing file "foo.ll".
ll2go: decompiling function "@main".
# * Step 4
#    - Post-process the decompiled Go source code.
#
#    - input:  foo.go (actually a copy of foo_pre.go, since rewrite happens inplace)
#    - output: foo.go
cp foo_pre.go foo.go
go-post foo.go
foo.go: fixed unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved unresolved mem2var cmain varnames
# Force run "localid" post-processing rule (disabled by default).
go-post -force localid foo.go
foo.go: fixed localid assignbinop
# Move variable declarations closer to their use.
grind foo.go
0
c_main: cannot inline gotos without type information
main: cannot inline gotos without type information
EDIT: foo.go
@@ -3,16 +3,11 @@ package main
 import "os"
 
 func c_main(_0 int32, _1 **int8) int32 {
-	var v_3 int32
-	var v_4 int32
-	var v_5 **int8
-	var v_6 int32
-	var v_7 int32
-	v_3 = 0
-	v_4 = _0
-	v_5 = _1
-	v_7 = 0
-	v_6 = 0
+	v_3 := int32(0)
+	v_4 := _0
+	v_5 := _1
+	v_7 := int32(0)
+	v_6 := int32(0)
 
 	for v_6 <
 		10 {

1
c_main: cannot inline gotos without type information
main: cannot inline gotos without type information
# The rewrites of Grind enabled further rewrite rules of go-post.
#
# Note: use -diff to see the change of an given go-post rewrite; e.g.
#
#    go-post -diff -r forloop foo.go
go-post -r forloop foo.go
foo.go: fixed forloop
go-post -r unusedvar foo.go
foo.go: fixed unusedvar
# Use sar to remove useless newlines.
sar -i "\n\n" "\n" foo.go
sar -i '([*<])\n' '$1 ' foo.go
# Use goimports to format the output Go source code.
#
#    - input:  foo.go
#    - output: foo.go
goimports -w foo.go
# Add whitespace between top-level function declarations.
sar -i "}\nfunc" "}\n\nfunc" foo.go

Hope that helps!

Cheers,
Robin

P.S. I would personally not use decomp/decomp for anything serious at the moment, as proper data flow analysis has not yet been implemented. As such, the rewrites performed by go-post are not sound. Also, the output of ll2go is very naiive as can be seen if you look closer at the foo_pre.go output, which would run into an infinite loop if you were to run it (this is due to a naiive translation out of SSA form as we do not yet have proper data flow analysis information available).

Input C source file foo.c (original source code, unknown to the decompiler):

int main(int argc, char **argv) {
	int i, x;

	x = 0;
	for (i = 0; i < 10; i++) {
		if (x < 100) {
			x += 3*i;
		}
	}
	return x;
}

LLVM IR assembly file foo.ll (produced by Clang):

; ModuleID = 'foo.ll'
source_filename = "foo.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

; Function Attrs: noinline nounwind optnone sspstrong uwtable
define dso_local i32 @main(i32, i8**) #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca i8**, align 8
  %6 = alloca i32, align 4
  %7 = alloca i32, align 4
  store i32 0, i32* %3, align 4
  store i32 %0, i32* %4, align 4
  store i8** %1, i8*** %5, align 8
  store i32 0, i32* %7, align 4
  store i32 0, i32* %6, align 4
  br label %8

8:                                                ; preds = %20, %2
  %9 = load i32, i32* %6, align 4
  %10 = icmp slt i32 %9, 10
  br i1 %10, label %11, label %23

11:                                               ; preds = %8
  %12 = load i32, i32* %7, align 4
  %13 = icmp slt i32 %12, 100
  br i1 %13, label %14, label %19

14:                                               ; preds = %11
  %15 = load i32, i32* %6, align 4
  %16 = mul nsw i32 3, %15
  %17 = load i32, i32* %7, align 4
  %18 = add nsw i32 %17, %16
  store i32 %18, i32* %7, align 4
  br label %19

19:                                               ; preds = %14, %11
  br label %20

20:                                               ; preds = %19
  %21 = load i32, i32* %6, align 4
  %22 = add nsw i32 %21, 1
  store i32 %22, i32* %6, align 4
  br label %8

23:                                               ; preds = %8
  %24 = load i32, i32* %7, align 4
  ret i32 %24
}

attributes #0 = { noinline nounwind optnone sspstrong uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{!"clang version 9.0.0 (tags/RELEASE_900/final)"}

Control flow graph of the main function in GraphViz DOT format main.dot:

strict digraph "main" {
	// Node definitions.
	2 [label=entry];
	8;
	11;
	23;
	14;
	19;
	20;

	// Edge definitions.
	2 -> 8;
	8 -> 11 [
		color=darkgreen
		label=true
	];
	8 -> 23 [
		color=red
		label=false
	];
	11 -> 14 [
		color=darkgreen
		label=true
	];
	11 -> 19 [
		color=red
		label=false
	];
	14 -> 19;
	19 -> 20;
	20 -> 8;
}

main

Steps taken during control flow recovery:

  • Step 1 (before)

main_0001a

  • Step 1 (after)

main_0001b

  • Step 2 ( before)

main_0002a

  • Step 2 (after)

main_0002b

  • Step 3 (before)

main_0003a

  • Step 3 (after)

main_0003b

  • Step 4 (before)

main_0004a

  • Step 4 (after)

main_0004b

Control flow recovery information produced by restructure in JSON format foo_graphs/main.json:

[
	{
		"prim": "if",
		"nodes": {
			"body": "14",
			"cond": "11",
			"exit": "19"
		},
		"entry": "11",
		"exit": "19"
	},
	{
		"prim": "seq",
		"nodes": {
			"entry": "11",
			"exit": "20"
		},
		"entry": "11",
		"exit": "20"
	},
	{
		"prim": "pre_loop",
		"nodes": {
			"body": "11",
			"cond": "8",
			"exit": "23"
		},
		"entry": "8",
		"exit": "23"
	},
	{
		"prim": "seq",
		"nodes": {
			"entry": "2",
			"exit": "8"
		},
		"entry": "2",
		"exit": "8"
	}
]

Unpolished Go source code produced by ll2go, foo_pre.go:

package main

func main(_0 int32, _1 **int8) int32 {
	_3 = new(int32)
	_4 = new(int32)
	_5 = new(**int8)
	_6 = new(int32)
	_7 = new(int32)
	*_3 = 0
	*_4 = _0
	*_5 = _1
	*_7 = 0
	*_6 = 0
	_9 = *_6
	_10 = _9 < 10
	for _10 {
		_12 = *_7
		_13 = _12 < 100
		if _13 {
			_15 = *_6
			_16 = 3 * _15
			_17 = *_7
			_18 = _17 + _16
			*_7 = _18
		}
		_21 = *_6
		_22 = _21 + 1
		*_6 = _22
	}
	_24 = *_7
	return _24
}

Go source code after post-processing (using go-post and grind), foo.go:

package main

import "os"

func c_main(_0 int32, _1 **int8) int32 {
	v_7 := int32(0)
	for v_6 := int32(0); v_6 < 10; v_6++ {
		if v_7 < 100 {
			v_7 += 3 * v_6
		}
	}
	return v_7
}

func main() {
	ret := int(c_main(0, nil))
	os.Exit(ret)
}

Copy of original C source code for ease of comparison:

int main(int argc, char **argv) {
	int i, x;

	x = 0;
	for (i = 0; i < 10; i++) {
		if (x < 100) {
			x += 3*i;
		}
	}
	return x;
}

At least for this simple example, the decompiled Go source code produces the same result as the original C source code:

# Result produced by original C source code:
$ clang -o foo foo.c
$ ./foo ; echo $?
108

# Result produced by decompiled Go source code:
$ go run foo.go
exit status 108

from decomp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.