consensys / evmil Goto Github PK

A simple intermediate language for generating EVM bytecode.

License: Apache License 2.0

Rust 90.70% Assembly 9.30%

evmil's Introduction

Overview

This is a library and tool for working with EVM bytecode. The tool allows you to disassemble contracts into assembly language, and assemble them back again. The tool also supports a primitive intermediate language which can be compiled into bytecode.

Assembler / Disassembler

To illustrate the tool, we will first disassemble the bytecode contract 0x60006000511161000f5760016000525b. We can do this as follows:

evmil disassemble --code 0x60006000511161000f5760016000525b

This should produce the following output:

.code
        push 0x00
        push 0x00
        mload
        gt
        push 0x000f
        jumpi
        push 0x01
        push 0x00
        mstore
_0x000f:
        jumpdest

If we store this into a file test.asm, we can then assemble it back as follows:

evmil assemble test.asm

And we should see our original bytecode being output:

0x60006000511161000f5760016000525b

Finally, when writing assembly language we can use labels for simplicity. For example, the above could be rewritten as follows:

.code
        push 0x00
        push 0x00
        mload
        gt
        push lab
        jumpi
        push 0x01
        push 0x00
        mstore
lab:
        jumpdest

This just makes writing the assembly language a bit easier.

evmil's People

Contributors

Stargazers

Watchers

Forkers

pablitomilenio gooddaisy 0x4b16b2

evmil's Issues

Add assignment statement

Support Binary Operators

At this stage, the parser does not support binary operators.

Rename `Stepable` => `Stepper`

Implement Dependency Tracking

(see also #72)

The aim here is to alter the way that a Disassembly is calculated, such that we can identify where in the bytecode sequence a particular constant comes from. Necessary changes:

Provide a set concept which can be used to aggregate concrete values.
Add failing tests to illustrate the problem
Provide mechanism for a Stepper to store the pc value with the value being pushed.

Generalise the FixedPoint computation

Currently, in Disassembly::build we have a ad-hoc fixed point computation. This could be stripped out into a generic notion of fixed-point computation.

Fix Translation of Integer Constants

Currently, integer constants are always translated into 4 bytes. That clearly does not make sense.

Add `if e goto` statement

Add `return` and `revert`

Perhaps we should have succeed, revert and fail ?

Debugging Code which cannot be Disassembled

At the moment, there is no support to help when a give bytecode program cannot be disassembled. What happens is that the program just crashes, leaving very little useful information. Instead, we need to be able to know that we couldn't disassemble the program, and some way of figuring out where we encountered problems.

The example program I'm working with at the moment is:

6080604052600436106100345760003560e01c806311610c2514610039578063310bd74b14610043578063d4b8399214610063575b600080fd5b61004161008b565b005b34801561004f57600080fd5b5061004161005e366004610101565b6100da565b34801561006f57600080fd5b5061007960005481565b60405190815260200160405180910390f35b60005434111561009a57600080fd5b6000805434900380825590036100d85760405133904780156108fc02916000818181858888f193505050501580156100d6573d6000803e3d6000fd5b505b565b670de0b6b3a76400008111156100ef57600080fd5b600054156100fc57600080fd5b600055565b60006020828403121561011357600080fd5b503591905056fea26469706673582212207b57e8eca1da96d5c0491373554b699c55b7dfcbe6779ae889f253ae5f8366cd64736f6c63430008110033

Add more Instructions!

Add Array Length Operator

For memory, calldata, code but not storage.

Update for Execution Tests

Currently, the tests don't actually generate output or otherwise do things that we can test. We need to fix that.

Implement `evm::Stack` for `AbstractStack`

Add ArrayAccess Operator

memory[e], storage[e], calldata[e] and code[e].

Convert AST to SyntacticHeap

Convert the internal representation to use syntactic_heap.

Fix `LowerHex` impl for `w256`

This is still not right. Examples:

{:#04x} applied to e.g. 0x1 should give 0x01
{:04x} applied to e.g. 0x1 should give 0001
{:4x} applied to e.g. 0x1 should give 0x1

All of these fail.

Merge `Disassembly.blocks` and `Disassembly.contexts`

These could be merged together I think, and this would improve the API.

Add Disassembly Tests

Continue adding single instruction tests for missing instructions
Add more interesting block tests

Eventually, we want to add a large range of more aggressive tests. For example, to show that decompilation=>compilation gives back the same bytecode.

Improved translation of logical connectives

Logical connectives currently produce a 0 or 1 on the top of the stack. However, as we know, this is not always efficient when the goal is to branch (e.g. for if or assert). In such case, its better to branch directly.

Abstract Virtual Machine

So, I think it makes sense to have trait for an arbitrary EVM. This would include types for the various components. Something like this:

trait EVM {
  type Stack;
  type Memory;
  type Storage;
  ...  
}

Fix Integer / Hex Constants over 128bits

At the moment, the tool does not support literals which cannot fit inside a u128.

Bug in Semantics for `Swap`

The following in cfa.rs contains a bug

SWAP(n) => {
   let m = (*n - 1) as usize;
   let x = self.peek(m);
   let y = self.peek(0);
   self.set(0,x).set(m,y)
}

It should be just *n not *n-1.

Allow `evm::pc` to be an interval?

This reason for this is that, otherwise, we don't technically have a proper join operation. At the moment, attempting to join two states with different pc values will fail. Furthermore, for evm::BOTTOM the pc could be defined in terms of the BOTTOM interval.

Remove `AbstractWord` ?

This is actually not needed anymore I think. Basically, we can just use Interval directly without the need for this intermediary.

Merge Data Blocks

At the moment, data blocks are not merged together when they could be. For example. with the following code:

60806040526004361060265760003560e01c806311610c2514602b578063d4b8399214603a575b600080fd5b6038600080543490039055565b005b348015604557600080fd5b50604e60005481565b60405190815260200160405180910390f3fea26469706673582212202203cf3a6091cb220fc22ce5ccabf6859442b55c6905ac986a013909b4302b9164736f6c63430008110033

We generate the following:

...
0x00005d: SUB
0x00005e: SWAP(1)
0x00005f: RETURN
0x000060: 0xfe
0x000061: 0xa26469706673582212202203cf3a6091cb220fc22ce5ccabf6859442b55c6905ac986a013909b4302b9164736f6c634300
0x000092: 0x081100
0x000095: 0x33

Observe that there are four unmerged data blocks at the end.

Support Decompilation to EvmIL

Given the disassembler is now operational, it should be relatively straightforward (in most cases) to decompile back to the EvmIL. Therefore, this should be added.

Add Yul Parser

Add a parser for Yul!

Example Yul for a betting contract:

object "Betting_68" {
   code {
      {
         mstore(64, memoryguard(0x80))
         if iszero(lt(calldatasize(), 4))
         {
            let _1 := 0
            switch shr(224, calldataload(_1))
            case 0x11610c25 {
               if slt(add(calldatasize(), not(3)), _1) { revert(_1, _1) }
               let _2 := sload(_1)
               if gt(callvalue(), _2) { revert(_1, _1) }
               sstore(_1, sub(_2, callvalue()))
               if eq(_2, callvalue())
               {
                  let expr := selfbalance()
                  let _3 := _1
                  if iszero(expr) { _3 := 2300 }
                  if iszero(call(_3, caller(), expr, _1, _1, _1, _1))
                  {
                     let pos := mload(64)
                     returndatacopy(pos, _1, returndatasize())
                     revert(pos, returndatasize())
                  }
               }
               return(_1, _1)
            }
            case 0x310bd74b {
               if callvalue() { revert(_1, _1) }
               if slt(add(calldatasize(), not(3)), 32) { revert(_1, _1) }
               let value := calldataload(4)
               if gt(value, 0x0de0b6b3a7640000) { revert(_1, _1) }
               if iszero(iszero(sload(_1))) { revert(_1, _1) }
               sstore(_1, value)
               return(_1, _1)
            }
            case 0xd4b83992 {
               if callvalue() { revert(_1, _1) }
               if slt(add(calldatasize(), not(3)), _1) { revert(_1, _1) }
               let _4 := sload(_1)
               let memPos := mload(64)
               mstore(memPos, _4)
               return(memPos, 32)
            }
         }
         revert(0, 0)
      }
   }
}

Rework `Disassembler` => `AbstractExecution`

Change signature of `evm::Stack<T>`

Push the type parameter into being an associated type.

`disassemble` command should read file

Currently, the disassemble command reads input directly from the command-line, rather than reading the bytes in from a file named on the command line.

Add Concept of Syntactic Sugar

Calls to "syntactic sugar" could be distinguished with ! for example, like macro calls in Rust. Current items of syntactic sugar we have are:

assert e
succeeds e1, e2
returns e1, e2

Remove `PUSHL` Instruction

I think this would simplify the Instruction interface quite well. However, to make it work we would need some way to encode labels.

Potentially one option is to merge PUSHL into PUSH by adding a bool operand indicating whether or not it is pushing a jump destination.

Minimal Evm Decompiler

The goal here is to add a minimal (for now) decompiler from EVM bytecode back to the intermediate language. The intended rough pipeline is as follows:

Disassemble raw bytes into an Instruction stream, initially without PUSHL instructions.
Symbolic value flow analysis to determine which constants flow into jump instructions (and, hence, allow us to replace PUSH instructions with PUSHL instructions). This will also store the stack height at each point (initially, in a context-insensitive fashion).
Decompile instruction stream into a sequence of Terms.

Refactor `analysis` => `abstract`

They we can have e.g. abstract::Stack, etc.

Improve Abstract Stack in CfaState

At the moment, the abstract stack used in CfaState does not handle merge's as well as it could. In particular, it does not properly support states which may have varying stack heights. The method CfaState.len() supposedly returns the stack height at a given point. But, in fact, it returns a lower bound on the stack height at a given point and no information on the upper bound is available.

Therefore, we extend the abstract stack to support an arbitrary range of unknown values. Thus, we go from e.g. [0x1a,????] to (0..2)::[0x1a,????] where (0..2) represent the possible number of unknown values at the bottom.

`Instruction::data` should not fall thru

Currently, this instruction returns true for State.fallthru when it should not.

Add Testing Framework

This should be based around a set of input programs, and ensure a few things:

Every input program compiles to a specific bytecode string
Every input program can be decompiled from its bytecode string

And, it should of course cover the full range of primitives supported. In particular, generating call/return instructions at the IL level.

Fix `w256::LowerHex`

Incorrect Parsing Subtract

Currently, subtract parses as though its right-associative. See test_sub_03.

This will break outputs for other operators as well.

Add `goto` statement

Improve Disassembly API

This is not great at the moment. Issues include:

Cannot access underlying byte sequence.
No iterator for blocks
No iterator for instructions within a block
No support for backwards dataflow analysis

Rename `CfaState` => `AbstractEvm`

Implement Arithmetic Operations on `w256`

Support Functions

Currently, the control-flow analysis does not correctly account for the situation where we have an effective "return" statement. That is where there are multiple destinations for a JUMP statement. This also applies (in principle) to JUMPI.

Support Missing Bytecode Instructions

The semantics for the control-flow analysis is missing a lot of stuff, such as various instructions (e.g. CALLER).

Disassembling Fall-thru Blocks

There is a bug in the code for connecting blocks together. Specifically, if a block does not end in either a STOP, INVALID, RETURN, REVERT, JUMP then it must always fall thru to the next consecutive block. At the moment, this does not happen.

Fix `AbstractStack` impl for `evm::Stack.len()`

Unknown instruction encountered

The following bytecode produces a crash:

60806040526004361060265760003560e01c806311610c2514602b578063d4b8399214603a575b600080fd5b6038600080543490039055565b005b348015604557600080fd5b50604e60005481565b60405190815260200160405180910390f3fea26469706673582212202203cf3a6091cb220fc22ce5ccabf6859442b55c6905ac986a013909b4302b9164736f6c63430008110033

The issue seems to be that there is data in the contract which cannot be decoded.

`AbstractStack` should implement `JoinInto`

Move `lex` module out of EvmIL

This is a utility which should be provided somewhere else.