Giter Club home page Giter Club logo

evmil's Introduction

Overview

This is a library and tool for working with EVM bytecode. The tool allows you to disassemble contracts into assembly language, and assemble them back again. The tool also supports a primitive intermediate language which can be compiled into bytecode.

Assembler / Disassembler

To illustrate the tool, we will first disassemble the bytecode contract 0x60006000511161000f5760016000525b. We can do this as follows:

evmil disassemble --code 0x60006000511161000f5760016000525b

This should produce the following output:

.code
        push 0x00
        push 0x00
        mload
        gt
        push 0x000f
        jumpi
        push 0x01
        push 0x00
        mstore
_0x000f:
        jumpdest

If we store this into a file test.asm, we can then assemble it back as follows:

evmil assemble test.asm

And we should see our original bytecode being output:

0x60006000511161000f5760016000525b

Finally, when writing assembly language we can use labels for simplicity. For example, the above could be rewritten as follows:

.code
        push 0x00
        push 0x00
        mload
        gt
        push lab
        jumpi
        push 0x01
        push 0x00
        mstore
lab:
        jumpdest

This just makes writing the assembly language a bit easier.

evmil's People

Contributors

davepearce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

evmil's Issues

Implement Dependency Tracking

(see also #72)

The aim here is to alter the way that a Disassembly is calculated, such that we can identify where in the bytecode sequence a particular constant comes from. Necessary changes:

  • Provide a set concept which can be used to aggregate concrete values.
  • Add failing tests to illustrate the problem
  • Provide mechanism for a Stepper to store the pc value with the value being pushed.

Generalise the FixedPoint computation

Currently, in Disassembly::build we have a ad-hoc fixed point computation. This could be stripped out into a generic notion of fixed-point computation.

Debugging Code which cannot be Disassembled

At the moment, there is no support to help when a give bytecode program cannot be disassembled. What happens is that the program just crashes, leaving very little useful information. Instead, we need to be able to know that we couldn't disassemble the program, and some way of figuring out where we encountered problems.

The example program I'm working with at the moment is:

6080604052600436106100345760003560e01c806311610c2514610039578063310bd74b14610043578063d4b8399214610063575b600080fd5b61004161008b565b005b34801561004f57600080fd5b5061004161005e366004610101565b6100da565b34801561006f57600080fd5b5061007960005481565b60405190815260200160405180910390f35b60005434111561009a57600080fd5b6000805434900380825590036100d85760405133904780156108fc02916000818181858888f193505050501580156100d6573d6000803e3d6000fd5b505b565b670de0b6b3a76400008111156100ef57600080fd5b600054156100fc57600080fd5b600055565b60006020828403121561011357600080fd5b503591905056fea26469706673582212207b57e8eca1da96d5c0491373554b699c55b7dfcbe6779ae889f253ae5f8366cd64736f6c63430008110033

Update for Execution Tests

Currently, the tests don't actually generate output or otherwise do things that we can test. We need to fix that.

Fix `LowerHex` impl for `w256`

This is still not right. Examples:

  • {:#04x} applied to e.g. 0x1 should give 0x01
  • {:04x} applied to e.g. 0x1 should give 0001
  • {:4x} applied to e.g. 0x1 should give 0x1

All of these fail.

Add Disassembly Tests

  • Continue adding single instruction tests for missing instructions
  • Add more interesting block tests

Eventually, we want to add a large range of more aggressive tests. For example, to show that decompilation=>compilation gives back the same bytecode.

Improved translation of logical connectives

Logical connectives currently produce a 0 or 1 on the top of the stack. However, as we know, this is not always efficient when the goal is to branch (e.g. for if or assert). In such case, its better to branch directly.

Abstract Virtual Machine

So, I think it makes sense to have trait for an arbitrary EVM. This would include types for the various components. Something like this:

trait EVM {
  type Stack;
  type Memory;
  type Storage;
  ...  
}

Bug in Semantics for `Swap`

The following in cfa.rs contains a bug

SWAP(n) => {
   let m = (*n - 1) as usize;
   let x = self.peek(m);
   let y = self.peek(0);
   self.set(0,x).set(m,y)
}

It should be just *n not *n-1.

Allow `evm::pc` to be an interval?

This reason for this is that, otherwise, we don't technically have a proper join operation. At the moment, attempting to join two states with different pc values will fail. Furthermore, for evm::BOTTOM the pc could be defined in terms of the BOTTOM interval.

Remove `AbstractWord` ?

This is actually not needed anymore I think. Basically, we can just use Interval directly without the need for this intermediary.

Merge Data Blocks

At the moment, data blocks are not merged together when they could be. For example. with the following code:

60806040526004361060265760003560e01c806311610c2514602b578063d4b8399214603a575b600080fd5b6038600080543490039055565b005b348015604557600080fd5b50604e60005481565b60405190815260200160405180910390f3fea26469706673582212202203cf3a6091cb220fc22ce5ccabf6859442b55c6905ac986a013909b4302b9164736f6c63430008110033

We generate the following:

...
0x00005d: SUB
0x00005e: SWAP(1)
0x00005f: RETURN
0x000060: 0xfe
0x000061: 0xa26469706673582212202203cf3a6091cb220fc22ce5ccabf6859442b55c6905ac986a013909b4302b9164736f6c634300
0x000092: 0x081100
0x000095: 0x33

Observe that there are four unmerged data blocks at the end.

Support Decompilation to EvmIL

Given the disassembler is now operational, it should be relatively straightforward (in most cases) to decompile back to the EvmIL. Therefore, this should be added.

Add Yul Parser

Add a parser for Yul!

Example Yul for a betting contract:

object "Betting_68" {
   code {
      {
         mstore(64, memoryguard(0x80))
         if iszero(lt(calldatasize(), 4))
         {
            let _1 := 0
            switch shr(224, calldataload(_1))
            case 0x11610c25 {
               if slt(add(calldatasize(), not(3)), _1) { revert(_1, _1) }
               let _2 := sload(_1)
               if gt(callvalue(), _2) { revert(_1, _1) }
               sstore(_1, sub(_2, callvalue()))
               if eq(_2, callvalue())
               {
                  let expr := selfbalance()
                  let _3 := _1
                  if iszero(expr) { _3 := 2300 }
                  if iszero(call(_3, caller(), expr, _1, _1, _1, _1))
                  {
                     let pos := mload(64)
                     returndatacopy(pos, _1, returndatasize())
                     revert(pos, returndatasize())
                  }
               }
               return(_1, _1)
            }
            case 0x310bd74b {
               if callvalue() { revert(_1, _1) }
               if slt(add(calldatasize(), not(3)), 32) { revert(_1, _1) }
               let value := calldataload(4)
               if gt(value, 0x0de0b6b3a7640000) { revert(_1, _1) }
               if iszero(iszero(sload(_1))) { revert(_1, _1) }
               sstore(_1, value)
               return(_1, _1)
            }
            case 0xd4b83992 {
               if callvalue() { revert(_1, _1) }
               if slt(add(calldatasize(), not(3)), _1) { revert(_1, _1) }
               let _4 := sload(_1)
               let memPos := mload(64)
               mstore(memPos, _4)
               return(memPos, 32)
            }
         }
         revert(0, 0)
      }
   }
}

`disassemble` command should read file

Currently, the disassemble command reads input directly from the command-line, rather than reading the bytes in from a file named on the command line.

Add Concept of Syntactic Sugar

Calls to "syntactic sugar" could be distinguished with ! for example, like macro calls in Rust. Current items of syntactic sugar we have are:

  • assert e
  • succeeds e1, e2
  • returns e1, e2

Remove `PUSHL` Instruction

I think this would simplify the Instruction interface quite well. However, to make it work we would need some way to encode labels.

Potentially one option is to merge PUSHL into PUSH by adding a bool operand indicating whether or not it is pushing a jump destination.

Minimal Evm Decompiler

The goal here is to add a minimal (for now) decompiler from EVM bytecode back to the intermediate language. The intended rough pipeline is as follows:

  • Disassemble raw bytes into an Instruction stream, initially without PUSHL instructions.
  • Symbolic value flow analysis to determine which constants flow into jump instructions (and, hence, allow us to replace PUSH instructions with PUSHL instructions). This will also store the stack height at each point (initially, in a context-insensitive fashion).
  • Decompile instruction stream into a sequence of Terms.

Improve Abstract Stack in CfaState

At the moment, the abstract stack used in CfaState does not handle merge's as well as it could. In particular, it does not properly support states which may have varying stack heights. The method CfaState.len() supposedly returns the stack height at a given point. But, in fact, it returns a lower bound on the stack height at a given point and no information on the upper bound is available.

Therefore, we extend the abstract stack to support an arbitrary range of unknown values. Thus, we go from e.g. [0x1a,????] to (0..2)::[0x1a,????] where (0..2) represent the possible number of unknown values at the bottom.

Add Testing Framework

This should be based around a set of input programs, and ensure a few things:

  • Every input program compiles to a specific bytecode string
  • Every input program can be decompiled from its bytecode string

And, it should of course cover the full range of primitives supported. In particular, generating call/return instructions at the IL level.

Incorrect Parsing Subtract

Currently, subtract parses as though its right-associative. See test_sub_03.

This will break outputs for other operators as well.

Improve Disassembly API

This is not great at the moment. Issues include:

  • Cannot access underlying byte sequence.
  • No iterator for blocks
  • No iterator for instructions within a block
  • No support for backwards dataflow analysis

Support Functions

Currently, the control-flow analysis does not correctly account for the situation where we have an effective "return" statement. That is where there are multiple destinations for a JUMP statement. This also applies (in principle) to JUMPI.

Disassembling Fall-thru Blocks

There is a bug in the code for connecting blocks together. Specifically, if a block does not end in either a STOP, INVALID, RETURN, REVERT, JUMP then it must always fall thru to the next consecutive block. At the moment, this does not happen.

Unknown instruction encountered

The following bytecode produces a crash:

60806040526004361060265760003560e01c806311610c2514602b578063d4b8399214603a575b600080fd5b6038600080543490039055565b005b348015604557600080fd5b50604e60005481565b60405190815260200160405180910390f3fea26469706673582212202203cf3a6091cb220fc22ce5ccabf6859442b55c6905ac986a013909b4302b9164736f6c63430008110033

The issue seems to be that there is data in the contract which cannot be decoded.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.