Giter Club home page Giter Club logo

compiler's People

Contributors

bitwalker avatar greenhat avatar grjte avatar jjcnn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

greenhat gmh5225

compiler's Issues

RFC: Miden SDK Outline

This issue is just an initial sketch of my proposal for how to approach development of the Miden SDK tooling:

  • Determine whether WebAssembly Interface Types (WIT) is a good solution for specifying function types in Miden. 1
  • Set up the Cargo extension to compile two variants of a crate, one which is compiled to wasm32-unknown-unknown (last), and the other which is compiled to the host target, both as dylibs. 2
  • As a short-term hack that will let us experiment with this approach as a solution, rather than add code to the compiler to work with the WIT spec, we will consume it directly in the Cargo extension to emit Rust code that will be compiled along with the rest of the user-written code in the crate for the wasm32-unknown-unknown target. 3
  • Finally, implement support in the compiler's Wasm frontend, to expect a WIT specification to be provided along with the Wasm module, which will be used to enrich the function signatures in the IR type system when translating from Wasm to Miden IR. 4

I'm about to crash here, so this is basically just flushing this stuff out of my brain, hopefully its useful to you @greenhat until I get a chance to clean it up a bit and expand on some of these points.

Footnotes

  1. A good solution in this case means that it can be used to represent high-level Rust (and other language) types using a common format (preferably in a way that lets us map those high-level types to our IR type system), which provides us the ability to query the types, sizes, and alignment of arguments and return values for a given function. Additionally, we want to be able to generate code to construct this specification in-memory, as well as parse/load it from a textual format, and use the in-memory spec to query things such as the number of parameters, what type are those parameters, what are the size and alignment, etc.

  2. Using conditional compilation, the latter target will contain a function which, when loaded dynamically and called, will return the WIT specification for the account/note functions in that crate. That spec will be used by the compiler to "enrich" the wasm32-unknown-unknown artifact with type information from the original program, enabling the compiler to handle the calling convention details.

  3. The generated code can simply be placed in a build directory for the crate and included; or placed in the source tree, either approach works. The Rust code we're generating here are wrappers which handle the gritty details of the calling convention (a wrapper for each public contract function defined in the crate, which delegates to the actual function after decoding the arguments using the calling convention rules; and a wrapper for external contract functions, which encodes the high-level Rust types using the calling convention rules, and then invoking the external function)

  4. Simultaneously, we'll implement the calling convention handling for call in the compiler backend. Once complete, we can rip out the hack from the previous stage

Code style adherence check on CI

Following #61 (comment)

Motivation

We don't have to spend time checking the code style during the code review process.

Suggested solution

Delegate code style checks to CI.

Use rustfmt in a separate CI job to check the code formatting and fail if it does not match our style defined in rustfmt.toml in the repo root. Here is an example of a CI job that does that - https://github.com/ergoplatform/sigma-rust/blob/a360b255f8780d3ae9e6da44266c774e6e4055c5/.github/workflows/ci.yml#L177-L190

rustfmt.toml:

# see https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=import#imports_granularity
import_granularity = "Module"

Provide developers with local tools to author the code according to our style guide.

rust-analyzer uses rustfmt to format the code on save and can be additionally configured to introduce new imports with a given granularity. Each editor has its way of configuring the rust-analyzer. For VSCode, the option for import granularity is rust-analyzer.imports.granularity.group (in our case, set to module).
https://rust-analyzer.github.io/manual.html#auto-import

Cargo extension for compiling Rust code to MASM

In the first version of the extension, we will support only library targets because #59 is a blocker for app targets.

Commands:

  • Create a new cargo project. Set #no_std required stuff in lib.rs (panic handler, etc.) and release profile options (size optimization, panic=abort, etc.)
  • Compile the cargo project to MASM.

Translate parsed Wasm component to IR

Consider splitting into the following stages:

Translation of a Wasm component without imports and using only primitive types

Translate a Wasm component that doesn't have imports and uses only primitive types. Think of an simple add function as an example.

Translation. Simple imports support

Translate a simple Wasm component that has imports but still uses only primitive types. Think of the Fibonacci function that uses an imported add function.

Translation. Complex component support. Stage 1. (basic-wallet example)

Translate a basic-wallet example) that has a pretty complex internal architecture and using record and variant Wasm CM types.

Translation. Complex component support. Stage 2. (p2id-note example)

Translate a p2id-note example that has multiple imports and is using list type in the interface.

Translation. Complete complex types support

Implement the following Wasm CM types support:

  • enum
  • result
  • option
  • flags

Failing instruction tests in the integration suite

The failing instruction tests grouped by the reason for failure:

64-bit ops are not implemented in the codegen

// test_bool_op!(ge, >=, u64);
// test_bool_op!(ge, >=, i64);
// test_bool_op!(gt, >, u64);
// test_bool_op!(gt, >, i64);
// test_bool_op!(le, <=, u64);
// test_bool_op!(le, <=, i64);
// test_bool_op!(lt, <, u64);
// test_bool_op!(lt, <, i64);
// test_int_op!(add, +, u64);
// test_int_op!(add, +, i64);
// test_int_op!(sub, -, u64);
// test_int_op!(sub, -, i64);
// test_int_op!(mul, *, u64);
// test_int_op!(mul, *, i64);
// test_int_op!(div, /, u64);
// test_int_op!(div, /, i64);
// test_int_op!(rem, %, u64);
// test_int_op!(rem, %, i64);
// test_unary_op!(neg, -, u64);
// test_unary_op!(neg, -, i64);
// test_unary_op!(not, !, u64);
// test_unary_op!(not, !, i64);
// test_int_op!(shl, <<, u64);
// test_int_op!(shl, <<, i64);
// test_int_op!(shr, >>, u64);
// test_int_op!(shr, >>, i64);
// test_unary_op!(neg, -, i64);

MASM compilation error (missing import for intrinsic)

Might be fixed by #67

//
// Comparison ops
//
// test_bool_op!(ge, >=, i32);
// test_bool_op!(ge, >=, i16);
// test_bool_op!(ge, >=, i8);
//
// test_bool_op!(gt, >, i32);
// test_bool_op!(gt, >, i16);
// test_bool_op!(gt, >, i8);
//
// test_bool_op!(le, <=, i32);
// test_bool_op!(le, <=, i16);
// test_bool_op!(le, <=, i8);
//
// test_bool_op!(lt, <, i32);
// test_bool_op!(lt, <, i16);
// test_bool_op!(lt, <, i8);
//
// Arithmetic ops
//
// test_int_op!(mul, *, u32);
// test_int_op!(mul, *, u16);
// test_int_op!(mul, *, u8);
// test_int_op!(mul, *, i32);
// test_int_op!(mul, *, i16);
// test_int_op!(mul, *, i8);
//
// Bitwise ops
//
// test_int_op!(shr, >>, i8);
// test_int_op!(shr, >>, i16);
// test_int_op!(shr, >>, i32);
//
// test_unary_op!(not, !, u64);
// test_unary_op!(not, !, i64);

MASM compilation error: missing import for stdlib

Might be fixed by #67

// test_int_op!(and, &, u64);
// test_int_op!(and, &, i64);
// test_int_op!(or, |, u64);
// test_int_op!(or, |, i64);
// test_int_op!(xor, ^, u64);
// test_int_op!(xor, ^, i64);

Due to #56

// test_func_two_arg!(min, core::cmp::min, i32, i32, i32);
// test_func_two_arg!(min, core::cmp::min, u32, u32, u32);
// test_func_two_arg!(min, core::cmp::min, u8, u8, u8);
// test_func_two_arg!(max, core::cmp::max, u8, u8, u8);
// ... other types

Due to the linked core::panicking

  1. Wrap in the cargo project and build with a custom core crate to avoid linking core::panicking
  2. Make a macro that filters out division by zero
// test_int_op!(div, /, u32);
// ...
// more typer in tests for div, rem,

Wasm CM support in cargo extension

New project command

Should create a Wasm CM Rust project, so the template should be updated:

  • add WIT file
  • set Wasm CM metadata in Cargo.toml

Dependencies

We need to figure out a way to provide at least a Miden SDK WIT file as a dependency.

Build project command

Start by calling the cargo_component to build the project and switch to use it as a library when our needs grow. We probably would like to avoid forking it but rather building on top of it.

Implement clz/ctz instructions

These functions count the number of leading/trailing zeroes in the binary representation of an integral type, and are often used when using certain integer types as bitsets. In particular, these are used in the allocators that Rust programs use when compiled to Wasm, so this will be a blocker for compiling any Rust/Wasm module that uses the alloc crate (i.e. anything that uses Box, Vec, etc.). See #22

This is the tracking issue for the implementation of these instructions, which has already been started, but was on hold while other tasks were being completed.

Wasm component model (CM) support in Wasm frontend

Implementation details

Translate the Wasm CM world into the IR Module. Since a non-trivial Wasm component has a complex nested architecture (see basic-wallet example that has 3 components, 4 component instances, 3 core modules and 5 core module instances) it makes sense for the frontend to encapsulate all this complexity and inline everything to produce a single IR Module.
The Wasm CM export and import top-level directives + canonical definitions (canon) to be translated and stored in the IR Module for every imported(external) and exported IR module function.

Wasm CM types are translated to IR Type variants. Add corresponding IR Type variants for the missing Wasm CM types (list, variant, enum, result, option).

Here is how the IR Type variants for list and variant might look like:

pub enum Type {
    ...
    /// An enum variant type
    Variant(VariantType),
    /// A list of values of the same type
    List(Box<Type>),
}

/// Variants are close to Rust `enum` declarations where a value is one of many
/// cases and each case has a unique name and an optional payload associated
/// with it.
#[derive(Clone, Hash, Eq, PartialEq, Debug)]
pub struct VariantType {
    /// The representation to use for this type
    pub(crate) repr: TypeRepr,
    /// The computed size of this struct
    pub(crate) size: u32,
    /// The list of cases that this variant can take.
    pub cases: Box<[VariantCase]>,
}

/// One case of a `variant` type which contains the name of the variant as well
/// as the payload.
#[derive(Clone, Hash, Eq, PartialEq, Debug)]
pub struct VariantCase {
    /// Name of the variant, unique amongst all cases in a variant.
    pub name: String,
    /// Optional type associated with this payload.
    pub ty: Option<Type>,
}

Implementation plan

To ease the review process and get the feedback early, I plan to split the Wasm CM support implementation into the following stages/PR sets:

Tasks

  1. frontend wasm
    greenhat
  2. frontend wasm
    greenhat
  3. frontend wasm
    greenhat

Transform structured control flow to high-level MASM control operators

Miden IR is a typical SSA IR which supports arbitrary structured control flow, this is to make lowering from a variety of high-level languages easier and provide a useful substrate on which to perform optimizations and transformations. MASM however does not support arbitrary structured control flow via jumps/branches, and instead only supports a small set of higher-level control operators (if, while, and repeat). In order to lower from Miden IR to Miden Assembly, we need a transformation pass which restructures the IR in a way that preserves the semantics of the high-level source code, but which is amenable to translation to Miden Assembly.

This pass will build on the loop/dominator tree analyses to recognize complex control flow, and convert it to an equivalent mix of MASM control operators. This may require some additional work on the IR to allow Instruction to contain regions with blocks, similar to MLIR.

Implement support for signed 32-bit operations

Miden provides native u32 support, but Wasm, and other consumers of the frontend will require at least minimal support for signed 32-bit ops (i.e. i32), and possibly at other bit widths as well. For basic unchecked/wrapping add/sub/mul, the native u32 ops should work for the signed version as well, since both use the two's complement binary encoding. However, for checked/overflowing variants, and for things like signed division, we must implement support for that separately.

I'll update this later with the specific instructions we want to support first.

Implement useful, but as of yet unsupported, Wasm instructions

The following is a list of instructions we would like to add support for in the codegen backend when we get a chance. Unless explicitly called out, these instructions are exposed in the IR, and translated from Wasm already, but compilation will fail due to lack of support in the backend.

Tasks

Assume that unless otherwise stated, all type variants of the above instructions are desired.

`InlineBlocks` pass crashes with `(signal: 11, SIGSEGV: invalid memory reference)`

Discovered in #84 in test_rust_comp::rust_array() test.

Un-ignore the test to reproduce.

The IR passes are defined in

let mut analyses = AnalysisManager::new();
let mut rewrites = RewriteSet::default();
rewrites.push(ModuleRewritePassAdapter::new(
transforms::SplitCriticalEdges,
));
rewrites.push(ModuleRewritePassAdapter::new(transforms::Treeify));
rewrites.push(ModuleRewritePassAdapter::new(transforms::InlineBlocks));
rewrites
.apply(&mut ir_module, &mut analyses, session)
.expect("Failed to apply rewrites");
. The crash goes away if InlineBlocks pass is omitted(commented out).

I've also reproduced it with rustc 1.74.1 and the latest nightly. I suspect the unsafe code that operates the intrusive_collections.

FWIW, here the Wasm:

(module
  (type (;0;) (func (param i32 i32) (result i32)))
  (type (;1;) (func (result i32)))
  (func $sum_arr (;0;) (type 0) (param i32 i32) (result i32)
    (local i32)
    i32.const 0
    local.set 2
    block ;; label = @1
      local.get 1
      i32.eqz
      br_if 0 (;@1;)
      loop ;; label = @2
        local.get 0
        i32.load
        local.get 2
        i32.add
        local.set 2
        local.get 0
        i32.const 4
        i32.add
        local.set 0
        local.get 1
        i32.const -1
        i32.add
        local.tee 1
        br_if 0 (;@2;)
      end
    end
    local.get 2
  )
  (func $__main (;1;) (type 1) (result i32)
    i32.const 1048576
    i32.const 5
    call $sum_arr
    i32.const 1048596
    i32.const 5
    call $sum_arr
    i32.add
  )
  (memory (;0;) 17)
  (global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
  (global (;1;) i32 i32.const 1048616)
  (global (;2;) i32 i32.const 1048624)
  (export "memory" (memory 0))
  (export "sum_arr" (func $sum_arr))
  (export "__main" (func $__main))
  (export "__data_end" (global 1))
  (export "__heap_base" (global 2))
  (data $.rodata (;0;) (i32.const 1048576) "\01\00\00\00\02\00\00\00\03\00\00\00\04\00\00\00\05\00\00\00\06\00\00\00\07\00\00\00\08\00\00\00\09\00\00\00\0a\00\00\00")
)

Missing Miden stdlib import in the `Module` when emitting stdlib's function calls in `stackify` pass

To reproduce uncomment test_int_op!(and, &, u64); test in instructions.rs in #49

For 1u64 & 1u64 Rust code is compiled into the following IR:

module noname

const $0 = 0x00100000;

global external @__stack_pointer : i32 = $0 { id = 0 };
global external @gv1 : i32 = $0 { id = 1 };
global external @gv2 : i32 = $0 { id = 2 };

pub fn entrypoint(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
    v3 = band v1, v0 : i64;
    br block1(v3);

block1(v2: i64):
    ret v2;
}

Which translates into the following MASM:

export.entrypoint
  exec.checked_and

end
begin
    exec.entrypoint
end

Which Miden Assembler cannot compile.

Repo naming and structure

The ultimate compilation pipeline we are trying to achieve looks something like this:

Rust -> WASM -> MidenIR -> MASM

The code in this repo handles the last 3 steps (WASM -> MASM). This leads to a couple of questions:

  1. Where should Rust -> WASM step live and what components could it include?
  2. Is miden-ir really the right name for this repo?

It seems to me that the answer to the second question is no. This repo contains much more than just IR - so, we should rename it to something more relevant. miden-compiler is one option - but happy to hear other suggestions.

As for the first question, it seems to me that this should live in a separate repo. That repo could probably consist of several crates covering the following functionality:

  1. Cargo extension which would use crates from this repo internally.
  2. A set of macros we'd need to shape the Rust code.
  3. A set of wrapper libraries around Miden stdlib and midenlib libraries (ideally, with a tool which can generate these wrappers automatically).
  4. Probably something else I'm not thinking of.

As for the name for this repo (and associated crates) - I think it could be something unique (i.e., doesn't have to include "miden" in the name). I don't have great ideas on this yet - so, any suggestions are welcome. A couple of examples from other projects:

  • Aribtrum has Stylus
  • Fuel Labs has Forc

One other question: should this new repo try to support general compilation from Rust to MASM or be focused specifically on the rollup? My thinking is that we should focus it on the rollup. This would reduce the scope of what we need to do and will help us get to Miden smart contracts in Rust faster.

Implement compiler executable

This issue is the final piece of #4 to have a complete compiler toolchain that can be used independently.

This executable basically puts together the following pieces of the compiler that we have built:

Prereqs

  1. frontend wasm
  2. 4 of 4
    frontend
    jjcnn
  3. feature
    bitwalker
  4. codegen
  5. codegen feature
    bitwalker

In essence, the frontend of the compiler takes as input either a Wasm binary (.wasm), Wasm text format (.wat), or the HIR textual format (.hir) so that we can store/load our own IR. Regardless of input format, the frontend translates to the in-memory HIR representation, and then handles a request to compile or eval that input in some way. The following is a set of capabilities I expect the compiler to provide (both when used as a library and as an executable):

Capabilities

Implement parser for the IR text format

For use in testing, it's valuable to have a parser for the IR text format which we currently pretty-print for use in review/diagnosing issues.

  • Should build on miden-diagnostics and miden-parsing
  • Should use lalrpop for the parser
  • May use a dedicated AST structure for easier integration with the parser, but ideally should parse directly to the IR, since a standalone AST has no use on its own, and it should be feasible to parse in to the IR data structure anyway.
  • It's fine to update the pretty-printed format itself to ensure we have all the data needed to reconstruct the IR in-memory, it should be fairly complete at this point, but it may be necessary to enrich the format further to preserve more metadata.

Tasks

Some implementation notes/recommendations:

You can refer to the [AirScript parser crate] to get a feel for how to define a LALRPOP grammar, and use it to construct the AST along with diagnostics, source spans, etc. I wouldn't focus too much on mimicking it's specific implementation, since it's use case is different, but it should provide some useful guidance/hints on how to approach certain problems. The grammar, as the name implies, drives a LR(1)/LALR parser generator, so it can be occasionally tricky to handle certain semi-ambiguous parts of a language. The IR syntax is pretty explicit though, and doesn't have a lot of fancy expression handling and the like, so that shouldn't be an issue. Let me know if you hit any shift/reduce errors though.

For translating the AST to the IR, I would implement it as a Pass, using a simple recursive traversal, basically the following:

use std::collections::VecDeque;
use miden_diagnostics::DiagnosticsHandler;
use miden_hir_pass::Pass;
use smallvec::SmallVec;
use rustc_hash::{FxHashMap, FxHashSet};

use crate::{self as hir, ast, ModuleBuilder, ModuleFunctionBuilder};

pub struct AstToIr<'a> {
    /// Used for emitting diagnostics
    diagnostics: &'a DiagnosticsHandler,
    /// Used to map block identifiers parsed into the AST, to their corresponding identifiers in the IR
    block_map: FxHashMap<ast::BlockId, hir::Block>,
    /// Used to map value identifiers parsed into the AST, to their corresponding identifiers in the IR
    value_map: FxHashMap<ast::ValueId, hir::Value>,
}
impl<'a> AstToIr<'a> {
    pub fn new(diagnostics: &'a DiagnosticsHandler) -> Self {
        Self {
            diagnostics,
            block_map: Default::default(),
            value_map: Default::default()
        }
    }
}
impl<'p> Pass for AstToIr<'p> {
    type Input<'a> = ast::Module;
    type Output<'a> = Box<hir::Module>;
    type Error = anyhow::Error;

    fn run<'a>(&mut self, mut module: Self::Input<'a>) -> Result<Self::Output<'a>, Self::Error> {
       let mut builder = ModuleBuilder::new(module.name);
       builder.with_span(module.span());

       for global in module.globals.into_iter() {
           // Here I'm assuming the AST uses our Type and Linkage types
           let gv = builder.declare_global_variable(global.name, global.ty, global.linkage, global.span())?;
           // Globals don't necessarily have an initializer
           if let Some(init) = global.initializer {
              builder.set_global_initializer(gv, init);
           }
       }

       for function in module.functions.into_iter() {
           // Here I'm assuming the AST uses our Signature type
           let mut fb = builder.build_function(function.name, function.signature, function.span())?;
           self.translate_function(&mut fb)?;
       }

       Ok(builder.build())
    }
}
impl<'a> AstToIr<'a> {
    fn translate_function(&mut self, builder: &mut ModuleFunctionBuilder, function: ast::Function) -> anyhow::Result<()> {
       // We need to visit all blocks to create the corresponding IR block before we do the translation
       for (i, block) in function.blocks.iter().enumerate() {
           // Here I'm assuming the first block in the AST is always the entry block, which should be true.
           let block_id = if i == 0 {
               let id = builder.entry_block();
               // The entry block is already populated with parameters based on the function signature,
               // we just need to map the identifiers in the AST to their IR equivalents
               let values = builder.data_flow_graph().block_params(id);
               assert_eq!(values.len(), block.params.len());
               for (param, value) in block.params.iter().zip(values.iter()) {
                   self.value_map.insert(param.id, value);
               }
               id
           } else {
               let id = builder.create_block();
               // For all other blocks, we need to alter the block parameter list
               for param in block.params.iter() {
                   // Here I'm assuming we're using our Type enum in the AST
                   let value = builder.append_block_param(id, param.ty.clone(), param.span());
                   self.value_map.insert(param.id, value);
               }
               id
           };
           // Make sure we map the block id in the AST to the IR block
           self.block_map.insert(block.id, block_id);
       }

       // Translate the body of the function by walking the control flow graph,
       // i.e. we translate the entry block, then follow control flow instructions
       // to the next unvisited block, and so on, until all blocks are visited. We
       // discard unvisited blocks, as those are by definition dead code.
       let mut visited = FxHashSet::<ast::BlockId>::default();
       let mut worklist = VecDeque::<ast::BlockId>::default();
       worklist.push_back(function.blocks[0].id);

       while let Some(ast_block) = worklist.pop_front() {
           if !visited.insert(ast_block) {
               continue;
           }
           self.translate_block(builder, &function.blocks[ast_block], &mut worklist)?;
       }

       // We're done
       builder.build(self.diagnostics).map(|_| ())
    }

    fn translate_block(&mut self, builder: &mut ModuleFunctionBuilder, block: &ast::Block, worklist: &mut VecDeque<ast::BlockId>) -> anyhow::Result<()> {
       let block_id = self.block_map[block.id];
       builder.switch_to_block(block_id);

       for op in block.ops.iter() {
           let span = op.span();
           match op {
               // This is an example of how to handle control flow instructions
               ast::Instruction::Br(ast::Br { dest, args, .. }) => {
                   // Make sure we visit the destination block next
                   worklist.push_back(*dest);
                   // Map the instruction data
                   let dest = self.block_map[dest];
                   let args = args.iter().map(|arg| self.value_map[arg]).collect::<SmallVec<[hir::Value; 2]>>();
                   builder.ins().br(dest, &args, span);
               }
               // This is an example of how to handle instructions with results
               ast::Instruction::Add(ast::Add { lhs, rhs, output, .. }) => {
                   // We're able to assume that values are already defined because our traversal enforces
                   // that we visit predecessors of this instruction before the instruction itself.
                   let lhs = self.value_map[lhs];
                   let rhs = self.value_map[rhs];
                   let result = builder.ins().add(lhs, rhs, span);
                   self.value_map.insert(*output, result);
               }
               _ => unimplemented!(),
           }
       }

       Ok(())
    }
}

That's the broad strokes, but obviously a lot of details are left out. If you have any questions, or find any issues with the above approach, let me know and we can discuss alternatives.

PoC account code compilation (WIT version)

Following the discussion at #28 (comment)

The goal is to try to compile the following code to the MASM below and see what parts we are missing:

use miden::{
    account,
    asset::Asset,
    note::Recipient,
    tx,
};

#[miden_account]
pub struct MyWallet;

#[miden_account]
impl MyWallet {
    pub fn recieve_asset(&mut self, asset: Asset) {
        account::add_asset(asset);
    }

    pub fn send_asset(&mut self, asset: Asset, recipient: Recipient) {
        account::remove_asset(asset);
        tx::create_note(asset, recipient);
    }
}

The above code is expected to be compiled close to the following (manually written) code:

use.miden::sat::account
use.miden::sat::tx

#! Adds the provided asset to the current account.
export.receive_asset
    exec.account::add_asset
    padw swapw dropw
end

#! Creates a note which sends the specified asset out of the current account 
#! to the specified recipient.
export.send_asset.1
    exec.account::remove_asset
    # => [ASSET, tag, RECIPIENT, ...]

    # insert 8 ZEROs into the stack right after recipient; we temporarily store one of the
    # elements of ASSET in memory to make stack manipulation easier
    push.0 swap loc_store.0 padw push.0.0.0 swapdw loc_load.0
    # => [ASSET, tag, RECIPIENT, ZERO, ZERO, ...]

    exec.tx::create_note
    # => [note_ptr, ZERO, ZERO, ...]
end

The things to check are:

  • Mark the MyWallet struct as a Miden account (export marker trait in WIT vs. doc comment attribute). Keep in mind that helper methods exported in the account are to be execd in notes/tx scripts.
  • Try to define the Asset as a resource in WIT;

EDIT: updated to reflect the switch to WIT.

Implement liveness analysis

We need liveness analysis for codegen, as well as a "register allocator"-like pass over the IR with it to generate appropriate operand stack management code (with spills to locals). The latter is tightly bound to liveness analysis, and is the main purpose of it in the code generator, so it is quite important for generating decent Miden assembly.

  • Implement liveness analysis
  • "Register" allocator based on that analysis, but with a modified set of rules tailored towards the operand stack rather than a register file. The goal here is to keep things on the operand stack rather than in locals, with a minimal amount of shuffling things around on the operand stack - similar to how a register allocator would prefer an allocation scheme with a minimum amount of register-to-register moves.
  • Update codegen to use the above analyses

Check for clippy warnings on CI

It's hard to use a clippy when there are a lot (65) of warnings in the whole workspace. I suggest we add a CI job that will fail on clippy warnings. Although I find almost all of the warnings useful we can disable the ones we don't want.

Implement basic lowering to MASM

In order to sketch out the backend of the compiler pipeline, we need to implement lowering to Miden Assembly. To begin with, we will lower IR that is free of non-trivial control flow (i.e. anything other than simple conditional branches). This will provide a good base on which further work can be built.

Tracking issue: Implement minimum viable end-to-end compiler pipeline

Goal(s)

  • Be able to express a wide variety of useful programs in Miden IR
  • Be able to compile programs from Miden IR to Miden Assembly
  • Have a test harness set up to exercise the compilation pipeline for verification

Non-Goal(s)

  • Have a Sway or Move frontend implemented

Details

In order to build a useful compiler from high-level languages to Miden Assembly (MASM), we must have an intermediate representation (IR) which is expressive enough to make lowering from high-level languages directly simple and straightforward, and general enough to be a viable target from a variety of language frontends. This IR must be able to express complex control flow not directly supported in MASM, but which meets certain constraints under which it is possible to transform into a MASM-compatible form. Furthermore, we must be able to express all of the supported MASM operations and their associated types. Additionally, the IR must be converted into a stack machine representation that matches MASM, either by lowering to MASM directly, or by undergoing a series of transformations during lowering that accomplishes that goal.

The IR we have defined currently meets all of the above requirements, but we now need to implement both the control flow transformations and the translation to MASM. Firstly, this requires a test harness which we can use to exercise the compilation pipeline and assert things about its behavior. Secondly, we must implement the translation to stack machine form and lower to MASM. The last piece of the puzzle is to wire up at least one language frontend to the compiler, but we are leaving that as a non-goal/nice-to-have for this milestone, as there are prerequisite tasks which must be tackled before we are ready for that step - namely the control flow transformations described above.

Required

  1. bitwalker
  2. 5 of 5
    testing
    bitwalker
  3. codegen
    bitwalker
  4. codegen
    jjcnn
  5. codegen
    bitwalker
  6. codegen feature
    bitwalker
  7. frontend wasm

NOTE: Issues have not yet been created for the above tasks, I'll create those shortly

Unreachable inst causes `unhandled terminator in non-terminator context` error in `stackify` pass

Error:

thread 'rust_masm_tests::apps::fib' panicked at codegen/masm/src/stackify/pass.rs:1389:9:
unhandled terminator in non-terminator context: PrimOp(PrimOp { op: Unreachable, args: EntityList { index: 0, unused: PhantomData<miden_hir::value::Value> } })

When the Rust code 9999/n compiles to the following Wasm:

(module
  (type (;0;) (func (param i32) (result i32)))
  (func $fib (;0;) (type 0) (param i32) (result i32)
    block ;; label = @1
      local.get 0
      i32.eqz
      br_if 0 (;@1;)
      i32.const 99999
      local.get 0
      i32.div_u
      return
    end
    unreachable
    unreachable
  )
  (memory (;0;) 16)
  (global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
  (global (;1;) i32 i32.const 1048576)
  (global (;2;) i32 i32.const 1048576)
  (export "memory" (memory 0))
  (export "fib" (func $fib))
  (export "__data_end" (global 1))
  (export "__heap_base" (global 2))
)

Which in turn translates to the following IR:

module noname

const $0 = 0x00100000;

global external @__stack_pointer : i32 = $0 { id = 0 };
global external @gv1 : i32 = $0 { id = 1 };
global external @gv2 : i32 = $0 { id = 2 };

pub fn fib(i32) -> i32 {
block0(v0: i32):
    v2 = eq v0, 0 : i1;
    v3 = cast v2 : i32;
    v4 = neq v3, 0 : i1;
    condbr v4, block2, block3;

block1(v1: i32):

block2:
    unreachable ;

block3:
    v5 = const.i32 99999 : i32;
    v6 = cast v5 : u32;
    v7 = cast v0 : u32;
    v8 = div.checked v6, v7 : u32;
    v9 = cast v8 : i32;
    ret v9;
}

The stackify pass fails with the error above.

IR `cast` from i32 to u32 produces MASM that does not support full u32 range

Discovered in #49

Rust <= (le) op on u32 values compiles to the following IR:

module noname

const $0 = 0x00100000;

global external @__stack_pointer : i32 = $0 { id = 0 };
global external @gv1 : i32 = $0 { id = 1 };
global external @gv2 : i32 = $0 { id = 2 };

pub fn entrypoint(i32, i32) -> i32 {
block0(v0: i32, v1: i32):
    v3 = cast v0 : u32;
    v4 = cast v1 : u32;
    v5 = lt v3, v4 : i1;
    v6 = cast v5 : i32;
    br block1(v6);

block1(v2: i32):
    ret v2;
}

Which then generates the following MASM:

export.entrypoint
  dup.0
  push.2147483648
  u32.and
  eq.2147483648
  assertz
  swap.1
  dup.0
  push.2147483648
  u32.and
  eq.2147483648
  assertz
  u32.lt.checked

end
begin
    exec.entrypoint
end

Which fails on assert in VM in proptest with minimal example of (0, 2147483648).

EDIT: see commented test reproducing this issue in #49 in instructions.rs (any comparison test for u32).

Rust to MASM compilation and semantic tests

In a separate crate (under midenc umbrella), establish a test suite for a set of Rust programs, that does the following:

  • Compile a Rust program to MASM, checking against the expected Wasm, IR, and MASM code (compilation test).
  • Run the Rust program (native binary) against the compiled MASM program and check that on the same inputs they produce the same outputs (semantic test).
  • Use property-based testing to get good coverage for the program inputs.

Implement link-time garbage collection

The implementation of the linker in #24 left a few things undone because they aren't critical to an MVP release. Namely, garbage collection of unused (dead) objects discovered at link-time.

Items To Garbage Collect

1 Only when the program being linked is an executable program. When linking a library, we must assume all objects with external linkage are used.
2 Objects with internal (or ODR) linkage are always candidates for garbage collection if there are no references from live objects.

This task is relatively low priority, but is a relatively straightforward task to tackle when we have time.

Implement support for indirect calls via dynexec/procref

Now that 0xPolygonMiden/miden-vm#1078 is merged (which introduces dynexec), the groundwork for supporting indirect calls in Miden Assembly is there. Before we can make use of it though, we need the proposed procref instruction, which would push the hash of a specified function name on the stack. This is done prior to dynexec, which requires that hash in order to execute the indirect call. We need procref because we don't know the hash of the function until after compilation, which procref solves by relying on the Miden assembler to expand it into push.HASH as it knows the hashes of all the functions.

Once that is implemented, we can make use of it to implement indirect calls in the IR:

Tasks

RFC: Metadata format for dynamically-linked libraries

Background

In #24, @bobbinth and I briefly discussed whether or not it was possible to define the signature for external functions in the IR, so that libraries which are not present at link-time, but are provided at runtime can be known to the compiler, just like the Miden standard library.

As I mentioned there, we already have support for the Miden stdlib in the IR itself, but a better, more open-ended approach would be to define a metadata format which can be used to communicate the modules that can be dynamically linked, what functions they contain, and the type signature and ABI of those functions.

The compiler could also use this format when it emits Miden Assembly so that it can consume it's own codegen output.

Metadata

The following is the bare minimum metadata we'd need from the format to be useful:

  • The set of modules that are part of the library
  • For each module, what functions it exports, with it's full signature (parameter and result types, ABI/calling convention, and optionally, applicable special purpose/extension semantics)

However, in addition to that information, I would also like the following to be supported (optionally, i.e. it isn't necessary to provide this information if it isn't applicable):

  • Any data segments which the library wishes to reserve for it's use (given as an offset and size in bytes, with optional initializer), for example, default programs emitted by the compiler reserve the first page (64kb) for the shadow stack, the second page for read-only constant data, followed immediately by the memory reserved for global variables, after which the "unreserved" heap starts. A library could declare that it reserves the 3rd page for it's own use (starting at address 131072) which the compiler will accommodate, placing the global variable segment as the 4th page, and setting up the unreserved heap so that it starts after that. Supporting this ensures that modules emitted by the compiler play nicely with dynamically linked modules, and vice versa.

Format(s)

I think there are three ways we'll want to allow providing the metadata:

Miden Assembly

Embedding the information directly in the Miden Assembly textual format, either using doc comments (perhaps with syntax for attributes), or dedicated (but optional) syntax, e.g.

#! This function takes a pointer to an array of u32, and the length of that array,
#! and iterates over the array, summing its contents.
#!
#! @type (*mut u32, u32) -> u32
#! @abi C
export.sum
   ...
end

OR

#! This variation uses a slightly different set of attributes that could also be beneficial for documentation more generally
@param ptr :: *mut u32
@param len :: u32
@result u32
@abi C
export.sum
   ...
end

We'd assume certain defaults if a specific attribute is not present, e.g. if the @abi attribute is not specified, we'd assume C, the most interoperable calling convention. The only non-optional attribute is the @type expression (assuming that's the way we choose to specify the type signature).

Textual Format

This could basically be JSON or some other human-readable/writable textual format with a specification, but would be versioned and express all of the metadata a library author is willing or able to provide, e.g.:

{
    "version": "1.0",
    "segments": [
        { "offset": 131072, "size": 65536, "init": "0x...." }
    ],
    "modules" {
        "foo": {
            "exports": {
                "bar": {
                    "abi": "C",
                    "params": [
                        { "type": "*mut u8" },
                        { "type": "u32", "extension": "zext" }
                    ],
                    "results": [
                        { "type": "u32" }
                    ]
                }
            }
        }   
    }
}

Binary Format

This format would be used to either embed in another format (perhaps in .masl files?), or to allow for a more compact representation. This would basically provide all the same information as the textual format, but using a well-known binary encoding like CBOR or Protobuf. Like the textual format, it would also be versioned to accommodate changes over time.

This format may not be needed, just depends on how we expect to access the metadata. Ideally we'd tell the compiler where a set of .masl files are, and it loads the metadata from them directly; but we could also expect to find separate files with the same name as its corresponding .masl, either way works.


This is just an initial sketch of the general idea here, we have a number of options we can pursue, and I'm not too opinionated about them. The main thing is that we define a format, the means of its delivery, and enough useful information about a library to provide a smooth and mutually beneficial developer experience around this. The more information we get, the better the compiler can reason about and verify its own outputs, and in turn provide useful diagnostics to users when something is wrong.

Implement test harness for compiler

In order to make for a better development experience, and to ensure we have an easy way of doing end-to-end testing, we need a test harness with the following properties:

Tasks

Some of the tooling for this is already in place, but the actual harness is not implemented yet.

Implement loop and dominator tree analyses

In order to convert Miden IR to Miden Assembly, we require two analysis passes over the IR which give us the necessary information to transform low-level control flow (primitive branches/jumps) into the higher-level control flow (if/while/repeat) used by Miden Assembly.

  • The dominator tree analysis is the foundation on which we build our control flow analyses. This analysis is critical for determining what blocks are involved in a given control flow path, and how control enters/exits that path. It is a low level analysis used by others we will build.
  • The loop analysis builds on the dominator tree to recognize loops and their various forms. This analysis will be used as the basis for our transformation of loop-like control flow to Miden Assembly.

There are other analyses/passes related to these that we will implement or may want to implement, but these two are critical for other work in the pipeline.

Support declaring the same predecessor more than once in SSABulder

Extracted from #22 (comment)

A solution to this particular problem though is to split the edge in the predecessor, introducing new blocks along any duplicate edges which branch to the same successor, and which act as a passthrough for the "true" successor block. This gets you the desired behavior without the complexity of supporting the same block on more than one path in multi-way branching instructions.

Missing definition for SSA value (v6)

#[no_mangle]
pub fn fib(n: u32) -> u32 {
    let mut a = 2;
    while a < n {
        a = a * 3;
    }
    a
}

Compiles to the following Wasm:

(module
  (type (;0;) (func (param i32) (result i32)))
  (func $fib (;0;) (type 0) (param i32) (result i32)
    (local i32)
    i32.const 2
    local.set 1
    loop (result i32) ;; label = @1
      block ;; label = @2
        local.get 1
        local.get 0
        i32.lt_u
        br_if 0 (;@2;)
        local.get 1
        return
      end
      local.get 1
      i32.const 3
      i32.mul
      local.set 1
      br 0 (;@1;)
    end
  )
  (memory (;0;) 16)
  (global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
  (global (;1;) i32 i32.const 1048576)
  (global (;2;) i32 i32.const 1048576)
  (export "memory" (memory 0))
  (export "fib" (func $fib))
  (export "__data_end" (global 1))
  (export "__heap_base" (global 2))
)

Which translates to the following IR:

module noname

const $0 = 0x00100000;

global external @__stack_pointer : i32 = $0 { id = 0 };
global external @gv1 : i32 = $0 { id = 1 };
global external @gv2 : i32 = $0 { id = 2 };

pub fn fib(i32) -> i32 {
block0(v0: i32):
    v2 = const.i32 0 : i32;
    v3 = const.i32 2 : i32;
    br block2(v3);

block1(v1: i32):

block2(v5: i32):
    v7 = cast v5 : u32;
    v8 = cast v6 : u32;
    v9 = lt v7, v8 : i1;
    v10 = cast v9 : i32;
    v11 = neq v10, 0 : i1;
    condbr v11, block4, block5;

block3(v4: i32):

block4:
    v12 = const.i32 3 : i32;
    v13 = mul.wrapping v5, v12 : i32;
    br block2(v13);

block5:
    ret v5;
}

v6 is undefined.

"value v1 not found on operand stack" error during `stackify` pass

To reproduce - uncomment test_func_two_arg!(min, core::cmp::min, u32, u32, u32); test in instructions.rs added in #49

Stack backtrace:

---- rust_masm_tests::instructions::min_u32_u32 stdout ----
thread 'rust_masm_tests::instructions::min_u32_u32' panicked at codegen/masm/src/stackify/emitter.rs:724:14:
value v1 not found on operand stack
stack backtrace:
   0: rust_begin_unwind
             at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/std/src/panicking.rs:617:5
   1: core::panicking::panic_fmt
             at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/panicking.rs:67:14
   2: core::panicking::panic_display
             at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/panicking.rs:150:5
   3: core::panicking::panic_str
             at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/panicking.rs:134:5
   4: core::option::expect_failed
             at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/option.rs:1988:5
   5: core::option::Option<T>::expect
             at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/option.rs:898:21
   6: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_stack_dependency
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:722:19
   7: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_node
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:637:21
   8: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_inst
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:962:13
   9: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_inst_dependency
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:794:13
  10: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_node
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:605:36
  11: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_inst
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:962:13
  12: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_node
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:615:25
  13: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_schedule
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:513:13
  14: miden_codegen_masm::stackify::emitter::MasmEmitter::emit
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:470:13
  15: <miden_codegen_masm::stackify::ConvertHirToMasm<&miden_hir::function::Function> as miden_hir::pass::conversion::ConversionPass>::convert
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/mod.rs:172:13
  16: <miden_codegen_masm::stackify::ConvertHirToMasm<miden_hir::module::Module> as miden_hir::pass::conversion::ConversionPass>::convert
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/mod.rs:123:33
  17: <miden_codegen_masm::stackify::ConvertHirToMasm<miden_hir::program::Program> as miden_hir::pass::conversion::ConversionPass>::convert
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/mod.rs:82:35
  18: miden_codegen_masm::MasmCompiler::compile
             at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/lib.rs:62:12
  19: miden_integration_tests::compiler_test::CompilerTest::ir_masm_program
             at ./src/compiler_test.rs:263:27
  20: miden_integration_tests::compiler_test::CompilerTest::expect_masm
             at ./src/compiler_test.rs:231:23
  21: miden_integration_tests::rust_masm_tests::instructions::min_u32_u32
             at ./src/rust_masm_tests/instructions.rs:106:17
  22: miden_integration_tests::rust_masm_tests::instructions::min_u32_u32::{{closure}}
             at ./src/rust_masm_tests/instructions.rs:95:28
  23: core::ops::function::FnOnce::call_once
             at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/ops/function.rs:250:5
  24: core::ops::function::FnOnce::call_once
             at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/ops/function.rs:250:5

When the min(a, b) Rust code compiled to the following Wasm:

(module
  (type (;0;) (func (param i32 i32) (result i32)))
  (func $entrypoint (;0;) (type 0) (param i32 i32) (result i32)
    local.get 0
    local.get 1
    local.get 0
    local.get 1
    i32.lt_u
    select
  )
  (memory (;0;) 16)
  (global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
  (global (;1;) i32 i32.const 1048576)
  (global (;2;) i32 i32.const 1048576)
  (export "memory" (memory 0))
  (export "entrypoint" (func $entrypoint))
  (export "__data_end" (global 1))
  (export "__heap_base" (global 2))
)

And translated into the following IR:

module noname

const $0 = 0x00100000;

global external @__stack_pointer : i32 = $0 { id = 0 };
global external @gv1 : i32 = $0 { id = 1 };
global external @gv2 : i32 = $0 { id = 2 };

pub fn entrypoint(i32, i32) -> i32 {
block0(v0: i32, v1: i32):
    v3 = cast v0 : u32;
    v4 = cast v1 : u32;
    v5 = lt v3, v4 : i1;
    v6 = cast v5 : i32;
    v7 = neq v6, 0 : i1;
    v8 = select v7, v0, v1 : i32;
    br block1(v8);

block1(v2: i32):
    ret v2;
}

Implement peephole optimizer

This is the first optimization pass I think we'll want to implement as we start to observe less-than-ideal instruction sequences in the generated MASM. Such sequences are likely to occur as an artifact of the stackification algorithm. Rather than add more complexity to that pass, it is more beneficial to post-process the output of the pass and collapse inefficient instruction sequences into more efficient forms using peephole optimization. We can then use this to inform changes to the stackification pass, by focusing on things which are not easily resolved by the peephole optimizer.

In particular, I'm expecting the following to be good candidates for this:

Possible Optimizations

Some of these we could implement at a higher-level on the IR prior to stackification, but we'll want at least some degree of peephole optimization to clean up the output of stackification.

Create WASM compilation example(s)

Would be good to create an example (or maybe a couple) showing how we can compile WASM files to MASM files. The basic example could include something simple - like computing a fixed Fibonacci number. We could add to these examples over time to include more complex use cases.

The examples could be in a separate top-level directory (e.g., examples) with a short README explaining how to compile them.

The main goal here is to make it easy for newcomers to see how compilation works and check out the output of the compiler for some simple programs.

Implement emitting zero at the start of the block in `frontend_wasm::ssa::emit_zero`

Extracted from #22 (comment)

The DataFlowGraph provides insert_inst as a primitive, which allows you to specify an InsertionPoint, which can be either the start/end of a Block, or before/after an Inst, but I haven't exposed that in a very convenient way in the DefaultInstBuilder. All of the InstBuilder implementations actually use this under the covers, so its really just a matter of surfacing it better in one of the builders.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.