0xpolygonmiden / compiler Goto Github PK
View Code? Open in Web Editor NEWCompiler from MidenIR to Miden Assembly
License: MIT License
Compiler from MidenIR to Miden Assembly
License: MIT License
This issue is just an initial sketch of my proposal for how to approach development of the Miden SDK tooling:
I'm about to crash here, so this is basically just flushing this stuff out of my brain, hopefully its useful to you @greenhat until I get a chance to clean it up a bit and expand on some of these points.
A good solution in this case means that it can be used to represent high-level Rust (and other language) types using a common format (preferably in a way that lets us map those high-level types to our IR type system), which provides us the ability to query the types, sizes, and alignment of arguments and return values for a given function. Additionally, we want to be able to generate code to construct this specification in-memory, as well as parse/load it from a textual format, and use the in-memory spec to query things such as the number of parameters, what type are those parameters, what are the size and alignment, etc. ↩
Using conditional compilation, the latter target will contain a function which, when loaded dynamically and called, will return the WIT specification for the account/note functions in that crate. That spec will be used by the compiler to "enrich" the wasm32-unknown-unknown artifact with type information from the original program, enabling the compiler to handle the calling convention details. ↩
The generated code can simply be placed in a build directory for the crate and included; or placed in the source tree, either approach works. The Rust code we're generating here are wrappers which handle the gritty details of the calling convention (a wrapper for each public contract function defined in the crate, which delegates to the actual function after decoding the arguments using the calling convention rules; and a wrapper for external contract functions, which encodes the high-level Rust types using the calling convention rules, and then invoking the external function) ↩
Simultaneously, we'll implement the calling convention handling for call
in the compiler backend. Once complete, we can rip out the hack from the previous stage ↩
Following #61 (comment)
We don't have to spend time checking the code style during the code review process.
Use rustfmt in a separate CI job to check the code formatting and fail if it does not match our style defined in rustfmt.toml
in the repo root. Here is an example of a CI job that does that - https://github.com/ergoplatform/sigma-rust/blob/a360b255f8780d3ae9e6da44266c774e6e4055c5/.github/workflows/ci.yml#L177-L190
rustfmt.toml
:
# see https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=import#imports_granularity
import_granularity = "Module"
rust-analyzer uses rustfmt to format the code on save and can be additionally configured to introduce new imports with a given granularity. Each editor has its way of configuring the rust-analyzer. For VSCode, the option for import granularity is rust-analyzer.imports.granularity.group
(in our case, set to module
).
https://rust-analyzer.github.io/manual.html#auto-import
In the first version of the extension, we will support only library targets because #59 is a blocker for app targets.
Commands:
#no_std
required stuff in lib.rs
(panic handler, etc.) and release profile options (size optimization, panic=abort, etc.)Translate a Wasm component that doesn't have imports and uses only primitive types. Think of an simple add
function as an example.
Translate a simple Wasm component that has imports but still uses only primitive types. Think of the Fibonacci function that uses an imported add
function.
basic-wallet
example)Translate a basic-wallet example) that has a pretty complex internal architecture and using record
and variant
Wasm CM types.
p2id-note
example)Translate a p2id-note example that has multiple imports and is using list
type in the interface.
Implement the following Wasm CM types support:
enum
result
option
flags
// test_bool_op!(ge, >=, u64);
// test_bool_op!(ge, >=, i64);
// test_bool_op!(gt, >, u64);
// test_bool_op!(gt, >, i64);
// test_bool_op!(le, <=, u64);
// test_bool_op!(le, <=, i64);
// test_bool_op!(lt, <, u64);
// test_bool_op!(lt, <, i64);
// test_int_op!(add, +, u64);
// test_int_op!(add, +, i64);
// test_int_op!(sub, -, u64);
// test_int_op!(sub, -, i64);
// test_int_op!(mul, *, u64);
// test_int_op!(mul, *, i64);
// test_int_op!(div, /, u64);
// test_int_op!(div, /, i64);
// test_int_op!(rem, %, u64);
// test_int_op!(rem, %, i64);
// test_unary_op!(neg, -, u64);
// test_unary_op!(neg, -, i64);
// test_unary_op!(not, !, u64);
// test_unary_op!(not, !, i64);
// test_int_op!(shl, <<, u64);
// test_int_op!(shl, <<, i64);
// test_int_op!(shr, >>, u64);
// test_int_op!(shr, >>, i64);
// test_unary_op!(neg, -, i64);
Might be fixed by #67
//
// Comparison ops
//
// test_bool_op!(ge, >=, i32);
// test_bool_op!(ge, >=, i16);
// test_bool_op!(ge, >=, i8);
//
// test_bool_op!(gt, >, i32);
// test_bool_op!(gt, >, i16);
// test_bool_op!(gt, >, i8);
//
// test_bool_op!(le, <=, i32);
// test_bool_op!(le, <=, i16);
// test_bool_op!(le, <=, i8);
//
// test_bool_op!(lt, <, i32);
// test_bool_op!(lt, <, i16);
// test_bool_op!(lt, <, i8);
//
// Arithmetic ops
//
// test_int_op!(mul, *, u32);
// test_int_op!(mul, *, u16);
// test_int_op!(mul, *, u8);
// test_int_op!(mul, *, i32);
// test_int_op!(mul, *, i16);
// test_int_op!(mul, *, i8);
//
// Bitwise ops
//
// test_int_op!(shr, >>, i8);
// test_int_op!(shr, >>, i16);
// test_int_op!(shr, >>, i32);
//
// test_unary_op!(not, !, u64);
// test_unary_op!(not, !, i64);
Might be fixed by #67
// test_int_op!(and, &, u64);
// test_int_op!(and, &, i64);
// test_int_op!(or, |, u64);
// test_int_op!(or, |, i64);
// test_int_op!(xor, ^, u64);
// test_int_op!(xor, ^, i64);
// test_func_two_arg!(min, core::cmp::min, i32, i32, i32);
// test_func_two_arg!(min, core::cmp::min, u32, u32, u32);
// test_func_two_arg!(min, core::cmp::min, u8, u8, u8);
// test_func_two_arg!(max, core::cmp::max, u8, u8, u8);
// ... other types
core::panicking
core
crate to avoid linking core::panicking
// test_int_op!(div, /, u32);
// ...
// more typer in tests for div, rem,
Should create a Wasm CM Rust project, so the template should be updated:
We need to figure out a way to provide at least a Miden SDK WIT file as a dependency.
Start by calling the cargo_component
to build the project and switch to use it as a library when our needs grow. We probably would like to avoid forking it but rather building on top of it.
The various assertion instructions, e.g. assert
in Miden Assembly now support constant integral error codes. We should add support for that in the IR.
A main
function in Rust cannot have arguments, and we need them to pass inputs to a MASM program in app targets of the cargo projects that we want to compile to MASM.
The workaround is to use extern "C" fn __main(...) -> ... {}
to have to pass inputs and get the output from the VM. However, the Miden Assembly deems __main
as the invalid name for a procedure.
These functions count the number of leading/trailing zeroes in the binary representation of an integral type, and are often used when using certain integer types as bitsets. In particular, these are used in the allocators that Rust programs use when compiled to Wasm, so this will be a blocker for compiling any Rust/Wasm module that uses the alloc
crate (i.e. anything that uses Box
, Vec
, etc.). See #22
This is the tracking issue for the implementation of these instructions, which has already been started, but was on hold while other tasks were being completed.
Translate the Wasm CM world
into the IR Module
. Since a non-trivial Wasm component has a complex nested architecture (see basic-wallet example that has 3 components, 4 component instances, 3 core modules and 5 core module instances) it makes sense for the frontend to encapsulate all this complexity and inline everything to produce a single IR Module
.
The Wasm CM export
and import
top-level directives + canonical definitions (canon
) to be translated and stored in the IR Module
for every imported(external) and exported IR module function.
Wasm CM types are translated to IR Type
variants. Add corresponding IR Type
variants for the missing Wasm CM types (list
, variant
, enum
, result
, option
).
Here is how the IR Type
variants for list
and variant
might look like:
pub enum Type {
...
/// An enum variant type
Variant(VariantType),
/// A list of values of the same type
List(Box<Type>),
}
/// Variants are close to Rust `enum` declarations where a value is one of many
/// cases and each case has a unique name and an optional payload associated
/// with it.
#[derive(Clone, Hash, Eq, PartialEq, Debug)]
pub struct VariantType {
/// The representation to use for this type
pub(crate) repr: TypeRepr,
/// The computed size of this struct
pub(crate) size: u32,
/// The list of cases that this variant can take.
pub cases: Box<[VariantCase]>,
}
/// One case of a `variant` type which contains the name of the variant as well
/// as the payload.
#[derive(Clone, Hash, Eq, PartialEq, Debug)]
pub struct VariantCase {
/// Name of the variant, unique amongst all cases in a variant.
pub name: String,
/// Optional type associated with this payload.
pub ty: Option<Type>,
}
To ease the review process and get the feedback early, I plan to split the Wasm CM support implementation into the following stages/PR sets:
fib
app testMiden IR is a typical SSA IR which supports arbitrary structured control flow, this is to make lowering from a variety of high-level languages easier and provide a useful substrate on which to perform optimizations and transformations. MASM however does not support arbitrary structured control flow via jumps/branches, and instead only supports a small set of higher-level control operators (if
, while
, and repeat
). In order to lower from Miden IR to Miden Assembly, we need a transformation pass which restructures the IR in a way that preserves the semantics of the high-level source code, but which is amenable to translation to Miden Assembly.
This pass will build on the loop/dominator tree analyses to recognize complex control flow, and convert it to an equivalent mix of MASM control operators. This may require some additional work on the IR to allow Instruction
to contain regions with blocks, similar to MLIR.
Miden provides native u32 support, but Wasm, and other consumers of the frontend will require at least minimal support for signed 32-bit ops (i.e. i32), and possibly at other bit widths as well. For basic unchecked/wrapping add
/sub
/mul
, the native u32 ops should work for the signed version as well, since both use the two's complement binary encoding. However, for checked/overflowing variants, and for things like signed division, we must implement support for that separately.
I'll update this later with the specific instructions we want to support first.
The following is a list of instructions we would like to add support for in the codegen backend when we get a chance. Unless explicitly called out, these instructions are exposed in the IR, and translated from Wasm already, but compilation will fail due to lack of support in the backend.
Assume that unless otherwise stated, all type variants of the above instructions are desired.
Discovered in #84 in test_rust_comp::rust_array()
test.
Un-ignore the test to reproduce.
The IR passes are defined in
compiler/tests/integration/src/compiler_test.rs
Lines 495 to 504 in 430ef78
InlineBlocks
pass is omitted(commented out).
I've also reproduced it with rustc 1.74.1 and the latest nightly. I suspect the unsafe
code that operates the intrusive_collections
.
FWIW, here the Wasm:
(module
(type (;0;) (func (param i32 i32) (result i32)))
(type (;1;) (func (result i32)))
(func $sum_arr (;0;) (type 0) (param i32 i32) (result i32)
(local i32)
i32.const 0
local.set 2
block ;; label = @1
local.get 1
i32.eqz
br_if 0 (;@1;)
loop ;; label = @2
local.get 0
i32.load
local.get 2
i32.add
local.set 2
local.get 0
i32.const 4
i32.add
local.set 0
local.get 1
i32.const -1
i32.add
local.tee 1
br_if 0 (;@2;)
end
end
local.get 2
)
(func $__main (;1;) (type 1) (result i32)
i32.const 1048576
i32.const 5
call $sum_arr
i32.const 1048596
i32.const 5
call $sum_arr
i32.add
)
(memory (;0;) 17)
(global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
(global (;1;) i32 i32.const 1048616)
(global (;2;) i32 i32.const 1048624)
(export "memory" (memory 0))
(export "sum_arr" (func $sum_arr))
(export "__main" (func $__main))
(export "__data_end" (global 1))
(export "__heap_base" (global 2))
(data $.rodata (;0;) (i32.const 1048576) "\01\00\00\00\02\00\00\00\03\00\00\00\04\00\00\00\05\00\00\00\06\00\00\00\07\00\00\00\08\00\00\00\09\00\00\00\0a\00\00\00")
)
To reproduce uncomment test_int_op!(and, &, u64);
test in instructions.rs
in #49
For 1u64 & 1u64
Rust code is compiled into the following IR:
module noname
const $0 = 0x00100000;
global external @__stack_pointer : i32 = $0 { id = 0 };
global external @gv1 : i32 = $0 { id = 1 };
global external @gv2 : i32 = $0 { id = 2 };
pub fn entrypoint(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v3 = band v1, v0 : i64;
br block1(v3);
block1(v2: i64):
ret v2;
}
Which translates into the following MASM:
export.entrypoint
exec.checked_and
end
begin
exec.entrypoint
end
Which Miden Assembler
cannot compile.
The ultimate compilation pipeline we are trying to achieve looks something like this:
Rust -> WASM -> MidenIR -> MASM
The code in this repo handles the last 3 steps (WASM -> MASM
). This leads to a couple of questions:
Rust -> WASM
step live and what components could it include?miden-ir
really the right name for this repo?It seems to me that the answer to the second question is no. This repo contains much more than just IR - so, we should rename it to something more relevant. miden-compiler
is one option - but happy to hear other suggestions.
As for the first question, it seems to me that this should live in a separate repo. That repo could probably consist of several crates covering the following functionality:
stdlib
and midenlib
libraries (ideally, with a tool which can generate these wrappers automatically).As for the name for this repo (and associated crates) - I think it could be something unique (i.e., doesn't have to include "miden" in the name). I don't have great ideas on this yet - so, any suggestions are welcome. A couple of examples from other projects:
One other question: should this new repo try to support general compilation from Rust to MASM or be focused specifically on the rollup? My thinking is that we should focus it on the rollup. This would reduce the scope of what we need to do and will help us get to Miden smart contracts in Rust faster.
This issue is the final piece of #4 to have a complete compiler toolchain that can be used independently.
This executable basically puts together the following pieces of the compiler that we have built:
In essence, the frontend of the compiler takes as input either a Wasm binary (.wasm
), Wasm text format (.wat)
, or the HIR textual format (.hir
) so that we can store/load our own IR. Regardless of input format, the frontend translates to the in-memory HIR representation, and then handles a request to compile or eval that input in some way. The following is a set of capabilities I expect the compiler to provide (both when used as a library and as an executable):
Extracted from #22 (comment)
The problem is that the expect!
test for dlmalloc
and signed_arith
that checks the produced wat file fails on CI because hashes in mangled function names differ from the "golden"/expected ones that I built on my machine.
I'm thinking about cutting out the hash from the mangled name. It should be hidden behind the option.
For use in testing, it's valuable to have a parser for the IR text format which we currently pretty-print for use in review/diagnosing issues.
miden-diagnostics
and miden-parsing
lalrpop
for the parserSome implementation notes/recommendations:
You can refer to the [AirScript parser crate] to get a feel for how to define a LALRPOP grammar, and use it to construct the AST along with diagnostics, source spans, etc. I wouldn't focus too much on mimicking it's specific implementation, since it's use case is different, but it should provide some useful guidance/hints on how to approach certain problems. The grammar, as the name implies, drives a LR(1)/LALR parser generator, so it can be occasionally tricky to handle certain semi-ambiguous parts of a language. The IR syntax is pretty explicit though, and doesn't have a lot of fancy expression handling and the like, so that shouldn't be an issue. Let me know if you hit any shift/reduce errors though.
For translating the AST to the IR, I would implement it as a Pass
, using a simple recursive traversal, basically the following:
use std::collections::VecDeque;
use miden_diagnostics::DiagnosticsHandler;
use miden_hir_pass::Pass;
use smallvec::SmallVec;
use rustc_hash::{FxHashMap, FxHashSet};
use crate::{self as hir, ast, ModuleBuilder, ModuleFunctionBuilder};
pub struct AstToIr<'a> {
/// Used for emitting diagnostics
diagnostics: &'a DiagnosticsHandler,
/// Used to map block identifiers parsed into the AST, to their corresponding identifiers in the IR
block_map: FxHashMap<ast::BlockId, hir::Block>,
/// Used to map value identifiers parsed into the AST, to their corresponding identifiers in the IR
value_map: FxHashMap<ast::ValueId, hir::Value>,
}
impl<'a> AstToIr<'a> {
pub fn new(diagnostics: &'a DiagnosticsHandler) -> Self {
Self {
diagnostics,
block_map: Default::default(),
value_map: Default::default()
}
}
}
impl<'p> Pass for AstToIr<'p> {
type Input<'a> = ast::Module;
type Output<'a> = Box<hir::Module>;
type Error = anyhow::Error;
fn run<'a>(&mut self, mut module: Self::Input<'a>) -> Result<Self::Output<'a>, Self::Error> {
let mut builder = ModuleBuilder::new(module.name);
builder.with_span(module.span());
for global in module.globals.into_iter() {
// Here I'm assuming the AST uses our Type and Linkage types
let gv = builder.declare_global_variable(global.name, global.ty, global.linkage, global.span())?;
// Globals don't necessarily have an initializer
if let Some(init) = global.initializer {
builder.set_global_initializer(gv, init);
}
}
for function in module.functions.into_iter() {
// Here I'm assuming the AST uses our Signature type
let mut fb = builder.build_function(function.name, function.signature, function.span())?;
self.translate_function(&mut fb)?;
}
Ok(builder.build())
}
}
impl<'a> AstToIr<'a> {
fn translate_function(&mut self, builder: &mut ModuleFunctionBuilder, function: ast::Function) -> anyhow::Result<()> {
// We need to visit all blocks to create the corresponding IR block before we do the translation
for (i, block) in function.blocks.iter().enumerate() {
// Here I'm assuming the first block in the AST is always the entry block, which should be true.
let block_id = if i == 0 {
let id = builder.entry_block();
// The entry block is already populated with parameters based on the function signature,
// we just need to map the identifiers in the AST to their IR equivalents
let values = builder.data_flow_graph().block_params(id);
assert_eq!(values.len(), block.params.len());
for (param, value) in block.params.iter().zip(values.iter()) {
self.value_map.insert(param.id, value);
}
id
} else {
let id = builder.create_block();
// For all other blocks, we need to alter the block parameter list
for param in block.params.iter() {
// Here I'm assuming we're using our Type enum in the AST
let value = builder.append_block_param(id, param.ty.clone(), param.span());
self.value_map.insert(param.id, value);
}
id
};
// Make sure we map the block id in the AST to the IR block
self.block_map.insert(block.id, block_id);
}
// Translate the body of the function by walking the control flow graph,
// i.e. we translate the entry block, then follow control flow instructions
// to the next unvisited block, and so on, until all blocks are visited. We
// discard unvisited blocks, as those are by definition dead code.
let mut visited = FxHashSet::<ast::BlockId>::default();
let mut worklist = VecDeque::<ast::BlockId>::default();
worklist.push_back(function.blocks[0].id);
while let Some(ast_block) = worklist.pop_front() {
if !visited.insert(ast_block) {
continue;
}
self.translate_block(builder, &function.blocks[ast_block], &mut worklist)?;
}
// We're done
builder.build(self.diagnostics).map(|_| ())
}
fn translate_block(&mut self, builder: &mut ModuleFunctionBuilder, block: &ast::Block, worklist: &mut VecDeque<ast::BlockId>) -> anyhow::Result<()> {
let block_id = self.block_map[block.id];
builder.switch_to_block(block_id);
for op in block.ops.iter() {
let span = op.span();
match op {
// This is an example of how to handle control flow instructions
ast::Instruction::Br(ast::Br { dest, args, .. }) => {
// Make sure we visit the destination block next
worklist.push_back(*dest);
// Map the instruction data
let dest = self.block_map[dest];
let args = args.iter().map(|arg| self.value_map[arg]).collect::<SmallVec<[hir::Value; 2]>>();
builder.ins().br(dest, &args, span);
}
// This is an example of how to handle instructions with results
ast::Instruction::Add(ast::Add { lhs, rhs, output, .. }) => {
// We're able to assume that values are already defined because our traversal enforces
// that we visit predecessors of this instruction before the instruction itself.
let lhs = self.value_map[lhs];
let rhs = self.value_map[rhs];
let result = builder.ins().add(lhs, rhs, span);
self.value_map.insert(*output, result);
}
_ => unimplemented!(),
}
}
Ok(())
}
}
That's the broad strokes, but obviously a lot of details are left out. If you have any questions, or find any issues with the above approach, let me know and we can discuss alternatives.
Following the discussion at #28 (comment)
The goal is to try to compile the following code to the MASM below and see what parts we are missing:
use miden::{
account,
asset::Asset,
note::Recipient,
tx,
};
#[miden_account]
pub struct MyWallet;
#[miden_account]
impl MyWallet {
pub fn recieve_asset(&mut self, asset: Asset) {
account::add_asset(asset);
}
pub fn send_asset(&mut self, asset: Asset, recipient: Recipient) {
account::remove_asset(asset);
tx::create_note(asset, recipient);
}
}
The above code is expected to be compiled close to the following (manually written) code:
use.miden::sat::account
use.miden::sat::tx
#! Adds the provided asset to the current account.
export.receive_asset
exec.account::add_asset
padw swapw dropw
end
#! Creates a note which sends the specified asset out of the current account
#! to the specified recipient.
export.send_asset.1
exec.account::remove_asset
# => [ASSET, tag, RECIPIENT, ...]
# insert 8 ZEROs into the stack right after recipient; we temporarily store one of the
# elements of ASSET in memory to make stack manipulation easier
push.0 swap loc_store.0 padw push.0.0.0 swapdw loc_load.0
# => [ASSET, tag, RECIPIENT, ZERO, ZERO, ...]
exec.tx::create_note
# => [note_ptr, ZERO, ZERO, ...]
end
The things to check are:
MyWallet
struct as a Miden account (export marker trait in WIT vs. doc comment attribute). Keep in mind that helper methods exported in the account are to be exec
d in notes/tx scripts.Asset
as a resource in WIT;EDIT: updated to reflect the switch to WIT.
As described at #58 (comment)
Extracted from #49 (comment)
Wait for a driver API changes for accessing IR.
We need liveness analysis for codegen, as well as a "register allocator"-like pass over the IR with it to generate appropriate operand stack management code (with spills to locals). The latter is tightly bound to liveness analysis, and is the main purpose of it in the code generator, so it is quite important for generating decent Miden assembly.
It's hard to use a clippy when there are a lot (65) of warnings in the whole workspace. I suggest we add a CI job that will fail on clippy warnings. Although I find almost all of the warnings useful we can disable the ones we don't want.
In order to sketch out the backend of the compiler pipeline, we need to implement lowering to Miden Assembly. To begin with, we will lower IR that is free of non-trivial control flow (i.e. anything other than simple conditional branches). This will provide a good base on which further work can be built.
In order to build a useful compiler from high-level languages to Miden Assembly (MASM), we must have an intermediate representation (IR) which is expressive enough to make lowering from high-level languages directly simple and straightforward, and general enough to be a viable target from a variety of language frontends. This IR must be able to express complex control flow not directly supported in MASM, but which meets certain constraints under which it is possible to transform into a MASM-compatible form. Furthermore, we must be able to express all of the supported MASM operations and their associated types. Additionally, the IR must be converted into a stack machine representation that matches MASM, either by lowering to MASM directly, or by undergoing a series of transformations during lowering that accomplishes that goal.
The IR we have defined currently meets all of the above requirements, but we now need to implement both the control flow transformations and the translation to MASM. Firstly, this requires a test harness which we can use to exercise the compilation pipeline and assert things about its behavior. Secondly, we must implement the translation to stack machine form and lower to MASM. The last piece of the puzzle is to wire up at least one language frontend to the compiler, but we are leaving that as a non-goal/nice-to-have for this milestone, as there are prerequisite tasks which must be tackled before we are ready for that step - namely the control flow transformations described above.
NOTE: Issues have not yet been created for the above tasks, I'll create those shortly
Error:
thread 'rust_masm_tests::apps::fib' panicked at codegen/masm/src/stackify/pass.rs:1389:9:
unhandled terminator in non-terminator context: PrimOp(PrimOp { op: Unreachable, args: EntityList { index: 0, unused: PhantomData<miden_hir::value::Value> } })
When the Rust code 9999/n
compiles to the following Wasm:
(module
(type (;0;) (func (param i32) (result i32)))
(func $fib (;0;) (type 0) (param i32) (result i32)
block ;; label = @1
local.get 0
i32.eqz
br_if 0 (;@1;)
i32.const 99999
local.get 0
i32.div_u
return
end
unreachable
unreachable
)
(memory (;0;) 16)
(global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
(global (;1;) i32 i32.const 1048576)
(global (;2;) i32 i32.const 1048576)
(export "memory" (memory 0))
(export "fib" (func $fib))
(export "__data_end" (global 1))
(export "__heap_base" (global 2))
)
Which in turn translates to the following IR:
module noname
const $0 = 0x00100000;
global external @__stack_pointer : i32 = $0 { id = 0 };
global external @gv1 : i32 = $0 { id = 1 };
global external @gv2 : i32 = $0 { id = 2 };
pub fn fib(i32) -> i32 {
block0(v0: i32):
v2 = eq v0, 0 : i1;
v3 = cast v2 : i32;
v4 = neq v3, 0 : i1;
condbr v4, block2, block3;
block1(v1: i32):
block2:
unreachable ;
block3:
v5 = const.i32 99999 : i32;
v6 = cast v5 : u32;
v7 = cast v0 : u32;
v8 = div.checked v6, v7 : u32;
v9 = cast v8 : i32;
ret v9;
}
The stackify
pass fails with the error above.
Discovered in #49
Rust <=
(le) op on u32 values compiles to the following IR:
module noname
const $0 = 0x00100000;
global external @__stack_pointer : i32 = $0 { id = 0 };
global external @gv1 : i32 = $0 { id = 1 };
global external @gv2 : i32 = $0 { id = 2 };
pub fn entrypoint(i32, i32) -> i32 {
block0(v0: i32, v1: i32):
v3 = cast v0 : u32;
v4 = cast v1 : u32;
v5 = lt v3, v4 : i1;
v6 = cast v5 : i32;
br block1(v6);
block1(v2: i32):
ret v2;
}
Which then generates the following MASM:
export.entrypoint
dup.0
push.2147483648
u32.and
eq.2147483648
assertz
swap.1
dup.0
push.2147483648
u32.and
eq.2147483648
assertz
u32.lt.checked
end
begin
exec.entrypoint
end
Which fails on assert in VM in proptest with minimal example of (0, 2147483648).
EDIT: see commented test reproducing this issue in #49 in instructions.rs
(any comparison test for u32).
In a separate crate (under midenc umbrella), establish a test suite for a set of Rust programs, that does the following:
The implementation of the linker in #24 left a few things undone because they aren't critical to an MVP release. Namely, garbage collection of unused (dead) objects discovered at link-time.
1 Only when the program being linked is an executable program. When linking a library, we must assume all objects with external linkage are used.
2 Objects with internal (or ODR) linkage are always candidates for garbage collection if there are no references from live objects.
This task is relatively low priority, but is a relatively straightforward task to tackle when we have time.
Now that 0xPolygonMiden/miden-vm#1078 is merged (which introduces dynexec
), the groundwork for supporting indirect calls in Miden Assembly is there. Before we can make use of it though, we need the proposed procref
instruction, which would push the hash of a specified function name on the stack. This is done prior to dynexec
, which requires that hash in order to execute the indirect call. We need procref
because we don't know the hash of the function until after compilation, which procref
solves by relying on the Miden assembler to expand it into push.HASH
as it knows the hashes of all the functions.
Once that is implemented, we can make use of it to implement indirect calls in the IR:
In #24, @bobbinth and I briefly discussed whether or not it was possible to define the signature for external functions in the IR, so that libraries which are not present at link-time, but are provided at runtime can be known to the compiler, just like the Miden standard library.
As I mentioned there, we already have support for the Miden stdlib in the IR itself, but a better, more open-ended approach would be to define a metadata format which can be used to communicate the modules that can be dynamically linked, what functions they contain, and the type signature and ABI of those functions.
The compiler could also use this format when it emits Miden Assembly so that it can consume it's own codegen output.
The following is the bare minimum metadata we'd need from the format to be useful:
However, in addition to that information, I would also like the following to be supported (optionally, i.e. it isn't necessary to provide this information if it isn't applicable):
I think there are three ways we'll want to allow providing the metadata:
Embedding the information directly in the Miden Assembly textual format, either using doc comments (perhaps with syntax for attributes), or dedicated (but optional) syntax, e.g.
#! This function takes a pointer to an array of u32, and the length of that array,
#! and iterates over the array, summing its contents.
#!
#! @type (*mut u32, u32) -> u32
#! @abi C
export.sum
...
end
OR
#! This variation uses a slightly different set of attributes that could also be beneficial for documentation more generally
@param ptr :: *mut u32
@param len :: u32
@result u32
@abi C
export.sum
...
end
We'd assume certain defaults if a specific attribute is not present, e.g. if the @abi
attribute is not specified, we'd assume C
, the most interoperable calling convention. The only non-optional attribute is the @type
expression (assuming that's the way we choose to specify the type signature).
This could basically be JSON or some other human-readable/writable textual format with a specification, but would be versioned and express all of the metadata a library author is willing or able to provide, e.g.:
{
"version": "1.0",
"segments": [
{ "offset": 131072, "size": 65536, "init": "0x...." }
],
"modules" {
"foo": {
"exports": {
"bar": {
"abi": "C",
"params": [
{ "type": "*mut u8" },
{ "type": "u32", "extension": "zext" }
],
"results": [
{ "type": "u32" }
]
}
}
}
}
}
This format would be used to either embed in another format (perhaps in .masl
files?), or to allow for a more compact representation. This would basically provide all the same information as the textual format, but using a well-known binary encoding like CBOR or Protobuf. Like the textual format, it would also be versioned to accommodate changes over time.
This format may not be needed, just depends on how we expect to access the metadata. Ideally we'd tell the compiler where a set of .masl
files are, and it loads the metadata from them directly; but we could also expect to find separate files with the same name as its corresponding .masl
, either way works.
This is just an initial sketch of the general idea here, we have a number of options we can pursue, and I'm not too opinionated about them. The main thing is that we define a format, the means of its delivery, and enough useful information about a library to provide a smooth and mutually beneficial developer experience around this. The more information we get, the better the compiler can reason about and verify its own outputs, and in turn provide useful diagnostics to users when something is wrong.
In order to make for a better development experience, and to ensure we have an easy way of doing end-to-end testing, we need a test harness with the following properties:
Some of the tooling for this is already in place, but the actual harness is not implemented yet.
In order to convert Miden IR to Miden Assembly, we require two analysis passes over the IR which give us the necessary information to transform low-level control flow (primitive branches/jumps) into the higher-level control flow (if/while/repeat) used by Miden Assembly.
There are other analyses/passes related to these that we will implement or may want to implement, but these two are critical for other work in the pipeline.
After #65 is merged
Parse a Wasm component into an in-memory representation with everything needed for the translation.
Extracted from #22 (comment)
A solution to this particular problem though is to split the edge in the predecessor, introducing new blocks along any duplicate edges which branch to the same successor, and which act as a passthrough for the "true" successor block. This gets you the desired behavior without the complexity of supporting the same block on more than one path in multi-way branching instructions.
To test a program on multiple input sets via proptest
, we need the emulator not to consume the program but to take it by reference. The way proptest
works is that it takes a closure where we are invoking our program with inputs generated by protest and runs the closure multiple times with different input sets.
#[no_mangle]
pub fn fib(n: u32) -> u32 {
let mut a = 2;
while a < n {
a = a * 3;
}
a
}
Compiles to the following Wasm:
(module
(type (;0;) (func (param i32) (result i32)))
(func $fib (;0;) (type 0) (param i32) (result i32)
(local i32)
i32.const 2
local.set 1
loop (result i32) ;; label = @1
block ;; label = @2
local.get 1
local.get 0
i32.lt_u
br_if 0 (;@2;)
local.get 1
return
end
local.get 1
i32.const 3
i32.mul
local.set 1
br 0 (;@1;)
end
)
(memory (;0;) 16)
(global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
(global (;1;) i32 i32.const 1048576)
(global (;2;) i32 i32.const 1048576)
(export "memory" (memory 0))
(export "fib" (func $fib))
(export "__data_end" (global 1))
(export "__heap_base" (global 2))
)
Which translates to the following IR:
module noname
const $0 = 0x00100000;
global external @__stack_pointer : i32 = $0 { id = 0 };
global external @gv1 : i32 = $0 { id = 1 };
global external @gv2 : i32 = $0 { id = 2 };
pub fn fib(i32) -> i32 {
block0(v0: i32):
v2 = const.i32 0 : i32;
v3 = const.i32 2 : i32;
br block2(v3);
block1(v1: i32):
block2(v5: i32):
v7 = cast v5 : u32;
v8 = cast v6 : u32;
v9 = lt v7, v8 : i1;
v10 = cast v9 : i32;
v11 = neq v10, 0 : i1;
condbr v11, block4, block5;
block3(v4: i32):
block4:
v12 = const.i32 3 : i32;
v13 = mul.wrapping v5, v12 : i32;
br block2(v13);
block5:
ret v5;
}
v6 is undefined.
Set the Rust toolchain version via rust-toolchain.toml
and use it in 'cargo make`.
See details at https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file
To reproduce - uncomment test_func_two_arg!(min, core::cmp::min, u32, u32, u32);
test in instructions.rs
added in #49
Stack backtrace:
---- rust_masm_tests::instructions::min_u32_u32 stdout ----
thread 'rust_masm_tests::instructions::min_u32_u32' panicked at codegen/masm/src/stackify/emitter.rs:724:14:
value v1 not found on operand stack
stack backtrace:
0: rust_begin_unwind
at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/std/src/panicking.rs:617:5
1: core::panicking::panic_fmt
at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/panicking.rs:67:14
2: core::panicking::panic_display
at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/panicking.rs:150:5
3: core::panicking::panic_str
at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/panicking.rs:134:5
4: core::option::expect_failed
at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/option.rs:1988:5
5: core::option::Option<T>::expect
at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/option.rs:898:21
6: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_stack_dependency
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:722:19
7: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_node
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:637:21
8: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_inst
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:962:13
9: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_inst_dependency
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:794:13
10: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_node
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:605:36
11: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_inst
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:962:13
12: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_node
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:615:25
13: miden_codegen_masm::stackify::emitter::MasmEmitter::emit_schedule
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:513:13
14: miden_codegen_masm::stackify::emitter::MasmEmitter::emit
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/emitter.rs:470:13
15: <miden_codegen_masm::stackify::ConvertHirToMasm<&miden_hir::function::Function> as miden_hir::pass::conversion::ConversionPass>::convert
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/mod.rs:172:13
16: <miden_codegen_masm::stackify::ConvertHirToMasm<miden_hir::module::Module> as miden_hir::pass::conversion::ConversionPass>::convert
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/mod.rs:123:33
17: <miden_codegen_masm::stackify::ConvertHirToMasm<miden_hir::program::Program> as miden_hir::pass::conversion::ConversionPass>::convert
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/stackify/mod.rs:82:35
18: miden_codegen_masm::MasmCompiler::compile
at /Users/dzadorozhnyi/src/miden-ir/codegen/masm/src/lib.rs:62:12
19: miden_integration_tests::compiler_test::CompilerTest::ir_masm_program
at ./src/compiler_test.rs:263:27
20: miden_integration_tests::compiler_test::CompilerTest::expect_masm
at ./src/compiler_test.rs:231:23
21: miden_integration_tests::rust_masm_tests::instructions::min_u32_u32
at ./src/rust_masm_tests/instructions.rs:106:17
22: miden_integration_tests::rust_masm_tests::instructions::min_u32_u32::{{closure}}
at ./src/rust_masm_tests/instructions.rs:95:28
23: core::ops::function::FnOnce::call_once
at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/ops/function.rs:250:5
24: core::ops::function::FnOnce::call_once
at /rustc/c469197b19d53a6c45378568f73c00986b20a5a5/library/core/src/ops/function.rs:250:5
When the min(a, b)
Rust code compiled to the following Wasm:
(module
(type (;0;) (func (param i32 i32) (result i32)))
(func $entrypoint (;0;) (type 0) (param i32 i32) (result i32)
local.get 0
local.get 1
local.get 0
local.get 1
i32.lt_u
select
)
(memory (;0;) 16)
(global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
(global (;1;) i32 i32.const 1048576)
(global (;2;) i32 i32.const 1048576)
(export "memory" (memory 0))
(export "entrypoint" (func $entrypoint))
(export "__data_end" (global 1))
(export "__heap_base" (global 2))
)
And translated into the following IR:
module noname
const $0 = 0x00100000;
global external @__stack_pointer : i32 = $0 { id = 0 };
global external @gv1 : i32 = $0 { id = 1 };
global external @gv2 : i32 = $0 { id = 2 };
pub fn entrypoint(i32, i32) -> i32 {
block0(v0: i32, v1: i32):
v3 = cast v0 : u32;
v4 = cast v1 : u32;
v5 = lt v3, v4 : i1;
v6 = cast v5 : i32;
v7 = neq v6, 0 : i1;
v8 = select v7, v0, v1 : i32;
br block1(v8);
block1(v2: i32):
ret v2;
}
This is the first optimization pass I think we'll want to implement as we start to observe less-than-ideal instruction sequences in the generated MASM. Such sequences are likely to occur as an artifact of the stackification algorithm. Rather than add more complexity to that pass, it is more beneficial to post-process the output of the pass and collapse inefficient instruction sequences into more efficient forms using peephole optimization. We can then use this to inform changes to the stackification pass, by focusing on things which are not easily resolved by the peephole optimizer.
In particular, I'm expecting the following to be good candidates for this:
Some of these we could implement at a higher-level on the IR prior to stackification, but we'll want at least some degree of peephole optimization to clean up the output of stackification.
Would be good to create an example (or maybe a couple) showing how we can compile WASM files to MASM files. The basic example could include something simple - like computing a fixed Fibonacci number. We could add to these examples over time to include more complex use cases.
The examples could be in a separate top-level directory (e.g., examples
) with a short README explaining how to compile them.
The main goal here is to make it easy for newcomers to see how compilation works and check out the output of the compiler for some simple programs.
Extracted from #22 (comment)
The DataFlowGraph provides insert_inst as a primitive, which allows you to specify an InsertionPoint, which can be either the start/end of a Block, or before/after an Inst, but I haven't exposed that in a very convenient way in the DefaultInstBuilder. All of the InstBuilder implementations actually use this under the covers, so its really just a matter of surfacing it better in one of the builders.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.