guildofweavers / distaff Goto Github PK
View Code? Open in Web Editor NEWZero-knowledge virtual machine written in Rust
License: MIT License
Zero-knowledge virtual machine written in Rust
License: MIT License
Some of the instructions below would require support for 2-parameter instructions.
digest.n
to support hashing long sequences of values (as in #38)bext.n.m
to extract mth bit from an n-bit value. This could replace the current isodd
. For example, bext.128.0
would extract the least significant bit of the field element.btrunc.n.m
to truncate an n-bit value to m bits. For example, btrunc.128.64
would truncate a field element to a 64-bit value.bsplit.n.m
to split an n-bit value into m-bit chunks. For example, bsplit.64.32
would split a 64-bit value into two 32-bit values.uroll.4
to be the opposite of roll.4
roll.3
It would be great if the VM supported EC operations (point addition, doubling, multiplication). This would open the door for verifying EC signatures (e.g. Schnorr) within the VM as well as many other useful things.
The biggest challenge is that the VM operates in a 128-bit base field, but for acceptable security we need elliptic curves in ~256-bit field. One way to address this is by defining an elliptic curve in a quadratic extension of the base field. This would be similar to Microsoft's FourQ curve.
Currently, the base field is defined by prime modulus m = 2128 - 45 * 240 + 1. This can be changed if needed, but the modulus should comply with the following constraints:
Currently, a program is just a linear sequence of instructions. This makes writing programs with complex conditional logic challenging.
One way to address this is to change the definition of a program to be a set of alternative execution paths. In such a case, a commitment to a program would be a root of a Merkle tree where each leaf is a linear execution path.
A proof of execution for such a program would need to include a Merkle branch to the execution path which was taken during program execution.
Currently, mpath
assembly instruction accepts node index as secret input. There are scenarios when supplying node index as a public input is required. So, there should be an additional instructions (maybe pmpath
) which would take the node index as public input.
VERIFY instruction should be added to the instruction set.
Currently, all programs execute against a single stack. Adding an alt stack would enable:
Adding of ALT stack (without mirroring) would require 2 new instructions:
TOALT
- to move a value from the main stack to the alt stackFROMALT
- to move a value from the alt stack to the main stackImpact on performance should be relatively small - maybe up to 5% increase in proof generation time. Proof size should remain un-affected for programs which don't make use of the alt stack.
INV instruction should be added to the instruction set.
This instruction will also enable division. But rather than having a dedicated division instruction, division can be computed as a sequence of 2 instructions: INV MUL
Currently, the only boolean operation is not
. It would be useful have or
and and
operations as well. It would probably make sense to add these in the core instruction set (not just assembly).
Currently, we are using FRI for low-degree testing. Replacing it with DEEP-FRI might be a good idea.
DEEP-FRI paper: https://eprint.iacr.org/2019/336
Distaff VM should support hashing. Specifically, the following forms of hashing need to be supported:
Ideally, this would be achieved with a single instruction - e.g. HASH. The instruction could behave as follows:
Assuming stack registers are: s0, s1, s2, s3 . . ., where each register holds a 128-bit field element and s0 is the top of the stack, we can view a = (s3, s2) and b = (s1, s0), then:
Hashing of b = hash(a) can be emulated by setting the top two values of the stack to 0. This, in effect will mean that hash(a) = hash(a, 0).
However, we can't describe the entire hash operation as a single instruction of degree less than 6. So, the solution is to have the operation be a single round of hashing. For example, if HASH described a round of a hash function which requires 10 rounds, hashing of 2 values - e.g. hash(1, 2) may look like so:
PUSH 0, PUSH 1, PUSH 0, PUSH 2,
HASH, HASH, HASH, HASH, HASH, HASH, HASH, HASH, HASH, HASH, HASH,
DROP, DROP
Basically, we push our values onto the stack, then we apply hash round operation 10 times, then we drop 2 elements from the top of the stack (hash result is in the next 2 elements). Excluding PUSH operations, execution of the above sequence of instructions has the same effect as the idealized HASH instruction.
Based on this, the criteria for selecting a hash function is as follows:
It seems like Rescue hash function is the best candidate. It meets 2 criteria out of the 3:
However, it does require special handling of the first and the last rounds. It may be possible to get rid of this special handling - though, security implications of this modification need to be better understood.
Other functions considered were Poseidon and GMiMC - but they both have a fairly large number of rounds, and Poseidon also has 2 different types of rounds (full round and partial round).
Currently, all instructions are encoded using 5 bit representation. This has several implications:
However, most operations have degree 2 or less with HASHR
and CMP
being two exceptions.
We can expand the encoding of instructions to 7 bits, such that the upper 2 bits encode high-degree operations, while lower 5 bits encode low-degree operations. This would enable the following:
BEGIN
and NOOP
would span all 7 bits (would be shared between two groups of instructions)HASHR
operation into the high-degree group and increase S-Box degree to 5.CMP
operation to the high-degree group and simplify the constraints so that they don't need to rely on aux stack register.Proof security level estimation currently takes into account only trace security level. This is incomplete as it does not incorporate security level of low-degree testing. This should be fixed.
Security level is currently estimated here: https://github.com/GuildOfWeavers/distaff/blob/master/src/stark/options.rs
It might make sense to do this after #1 is addressed.
Preventing exponential explosion of execution paths due to conditional branching will require refactoring of program representation as described here.
New instructions should be added to the VM to enable comparing two values (less than, greater than, etc.). This cannot be done as a single operation, but doing this in a cycle of 128 operations seems doable. This will probably require adding 2 - 3 new instructions to the instruction set.
Current hash
instructions allow hashing between 1 and 4 field elements. Hashing of more than 4 elements is possible but requires the user to manually calculate the number of hash
instructions needed. Ideally, we should have a single digest.n
instruction which can be used to hash arbitrarily many elements.
To support secret inputs a READ operation should be added to the instruction set.
The READ operation will read an secret input from the secret input tape and push it onto the stack.
The length of the secret input tape should not be limited - so, this will enable providing an unlimited number of secret inputs to a program.
Currently, stack is the only place to store values. This is rather limiting, and can be improved by introducing random access memory. Adding RAM would require:
LOAD
and STORE
instructions to access RAM state registerOverall performance impact of the above changes would probably result in over 50% execution slowdown with instruction mirroring being the biggest driver.
To improve code organization the repo should be split up into multiple crates. Potential crates are:
The purpose of a CHOOSE operation is to select between two values based on a third value. Specifically:
Assuming stack registers are: s0, s1, s2, . . ., where each register holds a 128-bit field element and s0 is the top of the stack, CHOOSE operation will do the following:
Semantically, this is the same as popping 3 elements from the top of the stack and then pushing (s0 * s2) + (s1 * (1 - s2)) back onto the stack.
It also might make sense to add CHOOSE2 variation of this instruction. It will do the following:
The effect of CHOOSE2 operation would be to select 2 out of the top 4 stack values based on the value of the 5th stack value. This well simplify Merkle branch verification use case.
Currently, there are several operations which are meaningful only if executed as a part of a mini-program
. These operations include CMP
, BINACC
, and HASHR
. Knowing how to arrange these operations into mini-programs to achieve common tasks places an undue burden on users of the VM.
A better approach is to have a simple assembly-like set of instructions which may reduce either to a single operation or to a pre-defined sequence of operations.
For example, new operations could be HASH
, GT
, 'LT` etc. which would be reduce to a mini-programs upon execution.
Currently, several operations rely on existence of an AUX stack register. These operations are:
EQ
CMP
BINACC
Removing AUX register would simply the design of the VM. The only downside is that EQ
operation would require 2 cycles to execute.
Bounded loops could be a purely compile-time construct for Distaff assembly. The loops would get unfolded into repeated sequences of instruction. The syntax could look like so:
repeat.10
a_0 ... a_n
end
where:
10
is a parameter that specifies how many times the body of the loop should be executed.This is great work! Do consider taking inspiration from ZkVM, since it implements a nice Bitcoin Script-like language (w/ bulletproofs as the proof system). You might get some nice ideas about the lang design from there https://github.com/stellar/slingshot/tree/main/zkvm
Currently, proof generation is done in a single thread. Prooving time can be sped-up significantly by taking advantage of multi-threading. The following areas could be parallelized (ranked in order from most to least impact):
Candidates for addition:
CSWAP
to maintain consistency with CSWAP2
instruction.Candidates for removal:
CHOOSE2
operation can be emulated by using CSWAP2
operation.CHOOSE
operation can be emulated by using CSWAP
operation.The semantic of isodd
instruction could be as follows:
Pop the top value from the stack, if it is odd, push 1
onto the stack. Otherwise, push 0
onto the stack.
The format for the instruction would be isodd.n
where n
is the max number of bits needed to represent the top value of the stack. If top value of the stack is > 2n, the operation will fail.
Currently, query positions are derived pseudo-randomly from roots of Merkle trees. Instead, we should derive them for root || nonce
where nonce
meets a configurable PoW parameter. This will improve proof security without increasing proof size.
Might make sense to do this after #1 is addressed.
Current default security level of the proofs is around 110 bits (not 120 as mentioned in the docs). Getting more security than this would require running FRI in a quadratic extension of the base field. This should be a long-term goal, but for now, capping security at 100 bits should be sufficient.
This will allow us to reduce proof size by at least 30% because:
Adding unbounded loops will require refactoring of program representation as described here. Syntax for unbounded loops could look like so:
while.true
a_0 . . . a_n
end
where:
Loop body will be executed until the top of the stack is not 0
.
I find a little mistake when reading the code:
distaff/src/stark/constraints/evaluator.rs
Line 188 in 784e5d8
EQ instruction should be added to the instruction set.
0
to the stack if the values are equal, or 1
otherwise.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.