0xpolygonmiden / air-script Goto Github PK
View Code? Open in Web Editor NEWA domain-specific language for writing AIR constraints for STARKs
License: MIT License
A domain-specific language for writing AIR constraints for STARKs
License: MIT License
Currently comments written in the AirScript file are lost after parsing. We should preserve the comments and output them at the relevant places in the generated rust code.
Originally posted by @grjte in #4 (comment)
Based on @bobbinth's comments on PR #40 here, it might be better to change the periodic columns to Vec<Vec<Felt>>
instead of Vec<Vec<u64>>
at IR level as it might be helpful in the future when we may want to evaluate constraints at a random point..
Also we could refactor the to_string()
method for periodic columns proposed by @bobbinth here.
A thought for the future: instead of doing conversions from u64 to Felt here, we could do the following:
- Define static arrays for all periodic columns such that these arrays already contain Felt's.
- Inside get_periodic_column_values() convert these arrays to vectors (which should be just memory copy operations) and return.
The performance benefit of the above is probably very minor - so, definitely not a high priority item.
We should make it a requirement to have boundary constraints. When this is updated, the mdbook docs section about boundary_constraints
should also be updated.
We should set up a structure for testing full files and add a few tests of full air files for v0.1 - some simple, and some that use the full v0.1 specification (e.g. system.air
and binary.air
that @tohrnii previously defined)
We may want to consider something like expect_test for this
Multiplication needs to be added to the entire pipeline, with unit tests at each stage.
We should allow users to name trace columns in a group as an array. For eg,
trace_columns:
main: [x, y, a[4], z]
Here the main trace contains 7 columns where 4 are grouped under a
. These columns can be accessed using their local indices inside a
(for e.g., a[1] to refer to the second column in group a
).
Currently inside the trace_columns section, the parser expects the main trace columns to be defined before the auxiliary columns. We should consider allowing users to define the auxiliary columns before the main columns. However, If it is preferable to keep main trace columns definition first as convention, we should show a better error to users regarding the same.
Currently, in the Operation enum the way we reference trace columns is via MainTraceCurrentRow
and MainTraceNextRow
(and similar for auxiliary trace). This hard-codes the evaluation frame to work only with two consecutive rows. Instead, we could make the structure more general by putting the row offset into the enum itself. For example, the enum could look like this:
pub enum Operation {
Const(u64),
MainTraceRow(usize, usize),
AuxTraceRow(usize, usize),
...
}
In the MainTraceRow
and AuxTraceRow
the first usize
would refer to the column offset and the second usize
would refer to the row offset. So, for example: MainTraceRow(1, 0)
would mean column 1 at the current row, but MainTraceRow(1, 1)
would mean column 1 at the next row.
An alternative structure proposed by @grjte for constants is to move it outside source section to the top of the AirScript file.
The syntax for declaring constants in this case could look like:
def ExampleAir
const A = 1
const B = 0
trace_columns:
...
Formatting issue in:
The output of this codegen option should be a JSON file with the following structure:
{
num_polys: integer,
num_variables: integer,
constants: [array of field elements in the Goldilocks field],
expressions: [array of expression nodes],
outputs: [array of node references],
}
In the above, node reference has the following structure:
{
type: "POL | POL_NEXT | VAR | CONST | REF",
index: integer,
}
Where:
POL
refers to the value in the column at the specified index
in the current row.POL_NEXT
refers to the value in the column at the specified index
in the next row.VAR
refers to a public input or a random value at the specified index
.CONST
refers to a constant at the specified index
.REF
refers to a previously defined expression at the specified index
.An expression has the following structure:
{
op: "ADD | SUB | MUL",
lhs: node_reference,
rhs: node_reference,
}
Where ADD
, SUB
, and MUL
are the corresponding operations in the Goldilocks base field.
To generate the code in the above format, we would need to do the following:
variables
array for the expressions.In addition to the above, we also need to add constraint merging logic to the constraint evaluation description. There should be only 3 entry points in the outputs
section:
The logic for merging transition constraints is described here and the logic for computing and merging boundary constraints is described here.
A few other things to consider:
variables
section. We also may need to add degree adjust factors to the variables
array as well.It would probably make senes to split implementation of this into several steps. The first step should probably not involve constraint merging and should have one output for each boundary and transition constraint. Only after this is working, we should implement constraint merging.
The name of the project has been changed to AirScript. We should replace mentions of AIR DSL to AirScript throughout the project.
The use of $rand
random values by boundary constraints should be limited to auxiliary constraints and therefore to boundary constraints which are against columns from the auxiliary trace.
We need to throw an error if $rand
is used in boundary constraints defined against main trace columns. This should be handled in the IR.
For example, this should be an error:
...
trace_columns:
main: [a, b]
...
boundary_constraints:
enf b.first = a + $rand[0]
...
In order to define the constraints on the stack overflow column in Miden VM we need a way to declare public input vectors of unknown size
We need to make the process of defining boundary constraints more flexible for Polygon Zero. We can do this by making it possible for them to define lagrange polynomials directly. This requires:
$x
We need unit tests for the IR and the codegen modules. We should come up with a good setup (possibly using an external tool) that keeps our tests simple and readable and can be used for all codegen cases.
@Al-Kindi-0, @grjte, @Overcastan, @tohrnii
The working group coordinator ensures scope & progress tracking are transparent and accurate. They will:
After support for grouping of trace columns is added in #83, we should preserve the group in IR as well so that it can be referenced by codegen with the column group's name.
At least one public input should be required, so this source section should be required and non-empty.
We should add tests and errors for the following cases:
ir
parser
Exponentiation needs to be added to the entire pipeline, with unit tests at each stage.
As noted inline in the code, the degree computation for periodic columns is currently incorrect. (The handling of repeated references to the same columns is wrong)
To start the project very simply, we should define the constraints and then the grammar to parse the following constraints against Miden VM's clock cycle column using LALRPOP
Currently it's possible to declare a trace columns source section without declaring the shape of the main execution trace. This shouldn't be possible. A new test should be added when it is fixed.
This is wrong but currently transpiles without error
trace_columns:
aux: [a, b]
We need access to random values when evaluating transition constraints over the auxiliary trace. The main complication with these is that they may be values in the extension field (and will be, in Miden's case), whereas everything that's been implemented so far (for the main trace) only requires values from the base field.
We also need to determine how random values are going to be referenced. Based on the previous discussion, it seems like we should refer to them by $rand[0], $rand[1]
, etc.
This is an example of a problematic constraint:
enf b = a + $rand[0]
Context from Bobbin:
This probably should be invalid as this constraint should never be satisfied: the right hand side is an extension field element, while the left hand side is a base field element. If the value in $rand[0]
is truly random, these will never be equal.
We'll need to think about a general rule which can describe this.
Originally posted by @bobbinth in #36 (comment)
We need to add parentheses (
)
to support grouping and more complex expressions
For some backends it may be useful to have additional sections defined which could make constraint descriptions cleaner.
One such section could be random_values
. This section would enable naming/grouping of random values similar to how trace columns section allows for trace columns. The syntax for this section could look as follows:
random_values:
rand: [15]
The above specifies that there should be 15 random available for use in auxiliary trace constraints, and that these random values will be available under variable $rand
.
Similar to trace columns, we could name various values as follows:
random_values:
rand: [a, b]
The above specifies that a
and b
names could be used to refer to specific random values, and also that $rand
variable could be used to refer to the entire vector.
Another such section could be validity_constraints
. These constraints would be similar to transition constraints, with the only exception that they would work only with the current row of the trace. For example:
trace_columns:
main: [a, b, c]
validity_constraints:
enf a^2 - a = 0
Another option is to determine whether a constraint is a validity constraint or a transition constraint automatically and then we don't need an additional section. Though, we will need to rename transition_constraints
section into something more general.
We need to add a way to specify boundary constraints that rely on public inputs.
In the future, it would be cool to replace all inline constants with named constants. Could be one of optimization passes.
Originally posted by @bobbinth in #95 (comment)
Currently, we have the built-in $rand
which allows accessing a random value by index, e.g. $rand[0]
.
It would be convenient to do the same thing with the main
and auxiliary
execution traces and allow columns to be accessed by index rather than exclusively by their column identifier.
This requires:
$main[n]
, $aux[n]
, $main[n]'
, and $aux[n]'
where n
is a number. This should only be allowed in transition constraint expressions. The AST needs to be updated to allow the new column representation as well.To improve the ergonomics of the AirScript language, we should add support for:
let
keyword with these 3 types: scalars, vectors, matricestrace_columns
section@Al-Kindi-0, @grjte, @Overcastan, @tohrnii
The working group coordinator ensures scope & progress tracking are transparent and accurate. They will:
For some backends it may be useful to have a random_values
section. This section would enable naming/grouping of random values similar to how trace columns section allows for trace columns. The syntax for this section could look as follows:
random_values:
rand: [15]
The above specifies that there should be 15 random available for use in auxiliary trace constraints, and that these random values will be available under variable $rand.
Similar to trace columns, we could name various values as follows:
random_values:
rand: [a, b]
The above specifies that a and b names could be used to refer to specific random values, and also that $rand variable could be used to refer to the entire vector.
The initial codegen is missing some things and does not yet produce valid code, since it was primarily targeting generation of constraints.
The following should be added:
IdentifierType::PeriodicColumn
enum variantWe should add support for constants. The syntax was defined by @bobbinth in the original Constraints Description Language discussion here. Taken directly from @bobbinth's original post:
Constants could be of the following three types:
The constants should be declared in a separate section.
Declaring constants could be done like so:
constants:
a: 123 // a scalar constant
b: [1, 2, 3] // a vector constant
c: [
[1, 2, 3],
[4, 5, 6],
] // a matrix constant
Referring to vector constants inside expressions could be done with index notation:
let x = b[1]
let y = c[0][2]
Something like this could also be possible:
let x = c[1][1..3] // this sets x to [5, 6]
Other open suggestions by @grjte and @bobbinth to easily distinguish between trace columns and constants:
This task involves making changes to the parser, IR and codegen.
We should add support for defining validity constraints. These constraints would be similar to transition constraints, with the only exception that they would work only with the current row of the trace. For example:
trace_columns:
main: [a, b, c]
integrity_constraints:
enf a^2 - a = 0 # this is a validity constraint
These constraints should be in the same section as transition constraints however we should change the name of the combined section to integrity_constraints
(or something similar). Also, in the IR we should still probably have a single algebraic DAG, but we could mark entry points as transition vs. validity constraints.
The IR unit tests currently only check that errors were not thrown when building the AirIR
from the parsed source. We should also ensure that the resulting IR is expected, and more rigorous testing should be added for edge cases
Comment for the future: exponentiations could be really expensive, especially for constant-time implementations (like the one we are currently using). So, whenever possible, we should replace them with multiplications or more specialized operations.
For example, here, instead of doing (current[0]).exp(E::PositiveInteger::from(2_u64)) we should be doing (current[0] * current[0]) or current[0].square().
Also, if we know that a constant value is smaller than u32 we should try to use conversions from u32. For example: E::from(2_u32) instead of E::from(2_u64) as we may come up with a more efficient reduction for smaller values later on.
Originally posted by @bobbinth in #58 (comment)
We should make it a requirement to have transition constraints. When this is updated, the mdbook docs section about transition_constraints
should also be updated.
During codegen for an operation, we need to add parentheses around compound expressions. However, parentheses should not be added in case the expression is not an arithmetic operation. We should add a check in the to_string() method to add the parentheses only if the expression is an arithmetic operation.
Periodic columns are used for transition constraints. The syntax has already been defined, but we need to add support for them
Once we implement a core subset of everything discussed here, we will be able to define the majority of Miden VM's constraints using this DSL.
To identify & settle questions for this first core version, it's useful to see what our constraints actually look like in this language for a reasonably self-contained set of constraints.
@tohrnii has defined the constraints for Miden's bitwise chiplet here according to the previously referenced discussion
Let's discuss/adjust & agree on any questions related to this file, then possibly expand a bit to match the scope of the first milestone for this project. (We will want to include boundary_constraints
and possibly the aux
trace_columns
and use
imports as well)
This will give us a guiding example as we move forward and ensure that the language is clear, consistent, and usable.
Once #14 is handled, we'll need to add the IR and codegen for the auxiliary trace
not for this PR, but I think it would be better to use `Felt::ZERO` and `Felt::ONE` when converting from `0` or `1`. Can you open an issue? (Same thing with `E::ZERO` and `E::ONE`)
Originally posted by @grjte in #42 (comment)
We should add support for variables. The syntax was defined by @bobbinth in the original Constraints Description Language discussion here. Taken from @bobbinth's original post:
A variable is defined using a let
keyword. For e.g., let a = b * c
Variables could be of the following three types:
Declaring variables could be done like so:
trace_columns:
main: [a, b, c]
transition_constraints:
let x = a + 123 // a scalar variable
let y = [a + 1, b + 2, c + 3] // a vector variable
let z = [
[a + 1, b + 2, c + 3],
[a + 4, b + 5, c + 6],
] // a matrix variable
Referring to vector variables inside expressions could be done with index notation:
let a = [m, n]
let x = a[0] + 1
We could build variables from column references as well:
trace_columns:
main: [a, b]
let x = [a, b] // x is a vector with current values from a and b
let y = [a', b'] // y is a vector with next values from a and b
Variables should only be defined in boundary_constraints
and transition_constraints
sections.
This task involves making changes to the parser, IR and codegen.
Currently, the IR has the operations Add
and Neg
, but not Sub
. For future changes, it will be easier if the IR has Sub
as well.
This requires changes to:
Comment for the future: what should we do if two periodic columns are identical (same cycle length and values)? Should it be an error? A warning? Or something else?
Originally posted by @bobbinth in #24 (comment)
This should probably be handled as a warning.
We could also consider doing an optimization to identify duplicate periodic columns and always reference the one that was declared first instead, then remove the duplicate from the generated Air. This may be more work than it's worth though, and is lower priority than issuing a warning.
In order do define some constraints, we may want to be able to define lagrange polynomials directly. In such cases, the domain value that is used for interpolation is needed, so we need a shortcut for referring to this x-value at each row.
We can do this by adding a new built-in value $x that represents x at any given row.
Here's an example from @dlubarov at Polygon Zero of how this could give more flexibility in defining constraints:
// Boundary constraint.
a * l_0(x) = 0
// Every other row.
a * odd_row(x) = 0
def odd_row(x) = (x^(n/2) - 1)
def l_0(x) = (x^n - 1) / (x - 1)
Required changes:
This adds flexibility to AirScript, but Miden VM does not need this at the moment, so we could keep updates to the miden codegen very minimal (i.e. just throw an error if this is used)
It would be nice to have python style list comprehension supported in AirScript. For eg.
trace_columns:
main: [a, b, c[4]]
# raise value in the current row to power 7
let x = [col^7 for col in c]
# raise value in the next row to power 7
let y = [col'^7 for col in c]
In both cases, the result would be a vector of 4 elements.
We could also add support for iterating over two lists simultaneously like:
trace_columns:
main: [a, b, c[4], d[4]]
let diff = [x - y for (x, y) in (c, d)]
We could also add support for iterating and enumerating like:
trace_columns:
main: [a, b, c[4]]
let x = [2^i * c for (i, c) in (0..3, c)]
We could also support list folding. There are two possible syntax:
trace_columns:
main: [a, b, c[4], d[4]]
# compute sum of products of values in c and d
let x += c * d for (c, d) in (c, d)
# compute product of sums of values in c and d
let y *= c + d for (c, d) in (c, d)
OR
trace_columns:
main: [a, b, c[4], d[4]]
# compute sum of products of values in c and d
let x = sum([c * d for (c, d) in (c, d)])
# compute product of sums of values in c and d
let y = prod([c + d for (c, d) in (c, d)])
It does feel a bit weird that we use structs from the parser in the IR, but I guess we do it in a few other places already. Maybe there should be a air-script-core or air-script-common crate which defines such commonly used structs. (but this would be for another PR)
Originally posted by @bobbinth in #94 (comment)
These features are described in the original AirScript discussion.
We want to be able to define functionality in one file and use it in another, e.g.:
use: bar
trace_columns:
main: [a, b, c, d]
aux: [e, f]
enf bar(main[0..2], aux[0..1])
Add function support as described here.
Add support for evaluator functions as described here.
Add selector support as described here
@torhnii, @grjte, @Overcastan, @Al-Kindi-0
The working group coordinator ensures scope & progress tracking are transparent and accurate. They will:
We need to come up with a new name for this DSL that will ideally tick these boxes:
We should add recoverable errors to the parser to recover all errors while parsing instead of terminating on encountering the first error.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.