harrytallbelt / verifiable-tests Goto Github PK

An attempt to apply formal methods (D. Gries style) to specification and verification of code-writting tasks for CS students.

License: MIT License

JavaScript 91.46% ANTLR 1.53% HTML 7.00%

formal-methods formal-verification student-project weakest-preconditions simplify antlr4 javascript nodejs

verifiable-tests's Introduction

Verifiable Code-Writing Tests for CS Students

This application attempts to apply formal methods (D. Gries style) to specification and verification of code-writting tasks for CS students.

It is still under development, but you can already use it for all the task examples we have and the other ones you come up with (see task files in /tasks for example of a task structure).

Steps You Need to Get it Runnging

Install Node.js.
Clone repository with

git clone https://github.com/harrytallbelt/verifiable-tests.git

or just download and unzip it.
Enter the repo directory:

cd verification-tests
Install dependencies:

npm install
Run it with

node server.js

and check out localhost:3000 in your browser.
If you want to run on a different interface or choose a different port, set VTESTS_HOST and VTESTS_PORT environment variables.

verifiable-tests's People

Contributors

Stargazers

Watchers

Forkers

alkhizha

verifiable-tests's Issues

doTheoremPt6 leaves commands[] empty

doTheoremPt6 leaves commands[] of a loop branch empty
if the branch contains nothing but loops. Try to repeat (there will be
a null dereference at some point):

do i <> n ->
  do j <> m -> skip od
od

Predicate representation format: leftIntExpr --> left

Is there any reason leftIntExpr and rightIntExpr of 'comp' predicates
cannot be just left and right?

Remove necessity to edit axioms.js each time new axiom added

axioms.js can really infer names of the axioms from their filenames.
If not, we can turn axioms into json objects and add a name field.

Invariant-variant pairs as objects

Shouldn't we store invariant-variant pairs as objects and keep
an array of these objects as a task description object field?
(Currently invariants and variants are two separate array-fields of a task.)

Implement qsort task

Should we add qsort task? (We had the specs somewhere.)

Document axiom triggers

Comment and add examples (of what is being prsed and what's not) to all axiom triggers, as they contain a bit tricky code.

Consider targeting other theorem provers

Currently we only target Simplify, but it is feasible to use other theorem provers (for example look at this frama-c page). The only dependent parts seem to be converting generated predicates to the prover's syntax and parsing its output.

Keeping track of the number of loops

Currently we do not support errors of too many or too few loops
in a program code. (Task description assumes a specific number
of loops.)
In the first case, it will probably result in a nasty server error
at some point.

Write verification script

See docs/verification-script.md.

Custom axiom parsing code support

Some axioms require changing the actual code of the system
(see to-simplify-syntax.js, wp.js, and PredicatesRepresentationBuilder.js),
which is not a good thing. This can be solved by allowing the user (teacher)
to write their own parsing code. This way a new axiom definition will contain:

the actual Simplify definition file,
JS file with a function for recognition of the parts of the predicate,
relevant to the defined axiom use.

Reusable SUM, PROD, and COUNT quantifiers

We cannot define SUM, PROD, and COUNT quantifiers for all possible
inner expression/predicate structure, because Simplify does not support
function- and predicate-variables. For example (SUM k : Q(k) : a[k])
cannot be defined with the same axiom as (SUM k : Q(k) : a[k]*b[k]).

To solve this we might change any quantifier (SUM k : Q(k) : f(k, <outer-var-list>))
into (SUM k : Q(k) : @a[k]) while adding a condition that (A k : Q(k) : @a[k] = f(k, ...)).
This way we can only define SUM, PROD and COUNT for an array.
(In fact, COUNT can be defined as (N k : Q(k) : P(k, ...)) == (SUM k : Q(k) : a[k]),
where (A k : Q(k) : P(k, ...) ^ @a[k] = 1 | ~P(k, ...) ^ @a[k] = 0).)

One complication might be name generation (for @a), because the id format
for Simplify (Modula-3) is the same as for our predicates, so we cannot use any
symbols, we know wouldn't be in the predicate. If we generate a random name,
we'll probably still want to check if it isn't already used in predicate. We also can
concatenate all the names in predicate (UPD: or just find the longest one and
make sure the generated name is longer) and add a number in the end for each
new array. It might not be a big part of the task, as we will (probably) need
to change the implication premises.

The last issue is does this all mean that our system becomes extremely Simplify-oriented?
UPD: This is probably not an issue, because all the feature code will be in
the exact part of the system responsible for working with Simplify.

Correct program representation format for parser

Due to issue #8 we document the program representation format.
It is now needed to correct the pseudocode parser, so it uses the correct program representation format.

DTO docs

The program makes use of DTO objects for parsed programs, predicates and WP context objects, but generally lacks documentation on those.

Interface support for multiple loops

The current take on UI assumes only one loop in the program, but the backend implements the generalised approach for any number of loops. The UI needs to become aware of that and new examples should be added.

wp.js/withoutLoops bug

In wp.js, function withoutLoops does not remove loops from inside if statements, but the code, apparently, expects it to.

Run appropriate Simplify binaries

Currently Windows Simplify binary is run regardless of the user's system (it works on Windows and on Linux via wine). But there are also Simplify binaries for Linux and MacOS, so we can choose which one to use, depending on the OS.

The binaries can be found in bin/versions or downloaded from via the link on this page.

Asynchronous verification script

There is the possibility that input code verification might take too much time for a node.js server. This should be researched and tested, and, if needed, we might want to run the script asynchronously or, better yet, in a separate thread.

The reason this might be irrelevant is because the script itself handles file readings and prover callings asynchronously, so it might not take such an amount of interpreting time in one go. (The prover calling is not implemented on the moment of issue firing, but there seems to be no way it would be synchronous in node.js and we certainly do not want this.).

UPD: Apparently the way to do it is to call setTimeout for 0 ms.

Incorrect handling of IF statements in pt.6 of DO proofs

When proving a DO statement, we toss off all the inner loops of a loop.
This leads to IF statements proving problems, because we lose the conditions,
provided by the inner loop guards negations:
For example consider a loop body

x := -1;
do x < 0 -> ... od;
if x < 0 -> abort | ... fi

By throwing the inner loop out, we lose the knowledge that x will never be < 0 at IF.

We already have problems of this sort, for example proving matrix_search
with an incorrect inner loop initialisation command j := 1 yields not only the
correct error, but also errors in loop finalisation IF statement.

Custom definitions support

Can we allow the user (teacher) to add their own definitions to simplify description of the task? For example

{
  "precondition": "n = N && x = X && equalArrays(a, A) && contains(A, X, 0, N-1)",
  ...
  "definitions": [
  {
    "name": "contains",
    "arguments": [ "a", "x", "from", "to" ],
    "definition": "(E k : from <= k <= to : a[k] = x)"
  },
  ...
  ]
  ...
}

Would this be consistent with perm - a predicate that we do not define, only describe some of its properties?

Precompiled Handlebars templates

Handlebars docs have a note about precompiled templates here, but it seems to talk about client side. Anyway it is probably possible to reuse compiled templates on server side (in the end, it can be a singleton object).

Add bubble sort

Apparently bubble sort is not implemented, because we thought inversion count would be hard to implement, but it wouldn't (see axioms/array-perm).

Perm predicate handling

After solving #28 and #30 we should be able to throw out perm predicate from the predicate grammar and the system code.

Record types

It seems the record types can be implemented pretty easily. The name of the record field being referenced would just be one more type of selector. Then, analogous to arrays, a record can be thought of like a function: field-name -> field-value.

For arrays these two lines are the same.
a[n] := a[m]
a := (a; n:a[m])

For records these two lines are the same.
john.age := samuel.age
john := (john; age:samuel.age)

Simplify supports this kind of operations via its theory of maps.

Script for testing on all the tasks

Store all the solutions somewhere (a secret gist?) in json format {"name":"solution"}.
Write a script that tests a list of task description files and outputs the failing predicates for each of them.

Syntax

node test.js SOLUTION.json [TASK_0.json ... TASK_N.json]

Calling example

node testing-utils/test.js testing-utils/solutions.json `ls tasks/*.json` | tee out.txt

Output format

; TASKNAME, TASKFILE.json
; ------------------------
[AXIOMS]
; FAILING_PREDICATE_IN_PREDICATE_SYNTAX
SIMPLIFY_FAILING_PREDICATE

Output example

; array_sum, sum.json
; ------------------------

(BG_PUSH
  (FORALL (i j b) (PATS (sum i j b))
    (AND
      (IMPLIES
        (> i j)
        (EQ (sum i j b) 0) )
      (IMPLIES
        (<= i j)
        (AND
          (EQ
            (sum i j b)
            (+ (select b j) (sum i (- j 1) b)) )
          (EQ
            (sum i j b)
            (+ (select b i) (sum (+ i 1) j b)) ))))))

; n=N && (A k : 0 <= k < N : b[k]=B[k]) =>
;   n=N && (A k : 0 <= k < N : b[k]=B[k]) &&
;   0<=1<=N && s=(SUM k : 0<=k<1 : B[k])
(IMPLIES
  (AND
    (EQ n N)
    (FORALL (k)
      (IMPLIES
        (AND (<= 0 k) (< k N))
        (EQ (select b k) (select B k)))))
  (AND
    (EQ n N)
    (FORALL (k)
      (IMPLIES
        (AND (<= 0 k) (< k N))
        (EQ (select b k) (select B k))))
    (<= 0 1)
    (<= 1 N)
    (FORALL (k)
      (IMPLIES
        (AND (<= 0 k) (< k 1))
        (EQ s (sum B 0 (- 1 1)))))))

exec for Windows

In prove.js quotation marks in exec's argument command are rendered in cmd.

We should test the code on Windows.

Call Simplify and interpret the results

Write the part of the code that is responsible for calling Simplify.

Keeping track of invariant and bondary functions

The current WP implementation does not behave well with nested loops and if's.
When branches of a loop or an if are considered one-by-one, each of them is given
the same innerSpec object. Those should actually be different, because WP might
remove some number of invariants and boundary functions on each of branches
and we will need to give the updated spec to the next branch proof.

There is a possibility that it might just work in a hacky way the way it is now.
That is because we do not copy specs and effectively all branches have the same
spec object. When map is run over these objects, WP will remove the invariants and
boundary functions just in the right order. Anyway, this is way to subtle, and frankly
just a coincidence.

UPT from 27.04.2017
For multiple loops we also need to differentiate t's of each loop (used to prove pt. 5 of the loop theorem).

Incorrect picking of loop invariant and variant pair

Choosing loop variant and invariant is currently done by taking the first element
of the arrays (using shift), which won't work for several sequential loops. On the
other hand, taking the last pair (with pop) will work for sequential loops, but won't
for the nested ones.

Taking the first loop in the sequence (instead of the last one) might solve this.
Otherwise, the way invariant-variant pairs are saved needs to be reconsidered.

PATS generation affects reverse2 proof

PATS generation messes up reverse2 proof somehow. We probably shouldn't generate PATS for every task.

Use linked lists

The current take on program specification uses arrays to store invariants, boundary functions, result predicates, and context objects. (Double?) linked list seem to better suit this task, because we do not need random access, but do want persistent slices when adding/removing first/last elements.

The program representation also uses arrays, while it probably can also benefit from lists.

The suggested package is this one: https://www.npmjs.com/package/yallist

Note that:

this is not performance critical right now,
but may result in a better looking code (without so much array copying).

Scripts to parse, render, and format predicates/pseudocode

Scripts to parse (this one we have), render, and format (parse+render) predicates and pseudocode. Put them into testing-utils somewhere.

Antlr parser errors

Currently antlr parser errors aren't handled. We need to add support for syntactic errors and notify the user in case they make one.

UPD: We might want to reuse the error format antlr uses for errors in ProgramDtoBuilder.js (search for TODO).

UPD: Look into antlr semantic predicates to check if an assignment statement has identica lvalues (probably won't work out, but still).

Localisation

Considering the project origins, it seems sensible to add support for Russian and Ukrainian languages.

Exceptions in parse-task.js

Let parse-task.js throw exceptions instead of returning null's.

Error and shorthand highlighting

Improve UI just a little bit:

on click highlight selected errors
on click highlight selected shorthands

Task syntax checking script

Implement a script that

checks predicate syntax without shorthands
checks shorthands
applies shorthands and checks resulting predicate syntax
checks axioms somehow ???

Identical lvalues in assignment

We claim that using the same lvalues is a syntax error, but it is definitely a semantic one, because we cannot check all these options:

a[i + 1], a[1 + i] := 123, 321

x := 1;
a[i + 1], a[i + x] := 123, 321

x := 0;
do k <> n && a[k] <> val ->
  x := x + 1
od;
a[i + 1], a[i + x] := 123, 321  // sometimes x will be 1 and sometimes not

We should probably throw out the code, that checks that lvalues are identical (it is obviously wrong) and just state a specific order in which the assignment is performed (left to right, for example).

Note: this makes errors in ProgramRepresenatationBuilder class not needed anymore.

Initialize task and axiom lists on module level

It might be a better idea to read all the tasks and axioms once the system is loaded instead of awaiting for the code to trigger getAxiom or getTask functions.

Generating PATS forbids usage of unique bound variables

Fixing #47 we used an axiom trigger to generate PATS clojure, but the callback in the trigger cannot use getUniqueName() function and so uses the unchanged bound variable, which is prone to errors.

Maybe, in trigger, we should provide a way of getting the standard conversion result.

Add "Back to the task list" link on the task view

Predicate rendering problems

We should render parentheses, because they are needed sometimes: a && (x => y).
We should find a way to indent predicates, as they can be quite long.

Properties vs definitions in axiom sets

Try to use axioms giving the desired properties (perm)
instead of full definitions (sum).
This might render the *_rev axioms unnecessary.

Implement predicate parser

We would like to parse predicates from mathematical notation. It should also be documented (connected to #8).

UPD: quirks of predicate notation we'll have to do something about (either implemented or ban from usage)

~~B[0:n-1], array slices;~~ (ruled out)
perm((x,y,z), (X,Y,Z)), permutations of vectors;
x, y = X, Y, vector equalities;
~~division (used in "reverse" task),~~ (ruled out)
a <= x <= b, range defining comparisons;
~~b = B, array equivalence;~~ (ruled out)
~~sum(A,a,b), Contains(A,a,b,x), functions and predicates defined via axioms.~~ (ruled out)

Storing number constants as numbers vs strings

Currently, when parsing pseudocode, the number constants are converted to int's which might in some cases cause overflow. To prevent that, we may store int constants as strings, but it isn't clear if we should.

Implement WP transformer

Using the algorithm, described in docs/wp.md, implement WP predicate transformer.