STOKE

STOKE is a stochastic optimizer for the x86_64 instruction set. STOKE uses random search to explore the extremely high-dimensional space of all possible program transformations. Although any one random transformation is unlikely to produce a code sequence that is both correct and an improvement over the original, the repeated application of millions of transformations is sufficient to produce novel and non-obvious code sequences that have been shown to outperform the code produced by general-purpose and domain-specific compilers, and in some cases expert hand-written code.

STOKE has appeared in a number of publications. For a thorough introduction to the design of STOKE, see:

Stochastic Superoptimization -- ASPLOS 2013 (link):
Data-Driven Equivalence Checking -- OOPSLA 2013 (link):
Stochastic Optimization of Floating-Point Programs with Tunable Precision -- PLDI 2014 (link):

Prerequisites
Downloading and Building STOKE
Using STOKE
Additional Features
Extending STOKE
Code Organization
Initial Search State
Search Transformations
Performance Term
Correctness Term
Live-out Error
Verification Strategy
Command Line Args
Frequently Asked Questions
Contact

Prerequisites

STOKE will run on modern 64-bit x86 processors. We officially support Haswell processors with AVX2 extensions. STOKE should also run on Sandy Bridge systems (with AVX, but not AVX2), and Nehalem systems without either extension; however officially these targets are not supported.

To check what level of hardware support you have, run:

$ less /proc/cpuinfo

and check if the following cpu flags are present:

flags: ... avx avx2 bmi bmi2 popcnt ...

If you don't have 'avx' or 'avx2', you will need to compile for nehalem. If you have 'avx', but not avx2, you will compile for 'sandybridge'. If you have both, you can use the default make targets. Build instructions are in the next section.

STOKE is supported on the latest Ubuntu LTS release; in practice, it will also run on Ubuntu 13.10+ and on debian testing. If you're trying to get STOKE to work on another linux distribution, having the right version of g++ is key. STOKE is supported on 4.8.2 only (this is the current version in Ubuntu 14.04). It should also work on 4.9.1, but in the past this has required minor tweaks (4.9.1 is the current version in Debian testing). g++ 4.7.x and older definitely will not work.

Most of STOKE's software dependencies are available through apt. These can be satisfied by typing:

$ sudo apt-get install git subversion flex bison ccache doxygen g++ g++-multilib ghc libghc-regex-tdfa-dev libghc-regex-compat-dev libghc-split-dev cmake libghc-regex-compat-dev libjsoncpp-dev

The rest of the dependencies will be fetched automatically as part of the build process.

Downloading and Building STOKE

The entire STOKE code base, is available on github under the Apache Software License version 2.0. To clone a copy of the source code, type:

$ git clone https://github.com/StanfordPL/stoke-release

The remainder of STOKE's software dependencies are available on github and will be downloaded automatically the first time that STOKE is built. To build stoke for a Haswell system type the appropriate command for your system (the default is Haswell):

$ make
$ make sandybridge
$ make nehalem

To add STOKE and its related components to your path, type:

$ export PATH=$PATH:/<path_to_stoke>/bin

Setting the path is important for the testing tools to run. To run the tests, choose the appropriate command:

$ make test
$ make sandybridge_test
$ make nehalem_test

The files generated during the build process can be deleted by typing:

$ make clean

To delete STOKE's github-hosted software dependencies as well (this is useful if an error occurs during the first build), type:

$ make dist_clean

Using STOKE

The following toy example shows a typical workflow for using STOKE. All of the following code can be found in the examples/tutorial/ directory. Consider a C++ program that repeatedly counts the number of bits (population count) in the 64-bit representation of an integer. (Keeping track of a running sum prevents g++ from eliminating the calls to popcnt() altogether.)

// main.cc

#include <cstdlib>
#include <stddef.h>
#include <stdint.h>

using namespace std;

size_t popcnt(uint64_t x) {
  int res = 0;
  for ( ; x > 0; x >>= 1 ) {
    res += x & 0x1ull;
  }
  return res;
}

int main(int argc, char** argv) {
  const auto itr = atoi(argv[1]);

  auto ret = 0;
  for ( auto i = 0; i < itr; ++i ) {
    ret += popcnt(i);
  }

  return ret;

STOKE is a compiler and programming language agnostic optimization tool. It can be applied to any x86_64 ELF binary. Although this example uses the GNU toolchain, nothing prevents the use of other tools. To build this code with full optimizations, type:

$ g++ -std=c++11 -O3 -fno-inline main.cc

To measure runtime, type:

$ time ./a.out 100000000

real  0m1.046s
user  0m1.047s
sys	  0m0.000s

A profiler will reveal that the runtime of ./a.out is dominated by calls to the popcnt() function. STOKE can be used to improve the implementation of this function as follows. The first step is to disassemble the program by typing:

$ stoke extract -i ./a.out -o bins

This will produce a directory named bins that contains the text of every function contained in the binary ./a.out.

Help for stoke or any of its subcommands can be obtained by typing:

$ stoke -h
$ stoke <subcommand> -h

STOKE can accept arguments either through the command line or through a configuration file. The invocation of stoke extract shown above is equivalent to the following:

$ stoke extract --config extract.conf

Where extract.conf contains:

##### stoke extract config file

-i ./a.out # Path to the elf binary to disassemble
-o bins # Path to the directory to store disassembled text in

Every STOKE subcommand can be used to generate example configuration files by typing:

$ stoke <subcommand> --example_config <path/to/file.conf>

Because main.cc was compiled using g++, the text of the popcnt() function will appear under the mangled name _Z6popcntm in bins/_Z6popcntm.s.

  .text
  .globl _Z6popcntm
  .type _Z6popcntm, @function
_Z6popcntm:
  xorl   %eax,%eax
  testq  %rdi,%rdi
  je     .L_4005b0
  nop
.L_4005a0:
  movq   %rdi,%rdx
  andl   $0x1,%edx
  addq   %rdx,%rax
  shrq   $0x1,%rdi
  jne    .L_4005a0
  retq
.L_4005b0:
  retq
  nop
  nop
  .size _Z6popcntm, .-_Z6popcntm

The next step is to generate a set of testcases for guiding STOKE's search procedure. These can be obtained by typing:

$ stoke testcase --config testcase.conf

where testcase.conf contains:

##### stoke testcase config file

--bin ./a.out # The name of the binary to use to generate testcases 
--args 10000000 # Command line arguments that should be passed to ./a.out

-o popcnt.tc # Path to file to write testcases to

--fxn _Z6popcntm # The name of the function to generate testcases for
--max_testcases 1024 # The maximum number of testcases to generate.

The resulting file will contain 1024 entires, all of the form:

Testcase 0:

%rax     00 00 00 00 00 98 96 80
%rcx     00 00 00 00 00 00 00 00
%rdx     00 00 00 00 00 00 00 0a
%rbx     00 00 00 00 00 00 00 01
%rsp     00 00 7f ff 97 44 36 28
%rbp     00 00 00 00 00 00 00 00
%rsi     19 99 99 99 99 99 99 99
%rdi     00 00 00 00 00 00 00 00
%r8      00 00 2a c9 68 1a 50 40
%r9      00 00 7f ff 97 44 46 01
%r10     00 00 00 00 00 98 96 80
%r11     00 00 00 00 00 00 00 0a
%r12     00 00 00 00 00 98 96 80
%r13     00 00 7f ff 97 44 37 20
%r14     00 00 00 00 00 00 00 00
%r15     00 00 00 00 00 00 00 00

%ymm0    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 00
%ymm1    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2f 2f 2f 2f 2f 2f 2f 2f 2f 2f 2f 2f 2f 2f 2f 2f
%ymm2    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm3    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 00 00 00 00 00 00 ff
%ymm4    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm5    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm6    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm7    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm8    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm9    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm10   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm11   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm12   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm13   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm14   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
%ymm15   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

%cf      0 
%1       1 
%pf      1 
%0       0 
%af      0 
%0       0 
%zf      0 
%sf      0 
%tf      0 
%if      1 
%df      0 
%of      0 
%iopl[0] 0 
%iopl[1] 0 
%nt      0 
%0       0 
%rf      0 
%vm      0 
%ac      0 
%vif     0 
%vip     0 
%id      0 

[ 00007fff 97443630 - 00007fff 97443620 ]
[ 1 valid rows shown ]

00007fff 97443628   d d d d d d d d   00 00 00 00 00 40 04 6c

[ 00000000 00000000 - 00000000 00000000 ]
[ 0 valid rows shown ]

Each entry corresponds to the hardware state that was observed just prior to an execution of the popcnt() function. The first 60 rows represent the contents of general purpose, sse, and eflags registers, and the remaining rows represent the contents of memory, both on the stack and the heap. Memory is shown eight bytes at a time, where a block of eight bytes appears only if the target dereferenced at least one of those bytes. Each row contains values and state flags. Bytes are flagged as either (v)alid (the target dereferenced this byte), (d)efined (the target read this byte prior to reading its value), or (.)invalid (the target did not dereference this byte).

Each of the random transformations performed by STOKE are evaluated with respect to the contents of this file. Rewrites are compiled into a sandbox and executed beginning from the machine state represented by each entry. Rewrites are only permitted to dereference defined locations. This includes registers that are flagged as def_in (see search.conf, below), memory locations that are flagged as 'd', or locations that were written previously. Rewrites are permitted to write values to all registers and to any memory location that is flagged as valid.

The STOKE sandbox will safely halt the execution of rewrites that perform undefined behavior. This includes leaving registers in a state that violates the x86_64 callee-save ABI, dereferencing invalid memory, performing a computation that results in a floating-point exception, or becoming trapped in a loop that performs more than max_jumps (see search.conf, below).

The final step is to use these testcases and the target code contained in bins/_Z6popcntm.s to run STOKE search by typing:

$ stoke search --config search.conf

where search.conf contains:

##### stoke search config file

--out result.s # Path to write results to

--target bins/_Z6popcntm.s # Path to the function to optimize
--init empty # Begin search from all nops 

--def_in "{ %rax %rdi }" # The registers that are defined on entry to the target
--live_out "{ %rax }" # The registers that are live on exit from the target

--testcases popcnt.tc # Path to testcase file
--training_set "{ 0 ... 7 }" # Testcases to use for measuring correctness during search
--test_set "{ 8 ... 1023 }"  # Testcases to use as holdout set for checking correctness

--distance hamming # Metric for measuring error between live-outs
--relax_reg # Allow partial credit for results that appear in wrong locations
--misalign_penalty 1 # Penalty for results that appear in the wrong location
--reduction sum # Method for summing errors across testcases
--sig_penalty 9999 # Score to assign to rewrites that produce non-zero signals

--perf latency # Measure performance by summing instruction latencies

--cpu_flags "{ popcnt }" # cpuid flags to use when proposing instructions
--mem_read # Propose instructions that read memory
--mem_write # Propose instructions that write memory

--global_swap_mass 0 # Proposal mass
--instruction_mass 1 # Proposal mass
--local_swap_mass 1 # Proposal mass
--opcode_mass 1 # Proposal mass
--operand_mass 1 # Proposal mass
--resize_mass 0 # Proposal mass

--nop_percent 80 # Percent of instruction moves that produce nop
--beta 1 # Search annealing constant
--max_instrs 8 # The maximum number of instruction allowed in a rewrite

--statistics_interval 100000 # Print statistics every 100k proposals
--timeout 1000000 # Propose 1m modifications before giving up
--timeout_action testcase # Try adding a new testcase from the testset when search times out
--timeout_cycles 16 # Timeout up to 16 times before giving up

--strategy hold_out # Verify results using a larger hold out testcase set

STOKE search will produce two types of status messages. Progress update messages will be printed whenever STOKE discovers a new lowest cost verified or unverified rewrite. The code shown on the left is not equivalent to the target code; the code shown on the right is with respect to the current set of testcases.

Progress Update: 

Lowest Cost Discovered (9)         Lowest Known Correct Cost (15)     
                                                         
btrq $0xffffffffffffffc0, %rdi     testq %rdi, %rdi      
retq                               je .L_X64ASM_0        
                                   xorl %eax, %eax       
                                   .L_X64ASM_1:          
                                   movl %edi, %edx       
                                   andl $0x1, %edx       
                                   addl %edx, %eax       
                                   shrq $0x1, %rdi       
                                   jne .L_X64ASM_1       
                                   cltq                  
                                   retq                  
                                   .L_X64ASM_0:          
                                   xorl %eax, %eax       
                                   retq

Statistics updates will be printed every statistics_interval proposals. Statistics are shown for the number of proposals that have taken place, elapsed time, proposal throughput, and for each of the transformations specified to have non-zero mass in search.conf.

Statistics Update: 

Iterations:   100000
Elapsed Time: 0.0836948s
Iterations/s: 1.19482e+06

Move Type       Proposed     Succeeded     Accepted     
                                                        
Instruction     16.791%      5.83%         2.009%       
Opcode          16.646%      8.857%        4.013%       
Operand         16.593%      10.444%       6.864%       
Resize          16.611%      0.791%        0.789%       
Local Swap      16.597%      1.556%        1.128%       
Global Swap     16.762%      7.066%        6.08%     
Extension       0%           0%            0%

Total           100%         34.544%       20.883%

When search has run to completion, STOKE will write the lowest cost verified rewrite that it discovered to result.s. Because this is a particularly simple example, STOKE is almost guaranteed to produce the optimal rewrite:

  .text
  .globl _Z6popcntm
  .type _Z6popcntm @function
_Z6popcntm:
  popcnt %rdi, %rax
  retq
  .size _Z6popcntm, .-_Z6popcntm

This result can then be patched back into the original binary by typing:

$ stoke replace --config replace.conf

where result.conf contains:

##### stoke replace config file

-i ./a.out # Path to the elf binary to patch
--rewrite result.s # Path to the replacement function

And runtime can once again be measured by typing:

$ time ./a.out 100000000

real  0m0.133s
user  0m0.109s
sys	  0m0.000s

As expected, the results are close to an order of magnitude faster than the original.

Using the Formal Validator

This release of STOKE includes a formal validator. It's design and interface are detailed in the src/validator/README.md file. To use the formal validator instead of hold out testing, specify --strategy formal for any STOKE binary that you use.

An example of using the validator can be found in the examples/pairity folder; this example has a Makefile much like the tutorial's and should be easy to follow. The key difference is that the pairity example does not use testcases to guide search. Instead, after producing a candidate rewrite, the validator checks for equivalence. If the codes are not equivalent, a counterexample is found, and this is used as a new testcase to help guide search.

There are some important limitations to keep in mind while using the validator:

Only some instructions are supported. The --validator_must_support flag can be used to only propose instructions that can be validated.
Only the general purpose registers, SSE registers (ymm0-ymm15) and five of the status flags (CF, SF, PF, ZF, OF) are supported.
There is basic, experimental support for memory. **Memory is not supported for input or output; the validator will check equivalence of codes that use memory, but will not check that they leave the machine's memory in the same state. **. For example, if a program writes to memory and then later reads it, the validator supports this. But, if two programs manipulate memory, the validator will not check if their modifications are equivalent.

Additional Features

In addition to the subcommands described above, STOKE has facilities for debugging and benchmarking the performance of each of its core components:

stoke debug cfg: Generate a pdf of a control flow graph.
stoke debug cost: Compute the cost of a rewrite.
stoke debug sandbox: Step through the execution of a rewrite.
stoke debug search: View the changes produced by performing and undoing a program transformation.
stoke debug state: Check the behavior of operators that manipulate hardware machine states.
stoke debug verify: Check the equivalence of two programs.
stoke benchmark cfg: Measure the time required to recompute a control flow graph.
stoke benchmark cost: Measure the time required to compute a cost function.
stoke benchmark sandbox: Measure the time required to execute a program in a STOKE sandbox.
stoke benchmark search: Measure the time required to perform and undo a transformation to a program.
stoke benchmark state: Measure the time required to reset the memory of a hardware machine state.
stoke benchmark verify: Measure the time required to check the equivalence of two programs.

STOKE also comes with support for bash and zsh completion. To enable either, type:

$ make bash_completion
$ make zsh_completion

Extending STOKE

This repository contains a minimal implementation of STOKE as described in ASPLOS 2013, OOPSLA 2013, and PLDI 2014. Most, but not all of the features described in those papers appear here. Some of the more experimental features are not yet ready for public release and have not been provided. Developers who are interested in refining these features or adding their own extensions are encouraged to try modifying this implementation as described below.

Code Organization

The STOKE source is organized into modules, each of which correspond to a subdirectory of the src/ directory:

src/analysis: An aliasing analysis used by the validator.
src/cfg: Classes for representing and manipulating control flow graphs.
src/cost: Classes for computing cost functions.
src/ext: External dependencies.
src/sandbox: Classes for safely executing random code sequences.
src/search: Classes for performing MCMC sampling.
src/state: Classes for representing and manipulating hardware machine states.
src/symstate: Classes for modeling the symbolic state of the hardware, used by the formal validator.
src/target: Code to find which instruction sets the CPU supports.
src/tunit: Classes for representing translation units (named instruction sequences).
src/verifier: Wrappers around verification techniques such as testing for formal validation.
src/validator: The formal validator for proving two codes equivalent.

Initial Search State

Initial state types are defined in src/search/init.h along with an additional type for user-defined extensions.

enum class Init {
  EMPTY,
  ZERO,
  TARGET,
  PREVIOUS,

  // Add user-defined extensions here ...
  EXTENSION
};

Initial state is specified using the --init command line argument which controls the initial values given to the current, lowest cost, and lowest cost correct search states. This value affects the behavior of the Search::configure() const method, which dispatches to the family of Search::configure_xxxxx() const methods. User-defined extensions should be placed in the Search::configure_extension() const method, which can be triggered by specifying --init extension.

void Search::configure_extension(const Cfg& target, SearchState& state) const {
  // Add user-defined logic here ...

  // Invariant 1: Search state should agree with target on boundary conditions.
  assert(state.current.def_ins() == target.def_ins());
  assert(state.current.live_outs() == target.live_outs());

  assert(state.best_yet.def_ins() == target.def_ins());
  assert(state.best_yet.live_outs() == target.live_outs());

  assert(state.best_correct.def_ins() == target.def_ins());
  assert(state.best_correct.live_outs() == target.live_outs());

  // Invariant 2: Search state must be in a valid state. This function isn't on
  // a critical path, so this can safely be accomplished by calling
  state.current.recompute();
  state.best_yet.recompute();
  state.best_correct.recompute();

  // Invariant 3: Search state must agree on first instruction. This instruction
  // must be the label definition that appears in the target.
  assert(state.current.get_code()[0] == target.get_code()[0]);
  assert(state.best_yet.get_code()[0] == target.get_code()[0]);
  assert(state.best_correct.get_code()[0] == target.get_code()[0]);

  // See Search::configure for additional invariants
}

Search Transformations

Transformation types are defined in src/search/move.h along with an additional type for user-defined extensions.

enum class Move {
  INSTRUCTION = 0,
  OPCODE,
  OPERAND,
  RESIZE,
  LOCAL_SWAP,
  GLOBAL_SWAP,

  // Add user-defined extensions here ...
  EXTENSION,

  NUM_MOVES
};

Transformations are specified using the family of --xxxxx_mass command line arguments. These values control the distribution of proposals that are made by the Transforms::modify() method and undone by the Transforms::undo() method, which dispatch to the family of Transforms::xxxxx_move() and Transforms::undo_xxxxx_move() methods respectively. User-defined extensions should be placed in the Transforms::extension_move() and Transforms::undo_extension_move() methods, which can be triggered by specifying a non-zero --extension_mass.

bool Transforms::extension_move(Cfg& cfg) {
  // Add user-defined implementation here ...

  // Invariant 1:
  // If this method returns true, it should leave this class in a state such
  // that calling undo_extension_move() will revert cfg to its original state.

  // Invariant 2:
  // If this method returns false, it must leave cfg in its original state.

  // Invariant 3:
  // If validator_ is non-null, validator_->is_sound(instr) must hold true for
  // all instructions instr upon return.  (You can assume this holds at the
  // beginning).

  // Invariant 4:
  // Transformations must preserve the first instruction in a code sequence
  // which should be a label that represents the name of a function.

  return false;
}

void Transforms::undo_extension_move(Cfg& cfg) {
  // Add user-defined implementation here ...

  // Invariant: If the previous invocation of extension_move() returned true, this
  // method must return cfg to its original state. 
  
  return;
}

As above, both performing and undoing a transformation should leave a control flow graph in a valid state. In general, this can be performed by invoking the Cfg::recompute() method. However because these methods are on STOKE's critical path, the faster Cfg::recompute_defs() method should be used for transformations that do not modify control flow structure and only potentially invalidate data-flow values.

Performance Term

Performance term types are defined in src/cost/performance_term.h along with an additional type for user-defined extensions.

enum class PerformanceTerm {
  NONE,
  SIZE,
  LATENCY,

  // Add user-defined extensions here ...
  EXTENSION
};

Performance term type is specified using the --perf command line argument. This value controls the behavior of the CostFunction::evaluate_performance() const method, which dispatches to the family of CostFunction::xxxxx_performance() const methods. User-defined extensions should be placed in the CostFunction::extension_performance() const method, which can be triggered by specifying --perf extension.

Cost CostFunction::extension_performance(const Cfg& cfg) const { 
  Cost res = 0;                                                  

  // Add user-defined implementation here ... 

  // Invariant: Return value should not exceed max_performance_cost 
  assert(res <= max_performance_cost); 

  return res;  
}

Correctness Term

Correctness term types are defined in src/cost/reduction.h along with an additional type for user-defined extensions.

enum class Reduction {
  SUM,
  MAX,

  // Add user-defined extensions here ...
  EXTENSION
};

Correctness term type is specified using the --reduction command line argument. This value controls the behavior of the CostFunction::evaluate_correctness() method, which dispatches to the family of CostFunction::xxxxx_correctness() methods, each of which represent a method for aggregating errors observed across testcases. User-defined extensions should be placed in the CostFunction::extension_correctness() method, which can be triggered by specifying --reduction extension.

Cost CostFunction::extension_correctness(const Cfg& cfg, Cost max) {                                             
  Cost res = 0;                                                                                                  

  // Add user-defined implementation here ... 

  // This method is not required to examine all testcases. Implementations                                       
  // that compute res iteratively may stop executing and return max once res                                     
  // equals or exceeds that value.                                                                               

  // Invariant 1: Return value should not exceed max_correctness_cost                                            
  assert(res <= max_correctness_cost);                                                                           
    
  return res;                                                                                                    
}

Live-out Error

Live-out error measurement types are defined in src/cost/distance.h along with an additional type for user-defined extensions.

enum class Distance {
  HAMMING,
  ULP,

  // Add user-defined extensions here ...
  EXTENSION
};

Measurement type is specified using the --distance command line argument. This value controls the behavior of the CostFunction::evaluate_distance() const method, which dispatches to the family of CostFunction::xxxxx_distance() const methods, each of which represent a method computing the distance between 64-bit values. User-defined extensions should be placed in the CostFunction::extension_distance() const method, which can be triggered by specifying --distance extension.

Cost CostFunction::extension_distance(uint64_t x, uint64_t y) const {                                            
  Cost res = 0;

  // Add user-defined implementation here ...

  // Invariant 1: Return value should not exceed max_error_cost
  assert(res <= max_error_cost);

  return res;                                                                                                   
}

Verification Strategy

Verification strategy types are defined in src/verifier/strategy.h along with an additional type for user-defined extensions.

enum class Strategy {
  NONE,
  HOLD_OUT,
  FORMAL,

  // Add user-defined extensions here ...
  EXTENSION
};

Strategy type is specified using the --strategy command line argument. This value controls the behavior of the Verifier::verify() method, which dispatches to the family of Verifier::xxxxx_verify() const methods, each of which represent a method for verifying correctness. User-defined extensions should be placed in the Verifier::extension_verify() method, which can be triggered by specifying --strategy extension.

bool Verifier::extension_verify(const Cfg& target, const Cfg& rewrite) {                                         
  // Add user-defined implementation here ...

  // Invariant 1. If this method returns false and is able to produce a
  // counter example explaining why, counter_example_available_ should be
  // set to true.

  // Invariant 2. If this method returns false, and it is able (see above),
  // counter_example_ should be set to a CpuState that will cause target and
  // rewrite to produce different values.

  // Invariant 3.  If this method encounters an error, it should set the
  // error_ member variable to a non-empty string; otherwise the error_
  // member should be empty.

  return true;
}

Command Line Args

Command line arguments can be added to any of the STOKE subcommands using the following syntax. Argument separators which are printed as part of help messages are specified by defining a heading variable:

auto& heading = Heading::create("Heading Description:");

Command line flags are specified by declaring a FlagArg.

auto& flag = FlagArg::create("flag_name")
  .alternate("alternate_flag_name")
  .description("What this flag does");

Any of the built-in c++ primitive types are specified by declaring a ValueArg.

auto& val = ValueArg<int>::create("value_name")
  .alternate("alternate_value_name")
  .usage("<int>")
  .description("What this value represents")
  .default_val(0);

User-defined types are specified by additionally providing function objects that define I/O methods.

struct T {
  int x, y, z;
};

struct Reader {
  void operator()(istream& is, T& t) const {
    is >> t.x >> t.y >> t.z;
  }
};

struct Writer {
  void operator()(ostream& os, const T& t) const {
    os << t.x << " " << t.y << " " << t.z;
  }
};

auto& val = ValueArg<T, Reader, Writer>::create("value_name")
  .alternate("alternate_value_name")
  .usage("<int> <int> <int>")
  .description("What this value represents")
  .default_val({0,0,0});

For complex values that are better suited to being read from files, a FileArg may be more appropriate than a ValueArg. The syntax is identical.

auto& val = FileArg<Complex, ComplexReader, ComplexWriter>::create("value_name")
  .alternate("alternate_value_name")
  .usage("<complex representation>")
  .description("What this value represents")
  .default_val(Complex());

Frequently Asked Questions

To appear.

Feedback

Questions and comments are encouraged. The best way to contact the developers is with the built-in github issue tracker.

ezhangle / stoke-release Goto Github PK

stoke-release's Introduction