grigory-rechistov / interpreters-comparison Goto Github PK

Sample programs for comparison of different VM interpretation techniques

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.52% C 96.42% Shell 1.20% Batchfile 0.86%

interpreters-comparison's Introduction

Compare Performance and Complexity of Software Interpreters of a Virtual Machine

Sample programs for comparison of different VM interpretation techniques. See the Makefile for the full list of targets.

switched - switched interpreter
threaded - threaded interpreter
predecoded - switched interpreter with preliminary decoding phase
subroutined - subroutined interpreter
threaded-cached - threaded interpreter with pre-decoding.
tailrecursive - subroutined interpreter with tail-call optimization
translated - binary translator to Intel 64 machine code
native - a static implementation of the test program in C

Build

Just type make. For Visual Studio builds, open corresponding project or solution files.

Use make sanity to perform a quick check of all variants.

Measure performance

Use ./measure.sh to measure run time of individual binaries or to perform a comparison of all techniques (alternatively, run make all measure).

The graph plotting part of the script uses Gnuplot and AWK.

Supported Environments

Tested to compile and run with GCC 4.8.1, GCC 5.1.0 and ICC 15.0.3 on Ubuntu Linux 12.04.5. Limited testing was also done on Windows 8.1 Cygwin64 environment, GCC 4.8.
Limited support for Visual Studio builds is provided. Certain types of interpreters will not build because of the CL compiler limitations or source code dependencies on GCC language extensions.

References

An article discussing structure, performance variations, and comparison of interpreters: http://habrahabr.ru/company/intel/blog/261665/ (in Russian).

interpreters-comparison's People

Contributors

Stargazers

Watchers

Forkers

nextgenintelligence sash-tsvet vsevolod-livinskij ncos thar0l kurapov-peter anastasiashevchenko vipon russianriver

interpreters-comparison's Issues

Add new benchmarks to the Program suite

Add one to three programs to the suite.

Calculate an approximation of square root of given number by Newton-Raphson method
Sort an array by any known method (bubble, quicksort, heapsort etc)
Looking for a substring match inside a given string

Additionally, any other computationally-intensive algorithm will be good as well

Make `translated` variant work in Cygwin

Curently mprotect() used in translated fails when attempted to be executed in Cygwin build. The task is to understand why it fails and fix it, possibly by rewriting the procedure for allocation of write-executable memory.

Create fully-inlined translated version

Currently translated variant uses MOV; CALL generated sequences to drive simulation to individual service routines. A fully inlined variant should insert copies of service routines in the generated code section. This will save a pair of call/return.

The steps required to complete the task:

Create relocatable service routines (possibly in assembler)
Make sure the state is passed and modified correctly
Modify the translated to copy capsules instead of CALL machine code.

Implement threaded-subroutied mode

The code already demonstrates threaded interpreter, and subroutined one. The task is to combine both techniques and make a threaded-subroutined one.

Make project code build and work as Microsoft Visual Studio project

Currently the project is only buildable/runnable on Linux GCC/ICC (possibly Clang) toolchain and buildable (with limited operation supported) on Cygwin on Windows. Native Windows support is needed.

The task is:

Use latest MS VS to create a set of projects/solutions for the code.
Use MS compiler to build as many variants of interpreters as possible.
Make tests run for them

Add guest instruction count statistics collection and display

Add a mechanism to count how many times each of simulated opcode got executed.

Requirements for a solution

It should only be active when INSTR_STAT macro is defined. I.e., it should be conditionally built in.
It should maintain an array of counters to keep statistics for individual opcodes. After every instruction is simulated, the corresponding counter should be incremented by one.
At the end of simulation, the results table should be produced. It should look like this:

     Opcode    Count
     -------------------
     Nop       1000
     Halt         1
     Push      1234
     ...

It can be implemented in a generic manner to support all variants of interpreters

After the implementation is ready, it will be nice to compare performance of two builds - interpreter with stats enabled and disabled, to see the performance overhead such collection creates.

Add ability to load programs from external file

At the moment the only program the simulator can execute is statically defined as Program array during compilation. It would be great to be able to load external programs from files. That would add a lot of flexibility to the simulator.

Steps required:

Add a new option to argc, argv parsing for optional file name, like ./switched 1000 file.raw
1. If the option is present, the file contents is loaded to the program memory as-is starting from address 0x0
2. If no option is present, the default program is used
Proper error checking should be implemented:
1. If the file cannot be read (missing/no permissions), an error is reported
2. If the file is shorter than memory length, the rest of it should be padded with zeroes
3. If file is too big (contains more words than program memory should hold), an error or at least a warning should be produced. At no circumstances buffer overflow may happen.
4. (optional) If file contains instructions that are not valid for the processor, a warning (not error) should be produced
A test should be prepared that showcases the new functionality with at least one external file with some sort of a valid program.

The file loading functionality is generic, i.e., it does not depend on used interpreter mode, so it can/should be done for all variants of the interpreter/translator

Create double-threaded (twice unrolled) interpreter model

From the article:

Наверное, эту идею можно развить и дальше — помочь предсказателю переходов правильно запоминать историю исполнения троек, четвёрок и т.д. за счёт соответствующего «разбухания» кода. Например, иметь по две копии всех сервисных процедур, и внутри DISPATCH выбирать только одну из них, в зависимости от кода предыдущей инструкции и её адреса, или какого-то другого критерия. Однако оставлю это в качестве упражнения заинтересовавшимся исследователям.

The task is to implement this mode: double-threaded interpreter

Create an assembler for the VM

Write a separate program that takes an assembler listing and generates a raw machine code program for the VM from it.

It should be able to:

Process all instructions defined in the original VM spec.
Support labels for jumps
Provide sane error reporting for invalid inputs (line number, problem description)

Add new arithmetic instructions to all interpreter variants

Currently the VM has 18 instructions. The task is to add and test implementation of new instructions:

And (a b -- a & b) - logical AND
Or (a b -- a | b) - logical OR
Xor (a b -- a ^ b) - logical Exsclusive OR
SHL (a shift -- a << shift) - Shift left
SHR (a shift -- a >> shift) - Shift right
SQRT (x -- √x̅) - find integer square root
Rot ( a b c -- b c a ) - rotate three top items on the stack
Pick ( a0 .. an n -- a0 .. an a0 ) - pick n-th item from the stack

Add guest PC values statistics collection and display

Add a mechanism to count how often the simulated CPU visits different program addresses by counting PC register values.

Requirements for a solution

It should only be active when INSTR_STAT macro is defined. I.e., it should be conditionally built in.
It should maintain an array of counters to keep statistics for individual PC values. After every instruction is simulated, the corresponding counter for its PC should be incremented by one.
At the end of simulation, the results table should be produced. It should look like this:

Address Count

0x0 1000
0x2 1000
0x6 1234
...

*It can be implemented in a generic manner to support all variants of interpreters

After the implementation is ready, it will be nice to compare performance of two builds - interpreter with stats enabled and disabled, to see the performance overhead such collection creates.

Add exceptions mechanism to VM

Currently any "unexpected" situation inside the VM (division by zero, PC address out of range etc) puts it into the Break state. This is convenient for a software VM but unrealistic for a hardware one. For some events, an exception handling mechanism can be used to give target software a chance to correct the state.

The task is to design and implement the exception handling mechanism:

Add a separate return stack to store the PC of instruction that caused a problem.
Define what method of communication for enumerating interruption events should be used (on stack? in register? by indexing the ISR address?).
Define where interrupt routines are placed in the program memory
Add a new instruction ret to pop PC from return stack
Add new instructions to move data between return and data stacks (to be able to manipulate PC)
Add tests

Ultimately, the ISR should be capable to detect what particular interrupt happened and to correct it by either changing data state and/or emulating the instruction and skipping it in the original flow.

Add procedure call return mechanism to several variants of interpreters

Currently the VM lacks any procedure mechanism. The task is to add implementation and tests:

Add a separate return stack to store addresses
Add new instructions:
- call imm - pushes current PC on return stack and jumps to PC+imm
- ret - pops new PC value from stack