skx / simple.vm Goto Github PK

View Code? Open in Web Editor NEW

456.0 24.0 50.0 176 KB

Simple virtual machine which interprets bytecode.

License: GNU General Public License v2.0

C 62.66% Perl 34.96% Emacs Lisp 0.97% Makefile 1.42%

virtual-machine perl c bytecode opcodes register binary-opcodes

simple.vm's Issues

Compiler warning: simple-vm-opcodes.c ignoring return from system()

simple-vm-opcodes.c lines 583 thru 597:

void op_string_system(struct svm *svm)
{
    /* get the reg */
    unsigned int reg = next_byte(svm);
    BOUNDS_TEST_REGISTER(reg);

    if (getenv("DEBUG") != NULL)
        printf("STRING_SYSTEM(Register %d)\n", reg);

    char *str = get_string_reg(svm, reg);
    system(str);

    /* handle the next instruction */
    svm->ip += 1;
}

clang compiler is issuing this warning during compilation:

simple-vm-opcodes.c:593:5: warning: ignoring return value of function declared with warn_unused_result attribute [-Wunused-result]
    system(str);
    ^~~~~~ ~~~

Making the return result from the system opcode may not be worth doing or may not be valuable enough, which is fine. Make whatever design decision makes sense for this project.

Segmentation Faults 2017-05-12

Hello,
I was using American Fuzzy Lop (afl-fuzz) to fuzz input the simple-vm program on Linux. Is fixing the crashes from these input files something you're interested in? The input files can be found here: https://github.com/rwhitworth/simple.vm-fuzz/tree/master/2017-05-12.

The files can be executed as ./simple-vm id_filename to cause seg faults.

Let me know if I can provide any more information to help narrow down this issue.

Add string equally opcode

Once the mess and mass-churn of #2 is resolved I'd like to get back to adding missing primitives.

One that immediately springs to mind is something for comparing registers with fixed strings. We have two CMP instructions at the moment:

cmp reg1, reg2
cmp reg1, 0xFF

The obvious missing case is:

cmp reg1, "Steve".

Missing opcodes: is_string/is_integer

Set the Z-flag to true if the given register has the correct type.

Compiling multiple files is broken.

If you compile two scripts things break if you use labels:

  ./compiler ./examples/jump.in ./examples/quine.in
  ./simple-vm examples/quine.raw 
  ERROR running script - The register doesn't contain an integer

Trivial problem caused by the @UPDATES and %LABELS being outside the compilation function.

Our IP-manipulation is .. subpar.

Currently there are a bunch of opcode implementations defined, some of which modify the IP and some that don't. The way this works is that the main loop looks up the handler and invokes them.

The handlers are defined as:

bool do_stuff( svm_t *cpu );

If the functions return true then it is assume they've updated the IP, otherwise they haven't so it is incremented to point to the next instruction.

Unfortunately this isn't great. Consider the situation where the following program is executed:

  store #1, 0x1234
  exit

This is a four-byte program with two instructions:

  store #1, 0x1234   ->  0x01 0x12 0x34
  exit 0> 0x00

At the start of execution IP=0, the byte there is 0x01 for int_store. The byte is incremented to read the first number, then again the second. Meaning ip=0x02.

At this point the main loop exits, the IP is incremented to 0x03 where the exit code is waiting to be executed. We've, internally in in the int_store method, bumped IP by two bytes but not kept track of it!

The thing to do is let the opcode handlers do what they want. Don't bump the IP in the handler, but add an extra cpu->ip++ at the end of each handler. Or reset it in the case of a jump.

Add random opcode.

Once the mess and mass-churn of #2 is resolved I'd like to get back to adding missing primitives.

One that immediately springs to mind is something for generating a random integer.

e.g. INT_RAND().

Strings should not be limited to 255 bytes.

Strings longer than 255 bytes cannot be loaded via "load #x, 'value'". That's not ideal, but it could be tolerable.

However more seriously a string cannot be concatenated past that length, or bad things happen.

Suggest both issues are resolved at the same time.

Store label offsets in registers.

At the moment we have:

  store #1, 1234
  store #1, "this is a string"

We cannot store the offset of a label, which would be useful for things. Such as:

     :start 
      ...
     :end

This would allow us to work out the length of a program, for example. This is immediately useful in the examples/quine.in sample where the length is hardcoded.

Move to using function-pointers.

Rather than having a giant switch statement, with a case for each opcode, we should break out the implementations of the opcodes into their own functions.

If we define an array of 255-opcodes in the svm_t function then we can use function pointers to invoke the correct function.

Pros
- The code becomes more readable.
- We could have a default handler which dumps "unknown instruction" for debugging purposes.
- This would allow host-applications to define host-specific opcodes.
Cons
- Significant churn.
- Wastes 255 * (function-pointer size) bytes per svm_t instance.
- We have some repetition setting up the opcode handlers - both defining the function and then allocating it.

Compiler warning: main.c ignoring return from fread()

main.c line 68:
fread(code, 1, size, fp);

clang compiler is issuing this warning during compilation:

main.c:68:5: warning: ignoring return value of function declared with warn_unused_result attribute [-Wunused-result]
    fread(code, 1, size, fp);
    ^~~~~ ~~~~~~~~~~~~~~~~~

The compiler should allow inline data.

For example:

  DATA 0x12
  DATA 0x13

This would allow embedding data for lookup-tables, self-modifying-code, or similar.

Segmentation Faults 2017-05-14

Found a few more bad inputs that cause segmentation faults. Though not all of them seg fault under valgrind, which I'm still trying to figure out. They can be found https://github.com/rwhitworth/simple.vm-fuzz/tree/master/2017-05-14 if you're interested. Thanks for the quick fixes this weekend.

IP wrap-around isn't consistent.

Within the main-intepretter we handle the IP exceeding the 64k boundary - but the opcode handlers don't.

We should handle something like:

   0xfffe: store #1, 0102

That would be encoded as:

   ram[0xFFFE] = 01
   ram[0xFFFF] = 01
   ram[0x0000 ] = 02

i.e. The opcode handler would read the wrong address for the second octet of the immediate value 0x0102.

LDIR missing.

We can't copy the semantics of LDIR exactly, but we should have a similar RAM-copying instruction.

Simple and easy to understand

Your byte code design is pretty elegant, I love this

        store #1, 1982
        store #2, "Good byte code design!"

And love the implementation behind the sence.

Thanks for showing this kind of VM, forked.

Print integers with a "0x" prefix

Hello @skx, thanks a lot for this great project!

I modified the print_int formatting string to include a "0x" prefix, and changed the padding to be four positions:

diff --git a/simple-vm-opcodes.c b/simple-vm-opcodes.c
index b284dc9..9edf3fb 100644
--- a/simple-vm-opcodes.c
+++ b/simple-vm-opcodes.c
@@ -343,7 +343,7 @@ void op_int_print(struct svm *svm)
     if (getenv("DEBUG") != NULL)
         printf("[STDOUT] Register R%02d => %d [Hex:%04x]\n", reg, val, val);
     else
-        printf("%02X", val);
+        printf("0x%04X", val);
 
 
     /* handle the next instruction */

The reason for the prefix is because it makes the base explicit, and the padding hints at the max value accepted for integers. I decided to use "0x%04X" instead of "%#04X" because I like the numbers in uppercase and the prefix in lowercase.

Do you think it could be a good change for simple.vm?

Use a macro for reading from the CPU-RAM

As per this comment on reddit we should have a READ_BYTE macro for reading a single byte from the current instruction-pointer, and incrementing it.

We could combine that with the existing BYTES_TO_ADDR macro to create a READ_ADDR macro too - although perhaps that might not be so clear.

Our code size and our jump size are potentially different..

The destination of jumps is an integer in the range 0x0000-0xFFFF, because we've got a 64k address-space. However when we execute a program we don't allocate 64k for code, instead we load and allocate precisely enough space for the program.

This means we could load a piece of code, a binary program, which is 50 bytes long, and that might contain "jump 16384".

Either 👍

We allocate 64k for programs, filled with NOPs, before loading the program.
We limit jump targets to the size of the code.

The former is wasteful, but not hugely. The latter is sane.

I'm actually leaning towards the former solution because the user might want to write self-modifying code, via a smple XOR loop, or similar, and then execute it.

ObRelated: Relocatable code pretty much requires user-supplied fixups or relative jumps.

skx / simple.vm Goto Github PK

simple.vm's Issues

Recommend Projects

Recommend Topics

Recommend Org