Giter Club home page Giter Club logo

simple.vm's Issues

Compiler warning: simple-vm-opcodes.c ignoring return from system()

simple-vm-opcodes.c lines 583 thru 597:

void op_string_system(struct svm *svm)
{
    /* get the reg */
    unsigned int reg = next_byte(svm);
    BOUNDS_TEST_REGISTER(reg);

    if (getenv("DEBUG") != NULL)
        printf("STRING_SYSTEM(Register %d)\n", reg);

    char *str = get_string_reg(svm, reg);
    system(str);

    /* handle the next instruction */
    svm->ip += 1;
}

clang compiler is issuing this warning during compilation:

simple-vm-opcodes.c:593:5: warning: ignoring return value of function declared with warn_unused_result attribute [-Wunused-result]
    system(str);
    ^~~~~~ ~~~

Making the return result from the system opcode may not be worth doing or may not be valuable enough, which is fine. Make whatever design decision makes sense for this project.

Add string equally opcode

Once the mess and mass-churn of #2 is resolved I'd like to get back to adding missing primitives.

One that immediately springs to mind is something for comparing registers with fixed strings. We have two CMP instructions at the moment:

  • cmp reg1, reg2
  • cmp reg1, 0xFF

The obvious missing case is:

  • cmp reg1, "Steve".

Compiling multiple files is broken.

If you compile two scripts things break if you use labels:

  ./compiler ./examples/jump.in ./examples/quine.in
  ./simple-vm examples/quine.raw 
  ERROR running script - The register doesn't contain an integer

Trivial problem caused by the @UPDATES and %LABELS being outside the compilation function.

Our IP-manipulation is .. subpar.

Currently there are a bunch of opcode implementations defined, some of which modify the IP and some that don't. The way this works is that the main loop looks up the handler and invokes them.

The handlers are defined as:

bool do_stuff( svm_t *cpu );

If the functions return true then it is assume they've updated the IP, otherwise they haven't so it is incremented to point to the next instruction.

Unfortunately this isn't great. Consider the situation where the following program is executed:

  store #1, 0x1234
  exit

This is a four-byte program with two instructions:

  store #1, 0x1234   ->  0x01 0x12 0x34
  exit 0> 0x00

At the start of execution IP=0, the byte there is 0x01 for int_store. The byte is incremented to read the first number, then again the second. Meaning ip=0x02.

At this point the main loop exits, the IP is incremented to 0x03 where the exit code is waiting to be executed. We've, internally in in the int_store method, bumped IP by two bytes but not kept track of it!

The thing to do is let the opcode handlers do what they want. Don't bump the IP in the handler, but add an extra cpu->ip++ at the end of each handler. Or reset it in the case of a jump.

Add random opcode.

Once the mess and mass-churn of #2 is resolved I'd like to get back to adding missing primitives.

One that immediately springs to mind is something for generating a random integer.

e.g. INT_RAND().

Strings should not be limited to 255 bytes.

Strings longer than 255 bytes cannot be loaded via "load #x, 'value'". That's not ideal, but it could be tolerable.

However more seriously a string cannot be concatenated past that length, or bad things happen.

Suggest both issues are resolved at the same time.

Store label offsets in registers.

At the moment we have:

  store #1, 1234
  store #1, "this is a string"

We cannot store the offset of a label, which would be useful for things. Such as:

     :start 
      ...
     :end 

This would allow us to work out the length of a program, for example. This is immediately useful in the examples/quine.in sample where the length is hardcoded.

Move to using function-pointers.

Rather than having a giant switch statement, with a case for each opcode, we should break out the implementations of the opcodes into their own functions.

If we define an array of 255-opcodes in the svm_t function then we can use function pointers to invoke the correct function.

  • Pros
    • The code becomes more readable.
    • We could have a default handler which dumps "unknown instruction" for debugging purposes.
    • This would allow host-applications to define host-specific opcodes.
  • Cons
    • Significant churn.
    • Wastes 255 * (function-pointer size) bytes per svm_t instance.
    • We have some repetition setting up the opcode handlers - both defining the function and then allocating it.

Compiler warning: main.c ignoring return from fread()

main.c line 68:
fread(code, 1, size, fp);

clang compiler is issuing this warning during compilation:

main.c:68:5: warning: ignoring return value of function declared with warn_unused_result attribute [-Wunused-result]
    fread(code, 1, size, fp);
    ^~~~~ ~~~~~~~~~~~~~~~~~

IP wrap-around isn't consistent.

Within the main-intepretter we handle the IP exceeding the 64k boundary - but the opcode handlers don't.

We should handle something like:

   0xfffe: store #1, 0102

That would be encoded as:

   ram[0xFFFE] = 01
   ram[0xFFFF] = 01
   ram[0x0000 ] = 02

i.e. The opcode handler would read the wrong address for the second octet of the immediate value 0x0102.

LDIR missing.

We can't copy the semantics of LDIR exactly, but we should have a similar RAM-copying instruction.

Simple and easy to understand

Your byte code design is pretty elegant, I love this

        store #1, 1982
        store #2, "Good byte code design!"

And love the implementation behind the sence.

Thanks for showing this kind of VM, forked.

Print integers with a "0x" prefix

Hello @skx, thanks a lot for this great project!

I modified the print_int formatting string to include a "0x" prefix, and changed the padding to be four positions:

diff --git a/simple-vm-opcodes.c b/simple-vm-opcodes.c
index b284dc9..9edf3fb 100644
--- a/simple-vm-opcodes.c
+++ b/simple-vm-opcodes.c
@@ -343,7 +343,7 @@ void op_int_print(struct svm *svm)
     if (getenv("DEBUG") != NULL)
         printf("[STDOUT] Register R%02d => %d [Hex:%04x]\n", reg, val, val);
     else
-        printf("%02X", val);
+        printf("0x%04X", val);
 
 
     /* handle the next instruction */

The reason for the prefix is because it makes the base explicit, and the padding hints at the max value accepted for integers. I decided to use "0x%04X" instead of "%#04X" because I like the numbers in uppercase and the prefix in lowercase.

Do you think it could be a good change for simple.vm?

Use a macro for reading from the CPU-RAM

As per this comment on reddit we should have a READ_BYTE macro for reading a single byte from the current instruction-pointer, and incrementing it.

We could combine that with the existing BYTES_TO_ADDR macro to create a READ_ADDR macro too - although perhaps that might not be so clear.

Our code size and our jump size are potentially different..

The destination of jumps is an integer in the range 0x0000-0xFFFF, because we've got a 64k address-space. However when we execute a program we don't allocate 64k for code, instead we load and allocate precisely enough space for the program.

This means we could load a piece of code, a binary program, which is 50 bytes long, and that might contain "jump 16384".

Either ๐Ÿ‘

  • We allocate 64k for programs, filled with NOPs, before loading the program.
  • We limit jump targets to the size of the code.

The former is wasteful, but not hugely. The latter is sane.

I'm actually leaning towards the former solution because the user might want to write self-modifying code, via a smple XOR loop, or similar, and then execute it.

ObRelated: Relocatable code pretty much requires user-supplied fixups or relative jumps.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.