skx / simple.vm Goto Github PK
View Code? Open in Web Editor NEWSimple virtual machine which interprets bytecode.
License: GNU General Public License v2.0
Simple virtual machine which interprets bytecode.
License: GNU General Public License v2.0
simple-vm-opcodes.c lines 583 thru 597:
void op_string_system(struct svm *svm)
{
/* get the reg */
unsigned int reg = next_byte(svm);
BOUNDS_TEST_REGISTER(reg);
if (getenv("DEBUG") != NULL)
printf("STRING_SYSTEM(Register %d)\n", reg);
char *str = get_string_reg(svm, reg);
system(str);
/* handle the next instruction */
svm->ip += 1;
}
clang compiler is issuing this warning during compilation:
simple-vm-opcodes.c:593:5: warning: ignoring return value of function declared with warn_unused_result attribute [-Wunused-result]
system(str);
^~~~~~ ~~~
Making the return result from the system opcode may not be worth doing or may not be valuable enough, which is fine. Make whatever design decision makes sense for this project.
Hello,
I was using American Fuzzy Lop (afl-fuzz) to fuzz input the simple-vm program on Linux. Is fixing the crashes from these input files something you're interested in? The input files can be found here: https://github.com/rwhitworth/simple.vm-fuzz/tree/master/2017-05-12.
The files can be executed as ./simple-vm id_filename
to cause seg faults.
Let me know if I can provide any more information to help narrow down this issue.
Once the mess and mass-churn of #2 is resolved I'd like to get back to adding missing primitives.
One that immediately springs to mind is something for comparing registers with fixed strings. We have two CMP instructions at the moment:
The obvious missing case is:
Set the Z-flag to true if the given register has the correct type.
If you compile two scripts things break if you use labels:
./compiler ./examples/jump.in ./examples/quine.in
./simple-vm examples/quine.raw
ERROR running script - The register doesn't contain an integer
Trivial problem caused by the @UPDATES
and %LABELS
being outside the compilation function.
Currently there are a bunch of opcode implementations defined, some of which modify the IP and some that don't. The way this works is that the main loop looks up the handler and invokes them.
The handlers are defined as:
bool do_stuff( svm_t *cpu );
If the functions return true then it is assume they've updated the IP, otherwise they haven't so it is incremented to point to the next instruction.
Unfortunately this isn't great. Consider the situation where the following program is executed:
store #1, 0x1234
exit
This is a four-byte program with two instructions:
store #1, 0x1234 -> 0x01 0x12 0x34
exit 0> 0x00
At the start of execution IP=0, the byte there is 0x01 for int_store. The byte is incremented to read the first number, then again the second. Meaning ip=0x02.
At this point the main loop exits, the IP is incremented to 0x03 where the exit code is waiting to be executed. We've, internally in in the int_store method, bumped IP by two bytes but not kept track of it!
The thing to do is let the opcode handlers do what they want. Don't bump the IP in the handler, but add an extra cpu->ip++
at the end of each handler. Or reset it in the case of a jump.
Once the mess and mass-churn of #2 is resolved I'd like to get back to adding missing primitives.
One that immediately springs to mind is something for generating a random integer.
e.g. INT_RAND().
Strings longer than 255 bytes cannot be loaded via "load #x, 'value'
". That's not ideal, but it could be tolerable.
However more seriously a string cannot be concatenated past that length, or bad things happen.
Suggest both issues are resolved at the same time.
At the moment we have:
store #1, 1234
store #1, "this is a string"
We cannot store the offset of a label, which would be useful for things. Such as:
:start
...
:end
This would allow us to work out the length of a program, for example. This is immediately useful in the examples/quine.in
sample where the length is hardcoded.
Rather than having a giant switch
statement, with a case
for each opcode, we should break out the implementations of the opcodes into their own functions.
If we define an array of 255-opcodes in the svm_t
function then we can use function pointers to invoke the correct function.
svm_t
instance.main.c line 68:
fread(code, 1, size, fp);
clang compiler is issuing this warning during compilation:
main.c:68:5: warning: ignoring return value of function declared with warn_unused_result attribute [-Wunused-result]
fread(code, 1, size, fp);
^~~~~ ~~~~~~~~~~~~~~~~~
For example:
DATA 0x12
DATA 0x13
This would allow embedding data for lookup-tables, self-modifying-code, or similar.
Found a few more bad inputs that cause segmentation faults. Though not all of them seg fault under valgrind, which I'm still trying to figure out. They can be found https://github.com/rwhitworth/simple.vm-fuzz/tree/master/2017-05-14 if you're interested. Thanks for the quick fixes this weekend.
Within the main-intepretter we handle the IP exceeding the 64k boundary - but the opcode handlers don't.
We should handle something like:
0xfffe: store #1, 0102
That would be encoded as:
ram[0xFFFE] = 01
ram[0xFFFF] = 01
ram[0x0000 ] = 02
i.e. The opcode handler would read the wrong address for the second octet of the immediate value 0x0102
.
We can't copy the semantics of LDIR exactly, but we should have a similar RAM-copying instruction.
Your byte code design is pretty elegant, I love this
store #1, 1982
store #2, "Good byte code design!"
And love the implementation behind the sence.
Thanks for showing this kind of VM, forked.
Hello @skx, thanks a lot for this great project!
I modified the print_int
formatting string to include a "0x"
prefix, and changed the padding to be four positions:
diff --git a/simple-vm-opcodes.c b/simple-vm-opcodes.c
index b284dc9..9edf3fb 100644
--- a/simple-vm-opcodes.c
+++ b/simple-vm-opcodes.c
@@ -343,7 +343,7 @@ void op_int_print(struct svm *svm)
if (getenv("DEBUG") != NULL)
printf("[STDOUT] Register R%02d => %d [Hex:%04x]\n", reg, val, val);
else
- printf("%02X", val);
+ printf("0x%04X", val);
/* handle the next instruction */
The reason for the prefix is because it makes the base explicit, and the padding hints at the max value accepted for integers. I decided to use "0x%04X"
instead of "%#04X"
because I like the numbers in uppercase and the prefix in lowercase.
Do you think it could be a good change for simple.vm?
As per this comment on reddit we should have a READ_BYTE macro for reading a single byte from the current instruction-pointer, and incrementing it.
We could combine that with the existing BYTES_TO_ADDR
macro to create a READ_ADDR
macro too - although perhaps that might not be so clear.
The destination of jumps is an integer in the range 0x0000-0xFFFF, because we've got a 64k address-space. However when we execute a program we don't allocate 64k for code, instead we load and allocate precisely enough space for the program.
This means we could load a piece of code, a binary program, which is 50 bytes long, and that might contain "jump 16384
".
Either ๐
The former is wasteful, but not hugely. The latter is sane.
I'm actually leaning towards the former solution because the user might want to write self-modifying code, via a smple XOR loop, or similar, and then execute it.
ObRelated: Relocatable code pretty much requires user-supplied fixups or relative jumps.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.