rust-hosted-langs / book Goto Github PK
View Code? Open in Web Editor NEWWriting Interpreters in Rust: a Guide
Home Page: https://rust-hosted-langs.github.io/book/
License: Creative Commons Attribution 4.0 International
Writing Interpreters in Rust: a Guide
Home Page: https://rust-hosted-langs.github.io/book/
License: Creative Commons Attribution 4.0 International
The CloseUpvalues
opcode can take three operands, i.e. up to three upvalues can be closed in one operation. The present implementation lazily only does one.
See TODO: interpreter/src/compiler.rs:247
Refactor the loop to add up to three upvalues per CloseUpvalues
opcode.
I originally wrote the parser to generate a nested Pair
tree data structure, just like Lisp uses cons cells. I did this to simplify bootstrapping into a compiler when I didn't have more complex data structures (like the Array types) available yet.
Since this Lisp-style cons cell data structure is archaic, is it worth refactoring the parser and compiler around the List
type?
List
type is an Array<TaggedCellPtr>
so, like most dynamic language arrays is a heterogenous arrayList
type rather than Pair
sThis seems wrong. Since these types are just storing word-sized pointers and the behavior we want is to be able to copy them optimally, there should be no other reason why they can't be Copy
.
There must be a crate out there that provides Cell
mechanics while also being Copy
, maybe we need to define our own.
Hi, thanks for the book It is very helpful.
I don't get the rationale for having a custom allocator though, shouldn't it be explained at the start? Wouldn't default rust allocator be good for this?
I was reading over the section about dereferencing safely and noticed that there was not any way to root objects. I don't know if this already planned to be covered in the section on garbage collection, but I think it is a really interesting problem that doesn't have a easy solution. It seems like the two approaches used for Rust thus far are either mutation sessions where you can only access heap objects in a closure or having some macro with drop guards. Each have their own trade-offs and I am curious what rust-hosted-langs
recommends.
Currently, the allocator only allows objects up to 32k (the Immix block size) - large objects are not supported.
The code path in stickyimmix/heap.rs:114
that raises an error should implement some other mechanism for tracking large objects.
I haven't thought this through yet but my initial half-baked thought has been to keep a list (linked list? Vec?) of large objects, where memory for large objects might be backed by Vec<u8>
and the object header stored in the initial word(s).
See discussion on rust-hosted-langs/runtimes-WG#11 regarding the nanopass concept.
In my opinion this is a very relevant idea for this book, given that half the purpose is to provide a baseline for others to build on (in addition, of course, to making it easier to implement LSP or analytics.)
interpreter
code be refactored to move toward the nanopass concept? Separate crates for parsing and compiling?We need a chapter on allocating memory. I'm specifically thinking of bump allocation because:
We need a chapter that outlines the object model the VM will use, what the trade-offs are, etc, etc.
Instead of the enum TypeList
in interpreter/src/headers.rs
, we should explore better type identifier means.
For example, can we generate a compile-time const for each object type that can be used to map to a trait object, where the trait object provides core runtime functions for each object type? This would be less verbose than the enums and mapping between tagged pointer types, possibly at some cost. Perhaps some combination of the two approaches would make sense?
The TaggedPtr
type allows for size_of::<isize>() * 8 - 2
bits of integer to be stored in a pointer.
Support for numbers needs to be implemented:
+
, -
, integer div
, *
) opcodesThis initial implementation should return RuntimeError
for values and results that fall outside of the supported TaggedPtr
range.
A short chapter of the book might be written to describe this.
Consider splitting this issue into multiple issues to break it up, perhaps an issue for each area of the code and one for documentation.
It should be fairly straightforward to adjust the allocator to bump-allocate downward from the end of a block rather than upward from the beginning. See https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html for an explanation.
Bump allocation happens inside an available set of Immix lines. The "find-next-line" operation can still look upward for the next set of free lines, but within lines, allocation should bump downward from the end of the lines.
Things to take into account:
stickyimmix/src/bumpblock.rs
booksrc/chapter-simple-bump.md
I'm not thrilled with the level and quality of unit testing yet, so consider what new tests can improve coverage and corner cases as part of this.
FatPtr
(interpreter/src/taggedptr.rs
) necessary? Can it be eliminated?Value
(interpreter/src/taggedptr.rs
) and TaggedScopedPtr
(interpreter/src/safeptr.rs
) appear to be near-identical, can we merge them?Edit book chapter booksrc/interp-tagged-symbols.md
to reflect changes.
The book should include a chapter (or series of multiple chapters) discussing the basics of garbage collection and how to tackle this using Rust. The end goal should be a set of resources to implement a naïve, non-moving mark-and-sweep garbage collector. While such a GC won't be very useful for most applications it's easy enough to understand/explain that it should serve as a good introduction.
I chose MPLv2 as a middle way between copyleft and the Rust community MIT/Apache default. Should a different licensing scheme be used? Why?
Bob Nystrom continues to inspire
See interpreter/src/bytecode.rs:293
- multiple dereferences required to fetch a single opcode.
Ideal state: an instruction pointer that is a pointer to the next opcode.
Realistically: that might not be sensible, given that we probably want to retain bounds checking; what about extending Array
with a RefCell
-style ArrayOpcode
slice borrow? That's as close to bare pointer in safe Rust as we can get.
Files:
interpreter/src/bytecode.rs:293
booksrc/chapter-interp-bytecode.md
There are options in Rust, depending on circumstances and requirements:
Nightly only:
alloc
features on nightly Rust
Stable:
libc::posix_memalign()
(and equivalent on other platforms that I'm not familiar with)
Vec
as a *mut u8
Implement simple stop-the-world tracing for all objects in all blocks, including large objects.
This code should live on the interpreter side as the entry point to tracing an object is the object header, which is implemented on the interpreter side.
Trace
trait with a trace()
method. Objects such as symbols and integers that do not get traced into may have a default no-op implementation of this method.trace()
method should call ObjectHeader::mark()
on the object.trace()
method may append passed-in objects to a Vec
argument to maintain a gray list. There are probably other ways to do this.I chose CC-BY-4.0 as a default to begin with. What other licenses should be considered and why?
Implement dead object sweeping across all blocks.
Depends on:
There needs to be a chapter discussing a simple bytecode format and a parser to use in the VM. I would recommend not using heavy-weight crates such as serde and instead focus on writing this by hand. This may require a bit more boilerplate but I believe it will make it easier to understand the code/concepts involved.
The book needs a chapter dedicated to the interpreter loop, how it consumes bytecode, etc. rust-hosted-langs/runtimes-WG#3 lists some techniques, but for the book I think it's best if we focus on the usual loop
+ match
approach since it can be done using regular Rust code.
Originally, Array didn't have runtime-borrow capability and I made the (hasty) decision to base the Array
struct on using Cell<RawArray<T>>
.
It is clear that it makes far more sense to use RefCell
now, both from a semantic and performance basis. The explicit borrowflag
can also be removed.
Source: interpreter/src/array.rs:42
Book: booksrc/chapter-interp-arrays.md
My understanding from reading the other issues is: The idea of the book is that different concepts are explored and demonstrated with a simple toy language.
I see where this is coming from, but imo it leaves out the most important step in the process: Setting real requirements. If choices between bytecode formats or garbage collection strategies are arbitrary for the provided example, it's not really an example for a meaningful choice. Also it removes the accountability that these choices where actually the right ones.
I get that we don't want to be buried in too much detail, but i think that the semantic complexity of an embeddable dynamic language build for real world usage is not that much greater than a toy language that actually shows anything interesting.
The one thing that would be vastly more complex is the "chapter" about language design itself and all the parsing stuff, at least in comparison to the proposed s-expression based language. I propose to just leave that stuff out and replace it with a textual intermediate representation that is effectively an assembler for the bytecode. A chapter about parsing a super simplified language that we actually don't want to write and that is not the focus of the book has probably not that much value anyway.
We need a chapter that outlines the syntax and language semantics that we will use for the rest of the book. Using an S expression based language would be best since it makes parsing super easy. Semantics wise I wouldn't really lock down on either OO or purely functional, instead it would probably be a hybrid between the two.
First of all, great work on the book, I've been really enjoying reading through it!
I noticed that the implementation of TaggedPtr represents nil as a pointer with a tag of 0
. I've also noticed that integers are represented with a tag of 0
.
Would this not make a literal 0
be equal to nil, and mean that a zero integer would always be converted to a nil FatPtr (due to this code here )? Not sure if this is an intentional part of the design or not :)
The book text switches inconsistently between a future and present tense. Pick one style and apply it consistently.
is there any special consideration on why is this?, or its just to make BumpBlock more cache friendly?
Numbers that overflow (size_of::<isize>() * 8) - 2
bits should be stored in NumberObject
objects.
I have no experience implementing number objects and I understand that there may be commonly used or standard schemes and/or libraries? Needs an expert.
Depends on #22
NumberObject
(we can rename this)As an extension for #2 we need a chapter discussing more advanced garbage collection techniques such as:
For this chapter we probably want to focus on a specific algorithm such as Immix. I certainly wouldn't call Immix trivial, but it's not that hard either (the paper is just really really dense).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.