saladdais / tailslide Goto Github PK
View Code? Open in Web Editor NEWEmbeddable parser, AST, tree walker and compiler for the Linden Scripting Language based on LSLint
License: MIT License
Embeddable parser, AST, tree walker and compiler for the Linden Scripting Language based on LSLint
License: MIT License
This is probably an issue in lslint as well
There's a common pattern of doing pushChild()
which internally calls addNextSibling()
on the head of the list. That walks the list until it finds the tail and tacks on the provided node. This leads to terrible perf for scripts that have long lists of children like stack_heap_collide.lsl
. It's especially pronounced when walking a list while mutating it.
Might make sense to keep track of the tail of the child list or swap to a vector.
Need some sensible way of giving these symbols constant values. The semantics around lvalue references to uninitialized heap types within globals are kind of weird under LSO (they tend to cause bounds check failures when used.) Uninitialized vars also aren't allowed in SALists.
Erroring outright on weird globals like that would be wrong since the script will work fine so long as the value isn't read before being written within a function. For ex in LSO:
list baz;
list quux = baz;
default {
state_entry() {
llOwnerSay((string)quux); // bad. `quux` is uninitialized, runtime bounds error
}
}
will break but
list baz;
list quux = baz;
default {
state_entry() {
quux = [1];
llOwnerSay((string)quux); // fine. `quux` was initialized before it was read.
}
}
is totally fine.
Might make sense to have a "default" constant value for each type that's tagged as such so it "poisons" all references to it, in case the distinction is important to a consumer.
Having multiple points where assertions might need to be checked has made the Logger
API more painful than it needs to be. It could do with being rethought entirely.
Would make it less annoying to use the lscript fuzzer. Can crib from the manifests in https://bitbucket.org/lindenlab/3p-curl/src/master/, and possibly use GH actions to upload artifacts on tag creation.
Right now most dynamic allocations rely on the gAllocationManager
thread-local global having been set, and all allocs are only cleaned up when the ScriptAllocationManager
's destructor is called. This was an okay stop-gap to deal with the fact that lslint (as a CLI util) never really needed to manually release resources, but I think we can do better. Similar story with the thread-local Logger
singleton that visitors use to report errors.
Luau's AST library uses a similar allocation strategy, but it manually passes down the allocator rather than globals, and I believe a pointer to it is stored on (or at least reachable somehow from) the AST nodes. Might make sense to have a ParserContext
struct or something like that that gets passed to all AST nodes so they can always reference their own allocator without a global.
With the current abuse of globals, the only way of having two distinct ASTs live at once is to parse and process them in separate threads.
Arbitrary constants defined by the lexer like MUL_ASSIGN
are effectively part of the public API, even though they shouldn't be and may change in value across versions.
IF
.Since there's no such thing as a key literal, you have to be careful about value propagation / constant folding for keys
list foo = ["<some_uuid>"];
is not the same thing as
key some_uuid = "<some_uuid>";
list foo = [some_uuid];
The former places a string value in a list whereas the latter coerces the string value into a key first, then places the key value in the list. Incautious folding can break uses of llListFindList()
and llGetListEntryType()
since both care about the type of the list element.
Folding in a key as a string constant is fine in places where the string expression will be automatically coerced to a key (rvalue of a key var assignment, function arg that requires a key) but not ok in a list expression. In other cases you can get away with a runtime cast to key, but obviously that's not possible in global initializers, the latter form is the only way of getting a real key into a global list initializer.
This only really matters if you're transpiling back to LSL, so if this is added it might make sense to have a separate transpilation-safe constant folding mode.
Outside of Clang, use of if constexpr
without warnings requires flipping from C++14 to C++17. To support C++14 consumers we need to use std::enable_if
patterns instead.
Found by the lscript<->tailslide conformance fuzzer
The following compiles in LL's compiler, but not tailslide:
default{state_entry(){if(1)//\xFF;
else{;}llOwnerSay("why.");}}
There is a line comment //
followed by a literal \xFF
byte followed by a ;
.
Appears to be due to a probable misunderstanding in LL's comment parser:
void line_comment()
{
char c;
while ((c = yyinput()) != '\n' && c != 0 && c != EOF)
;
}
where EOF
== -1
== \xFF
. EOF
is actually a valid byte that flex may return, and you only need to check for \0
. We should match that behavior anyways.
foo() {return llOwnerSay("bar");}
is currently allowed, but should not be. There may be other contexts where we need to be careful about void / null.
Found by the lscript<->tailslide conformance fuzzer.
This compiles with LL's compiler but doesn't in tailslide:
default {state_entry() {
string asdadsadafwqafqwfafwadwawadasasdsadsadasdsadsadsadasdasdasdsadsadadwqdqd2qdq2fqfqfqdqdqasssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa = "foobar";
string asdadsadafwqafqwfafwadwawadasasdsadsadasdsadsadsadasdasdasdsadsadadwqdqd2qdq2fqfqfqdqdqasssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa = "foo";
string asdadsadafwqafqwfafwadwawadasasdsadsadasdsadsadsadasdasdasdsadsadadwqdqd2qdq2fqfqfqdqdqasssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa = "foo";
llOwnerSay((string)llGetFreeMemory());
}}
Seems to be due to LL relying on string literals being interned by the LLStringTable::addStringEntry
, but it it treats every string over 256 chars as a distinct string due to
if (!strncmp(ret_val, str, MAX_STRINGS_LENGTH))
{
entry->incCount();
return entry;
}
// adds string to table
You can have colliding globals in mono as well, but it fails to start up due to the name collisions.
Since the string interning fails, all later references to those identifiers are a compile error under LL's compiler. Need to either error on identifiers over 255 bytes or allow them to be declared and shadows but don't allow later references.
Flex and Bison are the only potentially annoying dependencies currently, and they're not terribly popular for modern parsers. https://github.com/yhirose/cpp-peglib seems reasonable and comes with its own AST implementation that'd be useful at least for testing.
Write a fuzzer for CIL like was used for the LSO conformance testing.
Since we do constant expression evaluation at compile-time, we need to be able to guard against quadratic blowup in evaluation of scripts like
list foo = ["foo", "foo", "foo", "foo", "foo", "foo"];
list foo2 = foo + foo + foo + foo + foo + foo + foo;
list foo3 = foo2 + foo2 + foo2 + foo2 + foo2 + foo2 + foo2;
// ...
integer only_use = foo9 != [];
Since most dynamic allocs pass through the ScriptAllocator
we should have it add the size of each alloc to a counter, then throw (std::bad_alloc
?) when it passes some reasonable upper limit. It might also make sense to have a slightly lower "soft" limit after which we refuse to do value propagation / folding, so we can keep certain cases as runtime crashes vs hard compile-time crashes.
Some API consumers aren't going to have a file on disk, being able to parse a buffer you already have sitting around would be nice.
Right now C++20 has to be used under MSVC because of some struct initialization patterns that aren't all that important. Ideally you should be able to build the artifacts as C++17 and include them in a C++14 build, or at least a C++17 build. Would require breaking up bitstream.hh
at least.
for
initializers, that sort of thing.)conv.r8
.Right now a node returning a null ->get_constant_value()
can mean that either a constant value for the node can't be determined, or that something in the expression might have side-effects.
Basically, we should be able to tell that a if ((non_const = 3) + 1)
branch will always be taken because the expression will always result in 4
, but that the expression can't be folded due to the side-effects of the assignment.
We should also be able to prune a declaration like string foo = non_const + "bar";
because even though we can't determine a constant value for the expression, we know that it doesn't have any side-effects.
Can use Luau's fuzzing harnesses as a guideline here. Stuff in tests/scripts/
should be fine as a seed corpus but we'll have to be smart about excluding some of the overly-complex scripts.
Having to refer to nodes by ordinal index and then casting is not so nice compared to just having a properly typed and named field we can reference. The linked list structure doesn't really gain us much so long as we're still able to visit any given node's children.
_mChildren
or class members, not bothvisitChildren()
LSLASTNode::replaceNode()
to walk the children of the parent and get the address of the field containing the node to replace, only using the current node replacement strategy if the parent node has "list-like" children.The print()
expression is a very weird and very broken case in LL's compiler. It's sort of meant to take on the type of its argument, but it doesn't fully. print("foo") + "foo"
and string foo() {return print("bar");}
are valid according to LL's type checker, llOwnerSay(print("foo"))
is not.
Maybe it's just weird in function expressions?
Marked as enhancement because nobody cares about print()
(doesn't seem like it has ever compiled correctly under Mono.)
Right now the separation of passes is very close to how lslint and LL's compiler worked, but walking the AST isn't free. Right now the most cumulatively expensive part of the code is the tree walking logic, merging things into fewer passes == fewer tree walks == faster.
At least symbol resolution and type checking could be merged into one visitor (maybe with a flag field to determine whether to actually check types / resolve symbols.)
List-in-list is a runtime error rather than a compile-time error in LSL. As such, List-in-list in a local should be a compiler warning rather than an error.
The following is a perfectly valid LSL script:
default {
state_entry() {
return;
list l = [[]];
}
}
A list-in-list in a global, however, is a compile-time error.
We can test LSO scripts with the lscript harness, but we don't have a way to run CIL scripts at the moment. Should be able to crib a handful of classes from libremetaverse and elsewhere, enough so we can run the LSL language conformance test (it only requires a handful of library functions).
Need to look into autobuild's cross-compilation story.
I would love to update my extension for VSCode to support your plugin! LSLint For VSCode
I did some testing last night and it seems to output the same as lslint command line, which is awesome! I would just need to add an option to use tailslide vs lslint.
All of that is just a side note of why I am posting this... I use a lot of the features from Firestorm's Preprocessor, such as define
and include
[EXAMPLES], but it is not currently supported by your linting and spits out an error:
❯ tailslide --lint example.lsl
ERROR:: ( 1, 9): [E10020] syntax error, unexpected IDENTIFIER, expecting '('
TOTAL:: Errors: 1 Warnings: 0
which is just from trying to use a #define SOMETHING 123
in the script.
It is "tolerated" so to speak by lslint since it has a -i option to ignore/skip those directives, which would be helpful! [PR from lslint where it was implemented]
It would be even better if you were able to handle it, or at the least in this category handle #include
so it could trace through the other included files for that script.
I know it is a big ask, but your project is close enough to lslint in its output (which means I can use it for the VSCode GUI) and the only one that seems to be in active development. If this is something you would consider supporting, it would help a LOT of people since handling includes is the biggest downfall from lslint that people ask me about for the extension.
Thanks!
Also feel free to reach out to me in SL or email (info on my account page).
Value propagation behavior should be pluggable since the results of various operations can vary depending on what runtime is being used. For example, any operation that results in inf, -inf, or nan will crash an LSO script, so they may not be folded. Extracting the logic into an OperationSemantics
class that could be passed into the ConstantDeterminingVisitor
would be a lot cleaner, and allow things like doing the folding within an actual VM where appropriate.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.