I recently moved dukluv to use fixed buffer to expose libuv structs to s. I lik

Way to set prototype of buffer and/or pointer types,about svaarala/duktape

Comments (26)

svaarala commented on May 29, 2024

Wow, the terminal looks great :)

Actually plain buffers and pointers actually already inherit from Duktape.Buffer.prototype and Duktape.Pointer.prototype, but the buffer-to-string coercion is hardcoded at the moment to keep it fast.

Changing the coercion to go through the prototype toString() would probably be quite easy but I'm afraid it would conflict with needs in other environments for buffers to fast to work with. OTOH C code can do such "coercions" directly anyway so, not sure if this would be an actual issue - this would need some polling :)

For background on "metatables" in general, I've been considering Lua-like metatables from time to time, but there's currently no such concept in Duktape (Proxy comes closest). It would be possible to add but would need some work to figure out what its role should be, e.g. to avoid duplicating the Proxy mechanism. Earlier I thought about implementing Proxy through a lower level "metatable" concept but it turned out to be difficult to find the right metatable primitives to make it powerful but still match what Proxy needs.

Anyway, it would be possible to dedicate a few bits from the 16 spare bits to control buffer behavior e.g. for coercion.

There are actually other needs for this too - for instance, dynamic buffers can have a "spare" that reduces resizing at the cost of keeping some memory preallocated. Managing the spare is not currently exposed to user code. It would be nice if there were a few modes for managing the spare automatically, e.g. to keep the buffer without spare (compact), default spare algorithm, large spare for buffers that go through a lot of resizing, perhaps a "no shrink" flag, etc.

from duktape.

creationix commented on May 29, 2024

I'm fine with there being a single built-in string conversion, but can it be something other than utf8? We already have strings for storing utf8 data. I'd be find with something like pointer's conversion. Ideally I'd be able to set the type name when registering the magic code. For example, I'd like a uv_tcp_t that's wrapped in a fixed buffer to to-string as something like [uv_tcp_t 0x7f92d8517cc0]

from duktape.

svaarala commented on May 29, 2024

The buffer-to-string conversion actually doesn't do anything with the bytes. It appears to use UTF-8 simply because Duktape handles all strings in UTF-8 (CESU-8 actually), so any bytes you coerce into a string are interpreted as UTF-8 automatically.

from duktape.

creationix commented on May 29, 2024

Ahh, so it's print that's doing the utf8 conversion. Maybe we should change print to not convert buffers and print them directly as-is. I'll probably be doing the same for all my libuv functions that accept data.

from duktape.

svaarala commented on May 29, 2024

Hmm.

The current behavior tries to mimic how other plain types work. For instance, for booleans:

duk> x = true
= true
duk> String(x)
= true
duk> Object.prototype.toString.call(x)
= [object Boolean]

For plain buffers:

duk> x = Duktape.dec('hex', '666f6f')
= foo
duk> String(x)
= foo
duk> Object.prototype.toString.call(x)
= [object Buffer]

from duktape.

svaarala commented on May 29, 2024

The default print() actually prints buffer bytes directly into stdout with no changes if it's called with exactly one buffer argument. Otherwise it string coerces and appends a newline (!).

Not sure if this is best behavior, but it's useful in that you can control the exact bytes written if you use a single buffer argument. I can see the counter-argument why this might be surprising ;)

from duktape.

svaarala commented on May 29, 2024

Hmm, print() also does not conversion at all - it just writes the internal string bytes directly to stdout, appending a newline. The internal representation for strings is always UTF-8 and no conversion as such happens anywhere.

from duktape.

creationix commented on May 29, 2024

All I know is when I print my buffers, it sometimes throws an exception complaining about decoding invalid UTF-8 bytes.

from duktape.

svaarala commented on May 29, 2024

Hmm, could you paste an example?

There are cases inside Duktape where strings are expected to be valid UTF-8. Without buffers, user code is not able to create invalid UTF-8 strings because all the string constants that are appended, converted, etc will be UTF-8.

from duktape.

svaarala commented on May 29, 2024

For example, here's print() with invalid UTF-8:

duk> print(Duktape.dec('hex', 'ff'))
�= undefined

duk> print(String(Duktape.dec('hex', 'ff')))
�
= undefined

Some other string calls may barf on invalid UTF-8 though.

from duktape.

creationix commented on May 29, 2024

Ok, I think I was concatenating a string with a buffer. Thanks for explaining it.

from duktape.

creationix commented on May 29, 2024

I guess what I really want is something like the Pointer type, but managed by the VM and with the ability to tag a type and set a prototype.

from duktape.

svaarala commented on May 29, 2024

I remember having the same problem with some socket prototyping but I can't remember what the problem was. Concatenation is probably not the issue as such:

duk> 'foo' + Duktape.dec('hex', 'ff')
= foo�

Or what was the particular concatenation case that failed for you?

from duktape.

creationix commented on May 29, 2024

Hmm, I can't reproduce it right now. I have my day job to work on for a while. I'll try to reproduce it later.

from duktape.

svaarala commented on May 29, 2024

Yeah, the prototype for plain buffers and pointers is fixed now - this behavior mimics that of strings, booleans, etc, which also inherit from a fixed prototype (plain strings inherit from String.prototype etc).

I agree it'd be useful to be able to do some minimal typing for buffers.

from duktape.

svaarala commented on May 29, 2024

Ok, let me know if you catch it - I remember having the same issue but I can't remember what the nit was that caused it. It wasn't a bug as such IIRC, but perhaps something worth looking at anyway.

from duktape.

svaarala commented on May 29, 2024

Now I remember what my case was: it was the example socket server that uppercased input data. Some string built-ins don't know how to deal with non-UTF-8 data so they barf in the process:

duk> String(Duktape.dec('hex', 'ff')).toUpperCase()
Error: utf-8 decode failed
    duk_unicode_support.c:230
    toUpperCase  native strict preventsyield
    global input:1 preventsyield

There are several such situations now - the intention is that normally buffers would be operated on as buffers and only coerced to strings when that makes sense. Raw non-UTF-8 strings should also work, and basic operations like concatenation should work. But, things like regexps, trimming, case conversion, etc may currently barf. These could be fixed to just ignore non-UTF-8 characters and scan forwards (or backwards) looking for valid data, though.

from duktape.

svaarala commented on May 29, 2024

There would be one major upside in changing the current buffer-to-string coercion to go through Buffer.prototype.toString() and changing that to return something like "buffer" or "[buffer fixed 1234]" (and user code could of course override the method): it would be safer for sandboxing.

The current buffer-to-string coercion is problematic for sandboxing because it allows "internal keys" to be constructed by simple coercion. There are work items for sandboxing to avoid the security impact of this - but the whole issue would disappear if there was no such coercion available for Ecmascript code by default. In other words, user code couldn't create non-UTF-8 strings just by executing Ecmascript code.

One could of course provide a C function that did this coercion if it is needed by the application - but it would be an explicit decision rather than default behavior.

from duktape.

mitchblank commented on May 29, 2024

I'm sorry if I'm hijacking this bug (although it seems to already have drifted from its original topic a bit) I think it's related to the one piece of duktape embedding that I'm still not 100% clear on. That is, what is the best practices for exposing a native-code object into the EMCA world.

I just require the basics semantics-wise:
(A) ECMA code shouldn't be able to see (and certainly not modify!) the memory of the native object
(B) When I set a finalizer to clean up the native state (i.e. call the C++ destructor for the object) I need to make sure that the EMCA code can't cause my finalizer to not run by calling Duktape.fin() on it.
Corollary: it doesn't matter to me whether duktape manages the native object's memory or I do. I'll already have a finalizer registered so I can free it if that's easier.
(C) It should be efficient for native code to retrieve the pointer to the wrapped object
(D) It should also be efficient for the wrapped code to determine whether an ECMA object is of the expected native type.

I've taken a look at the dukluv code and it seems to currently be missing (D), or at least that's my understanding. In lua it was possible to get this by comparing the metatable with lua_rawequal(). It wasn't a great API but it mostly worked.

It seems the duktape "pointer" type is the fastest at (C) but this issue seems to indicate that it won't work very well since I can't easily add a prototype with all of my methods/getters/setters. It feels like pointer is trying to be what I want but doesn't quite get there.

I took a look at the duktape internals and it seems to use APIs like duk_get_hobject() and DUK_HOBJECT_GET_CLASS_NUMBER() but those are understandably not available to mortals.

So I'm left a little confused about what I'm supposed to be doing here? Wrap an object in a E6 proxy? Attach a buffer (or pointer) to the native object as a hidden property on an object?

Do I need to completely remove Duk.fin() from the ECMA-callable code to keep users from preventing my own finalizers from running?

from duktape.

svaarala commented on May 29, 2024

The pointer type is intended to be the lightest possible way to store a pointer with no semantics (i.e. a void *) and it basically has the semantics of Lua lightuserdata. It's implemented as a tagged type, so there's no space for things like finalizer references etc. There is space for 16 bits (unused now) to use for something, and this has been discussed above.

The best practice at the moment is to:

Create an object for every native resource you want to track this way. Use a prototype object to inherit methods etc if there are a lot of instances of the object type.
Store the native buffer and pointer reference(s) (and file handles, etc) in the object as internal properties.
Attach a finalizer to the wrapper object. Implement the finalizer in C so that it can easily access the internal properties and free buffers etc.

This works quite well unless you're running actually malicious code (instead of just accidentally broken code). For malicious code, you'll need actual sandboxing measures: replacing the global object so that user code is denied from accessing e.g. Duktape.fin, denying access to buffer values (which can currently be used to access internal properties), etc. However, sandboxing stuff is still under work, and there's e.g. no way to abort a script in infinite loop. See the internal sandboxing document for the current limitations.

When the native resource being tracked is just a buffer, it'd be nice to be able to attach a finalizer without an object wrapper. There are some ideas for this, but it needs careful thought because buffers also serve as the most primitive and efficient way of dealing with mutable raw data. Some current ideas:

Add a small magic value for buffer objects, and use it to encode an index to a finalizer array (or something equivalent). This is not terribly flexible, but should have no memory impact.
Add an optional finalizer reference to buffers. Allow it to be added only for dynamic buffers because it's easier to use a variable number of fields for them internally.
Add an optional property table to buffers, again restricting it to dynamic buffers.

To be worth the added complexity, the end result from this must be significantly cheaper than a wrapper object or a Buffer object of course.

from duktape.

mitchblank commented on May 29, 2024

I've already looked a bit at the sandboxing stuff and I'm still trying to decide what I'm doing there. My initial worry is less malicous code than "clever" code. i.e. a JS programmer decides it would be fun to use a custom finalizer and suddenly my daemon process starts leaking sockets. In the end I'll probably be making a custom global object anyway from them so I guess it's not an issue.

This does seem like a common use case for a runtime built for embedding -- have you considered adding a first class data type for it (i.e. sibling to pointers and buffers) If I could be so bold to sketch an API... I'm just calling it 'extended' here, not sure if there's a better word. "wrapped"? "reflected"?

/* exposed in the API: */

struct duk_extended_type_info {
    const char *name;
    duk_c_function finalizer;  /* or NULL */
    int (*to_string)(duk_context *ctx, const struct duk_extended_type_info *typeinfo); /* or NULL for some default that just prints 'name' I guess? */
    /* others? */
};

/* allocates the object, returns a pointer to it which the caller must fill in: */
extern void *duk_push_extended(duk_context *ctx, const struct duk_extended_type_info *typeinfo, duk_size_t bytes);

extern duk_bool_t duk_is_extended(duk_context *ctx, duk_idx_t index, const struct duk_extended_type_info *typeinfo);

/* returns NULL if not an extended type matching 'typeinfo': */
extern void *duk_get_extended(duk_context *ctx, duk_idx_t index, const struct duk_extended_type_info *typeinfo);

extern void *duk_require_extended(duk_context *ctx, duk_idx_t index, const struct duk_extended_type_info *typeinfo);

/* versions that don't typecheck: */
extern duk_bool_t duk_is_any_extended(duk_context *ctx, duk_idx_t index);
extern void *duk_get_any_extended(duk_context *ctx, duk_idx_t index);
extern void *duk_require_any_extended(duk_context *ctx, duk_idx_t index);

/* given a pointer to the 'extended' value, find its type */
extern const struct duk_extended_type_info *duk_get_extended_typeinfo(const void *extended_ptr);

/* ---- internal structure inside duktape.c: ---- */

#define DUK_HOBJECT_CLASS_EXTENDED 19
struct duk_extended_type {
    /* [...normal duktape sutff like prototype pointer...] */
    const struct duk_extended_type_info *typeinfo;
    duk_uint8_t storage[1];   /* variable-sized, must be last.  Might actually want to make this a union of a bunch of types to make sure it's always well-aligned for any C type */
};

The advanges would be:

Getting the native pointer (which is just to &storate[0] of the gc-managed duktape struct) should be as fast as a call to duk_get_pointer())
Once we have that pointer, getting back the "typeinfo*" value is just pointer math, so that's basically free as well
C code can verify that it's working on the expected type just by comparing the typeinfo pointer
Since the C code declares the typeinfo instance itself it can also put any extra info it needs per-type next to it in the struct.
The typeinfo instance is also const, so it can be shared among multiple duktape heaps in the same process
It would also be useful internally by duktape itself for a lot of the things that it currently uses DUK_HOBJECT_CLASS's for, so it should have a nearly zero impact in code size.

Anyway, for now I'll implement something like this on top of internal properties as you suggest. This means an extra property lookup on every native call but it shouldn't be too bad.

from duktape.

creationix commented on May 29, 2024

Oh wow, this is still open too. I'm using wrapped Objects with the fixed buffer in an internal property in my new project.

All I'm missing now is a way to control the the internal class of my wrapper object. I need a way for my pretty printer (which is implemented in JS) to print these with something more useful than { }. While I can set toString in my custom prototype, the pretty-printer doesn't use this for object's who's internal class is "Object" because it would prevent seeing the structure of all normal objects.

I'm creating it using a C constructor function using the provided this value. I can verify this is inheriting from my custom .prototype that's set on the constructor.

from duktape.

svaarala commented on May 29, 2024

If you're using standard objects then ArrayBuffer and typed arrays would be (IMO) the default way to do. They automatically wrap an internal plain buffer but provide property slots because the ArrayBuffer / typed array is an ordinary object. They also have an internal class different from object so that they print out e.g. as [object ArrayBuffer].

ArrayBuffer and typed arrays are of course not as memory efficient as plain buffers but do have an internal C structure representation (rather than having a lot of actual properties) so they're still pretty OK. For example, on x64 a fixed buffer has a 40 byte header while actual buffer objects (ArrayBuffer, typed array, Node.js Buffer) have a 88 byte header.

from duktape.

svaarala commented on May 29, 2024

This issue is still open: the need is clear but there wasn't an obvious, compatible solution for Duktape 1.x. With Duktape 2.x it'd be easier to make a change.

Adding more fields to plain buffers must be done very carefully because they're the lowest overhead representation for byte arrays and each byte matters for very low memory targets. For pointers the situation is even more difficult because they're not even heap allocated so that there's at most 16 bits of metadata when using packed duk_tval.

Design-wise the current situation is pretty clean: a plain buffer works like a plain string (whose prototype you also can't set) and a plain pointer like e.g. an integer. There are boxed variants of strings and numbers when you do want to have property slots; same goes for buffers and pointers. But I can see why making an exception to this basic model would make practical sense. The two requested features would be adding a prototype and adding a finalizer to a plain buffer/pointer.

There's also a C API working item to make it easier to work with boxed values so that they'd behave like their primitives counterparts. For example duk_get_buffer() returns a data pointer only for a plain buffer while duk_get_buffer_data() accepts both plain and boxed arguments.

from duktape.

svaarala commented on May 29, 2024

Anyway, the best approach so far (there are separate issues for that) is to add a 16-bit magic value to both plain buffer and pointer. There are existing API calls that can be used to set/get the magic. The magic, by itself, allows user code to distinguish between pointer/buffer purposes to a small extent. The downside of a 16-bit space is that it's quite small (but more isn't available for 8-bit packed values) and not ideally modular because of that.

After that, some mechanism is needed for looking up a prototype/finalizer object based on the magic value. This could be as simple as some array providing the mappings, or a user-provided callback which provides the prototype/finalizer when given the object (or its magic value).

from duktape.

svaarala commented on May 29, 2024

For a bit more concreteness, overview of typing alternatives ranging from plain tagged types to extended objects: https://github.com/svaarala/duktape/blob/86b73bf3ca3b0f19c5ae9660c62403f76522efbf/doc/typing.rst.

Going over it more comprehensively, for plain buffers the extension options would be:

Keep the current buffer structure but pack 16 bits of magic into the tagged value. Use the magic value to reference a finalizer/prototype.
Add more C struct variants (detected using flag bits) which would allow the basic, low memory optimized struct to gain fields like a prototype or a finalizer reference. There are already several buffer struct variants so this is certainly doable; the downside is that the specific struct must be chosen at creation time and can't be changed later because its heap pointer might change. Adding and removing new fields is a bit painful to maintain.
Drop the plain buffer variants entirely and use only buffer objects duk_hbufobj; it's a duk_hobject extension which has both specific fields (compact and fast) but also has the ability to hold properties for easy extension. Maybe come up with a "lightweight" duk_hbufobj with as small a memory footprint as possible (e.g. no slice information).

Keeping plain buffers is very useful in some environments because they're so lightweight. In other environments where mainly ArrayBuffers and typed arrays are used, plain buffers are not that useful and sometimes seem confusing. Also because an ArrayBuffer references a plain buffer uses more memory than strictly necessary because the underlying buffer needs its own heap object instead of the allocation being directly held by the ArrayBuffer. So this is definitely an area where some simplifications might be possible.

For pointers there aren't that many good options:

Keep the current tagged representation, adding 16-bit magic, then use the magic for e.g. prototype and finalizer lookup.
Drop the plain pointer type entirely and use a specific heap allocated structure to represent pointers. This is rather heavyweight and would feel like a downgrade to me.
Drop the plain pointer type and use a duk_hobject extension type to hold pointers.

from duktape.

Way to set prototype of buffer and/or pointer types about duktape HOT 26 OPEN

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent