Giter Club home page Giter Club logo

Comments (4)

gilch avatar gilch commented on September 27, 2024

The next solution I came up with would be for the gensym prefixes to include their module name and a template counter. There's no guarantee that modules will be compiled in the same session (and in some perfectly reasonable use cases, they likely won't be) but we can expect that all forms within a file would be. Module names are also not likely to collide when used together, or you'd have a hard time importing them.

However, it's still possible that a module providing macros could go through several revisions, which wouldn't change the name. Code compiled with an old revision could interact with code compiled with a new one. This seems easy enough to avoid in your own code (Just do a clean build. You'd do that anyway, right?) but libraries distributed without source could have transitively depended on different revisions of the same macro provider that they didn't include. Ugh.

Seeding the counter with a timestamp or something could mitigate the issue, although it's not clear how much precision would be required, and high enough precision would significantly add to the length. We'd also be giving up reproducible builds, which isn't a stated design goal of Hissp, but maybe ought to be. A UUID would also have this problem, come to think of it.

And that still leaves __main__, a module name that everyone uses. Hissp is just data structures. The compiler doesn't require you to include a file, even though the reader would like it. You don't usually precompile main as main though. Maybe this isn't an issue.

from hissp.

gilch avatar gilch commented on September 27, 2024

How does Clojure deal with this? I think it doesn't? AOT compilation is the exception there, not the norm. I don't think it does any better than my module name + counter idea.

Older Lisps use symbol interning to ensure that gensyms are unique. I think Clojure can't do this because it's hosted.

Hissp is hosted too. Symbols are kind of a fiction in Lissp, and don't exist in Hissp, where they're just strings. Python does intern strings, and CPython (at least) pretty aggressively interns identifiers, but this is just an optimization and there are no guarantees. I wonder how easy that is to check. Perhaps Lissp could avoid making a gensym if it's already an interned string. It's also possible to generate symbol tables when you have source. I'm not sure how much any of this helps.

from hissp.

gilch avatar gilch commented on September 27, 2024

PEP 552 – Deterministic pycs changed the .pyc format to use a hash instead of a timestamp to enable reproducible builds. That seems like a viable solution.

Template count, module name, and module hash should be sufficient. And actually, the module name is redundant given the hash, though it's not particularly human readable. The Lissp Lexer gets the entire code string as an argument, so the Lissp reader should be able to call hash() on it, or something from the hashlib, although that's probably not necessary; CPython's hash() would use siphash24 for strings, which has pretty good characteristics.

Gensyms probably don't need to use all 64 bits either. Truncating to 8 hex digits is usually good enough for Git, although a few extremely large projects need 12 now. I think I could also XOR that with the counter to avoid increasing the gensym length any further than that. It would take fewer than 8 identifier characters to encode 8 hex digits.

from hissp.

gilch avatar gilch commented on September 27, 2024

OK, experimenting a little, it looks like CPython randomizes the hashes. You can set PYTHONHASHSEED if you need the reproducible builds. Do I want to make the users figure that out? Maybe not. Extra randomization might make accidental collisions even more unlikely though. I could try something from hashlib instead. I thought siphash24 was a pretty good choice. I wonder if there's some other way to access it. Maybe one of the others has suitable characteristics.

Also, the Lexer doesn't have the whole module. It has the current top-level form. Is this a problem? It means different revisions of the module might have top-level forms that haven't changed, even though code in the same module using them might. That means the module name is technically not redundant.

I could put the counter and module name in a tuple with the form code and hash that instead. That way the module name at least goes into the hash. I could keep the count and name human-readable and separate, although that would add considerably to the length, it might not hurt readability much. Or I could compute the code hash in the Lissp.compile() function. That still doesn't work in the REPL though. Maybe that's fine.

from hissp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.