Comments (4)
The next solution I came up with would be for the gensym prefixes to include their module name and a template counter. There's no guarantee that modules will be compiled in the same session (and in some perfectly reasonable use cases, they likely won't be) but we can expect that all forms within a file would be. Module names are also not likely to collide when used together, or you'd have a hard time importing them.
However, it's still possible that a module providing macros could go through several revisions, which wouldn't change the name. Code compiled with an old revision could interact with code compiled with a new one. This seems easy enough to avoid in your own code (Just do a clean build. You'd do that anyway, right?) but libraries distributed without source could have transitively depended on different revisions of the same macro provider that they didn't include. Ugh.
Seeding the counter with a timestamp or something could mitigate the issue, although it's not clear how much precision would be required, and high enough precision would significantly add to the length. We'd also be giving up reproducible builds, which isn't a stated design goal of Hissp, but maybe ought to be. A UUID would also have this problem, come to think of it.
And that still leaves __main__
, a module name that everyone uses. Hissp is just data structures. The compiler doesn't require you to include a file, even though the reader would like it. You don't usually precompile main as main though. Maybe this isn't an issue.
from hissp.
How does Clojure deal with this? I think it doesn't? AOT compilation is the exception there, not the norm. I don't think it does any better than my module name + counter idea.
Older Lisps use symbol interning to ensure that gensyms are unique. I think Clojure can't do this because it's hosted.
Hissp is hosted too. Symbols are kind of a fiction in Lissp, and don't exist in Hissp, where they're just strings. Python does intern strings, and CPython (at least) pretty aggressively interns identifiers, but this is just an optimization and there are no guarantees. I wonder how easy that is to check. Perhaps Lissp could avoid making a gensym if it's already an interned string. It's also possible to generate symbol tables when you have source. I'm not sure how much any of this helps.
from hissp.
PEP 552 – Deterministic pycs changed the .pyc format to use a hash instead of a timestamp to enable reproducible builds. That seems like a viable solution.
Template count, module name, and module hash should be sufficient. And actually, the module name is redundant given the hash, though it's not particularly human readable. The Lissp Lexer gets the entire code string as an argument, so the Lissp reader should be able to call hash()
on it, or something from the hashlib
, although that's probably not necessary; CPython's hash()
would use siphash24 for strings, which has pretty good characteristics.
Gensyms probably don't need to use all 64 bits either. Truncating to 8 hex digits is usually good enough for Git, although a few extremely large projects need 12 now. I think I could also XOR that with the counter to avoid increasing the gensym length any further than that. It would take fewer than 8 identifier characters to encode 8 hex digits.
from hissp.
OK, experimenting a little, it looks like CPython randomizes the hashes. You can set PYTHONHASHSEED
if you need the reproducible builds. Do I want to make the users figure that out? Maybe not. Extra randomization might make accidental collisions even more unlikely though. I could try something from hashlib
instead. I thought siphash24
was a pretty good choice. I wonder if there's some other way to access it. Maybe one of the others has suitable characteristics.
Also, the Lexer doesn't have the whole module. It has the current top-level form. Is this a problem? It means different revisions of the module might have top-level forms that haven't changed, even though code in the same module using them might. That means the module name is technically not redundant.
I could put the counter and module name in a tuple with the form code and hash that instead. That way the module name at least goes into the hash. I could keep the count and name human-readable and separate, although that would add considerably to the length, it might not hurt readability much. Or I could compute the code hash in the Lissp.compile()
function. That still doesn't work in the REPL though. Maybe that's fine.
from hissp.
Related Issues (20)
- `deftype` should be able to take `kwds`
- one time donations problem, please try receive some crypto as donation, maybe nano cryptocurrency
- Rethink `if-else` HOT 12
- Consider inject literals in Lissp (raw symbols) HOT 3
- Build broke on Python 3.11
- Synexpand may be too sensitive HOT 1
- & doesn't factor well in synexpand HOT 7
- Consider making gensym counter a reader instance variable
- Add source links to macros.lissp HOT 1
- defonce defs twice?
- Rename `&&` and `||` to `ands` and `ors` HOT 1
- #241 broke synexpand
- Dill integration HOT 1
- Consider munging `-` as `QzH_` instead of `Qz_`
- Consider a shorter name for `_macro_` HOT 2
- Write a symbol macro let walkthrough
- Stop supporting Python 3.8
- Compiled lambda text layout could be improved HOT 5
- Use terminology more consistently
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hissp.