Giter Club home page Giter Club logo

Comments (6)

tfmorris avatar tfmorris commented on June 17, 2024

@wetneb What do you think of this proposal? Any preference for how the numeric range is divided? In addition to the two options above (10^3 and 2^5), we also have a PR which proposes 2^20.

from openrefine.

thadguidry avatar thadguidry commented on June 17, 2024

@tfmorris Hmm, a bit like Snowflake ids, I like it. Incidentally, we use TSIDs in DB2Rest. There's a thread-safe Java library: https://github.com/vladmihalcea/hypersistence-tsid
Which also adds a node part that's adjustable. The node part could hold an OpenRefine version, or simply a 3 bit node id and 2 bit version for 5 bits on the node part? Having the OpenRefine version encoded would help migration and sharing, no? We could quickly detect the project was from version 3.8 and uplift when shared and given to someone to open using version 4.0?

from openrefine.

wetneb avatar wetneb commented on June 17, 2024

I'm a bit torn on this… intuitively I'd rather prefer to go in the direction of using completely opaque ids which wouldn't carry any particular information. Users might not be aware that the project ids contain this information and it might constitute an unwelcome information leak in certain circumstances. But not a hill I would die on…

from openrefine.

tfmorris avatar tfmorris commented on June 17, 2024

@wetneb The project metadata already includes both the creation time and the last updated time. This is intended to provide a hint to the user in case the metadata is gone/corrupted, ie "Your missing project is the one that you created on the afternoon of May 30." Currently we have no way of telling the user what project(s) is/are missing. (Of course, the best thing is not to lose the projects in the first place.) From a practical point of view, the current IDs are completely opaque.

Another option would be to use the newly defined UUID v7 from rfc9562, but that would require increasing the field size from 64 to 128 bits, breaking compatibility, so is a non-starter for now until we have protocol & metadata versioning. One useful hint from that spec is that dividing fields on nibble boundaries makes them more easily human parseable in hex format. We could also place the timestamp in the high order 48 bits to match the UUID layout, for whatever that's worth.

@thadguidry That repo looks like a rip-off of https://github.com/f4b6a3/tsid-creator/, but we don't need sortable IDs - just a rough idea of time that we can convey to the user. We should have the OpenRefine version encoded in the metadata, but I don't think the project ID is the correct place for it.

from openrefine.

thadguidry avatar thadguidry commented on June 17, 2024

@tfmorris Gotcha, agree. Btw, in the first paragraph of the README says that it's not a "ripoff", it's a "fork" that's maintained because the original repo is no longer wanting to be maintained by its creator.

from openrefine.

wetneb avatar wetneb commented on June 17, 2024

The project metadata already includes both the creation time and the last updated time.

Yes, I am aware that we store those times in the project metadata, but what I am saying is that it feels somewhat quirky to also encode that in the project id itself.

This is intended to provide a hint to the user in case the metadata is gone/corrupted, ie "Your missing project is the one that you created on the afternoon of May 30."

To provide something like this, I'd rather store the entire project metadata in a more corruption-resilient way independent from project serialization, for instance in a SQLite database. That would have the advantage of also being able to provide the user with not just the creation date, but also the project name and other metadata fields.

from openrefine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.