Giter Club home page Giter Club logo

spec's Introduction



ulid


Universally Unique Lexicographically Sortable Identifier

UUID can be suboptimal for many use-cases because:

  • It isn't the most character efficient way of encoding 128 bits of randomness
  • UUID v1/v2 is impractical in many environments, as it requires access to a unique, stable MAC address
  • UUID v3/v5 requires a unique seed and produces randomly distributed IDs, which can cause fragmentation in many data structures
  • UUID v4 provides no other information than randomness which can cause fragmentation in many data structures

Instead, herein is proposed ULID:

ulid() // 01ARZ3NDEKTSV4RRFFQ69G5FAV
  • 128-bit compatibility with UUID
  • 1.21e+24 unique ULIDs per millisecond
  • Lexicographically sortable!
  • Canonically encoded as a 26 character string, as opposed to the 36 character UUID
  • Uses Crockford's base32 for better efficiency and readability (5 bits per character)
  • Case insensitive
  • No special characters (URL safe)
  • Monotonic sort order (correctly detects and handles the same millisecond)

Implementations in other languages

From ourselves and the community!

Language Author Binary Implementation
C++ suyash
C# mcb2001
Clojure theikkila
Objective-C ricardopereira
Crystal SuperPaintman
Dart isoos
Dart GuepardoApps
Delphi matinusso
D enckse
D (dub) extrawurst
Erlang savonarola
Elixir Homepolish
Elixir merongivian
Elixir omisego
Elixir (Ecto) dcuddeback
F# lucasschejtman
Factor Alexander Ilin
Go oklog
Haskell steven777400
Java huxi
Java azam
Java Lewiscowles1986
JavaScript ulid
JavaScript aarondcohen
Julia ararslan
Kotlin GuepardoApps
Lua Tieske
.NET RobThree
.NET fvilers
Nim adelq
OCaml stripedpajamas
Perl 5 bk
PHP Lewiscowles1986
PHP robinvdvleuten
PowerShell PetterBomban
Python mdipierro
Python ahawker
Python mdomke
R hrbrmstr
Ruby rafaelsales
Rust mmacedoeu
Rust dylanhart
Scala petitviolet
SQL-Microsoft rmalayter
Swift simonwhitehouse
Swift yaslab
Tcl dbohdan

Specification

Below is the current specification of ULID as implemented in ulid/javascript.

Note: the binary format has not been implemented in JavaScript as of yet.

 01AN4Z07BY      79KA1307SR9X4MV3

|----------|    |----------------|
 Timestamp          Randomness
   48bits             80bits

Components

Timestamp

  • 48 bit integer
  • UNIX-time in milliseconds
  • Won't run out of space 'til the year 10889 AD.

Randomness

  • 80 bits
  • Cryptographically secure source of randomness, if possible

Sorting

The left-most character must be sorted first, and the right-most character sorted last (lexical order). The default ASCII character set must be used. Within the same millisecond, sort order is not guaranteed

Canonical String Representation

ttttttttttrrrrrrrrrrrrrrrr

where
t is Timestamp (10 characters)
r is Randomness (16 characters)

Encoding

Crockford's Base32 is used as shown. This alphabet excludes the letters I, L, O, and U to avoid confusion and abuse.

0123456789ABCDEFGHJKMNPQRSTVWXYZ

Monotonicity

When generating a ULID within the same millisecond, we can provide some guarantees regarding sort order. Namely, if the same millisecond is detected, the random component is incremented by 1 bit in the least significant bit position (with carrying). For example:

import { monotonicFactory } from 'ulid'

const ulid = monotonicFactory()

// Assume that these calls occur within the same millisecond
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVRZ
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVS0

If, in the extremely unlikely event that, you manage to generate more than 280 ULIDs within the same millisecond, or cause the random component to overflow with less, the generation will fail.

import { monotonicFactory } from 'ulid'

const ulid = monotonicFactory()

// Assume that these calls occur within the same millisecond
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVRY
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVRZ
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVS0
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVS1
...
ulid() // 01BX5ZZKBKZZZZZZZZZZZZZZZX
ulid() // 01BX5ZZKBKZZZZZZZZZZZZZZZY
ulid() // 01BX5ZZKBKZZZZZZZZZZZZZZZZ
ulid() // throw new Error()!

Overflow Errors when Parsing Base32 Strings

Technically, a 26-character Base32 encoded string can contain 130 bits of information, whereas a ULID must only contain 128 bits. Therefore, the largest valid ULID encoded in Base32 is 7ZZZZZZZZZZZZZZZZZZZZZZZZZ, which corresponds to an epoch time of 281474976710655 or 2 ^ 48 - 1.

Any attempt to decode or encode a ULID larger than this should be rejected by all implementations, to prevent overflow bugs.

Binary Layout and Byte Order

The components are encoded as 16 octets. Each component is encoded with the Most Significant Byte first (network byte order).

0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      32_bit_uint_time_high                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     16_bit_uint_time_low      |       16_bit_uint_random      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       32_bit_uint_random                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       32_bit_uint_random                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Prior Art

Partly inspired by:

spec's People

Contributors

aarondcohen avatar alex-ilin avatar alizain avatar bish0polis avatar darccio avatar dcuddeback avatar erikreedstrom avatar floydpink avatar hrbrmstr avatar huxi avatar jpwilliams avatar mcb2001 avatar petitviolet avatar stripedpajamas avatar unnawut avatar yaslab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spec's Issues

[Question]: Are ULIDs globally unique and appropriate for a high volume chat app?

I apologize if this isn't the place to ask this --

Specifically my scenario would be to use the Javascript ULID library inside AWS lambdas, which are processing the incoming messages and assigning them IDs which are then partitioned in Dynamo (for which ULID is an ideal structure).

This is a scenario where several lambdas may be receiving messages in the same moment. Will these generate unique globally unique ULIDs?

Storing ULIDs as primary keys

Since ulids are sortable, is it ok to store them as char(26) in mysql/pgsql... when used as primary keys or should they rather be stored in some other form(binary/bytes...)?

Wikipedia page

I have started a Wikipedia page. I'm trying to make it a bit like the UUID page. I'll work on this in the future; help is appreciated though! See the talk page to see what I'm planning to change / do.

Another idea might be trying to get ULID's standardized by some standards body/bodies?

Oh, and while I'm at it; it think we need to clarify the spec on monotonicity being a required part of the spec or not (or maybe even get rid of monotonicity?)

[Q] shortenning to 20 characters

our payment processor requires 20 character long reference ID.
We are already using ULID for our order IDs, so I would like to just chop off the first 6 characters
to get it from 26 to 20.

We do not need ids to be crypto-secure.
Just something that's reasonably unique within 10 mln entries space

Would that work?

Multiple ULIDs per millisecond not guaranteed

If, in the extremely unlikely event that, you manage to generate more than 2^80 ULIDs within the same millisecond, or cause the random component to overflow with less, the generation will fail.

Given the width of the random space is 80 bits, the latter is guaranteed to happen first (or at the same time).

There is also no guarantee that the number randomly generated is not very close to 2^80 -- should implementations consider a maximum number they will accept from the random number generator, and that they will change random numbers greater than to? (Or choose a number lower than)?

Proposal: Suggest uppercase

I just saw that there are implementations of ULID that provide uppercase only ULIDs, others (like the PHP implementation by @robinvdvleuten) provide lowercase ULIDs. The specification does not yet impose uppercase or lowercase but states that ULID is "case-insensitive". This is a great feature.

Nevertheless, I'd like to propose suggesting Uppercase ULID as "the right way". Mainly for two reasons:

  • most of the libraries provide uppercase ULIDs. The existing libraries will probably become the "standard solution" for their language (given that they are well designed, tested and documented).
  • Having different solutions in different languages could lead to problems in multi language environments. Consider a database that is accessed by more than one project with different languages across the projects. Consider Regex validation that will often be made up for the existing library and may lead to problems when other librarys or languages provide stuff.

I understand that this is debatable, as being flexible in your setup is a strength. But I also think, that having either uppercase or lowercase as the proposed (or imposed) way to implement ULIDs will help the Spec to spread because there is less potential for conflicts.

What do you think of this?

Would it make sense to extend the ULID to include an optional checksum?

This would be useful in the case that the ULID is communicated over the phone or by handwriting...
Standardising the form would make it easier to recognise when it's being used and throw it away upon insertion into a database system... as well as how to regenerate it again and verify it's integrity when transmitted manually.

Looks like Crockford's Base32 which you are using does have some provision for appending an optional check symbol... however it does not provide a way to disambiguate between a checksummed data vs a non checksummed data... but for our purpose since the ULID is a fixed length... you could assume any extra data appended is the check symbol... anyway best to clarify that in the spec as an implementation recommendation

Differences and Similarities with uuidv6?

Didn't see this in the prior art section and i wonder if this spec tacles the same problems as the v6 not-official proposal did.

https://bradleypeabody.github.io/uuidv6/

Sorting by its raw bytes results in a sequence equivalent to sorting by its embedded timestamp, thus making it naturally more useful as a primary key and providing improved reference locality and thus insert performance. (I.e. primary keys that are out of sequence are not just useless for sorting, but can also cause unnecessary disk access due to occupying significantly different locations in the index. [example])
The embedded time should be extractable for use as the creation time of the record.
Global uniqueness, which is of course a basic requirement of all types of UUIDs.

Remove monotonicity guarantee from spec

That's basically the summary of #11
Reasons mentioned are e.g.:

  • Makes it stateful
  • Worst case scenario leads to just one ULID per millisecond
  • Can only work in a concurrent setting and not in an async one
  • ...

Maybe it can be moved to an addendum as a recommendation of additional features that could be provided by an implementation.

Need help with discrepancy found in decoding ULID byte strings

I've run into an odd issue when decoding ULIDs to bytes and encoding bytes back to ULIDs.

TL;DR: Is the binary layout and byte order defined in the spec correct? If so, where is my error in logic?

Where are all 128 bits?

The specification states:

Technically, a 26-character Base32 encoded string can contain 130 bits of information, whereas a ULID must only contain 128 bits. Therefore, the largest valid ULID encoded in Base32 is 7ZZZZZZZZZZZZZZZZZZZZZZZZZ

This is worded in such a way that I would expect 7ZZZZZZZZZZZZZZZZZZZZZZZZZ, when decoded, to have all 128 bits set to 1, but that doesn't seem to be the case. Instead, this is what the bytes look like when converted to hexadecimal:

3f ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

This does not have all 128 bits set to 1. To set all bits to 1, that first 3 needs to be an f, but when I do this (i.e. ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff), Crockford's base32 encodes the bytes as:

ZZZZZZZZZZZZZZZZZZZZZZZZZW

Notice the W at the end.

Data loss?

Furthermore, while 7ZZZZZZZZZZZZZZZZZZZZZZZZZ converts to 3f ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff, the reverse is not true. Encoding those same bytes back into Crockford's base32 renders 7ZZZZZZZZZZZZZZZZZZZZZZZZW.

So, it appears that Crockford's base32 encoding results in some data loss?

Timestamp Magic?

Continuing on, the spec says:

the largest valid ULID encoded in Base32 is 7ZZZZZZZZZZZZZZZZZZZZZZZZZ, which corresponds to an epoch time of 281474976710655 or 2 ^ 48 - 1

If we assume the binary layout and byte order to be true (as specified here), then when we convert 7ZZZZZZZZZZZZZZZZZZZZZZZZZ to bytes, we should be able to take the first 48 bits (6 bytes) and convert them to an integer, which should equal 281474976710655.

Going back to my earlier point, 7ZZZZZZZZZZZZZZZZZZZZZZZZZ decodes to bytes as:

3f ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

The first 48 bits of this are:

3f ff ff ff ff ff

When converted to a decimal integer, we get the value 70368744177663, which is not the same as 281474976710655. When converted to a date, we get 4199-11-24 01:22:57.663 +00:00.

However, all the ULID implementations I tried appear to convert the date correctly to arrive at the value expected by the specification, so I must be doing something wrong.

ULID in Oracle

If anyone wants to implement the ULID spec in Oracle, here's a start:
Extracting timestamp from ULID in PL/SQL

    p_Ulid VARCHAR2(200) := '01EG664DVCY5NTH7WFN57PA7TM';
    FUNCTION Get_Ulid_Ts(p_In VARCHAR2) RETURN TIMESTAMP
        WITH TIME ZONE IS
        Dec_Value   NUMBER := 0;
        t_Time_Part VARCHAR2(10) := Substr(p_In, 0, 10); --First 10 characters are the timestamp
        Ret         TIMESTAMP WITH TIME ZONE := To_Timestamp_Tz('19700101 +00:00', 'yyyymmdd TZH:TZM');
        c_Base      NUMBER := 32;
        c_Base32    VARCHAR2(32) := '0123456789ABCDEFGHJKMNPQRSTVWXYZ'; --Crockford's base32
        TYPE B32_Map_Typ IS TABLE OF NUMBER INDEX BY VARCHAR2(1);
        B32map B32_Map_Typ;
    BEGIN
        --initialize base32 map
        FOR i IN 0 .. Length(c_Base32) - 1
        LOOP
            B32map(Substr(c_Base32, i + 1, 1)) := i;
        END LOOP;
        --convert base 32 to base 10
        FOR i IN 1 .. Length(t_Time_Part)
        LOOP
            Dec_Value := Dec_Value +
                         Power(c_Base, i - 1) *
                         B32map(Substr(t_Time_Part, -i, 1));
        END LOOP;
        --add to unix timestamp sentinal
        Ret := Ret +numtodsinterval(((Dec_Value/ 1000) ),'SECOND') ;
        RETURN Ret;
    END;
BEGIN
    --ISO8601 timestamp formats
    EXECUTE IMMEDIATE q'!alter session set nls_timestamp_format = 'YYYY-MM-DD"T"HH24:MI:SS.ff3"Z"' !';
    EXECUTE IMMEDIATE q'!alter session set nls_timestamp_tz_format = 'YYYY-MM-DD"T"HH24:MI:SS.ff3 TZR' !';
    --test function
    Dbms_Output.Put_Line(Get_Ulid_Ts(p_Ulid));
    Dbms_Output.Put_Line(Get_Ulid_Ts(p_Ulid) At TIME ZONE
                         'America/New_York');
END;
/

Differences from UUIDv1?

How is this proposal different from UUIDv1?

RFC 4122, Section 4.1.4 specifies the first 60 bits of a UUID to hold a time stamp, and Section 4.5 specifies the remaining bits in the absence of a network address.

Proposing an alternative (and non-standardized) ID system will only fragment usage.

Clarification of Crockford Base32

The spec says:

"Uses Crockford's base32 for better efficiency and readability (5 bits per character)"

"Crockford's Base32 is used as shown. This alphabet excludes the letters I, L, O, and U to avoid confusion and abuse."

However it is not exactly clear does this mean only using the character set 0123456789ABCDEFGHJKMNPQRSTVWXYZ or does it mean using also the Crockford base32 features such as treating i and l as 1, treating o as 0 and ignoring hyphens in the string?

GUID comparison

Greetings,

I am compiling a GUID summary sheet which tries to aggregate different dimensions with the intent to help engineers pick the right generator.

Would you be kind enough to help me validate the ULID column and point to any issues?

Thank you

image

Guarantee a minimum number of IDs before overflow of the random component

The actual number of ULIDs I can get in a millisecond might only be 1 (with absurd probability 2^-80). We could guarantee at least 2^79 ULIDs by requiring the first random generated in a millisecond starts with a zero bit (implementors would merely mask it, I'm sure).

This has the negative impact of only 79 bits of randomness instead of 80 so if there are 1e12ULIDs being generated across different devices in the same millisecond then we'll get a collision whp (vs the previous expectation of 7.8e11 ULIDs in one millisecond)

Postgres extension written in rust to generate ulid

Hey, I created a postgres extension that was written in rust, and can be used in postgres to generate ulid values for various stuff.

check it here - spa5k/uids-postgres

mainly created it for experiment and lack of a good postgres extension written in ksuid, ulid and nanoid.

One extension for all 3 of em.

Why does base32 encoding using Crockford alphabet not produce the same results as the spec?

I am trying to understand what method of base32 encoding is used to produce the common ULID string specified in the spec, leading with usually 01G as it doesn't seem to be the method defined in rfc4648 with the Crockford alphabet.

Using any rfc4648 compliant base32 encoder with the Crockford32 alphabet ends up producing a different resulting string, usually preceding with 063

For example, using Golang and the oklog/ulid package:

package main

import (
	"encoding/base32"
	"fmt"

	"github.com/oklog/ulid/v2"
)

const crockford32 = "0123456789ABCDEFGHJKMNPQRSTVWXYZ"

var crockfordEncoder = base32.NewEncoding(crockford32).WithPadding(base32.NoPadding)

func main() {

	id := ulid.Make()
	fmt.Printf("encoding/base32: \t%s\n", crockfordEncoder.EncodeToString(id[:]))
	fmt.Printf("ULID.String(): \t\t%s\n", id.String())
}

// output 
//   encoding/base32:        063A7TAANJDY8Q2ASDD5EE0F2R
//   ULID.String():          01GTHYJJNCKFJ5RJPBB9BKG3RP

Here is the code also in the Go playground - be aware the time is frozen is this sandbox.
https://go.dev/play/p/T9FD_2uJhCZ

Deprecate in favor of UUIDv7

I just read about UUIDv7. It's a proposed IETF standard, in "last call". Other than the base32 encoding, it's got everything we want from ULID. UUIDv7 will be more widely accepted and available to developers.

I propose that we declare victory, and allow the UUID spec to take over the effort from here on.

Listed packages may not comply with spec

The specification README lists many packages that are not compliant with stated specifications.

Regardless of your feelings on monotonicity (#11, #40, etc), the current spec claims ULIDs provide it, and that they throw an error on overflow. Many listed packages do not implement monotonicity, do not throw, or implement it incorrectly (e.g. by incrementing the timestamp).

Recommended resolutions

  • modify the spec (e.g. clarifying whether sections are optional), and/or
  • clarify which listed implementations are fully compliant

E.g. a column in the README table describing whether a package implements monotonicity per spec would be useful to potential users

`Qt_5.11' not found

I ran 'git submodule update --init --recursive' then 'sudo make install' then 'brain-player b82cae11-b1e5-49fa-a921-923994b07b9a' and got this error:

Traceback (most recent call last): File "/usr/local/bin/brain-player", line 2, in <module> from assist_replayer import main File "/usr/local/lib/python2.7/dist-packages/assist_replayer/__init__.py", line 17, in <module> from app import main File "/usr/local/lib/python2.7/dist-packages/assist_replayer/app.py", line 17, in <module> from PySide2.QtCore import QPointF, QUrl, Signal, Slot ImportError: /usr/lib/x86_64-linux-gnu/libQt5Core.so.5: version Qt_5.11' not found (required by /usr/local/lib/python2.7/dist-packages/PySide2/QtCore.so)
`

nonce mode

Given recent events I'd like to avoid leaking high resolution clock data.

Perhaps the ULID spec could offer support for some sort of nonce offset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.