ulid / spec Goto Github PK

View Code? Open in Web Editor NEW

9.3K 92.0 171.0 53 KB

The canonical spec for ulid

License: GNU General Public License v3.0

spec's Introduction

Universally Unique Lexicographically Sortable Identifier

UUID can be suboptimal for many use-cases because:

It isn't the most character efficient way of encoding 128 bits of randomness
UUID v1/v2 is impractical in many environments, as it requires access to a unique, stable MAC address
UUID v3/v5 requires a unique seed and produces randomly distributed IDs, which can cause fragmentation in many data structures
UUID v4 provides no other information than randomness which can cause fragmentation in many data structures

Instead, herein is proposed ULID:

ulid() // 01ARZ3NDEKTSV4RRFFQ69G5FAV

128-bit compatibility with UUID
1.21e+24 unique ULIDs per millisecond
Lexicographically sortable!
Canonically encoded as a 26 character string, as opposed to the 36 character UUID
Uses Crockford's base32 for better efficiency and readability (5 bits per character)
Case insensitive
No special characters (URL safe)
Monotonic sort order (correctly detects and handles the same millisecond)

Implementations in other languages

From ourselves and the community!

Language	Author	Binary Implementation
C++	suyash	✓
C#	mcb2001
Clojure	theikkila
Objective-C	ricardopereira
Crystal	SuperPaintman
Dart	isoos	✓
Dart	GuepardoApps
Delphi	matinusso
D	enckse
D (dub)	extrawurst
Erlang	savonarola
Elixir	Homepolish	✓
Elixir	merongivian
Elixir	omisego	✓
Elixir (Ecto)	dcuddeback	✓
F#	lucasschejtman
Factor	Alexander Ilin	✓
Go	oklog	✓
Haskell	steven777400	✓
Java	huxi	✓
Java	azam
Java	Lewiscowles1986
JavaScript	ulid
JavaScript	aarondcohen	✓
Julia	ararslan
Kotlin	GuepardoApps
Lua	Tieske
.NET	RobThree	✓
.NET	fvilers
Nim	adelq
OCaml	stripedpajamas
Perl 5	bk	✓
PHP	Lewiscowles1986
PHP	robinvdvleuten
PowerShell	PetterBomban
Python	mdipierro
Python	ahawker	✓
Python	mdomke	✓
R	hrbrmstr
Ruby	rafaelsales
Rust	mmacedoeu	✓
Rust	dylanhart	✓
Scala	petitviolet
SQL-Microsoft	rmalayter	✓
Swift	simonwhitehouse
Swift	yaslab	✓
Tcl	dbohdan

Specification

Below is the current specification of ULID as implemented in ulid/javascript.

Note: the binary format has not been implemented in JavaScript as of yet.

 01AN4Z07BY      79KA1307SR9X4MV3

|----------|    |----------------|
 Timestamp          Randomness
   48bits             80bits

Components

Timestamp

48 bit integer
UNIX-time in milliseconds
Won't run out of space 'til the year 10889 AD.

Randomness

80 bits
Cryptographically secure source of randomness, if possible

Sorting

The left-most character must be sorted first, and the right-most character sorted last (lexical order). The default ASCII character set must be used. Within the same millisecond, sort order is not guaranteed

Canonical String Representation

ttttttttttrrrrrrrrrrrrrrrr

where
t is Timestamp (10 characters)
r is Randomness (16 characters)

Encoding

Crockford's Base32 is used as shown. This alphabet excludes the letters I, L, O, and U to avoid confusion and abuse.

0123456789ABCDEFGHJKMNPQRSTVWXYZ

Monotonicity

When generating a ULID within the same millisecond, we can provide some guarantees regarding sort order. Namely, if the same millisecond is detected, the random component is incremented by 1 bit in the least significant bit position (with carrying). For example:

import { monotonicFactory } from 'ulid'

const ulid = monotonicFactory()

// Assume that these calls occur within the same millisecond
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVRZ
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVS0

If, in the extremely unlikely event that, you manage to generate more than 2⁸⁰ ULIDs within the same millisecond, or cause the random component to overflow with less, the generation will fail.

import { monotonicFactory } from 'ulid'

const ulid = monotonicFactory()

// Assume that these calls occur within the same millisecond
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVRY
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVRZ
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVS0
ulid() // 01BX5ZZKBKACTAV9WEVGEMMVS1
...
ulid() // 01BX5ZZKBKZZZZZZZZZZZZZZZX
ulid() // 01BX5ZZKBKZZZZZZZZZZZZZZZY
ulid() // 01BX5ZZKBKZZZZZZZZZZZZZZZZ
ulid() // throw new Error()!

Overflow Errors when Parsing Base32 Strings

Technically, a 26-character Base32 encoded string can contain 130 bits of information, whereas a ULID must only contain 128 bits. Therefore, the largest valid ULID encoded in Base32 is 7ZZZZZZZZZZZZZZZZZZZZZZZZZ, which corresponds to an epoch time of 281474976710655 or 2 ^ 48 - 1.

Any attempt to decode or encode a ULID larger than this should be rejected by all implementations, to prevent overflow bugs.

Binary Layout and Byte Order

The components are encoded as 16 octets. Each component is encoded with the Most Significant Byte first (network byte order).

0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      32_bit_uint_time_high                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     16_bit_uint_time_low      |       16_bit_uint_random      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       32_bit_uint_random                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       32_bit_uint_random                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Prior Art

Partly inspired by:

spec's People

Contributors

Stargazers

Watchers

Forkers

aarondcohen jpwilliams huxi 40a stripedpajamas catufunwa floydpink shaunstanislauslau tchen0123 lprimeroo dkdocs sahwar bish0polis hrbrmstr silky alex-ilin qumeta petitviolet intfrr gitviscodeorg nacika-ins valrcs badboy zostay nguyenhoangson marksalpeter otranquangtrung itsrainingmani andyglow thomascayne corelibcore alekzandero jkugiya lemarsu djordjedjukic edoceo lazercorn akx rafaelsales kiyoaki fabiolimace quynhvi98 nicolas-grekas getother mdomke twillouer jgowdy private-forks mjurincic-toolbox identifiers hyamw automata-studio huazailmh avaz rabuchaim suryatmodulus wiltonlazary forkkit yinchuangsum ihor-nahuliak vaibhav-kaushal andypeng2015 neuroradiology shafimsp temafey volight zeta1999 t3149 tuxinoo tabulon-ext ultonian h2atecnologia mrhanqr belzecue allswiftui kyle-jeynes teddydd peterbourgon valery-barysok shirok snowriverl mistshi sakishum zhangaz1 longyue-tech oooooooz wangjacker sanyuesiyuewuyue coolboyly 1008610010 xiaojun2sheng yuanyu90221 hardboil leandrodaher pragmaticivan marcoserafini2 clayne krvajal blowingsmoke skilesare

spec's Issues

[Question]: Are ULIDs globally unique and appropriate for a high volume chat app?

I apologize if this isn't the place to ask this --

Specifically my scenario would be to use the Javascript ULID library inside AWS lambdas, which are processing the incoming messages and assigning them IDs which are then partitioned in Dynamo (for which ULID is an ideal structure).

This is a scenario where several lambdas may be receiving messages in the same moment. Will these generate unique globally unique ULIDs?

Storing ULIDs as primary keys

Since ulids are sortable, is it ok to store them as char(26) in mysql/pgsql... when used as primary keys or should they rather be stored in some other form(binary/bytes...)?

Wikipedia page

I have started a Wikipedia page. I'm trying to make it a bit like the UUID page. I'll work on this in the future; help is appreciated though! See the talk page to see what I'm planning to change / do.

Another idea might be trying to get ULID's standardized by some standards body/bodies?

Oh, and while I'm at it; it think we need to clarify the spec on monotonicity being a required part of the spec or not (or maybe even get rid of monotonicity?)

FPC (Free Pascal) implementation: https://github.com/afrusrsc/ulid

Is this repository still active?

Just wonder due to there are a lot of opened PRs.

[Q] shortenning to 20 characters

our payment processor requires 20 character long reference ID.
We are already using ULID for our order IDs, so I would like to just chop off the first 6 characters
to get it from 26 to 20.

We do not need ids to be crypto-secure.
Just something that's reasonably unique within 10 mln entries space

Would that work?

Comparison to Firebase PushIDs

Google's Firebase Realtime DB uses a sortable unique identifier format, a JS implementation of which is available at https://gist.github.com/mikelehen/3596a30bd69384624c11

72 bit of randomness, but it doesn't collide if multiple IDs are generated within the same millisecond on the same machine, the last property is kind of cool and achieved rather simply.

base62 feature

Any plan add base62 feature?

ULID for Zig

Wouldn't recommend adding this to the README just quite since there's a few more methods and tests I want to add but I implemented a binary implementation for Zig and am actively using it in a project. Leaving this issue here in case anyone comes looking for it at the upstream repo.

https://github.com/nektro/zig-ulid

Zig: https://ziglang.org / https://github.com/ziglang/zig

Question: regexp

Please provide correct regex for ULID. Thanx!

Multiple ULIDs per millisecond not guaranteed

If, in the extremely unlikely event that, you manage to generate more than 2^80 ULIDs within the same millisecond, or cause the random component to overflow with less, the generation will fail.

Given the width of the random space is 80 bits, the latter is guaranteed to happen first (or at the same time).

There is also no guarantee that the number randomly generated is not very close to 2^80 -- should implementations consider a maximum number they will accept from the random number generator, and that they will change random numbers greater than to? (Or choose a number lower than)?

Question: Has anyone managed to integrate ULID in PostgreSQL?

Proposal: Suggest uppercase

I just saw that there are implementations of ULID that provide uppercase only ULIDs, others (like the PHP implementation by @robinvdvleuten) provide lowercase ULIDs. The specification does not yet impose uppercase or lowercase but states that ULID is "case-insensitive". This is a great feature.

Nevertheless, I'd like to propose suggesting Uppercase ULID as "the right way". Mainly for two reasons:

most of the libraries provide uppercase ULIDs. The existing libraries will probably become the "standard solution" for their language (given that they are well designed, tested and documented).
Having different solutions in different languages could lead to problems in multi language environments. Consider a database that is accessed by more than one project with different languages across the projects. Consider Regex validation that will often be made up for the existing library and may lead to problems when other librarys or languages provide stuff.

I understand that this is debatable, as being flexible in your setup is a strength. But I also think, that having either uppercase or lowercase as the proposed (or imposed) way to implement ULIDs will help the Spec to spread because there is less potential for conflicts.

What do you think of this?

Would it make sense to extend the ULID to include an optional checksum?

This would be useful in the case that the ULID is communicated over the phone or by handwriting...
Standardising the form would make it easier to recognise when it's being used and throw it away upon insertion into a database system... as well as how to regenerate it again and verify it's integrity when transmitted manually.

Looks like Crockford's Base32 which you are using does have some provision for appending an optional check symbol... however it does not provide a way to disambiguate between a checksummed data vs a non checksummed data... but for our purpose since the ULID is a fixed length... you could assume any extra data appended is the check symbol... anyway best to clarify that in the spec as an implementation recommendation

would ulid be better then snowflake id?

would ulid be better/more unique than a twitter snowflake and why?

Differences and Similarities with uuidv6?

Didn't see this in the prior art section and i wonder if this spec tacles the same problems as the v6 not-official proposal did.

https://bradleypeabody.github.io/uuidv6/

Sorting by its raw bytes results in a sequence equivalent to sorting by its embedded timestamp, thus making it naturally more useful as a primary key and providing improved reference locality and thus insert performance. (I.e. primary keys that are out of sequence are not just useless for sorting, but can also cause unnecessary disk access due to occupying significantly different locations in the index. [example])
The embedded time should be extractable for use as the creation time of the record.
Global uniqueness, which is of course a basic requirement of all types of UUIDs.

Remove monotonicity guarantee from spec

That's basically the summary of #11
Reasons mentioned are e.g.:

Makes it stateful
Worst case scenario leads to just one ULID per millisecond
Can only work in a concurrent setting and not in an async one
...

Maybe it can be moved to an addendum as a recommendation of additional features that could be provided by an implementation.

urn:ulid:.. namespace registration planned?

Have you considered to register an URN namespace for ULID in order to use ULIDs with linked data technologies?

Similar RFC for UUID is rfc4122.

Need help with discrepancy found in decoding ULID byte strings

I've run into an odd issue when decoding ULIDs to bytes and encoding bytes back to ULIDs.

TL;DR: Is the binary layout and byte order defined in the spec correct? If so, where is my error in logic?

Where are all 128 bits?

The specification states:

Technically, a 26-character Base32 encoded string can contain 130 bits of information, whereas a ULID must only contain 128 bits. Therefore, the largest valid ULID encoded in Base32 is 7ZZZZZZZZZZZZZZZZZZZZZZZZZ

This is worded in such a way that I would expect 7ZZZZZZZZZZZZZZZZZZZZZZZZZ, when decoded, to have all 128 bits set to 1, but that doesn't seem to be the case. Instead, this is what the bytes look like when converted to hexadecimal:

3f ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

This does not have all 128 bits set to 1. To set all bits to 1, that first 3 needs to be an f, but when I do this (i.e. ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff), Crockford's base32 encodes the bytes as:

ZZZZZZZZZZZZZZZZZZZZZZZZZW

Notice the W at the end.

Data loss?

Furthermore, while 7ZZZZZZZZZZZZZZZZZZZZZZZZZ converts to 3f ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff, the reverse is not true. Encoding those same bytes back into Crockford's base32 renders 7ZZZZZZZZZZZZZZZZZZZZZZZZW.

So, it appears that Crockford's base32 encoding results in some data loss?

Timestamp Magic?

Continuing on, the spec says:

the largest valid ULID encoded in Base32 is 7ZZZZZZZZZZZZZZZZZZZZZZZZZ, which corresponds to an epoch time of 281474976710655 or 2 ^ 48 - 1

If we assume the binary layout and byte order to be true (as specified here), then when we convert 7ZZZZZZZZZZZZZZZZZZZZZZZZZ to bytes, we should be able to take the first 48 bits (6 bytes) and convert them to an integer, which should equal 281474976710655.

Going back to my earlier point, 7ZZZZZZZZZZZZZZZZZZZZZZZZZ decodes to bytes as:

3f ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

The first 48 bits of this are:

3f ff ff ff ff ff

When converted to a decimal integer, we get the value 70368744177663, which is not the same as 281474976710655. When converted to a date, we get 4199-11-24 01:22:57.663 +00:00.

However, all the ULID implementations I tried appear to convert the date correctly to arrive at the value expected by the specification, so I must be doing something wrong.

Add a link to Crockford's Base 32 definition (or copy it)

Because Crockford's base 32 definition is central to ULID encoding, the spec should point to the correct source of truth:

https://www.crockford.com/base32.html

Notably, not all implementations alias i, I, o, and O to 1 and 0, respectively.

Also, it would be good to clarify the acceptability of a check symbol.

Can we deterministically generate a ULID from another ULID?

I guess we need to leave the highest 48 bits unchanged and generate the remaining 80 bits based on the input ULID. What would be a good way to do this?

ULID in Oracle

If anyone wants to implement the ULID spec in Oracle, here's a start:
Extracting timestamp from ULID in PL/SQL

    p_Ulid VARCHAR2(200) := '01EG664DVCY5NTH7WFN57PA7TM';
    FUNCTION Get_Ulid_Ts(p_In VARCHAR2) RETURN TIMESTAMP
        WITH TIME ZONE IS
        Dec_Value   NUMBER := 0;
        t_Time_Part VARCHAR2(10) := Substr(p_In, 0, 10); --First 10 characters are the timestamp
        Ret         TIMESTAMP WITH TIME ZONE := To_Timestamp_Tz('19700101 +00:00', 'yyyymmdd TZH:TZM');
        c_Base      NUMBER := 32;
        c_Base32    VARCHAR2(32) := '0123456789ABCDEFGHJKMNPQRSTVWXYZ'; --Crockford's base32
        TYPE B32_Map_Typ IS TABLE OF NUMBER INDEX BY VARCHAR2(1);
        B32map B32_Map_Typ;
    BEGIN
        --initialize base32 map
        FOR i IN 0 .. Length(c_Base32) - 1
        LOOP
            B32map(Substr(c_Base32, i + 1, 1)) := i;
        END LOOP;
        --convert base 32 to base 10
        FOR i IN 1 .. Length(t_Time_Part)
        LOOP
            Dec_Value := Dec_Value +
                         Power(c_Base, i - 1) *
                         B32map(Substr(t_Time_Part, -i, 1));
        END LOOP;
        --add to unix timestamp sentinal
        Ret := Ret +numtodsinterval(((Dec_Value/ 1000) ),'SECOND') ;
        RETURN Ret;
    END;
BEGIN
    --ISO8601 timestamp formats
    EXECUTE IMMEDIATE q'!alter session set nls_timestamp_format = 'YYYY-MM-DD"T"HH24:MI:SS.ff3"Z"' !';
    EXECUTE IMMEDIATE q'!alter session set nls_timestamp_tz_format = 'YYYY-MM-DD"T"HH24:MI:SS.ff3 TZR' !';
    --test function
    Dbms_Output.Put_Line(Get_Ulid_Ts(p_Ulid));
    Dbms_Output.Put_Line(Get_Ulid_Ts(p_Ulid) At TIME ZONE
                         'America/New_York');
END;
/

yet another scala implementation

please review #30 and merge if accepted
thanks!

GPL-3.0 for a specification?

Usually specifications are public domain or CC-0

Differences from UUIDv1?

How is this proposal different from UUIDv1?

RFC 4122, Section 4.1.4 specifies the first 60 bits of a UUID to hold a time stamp, and Section 4.5 specifies the remaining bits in the absence of a network address.

Proposing an alternative (and non-standardized) ID system will only fragment usage.

Clarification of Crockford Base32

The spec says:

"Uses Crockford's base32 for better efficiency and readability (5 bits per character)"

"Crockford's Base32 is used as shown. This alphabet excludes the letters I, L, O, and U to avoid confusion and abuse."

However it is not exactly clear does this mean only using the character set 0123456789ABCDEFGHJKMNPQRSTVWXYZ or does it mean using also the Crockford base32 features such as treating i and l as 1, treating o as 0 and ignoring hyphens in the string?

GUID comparison

Greetings,

I am compiling a GUID summary sheet which tries to aggregate different dimensions with the intent to help engineers pick the right generator.

Would you be kind enough to help me validate the ULID column and point to any issues?

Thank you

Guarantee a minimum number of IDs before overflow of the random component

The actual number of ULIDs I can get in a millisecond might only be 1 (with absurd probability 2^-80). We could guarantee at least 2^79 ULIDs by requiring the first random generated in a millisecond starts with a zero bit (implementors would merely mask it, I'm sure).

This has the negative impact of only 79 bits of randomness instead of 80 so if there are 1e12ULIDs being generated across different devices in the same millisecond then we'll get a collision whp (vs the previous expectation of 7.8e11 ULIDs in one millisecond)

Postgres extension written in rust to generate ulid

Hey, I created a postgres extension that was written in rust, and can be used in postgres to generate ulid values for various stuff.

check it here - spa5k/uids-postgres

mainly created it for experiment and lack of a good postgres extension written in ksuid, ulid and nanoid.

One extension for all 3 of em.

Perl6 version is also available

https://github.com/zostay/p6-ULID

(not my module, I've just noticed that it's listed in https://modules.perl6.org/search/?q=ulid but not linked in README here)

[Question] Is there any function to convert between uuid and ulid

As I see in the spec, 128-bit compatibility with UUID.

I have an usecase to return id in uuid format, but default output of ulid is Base32. Is there any function to convert back-and-forth between uuid and ulid?

Request clarification in readme on the nature of 128-bit compatibility with UUID

E.g. what is the canonical way to convert between UUID and ULID form? Is it possible to losslessly encode a ULID by having an extra field storing the timestamp?

Why does base32 encoding using Crockford alphabet not produce the same results as the spec?

I am trying to understand what method of base32 encoding is used to produce the common ULID string specified in the spec, leading with usually 01G as it doesn't seem to be the method defined in rfc4648 with the Crockford alphabet.

Using any rfc4648 compliant base32 encoder with the Crockford32 alphabet ends up producing a different resulting string, usually preceding with 063

For example, using Golang and the oklog/ulid package:

package main

import (
	"encoding/base32"
	"fmt"

	"github.com/oklog/ulid/v2"
)

const crockford32 = "0123456789ABCDEFGHJKMNPQRSTVWXYZ"

var crockfordEncoder = base32.NewEncoding(crockford32).WithPadding(base32.NoPadding)

func main() {

	id := ulid.Make()
	fmt.Printf("encoding/base32: \t%s\n", crockfordEncoder.EncodeToString(id[:]))
	fmt.Printf("ULID.String(): \t\t%s\n", id.String())
}

// output 
//   encoding/base32:        063A7TAANJDY8Q2ASDD5EE0F2R
//   ULID.String():          01GTHYJJNCKFJ5RJPBB9BKG3RP

Here is the code also in the Go playground - be aware the time is frozen is this sandbox.
https://go.dev/play/p/T9FD_2uJhCZ

Deprecate in favor of UUIDv7

I just read about UUIDv7. It's a proposed IETF standard, in "last call". Other than the base32 encoding, it's got everything we want from ULID. UUIDv7 will be more widely accepted and available to developers.

I propose that we declare victory, and allow the UUID spec to take over the effort from here on.

Listed packages may not comply with spec

The specification README lists many packages that are not compliant with stated specifications.

Regardless of your feelings on monotonicity (#11, #40, etc), the current spec claims ULIDs provide it, and that they throw an error on overflow. Many listed packages do not implement monotonicity, do not throw, or implement it incorrectly (e.g. by incrementing the timestamp).

Recommended resolutions

modify the spec (e.g. clarifying whether sections are optional), and/or
clarify which listed implementations are fully compliant

E.g. a column in the README table describing whether a package implements monotonicity per spec would be useful to potential users

Consider removing unmaintained https://github.com/mmacedoeu/rulid.rs/

The project is unmaintained since 2017.

An alternative, well-maintained project, is already linked.

`Qt_5.11' not found

I ran 'git submodule update --init --recursive' then 'sudo make install' then 'brain-player b82cae11-b1e5-49fa-a921-923994b07b9a' and got this error:

Traceback (most recent call last): File "/usr/local/bin/brain-player", line 2, in <module> from assist_replayer import main File "/usr/local/lib/python2.7/dist-packages/assist_replayer/__init__.py", line 17, in <module> from app import main File "/usr/local/lib/python2.7/dist-packages/assist_replayer/app.py", line 17, in <module> from PySide2.QtCore import QPointF, QUrl, Signal, Slot ImportError: /usr/lib/x86_64-linux-gnu/libQt5Core.so.5: version Qt_5.11' not found (required by /usr/local/lib/python2.7/dist-packages/PySide2/QtCore.so)
`

nonce mode

Given recent events I'd like to avoid leaking high resolution clock data.

Perhaps the ULID spec could offer support for some sort of nonce offset.

ULID implementation for Cloudflare Workers

I would like to include my JavaScript ULID fork, specific to the peculiarities of Cloudflare Workers, in your list of compliant implementations. Would this be acceptable?

See: My rationale for the fork, along with discussion and consent by @perry-mitchell maintainer of perry-mitchell/ulidx.