Giter Club home page Giter Club logo

tsid-python's Introduction

tsid-python

A Python library for generating Time-Sorted Unique Identifiers (TSID) as defined in https://github.com/f4b6a3/tsid-creator.

This library is a port of the original Java code by Fabio Lima.

Installation

pip install tsidpy

What is a TSID?

The term TSID stands for (roughly) Time-Sorted ID. A TSID is a value that is formed by its creation time along with a random value.

It brings together ideas from Twitter's Snowflake and ULID Spec.

In summary:

  • Sorted by generation time.
  • Can be stored as a 64-bit integer.
  • Can be stored as a 13-char len string.
  • String format is encoded to Crockford's base32.
  • String format is URL safe, case insensitive and has no hyphens.
  • Shorter than other unique identifiers, like UUID, ULID and KSUID.

TSID Structure

A TSID has 2 components:

  1. A time component (42 bits), consisting in the elapsed milliseconds since 2020-01-01 00:00:00 UTC (this epoch can be configured)

  2. A random component (22 bits), containing 2 sub-parts:

    • A node identifier (can use 0 to 20 bits)
    • A counter (can use 2 to 22 bits)

    Note: The counter length depends on the node identifier length.

    For example, if we use 10 bits for the node representation:

    • The counter is limited to 12 bits.
    • The maximum node value is 2^10-1 = 1023
    • The maximum counter value is 2^12-1 = 4095, so the maximum TSIDs that can be generated per millisecond is 4096.

This is the default TSID structure:

                                            adjustable
                                           <---------->
|------------------------------------------|----------|------------|
       time (msecs since 2020-01-01)           node      counter
                42 bits                       10 bits    12 bits

- time:    2^42 = ~69 to ~139 years with adjustable epoch (see notes below)
- node:    up to 2^20 values with adjustable bits.
- counter: 2^2..2^22 with adjustable bits and randomized values every millisecond.

Notes:

  • The time component can be used for ~69 years if stored in a SIGNED 64-bit integer field (41 usable bits) or ~139 years if stored in a UNSIGNED 64-bit integer field (42 usable bits).
  • By default, new TSID generators use 10 bits for the node identifier and 12 bits to the counter. It's possible to adjust the node identifier length to a value between 0 and 20.
  • The time component can be 1 ms or more ahead of the system time when necessary to maintain monotonicity and generation speed.

Node identifier

The simplest way to avoid collisions is to make sure that each generator has an exclusive node ID.

The node ID can be passed to the TSIDGenerator constructor. If no node ID is passed, the generator will use a random value.

Recommended readings

Related with the original library:

Basic usage

Create a TSID:

from tsidpy import TSID

tsid: TSID = TSID.create()

Create a TSID as an int:

>>> TSID.create().number
432511671823499267

Create a TSID as a str:

>>> str(TSID.create())
'0C04Q2BR40003'

Create a TSID as an hexadecimal str:

>>> TSID.create().to_string('x')
'06009712f0400003'

Note: TSID generators are thread-safe.

TSID as int

The TSID::number property simply unwraps the internal int value of a TSID.

>>> from tsidpy import TSID
>>> TSID.create(432511671823499267).number
432511671823499267

Sequence of TSIDs:

38352658567418867
38352658567418868
38352658567418869
38352658567418870
38352658567418871
38352658567418872
38352658567418873
38352658567418874
38352658573940759 < millisecond changed
38352658573940760
38352658573940761
38352658573940762
38352658573940763
38352658573940764
38352658573940765
38352658573940766
         ^      ^ look
                                   
|--------|------|
   time   random

TSID as str

The TSID::to_string() method encodes a TSID as a Crockford's base 32 string. The returned string is 13 characters long.

>>> from tsidpy import TSID
>>> tsid: str = TSID.create().to_string()
'0C04Q2BR40004'

Or, alternatively:

>>> tsid: str = str(TSID.create())
'0C04Q2BR40004'

Sequence of TSID strings:

01226N0640J7K
01226N0640J7M
01226N0640J7N
01226N0640J7P
01226N0640J7Q
01226N0640J7R
01226N0640J7S
01226N0640J7T
01226N0693HDA < millisecond changed
01226N0693HDB
01226N0693HDC
01226N0693HDD
01226N0693HDE
01226N0693HDF
01226N0693HDG
01226N0693HDH
        ^   ^ look
                                   
|-------|---|
   time random

The string format can be useful for languages that store numbers in IEEE 754 double-precision binary floating-point format, such as Javascript.

More Examples

Create a TSID using the default generator:

from tsidpy import TSID

tsid: TSID = TSID.create()

Create a TSID from a canonical string (13 chars):

from tsidpy import TSID

tsid: TSID = TSID.from_string('0123456789ABC')

Convert a TSID into a canonical string in lower case:

>>> tsid.to_string('s')
'0123456789abc'

Get the creation timestamp of a TSID:

>>> tsid.timestamp
1680948418241.0  # datetime.datetime(2023, 4, 8, 12, 6, 58, 241000)

Encode a TSID to base-62:

>>> tsid.to_string('z')
'0T5jFDIkmmy'

A TSIDGenerator that creates TSIDs similar to Twitter Snowflakes:

  • Twitter snowflakes use 10 bits for node id: 5 bits for datacenter ID (max 31) and 5 bits for worker ID (max 31)
  • Epoch starts on 2010-11-04T01:42:54.657Z
  • Counter uses 12 bits and starts at 0 (max: 4095 values per millisecond)
from tsidpy import TSID, TSIDGenerator

datacenter: int = 1
worker: int = 1
node: int = datacenter << 5 | worker
epoch: datetime = datetime.fromisoformat('2010-11-04T01:42:54.657Z')

twitter_generator: TSIDGenerator = TSIDGenerator(node=node, node_bits=10,
                                                 epoch=epoch.timestamp() * 1000,
                                                 random_fn=lambda n: 0)

# use the generator
tsid: TSID = twitter_generator.create()

A TSIDGenerator that creates TSIDs similar to Discord Snowflakes:

  • Discord snowflakes use 10 bits for node id: 5 bits for worker ID (max 31) and 5 bits for process ID (max 31)
  • Epoch starts on 2015-01-01T00:00:00.000Z
  • Counter uses 12 bits and starts at a random value.
from tsidpy import TSID, TSIDGenerator

worker: int = 1
process: int = 1
node: int = worker << 5 | process
epoch: datetime = datetime.fromisoformat("2015-01-01T00:00:00.000Z")

discord_generator: TSIDGenerator = TSIDGenerator(node=node, node_bits=10,
                                                 epoch=epoch.timestamp() * 1000)

# use the generator
tsid: TSID = discord_generator.create()

Make TSID.create() to use the previous Discord generator:

TSID.set_default_generator(discord_generator)

# at this point, you can use the default TSID.create()
tsid: TSID = TSID.create()

# or the generator
tsid: TSID = discord_generator.create()

A note about node id and node bits

When creating a TSIDGenerator, remember you can't use a node id greater than 2^node_bits - 1. For example, if you need to use a node id greater than 7, you need to use more than 3 bits for the node id:

from tsidpy import TSIDGenerator

gen0 = TSIDGenerator(node=0, node_bis=3)  # ok
gen1 = TSIDGenerator(node=1, node_bis=3)  # ok
...
gen7 = TSIDGenerator(node=7, node_bis=3)  # ok

# error: can't represent 8 with 3 bits
gen8 = TSIDGenerator(node=8, node_bis=3)

Other ports, forks and OSS

Ports, forks, implementations and other OSS

Ports, forks and implementations:

Language Name
Go vishal-bihani/go-tsid
Java vladmihalcea/hypersistence-tsid
Java vincentdaogithub/tsid
.NET kgkoutis/TSID.Creator.NET
PHP odan/tsid
Python luismedel/tsid-python
Rust jakudlaty/tsid
TypeScript yubintw/tsid-ts

Other OSS:

Language Name
Java fillumina/id-encryptor
.NET ullmark/hashids.net

License

This library is Open Source software released under the MIT license.

tsid-python's People

Contributors

kasium avatar luismedel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

kasium

tsid-python's Issues

Python version specifier is incorrect

Thanks for such a useful package! One issue I've noted while trying to use it; in your pyproject.toml, you specify that the package requires-python 3.7 or newer:

[project]
name = "tsidpy"
version = "1.1.2"
authors = [
{ name="Luis Medel", email="[email protected]" },
]
description = "A Python library for generating Time-Sorted Unique Identifiers (TSID)"
readme = "README.md"
requires-python = ">=3.7"

However, in tsid.py, you then use the match syntax that is not supported by Python versions older than 3.10:

match fmt:
case 'S': # canonical string in upper case
result = self._to_canonical_string()
case 's': # canonical string in lower case
result = self._to_canonical_string().lower()
case 'X': # hexadecimal in upper case
result = encode(self.number, 16, min_length=TSID_BYTES*2)
case 'x': # hexadecimal in lower case
result = encode(self.number, 16, min_length=TSID_BYTES*2)
result = result.lower()
case 'd': # base-10
result = encode(self.number, 10)
case 'z': # base-62
result = encode(self.number, 62)
case _:
raise ValueError(f"Invalid format: '{fmt}'")

Type hints

Would it be possible to add type hints to the project? I'm happy to contribute these

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.