ahawker / ulid Goto Github PK

View Code? Open in Web Editor NEW

694.0 5.0 42.0 337 KB

Universally Unique Lexicographically Sortable Identifier (ULID) in Python 3

License: Apache License 2.0

Makefile 2.02% Python 93.71% PowerShell 4.27%

python python3 ulid uuid

ulid's People

Contributors

Stargazers

Watchers

ulid's Issues

Update README with pros/cons vs. UUID

Originated from Reddit Comment.

The README should contain more information (parity at the very least) with the README in ULID.

Add Python 3.7 Support

- All tests passing
- Tox/TravisCI integration
- Windows Integration in #280

Add mypy support

This package was written with type hints (PEP484) so it should perform some static analysis checks on build.

Add make target for invoking checks
~~Add tox target?~~
~~Add pytest support?~~
Add TravisCI support

Add bounds checking for max timestamp overflow case

We need to add validation for handling the max timestamp value, 2 ^ 48 - 1, 281474976710655. Spec notes are at https://github.com/ulid/spec#overflow-errors-when-parsing-base32-strings

Parsing of the t value in the following example should raise an exception.

>>> import ulid
>>> s = '7ZZZZZZZZZZZZZZZZZZZZZZZZZ'
>>> t = '8ZZZZZZZZZZZZZZZZZZZZZZZZZ'
>>> ulid.parse(s)
<ULID('7ZZZZZZZZZZZZZZZZZZZZZZZZZ')>
>>> ulid.parse(t)
<ULID('0ZZZZZZZZZZZZZZZZZZZZZZZZZ')>

GitHub
ulid/spec
The canonical spec for ulid. Contribute to ulid/spec development by creating an account on GitHub.

Fix non-ascii character tests

There are a number of tests based off the invalid_str_encoding fixture that are passing but the assert is being fulfilled by an incorrect code path.

Investigate best way to assert against exception message (pattern matching hopefully)
Fix tests and assert correct exception from expected code path

Would it be possible to update the changelog with the more recent version enhancements? Specifically, I'm upgrading from 0.0.6 to 0.0.7 and was hoping to get some high-level context. I've looked through the commits, I just wanted to make sure I wasn't missing the forest for the trees on anything.

Enforce ULID Timestamp Range

A number of the test fixtures that generate data are powered by os.urandom. This works fine until it generates a random sequence of bytes that starts with a leading zero. This will cause tests to fail during duration due to int.bit_length stripping leading zeros in its computation.

Example test failure: https://travis-ci.org/ahawker/ulid/jobs/294263189

All of the above is a side-effect of the fact that there is no validation logic for the timestamp portion of a ULID. It should never contain a zero leading byte since the minimum value is the Unix epoch.

Items to address this issue:

Validation rules to enforce minimum and maximum timestamp values upon creation
Update test fixtures to specific generates values within valid or invalid ranges

Example:

>>> import ulid
>>> data = b"\x00\xcdh\x95}\xd9\xb2Yp':y0\xe4\xce\xdc"
>>> ulid.from_bytes(data)
<ULID('00SNM9AZESP9CQ09STF4RE9KPW')>
>>> ulid.from_int(int.from_bytes(data, byteorder='big'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hawker/src/github.com/ahawker/ulid/ulid/api.py", line 76, in from_int
    raise ValueError('Expects integer to be 128 bits; got {} bytes'.format(length))
ValueError: Expects integer to be 128 bits; got 15 bytes

Remove "development" requirements from base.txt

Currently the requirements/base.txt requirements file (for runtime) contains dependencies that are only useful for a development/deployment environment. These should be broken out into a separate file.

API: from_randomness should support ULID/Randomness objects

The from_randomness function in ulid/api.py supports creating ULID instances with a randomness value from a given value. In addition to the currently supports types, it should also support Randomness and ULID types as well.

When the given value is a Randomness, a straight copy of all bytes should suffice.
When the given value is a ULID, a straight copy of the last 10 bytes should suffice.

Add Performance Benchmarking

I did some very basic work with pytest-benchmark during development. However, a more complete and robust set of performance tests for common API calls/flows should be written.

Completeness Criteria:

New benchmark module, say test_performance.py.
Use pytest groups on the module or filter it out so make test doesn't always run it.
Add make benchmark or some similar target to execute them.
Add as a new tox and Travis CI target.
Pick a stable machine to run benchmarks and add a baseline to the README.

Fix CI (Replace Travis)

Travis CI is dead for open source projects (free). Swap to Circle CI, Github Actions, or all to Appveyor

Freeze Dependency Versions

All of the requirements txt files should be updated to freeze against a specific version.

ULID.hex skips leading zero

ulid-py 1.1.0

The .hex attribute does not correctly pad to 32 characters. It skips the leading zero, giving a len-31 string (33 with the 0x).

import ulid
import binascii 

u = ulid.from_randomness(0)
print(len(u.hex))
print(u.hex)
print(f"0x{binascii.hexlify(u.bytes).decode()}")

Out:

33
0x17b0c9d5b3b00000000000000000000
0x017b0c9d5b3b00000000000000000000

Code Coverage: ulid/api.py

Aim for 100% code coverage for the ulid/api.py module.

Report: https://codeclimate.com/github/ahawker/ulid/coverage/59501e131e7b440001015f3e

Full Windows Support

This issue should track related work for making Windows a first class citizen for this package.

- All tests passing
- CI/CD pipeline via AppVeyor
- Attempt to merge travis* commands in Makefile into generalized ci* commands

Datetime objects are naive

ulid.timestamp().datetime returns a naive datetime object (lacking time zone information), but yet the time is in UTC.

A naive datetime is ambiguous. Can the datetime be made aware by explicitly attaching the UTC time zone? The datetime module documentation has reasons why it is preferred to use aware datetimes to represent times in UTC

Problems with using ulid with `mypy --strict`

Hi,
my project is using mypy --strict.
While importing ulid I'm getting a problem:

import ulid

MY_ULID = ulid.new()

error: Module has no attribute "new"

I found a workaround:

import ulid

MY_ULID = ulid.api.new()

But I'm sure the first way is a bit more preferable.

Investigation

I made some investigation on the problem.
The following modified content of __init__.py should fix the problem:

from .api import from_bytes, from_int, from_randomness, from_str, from_timestamp, from_uuid, new, parse
from .ulid import Randomness, Timestamp, ULID


__all__ = [
    # from .api
    'new', 'parse', 'from_bytes', 'from_int', 'from_str', 'from_uuid', 'from_timestamp', 'from_randomness',
    # from .ulid
    'Timestamp', 'Randomness', 'ULID',
]

__version__ = '0.0.14'

So I explicitly imported items and explicitly listed them in __all__ . This is some code duplication, but it looks not fatal for me.

Questions

Q1. Should I create PR with the these changes for in __init__.py?

Q2. Should I crate PR to fix all mypy --strict errors for the whole ulid project? The fixes are going to be trivial from my experience. Here is the full list of mypy errors:

ulid\ulid.py:23: error: Function is missing a type annotation
ulid\ulid.py:26: error: Function is missing a type annotation
ulid\ulid.py:39: error: Function is missing a type annotation
ulid\ulid.py:52: error: Function is missing a type annotation
ulid\ulid.py:67: error: Function is missing a type annotation
ulid\ulid.py:82: error: Function is missing a type annotation
ulid\ulid.py:97: error: Function is missing a type annotation
ulid\ulid.py:112: error: Function is missing a return type annotation
ulid\ulid.py:115: error: Function is missing a return type annotation
ulid\ulid.py:118: error: Function is missing a return type annotation
ulid\ulid.py:121: error: Function is missing a return type annotation
ulid\ulid.py:124: error: Function is missing a return type annotation
ulid\ulid.py:127: error: Function is missing a return type annotation
ulid\ulid.py:275: error: Returning Any from function declared to return "Timestamp"
ulid\ulid.py:275: error: Call to untyped function "Timestamp" in typed context
ulid\ulid.py:284: error: Returning Any from function declared to return "Randomness"
ulid\ulid.py:284: error: Call to untyped function "Randomness" in typed context
ulid\api.py:47: error: Returning Any from function declared to return "ULID"
ulid\api.py:47: error: Call to untyped function "ULID" in typed context
ulid\api.py:104: error: Returning Any from function declared to return "ULID"
ulid\api.py:104: error: Call to untyped function "ULID" in typed context
ulid\api.py:124: error: Returning Any from function declared to return "ULID"
ulid\api.py:124: error: Call to untyped function "ULID" in typed context
ulid\api.py:137: error: Returning Any from function declared to return "ULID"
ulid\api.py:137: error: Call to untyped function "ULID" in typed context
ulid\api.py:149: error: Returning Any from function declared to return "ULID"
ulid\api.py:149: error: Call to untyped function "ULID" in typed context
ulid\api.py:198: error: Returning Any from function declared to return "ULID"
ulid\api.py:198: error: Call to untyped function "ULID" in typed context
ulid\api.py:244: error: Returning Any from function declared to return "ULID"
ulid\api.py:244: error: Call to untyped function "ULID" in typed context

API: Add from_* style method

Currently, the API exposes multiple methods for creating ulid.ULID instances from other data types. However, it does not support a "catch all" call that attempts to make the determination based on type and requires the caller to do that.

Let's imagine that a user of the library has read an input value from somewhere that they have a relatively high confidence is a ULID. However, they don't know the format in which it was stored. In order to support this mechanism, the user of the library needs to write the following code:

if isinstance(value, bytes):
    return ulid.from_bytes(value)
if isinstance(value, int):
    return ulid.from_int(value)
if isinstance(value, str):
    return ulid.from_str(value)
if isinstance(value, uuid.UUID):
    return ulid.from_uuid(value)

raise ValueError('Cannot create ULID from type {}'.format(value.__class__.__name__)

This is pretty verbose, especially since we could hide this logic inside the library in a separate API call itself. It will be slightly slower that calling the correct method directly, since we have to run the if/else tree every time and don't know the "hot path", but should be helpful for this scenario.

Potential thoughts:

from_(value)
from_value(value)
from_obj(value)
from_unknown(value)
parse(value)
decode(value)
load(value)

Add Monotonicity

Address reported pylint issues

Either fix the reported issue or explicitly add an ignore to silence the warning if it's "written as intended".

Issues:

hawker@mbp:~/src/github.com/ahawker/ulid|master⚡
⇒  make lint
************* Module ulid
W: 11, 0: Wildcard import api (wildcard-import)
W: 13, 0: Wildcard import ulid (wildcard-import)
************* Module ulid.api
C: 21, 0: Invalid constant name "TimestampPrimitive" (invalid-name)
C: 27, 0: Invalid constant name "RandomnessPrimitive" (invalid-name)
************* Module ulid.hints
C: 12, 0: Invalid constant name "Buffer" (invalid-name)
************* Module ulid.ulid
C:191, 8: Invalid variable name "ms" (invalid-name)
make: *** [lint] Error 20

Read the Docs Support

The codebase is relative well covered with comments and docstrings. We need to get the repository hooked up to an online documentation source, likely Read the Docs and get the API documentation updating as part of the build/release process.

Monitor dependency versions

Using a service such as pyup, this repository should be monitored for changes in dependency versions.

How to derive a ULID from a string?

Suppose you have

ulid_string = '01EYV88PB2Y212QSR0AJ2JX5T4'

How would you derive a ulid object from the string?

consider implementing .toJSON so ulid is JSON serialisable

especially relevant with django-ulid

ImportError: cannot import name 'ULID' from 'ulid'

I've encountered the problem with this ERROR, but I can import ulid successfully. Somebody can help me?

Code Coverage: ulid/base32.py

Aim for 100% code coverage for the ulid/base32.py module.

Report: https://codeclimate.com/github/ahawker/ulid/coverage/59501e131e7b440001015f3f

is it possible to reduce the size of the ID at the expense of less ULIDs per millisecond?

Wondering if its possible to reduce the size of the ID at the expense of less ULIDs per millisecond?

No module named 'ulid.api'

Starting ulid 0.2.0 I get this error when I try to simple install library

(venv) ➜  pip install ulid-py==1.0.0               
Collecting ulid-py==1.0.0
  Using cached https://files.pythonhosted.org/packages/3f/9e/deba154963e4eb00cd31b60f35329359dcbf8ad34a01371c10f32faf3867/ulid_py-1.0.0-py2.py3-none-any.whl
Installing collected packages: ulid-py
  Found existing installation: ulid-py 0.1.0
    Uninstalling ulid-py-0.1.0:
      Successfully uninstalled ulid-py-0.1.0
Successfully installed ulid-py-1.0.0
(venv) ➜  python3 <<< "import ulid" 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lib/python3.8/site-packages/ulid/__init__.py", line 10, in <module>
    from .api import default, microsecond, monotonic
ModuleNotFoundError: No module named 'ulid.api'

Also, when I download a package from PyPI and unpack it there is not API folder inside

Values for range queries

To do range selection on time with ULIDs one needs to generate values with the lowest/highest possible randomness.

While this is doable with some effort, I feel it should be offered by the API. For example:

uilid.from_timestamp(timestamp, randomness=ulid.MIN_RANDOM)

deepcopy doesn't work on ULID object

Running into a cryptic error when trying to deepcopy a ULID object:

>>> import ulid
>>> a = ulid.new()
>>> a
<ULID('01EAZF1038723PE2SS9BXRQC80')>
>>> import copy
>>> copy.deepcopy(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 173, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 147, in deepcopy
    y = copier(x, memo)
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 211, in _deepcopy_tuple
    y = [deepcopy(a, memo) for a in x]
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 211, in <listcomp>
    y = [deepcopy(a, memo) for a in x]
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 147, in deepcopy
    y = copier(x, memo)
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 162, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle 'memoryview' object

Any ideas?

Code Coverage: ulid/ulid.py

Aim for 100% code coverage for the ulid/ulid.py module.

Report: https://codeclimate.com/github/ahawker/ulid/coverage/59501e131e7b440001015f41

Assert on ValueError exception messages

There are many cases where a ValueError can be raised by any number of functions across most of the modules in this package.

I am relatively confident that all of the @pytest.raises(ValueError) calls are correct based on code coverage metrics. However, I was proven wrong today and had to address some of them with #61.

The scope of this task is to go through all tests that use @pytest.raises, capture the exception and perform an additional assertion of the exception message to confirm that we're hitting the exact code path expected.

API: from_timestamp should support ULID/Timestamp objects.

The from_timestamp function in ulid/api.py supports creating ULID instances with a timestamp from a given value. In addition to the currently supports types, it should also support Timestamp and ULID types as well.

When the given value is a Timestamp, a straight copy of all bytes should suffice.
When the given value is a ULID, a straight copy of the first 6 bytes should suffice.

Backport to Python 2.7?

Here are some initial thoughts but definitely incomplete list of changes necessary.

Switch hard coded bytes to be configurable to str.
Loss of int.to_bytes() and int.from_bytes().
Loss of datetime.timestamp()
Differences between memoryview and buffer?

Document how to get the next ULID

If you receive an ULID from some external source (e.g. a database) you might want to compute the next following ULID. This is useful for range-style queries where you are trying to retrieve every item after the aforementioned ULID. The library already does so internally to provide monotonic values but it's not entirely clear how to get the monotonically "next" ULID, given another one.

Example:

prev = ulid.parse(some_str)  # From external source
next = ...  # ???

I was playing with ulid.create but I couldn't quite figure it out. It seems to be that bumping the randomness by one and if that overflow bumping the timestamp by one is what we want.

A ULID.next method would be really nice.

Properly handle invalid base32 characters

As of today, it is possible to input non-base32 characters, uU for example, into any of the api calls.

Doing this will cause the library to fail silently and perform an incorrect base32 decode on the string.

The API should provide a feedback mechanism that informs the caller of the bad input. The implementation of that feedback is still TBD (separate API call vs. exception vs. ??).

Considerations:

Performance of this computation for every decode call?
Double-penality for callers that have already made this guarantee?
Separate API call to validate? Is there use-cases for this outside of normal hot path?

Use rand.randbytes() instead of os.urandom()

[mmarkk@asus home]$ python -m timeit -s 'import random' 'random.randbytes(8)'
5000000 loops, best of 5: 93.9 nsec per loop
[mmarkk@asus home]$ python -m timeit -s 'import os' 'os.urandom(8)'
1000000 loops, best of 5: 248 nsec per loop

How to use ulid in a mysql database

i want do primery key use ulid how can i do i want know how use ulid in sqlalchemy orm please give a example

Non-Crockford's Base32 letters converted differently in Java or Python implementations

Hi Andrew,

first of all, thanks for the amazing library, we've been using a lot!

I have a doubt regarding how we fix the conversion of ULIDs which are not following Crockford's Base32 standard.

We are using Lua to generate some guids (https://github.com/Tieske/ulid.lua) and for some reason, we get from time to time letters outside the Crockford's Base32.
While trying to fix this on our side (we're not sure how this is happening to be honest), we realised that Java and Python implementations silently corrects this issue in different ways:

Java

ULID.Value ulidValueFromString = ULID.parseULID("01BX73KC0TNH409RTFD1JXKmO0")
--> "01BX73KC0TNH409RTFD1JXKM00"

mO is silently converted into M0

Python

In [1]: import ulid

In [2]: u = ulid.from_str('01BX73KC0TNH409RTFD1JXKmO0')

In [3]: u
Out[3]: <ULID('01BX73KC0TNH409RTFD1JXKQZ0')>

In [4]: u.str
Out[4]: '01BX73KC0TNH409RTFD1JXKQZ0'

mO is silently converted into QZ

Shouldn't the python library behave as the Java one as per the Crockford's Base32 spec, converting L and I to 1 and O to 0 and only upper casing lower case letters instead of changing them?

Thanks a lot in advance!

Eddie