Giter Club home page Giter Club logo

dict_hash's Introduction

Dict Hash

Pypi project Pypi total project downloads Github Actions

Simple python tool to hash dictionaries using both default hash and sha256. The library comes with full support for hashing Pandas DataFrame objects, Numba objects and Numpy arrays, but you will need to specify the requirements when installing the package to avoid bloating the installation process.

Furthermore, the library supports objects that can be recursively hashed.

As we saw this library being used in the wild mostly to create caching libraries and wrappers, we'd like to point you to our library, Cache decorator.

How do I install this package?

As usual, just download it using pip:

pip install dict_hash

Usage examples

The package offers two functions: sha256 to generate constant sha256 hashes and dict_hash, to generate hashes using the native hash function.

Session hash with dict_hash

Obtain a session hash from the given dictionary.

from dict_hash import dict_hash
from random_dict import random_dict
from random import randint

d = random_dict(randint(1, 10), randint(1, 10))
my_hash = dict_hash(d)

Consistent hash with sha256

Obtain a consistent hash from the given dictionary.

from dict_hash import sha256
from random_dict import random_dict
from random import randint

d = random_dict(randint(1, 10), randint(1, 10))
my_hash = sha256(d)

Approximated hash

All of the methods shown offer the use_approximation parameter, which allows you to switch to a more lightweight hashing procedure where supported, for the various supported objects. This procedure will randomly subsample the provided objects.

Currently, we support this parameter for NumPy and Pandas objects.

from dict_hash import sha256
from random_dict import random_dict
from random import randint

# Even though the DataFrame is very big...
df = load_a_very_big_dataframe(...)
# an approximated hash is still very fast!
my_hash = sha256(
    df,
    use_approximation=True
)

Behavior on error

If the hashing function encounters an object that it cannot hash, it will by default raise a NotHashableException exception. You can choose whether this or other options happen by setting the behavior_on_error parameter. You can choose between:

  • raise: Raise a NotHashableException exception.
  • warn: Print a NotHashableWarning and continue hashing, setting the unhashable object to "Unhashable object" string.
  • ignore: Ignore the object and continue hashing, setting the unhashable object to "Unhashable object" string.

Recursive objects

In Python it is possible to have recursive objects, such as a dictionary that contains itself. When you attempt to hash such an object, the hashing function will raise a RecursionError exception, which you can customize with the maximal_recursion parameter, by default equal to 100. The RecursionError is most commonly then handled as a NotHashableException, and as such you can set the behavior_on_error parameter to handle it as you see fit.

Hashable

When handling complex objects within the dictionaries, you may need to implement the class Hashable in that object.

Here is an example:

from dict_hash import Hashable, sha256

class MyHashable(Hashable):

    def __init__(self, a: int):
        self._a = a
        self._time = time()

    def consistent_hash(self) -> str:
        return sha256({
            "a": self._a
        })

dict_hash's People

Contributors

lucacappelletti94 avatar matthias1590 avatar zommiommy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dict_hash's Issues

Python datetime & date fields support

I was trying to use the library, but endned up scripting my own because of the absense of the datetimefields support out of the box.

Something like that might do:

if isinstance(value, (date, datetime)): 
    value_to_use = value.isoformat()

Failing this test case

Version:

dict-hash==1.1.14
python==3.7.9

Test Case

from dict_hash import sha256

d1 = {
    'tune_best_model': True,
    'target': 'def'
}

d2 = {
    'target': 'def',
    'tune_best_model': True
}

h1, h2 = sha256(d1), sha256(d2)

assert h1 == h2, f'{h1} != {h2}'

Output:

AssertionError: 0eb22ae4f46a2d9529d841fc905e1dc0cd07782ccda4a834d556c2eb5e2b8c58 != 80e801aadf2453bb0c49d0f2667f1e612092464cb9d7dca071dd7c3ffdf86475

Expected Output

Assert True

Possible Solution:

return json.dumps(deflate(_convert(dictionary), leave_tuples=True))

change from return json.dumps(deflate(_convert(dictionary), leave_tuples=True))
to return json.dumps(deflate(_convert(dictionary), leave_tuples=True), sort_keys=True)

However, maybe it's not a good idea because it would make the time complexity O(NlogN).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.