Giter Club home page Giter Club logo

discohash's Introduction

πŸš€ DISCoHAsH - A Blend of Simplicity and Performance in Hashing

version npm downloads

Achieves 8-10GB/s, Successfully Passed SMHasher Tests


Introduction

DISCoHAsH, also known as BEBB4185 or just discohash, is a straightforward and high-performance hash function that combines simplicity with impressive performance. While there are several great high-speed hash functions available, DISCoHAsH stands out for its elementary implementation, not relying on architecture-specific optimizations, and still delivering commendable speed and quality. It’s not the fastest nor the best in quality, but its simplicity and portability make it an excellent choice for various applications.

DISCoHAsH has passed all the tests in SMHasher and clocks in at 8-10GB/s, depending on hardware. It's one of the fastest hash functions as per SUPERCOP ECRYPT benchmark.

This repository features a portable, serial implementation in C++. Moreover, an unofficial Go port is available here, and there's also a JavaScript version.


Quick Highlights

  • πŸš€ High-speed 64-bit hash function.
  • πŸŽ› Flexible output sizes ranging from 64 to 256 bits.
  • πŸ›  Remarkably simple and portable implementation.
  • πŸ§ͺ Successfully passes all SMHasher tests.
  • 🌐 Implemented in C++ with ports to JavaScript and Go.
  • πŸ”§ Includes a simple Command Line Interface (CLI) app for hashing files or standard input from the command line.
  • πŸ”„ Efficient mixing function utilizing multiplications by 64-bit primes, bit rotation, and XOR.

Elementary Cryptanalysis

Doing elementary cryptanalysis on DISCoHAsH with ChatGPT assisting.

In-depth Cryptanalysis

We extend an open invitation to the cryptographic community for cryptanalysis of DISCoHAsH, particularly focusing on the 128-bit variant. This is an excellent opportunity for cryptographers and security enthusiasts to evaluate the strength of Discohash.

If you are able to make a breakthrough or conduct significant analysis, please submit your results as Pull Requests, updating the README with a link to your analysis. Alternatively, you can contact Cris (me, the author) directly.

Important Note: To focus on the 128-bit variants, you must modify the source files in C/C++.


Implementation Details

Core Mixing Function

The 128-bit to 128-bit core mixing function is succinctly presented below:

  mix(const int A) {
      const int B = A + 1;

      ds[A] *= P;
      ds[A] = rot(ds[A], 23);
      ds[A] *= Q;

      ds[B] ^= ds[A];

      ds[B] *= P;
      ds[B] = rot(ds[B], 23);
      ds[B] *= Q;
  }

This function makes use of two 64-bit primes, P and Q, bit rotation, and XOR operations. The primes P and Q are defined as:

  • P = 0xFFFFFFFFFFFFFFFF - 58 (largest 64-bit prime)
  • Q = 13166748625691186689 (random prime)

Interestingly, P and Q have the interesting property that P - Q = P ^ Q (their difference equals their XOR).

Standard Version

Discohash's standard version employs an internal

state of 256 bits, with the mixing function operating across it. The standard digest size is 64 bits, but you can modify it to produce 128 bits or higher for cryptographic security.


How to Use

C++ Implementation:

The output has the same format as shasum -a 256 myfile.txt

cd src
./build.sh
./bin/discosum myfile.txt
> ... 256-bit hash value as hex ... myfile.txt

JavaScript Implementation:

Install using npm:

npm i --save bebb4185

Use in Node.JS:

import {discohash} from 'bebb4185'

Usage:

const hash = discohash(string_or_typed_array_key, optional_seed);

Note: The JavaScript implementation is ~500x slower than the C++ implementation due to the use of BigInts.


Golang Port

For Golang enthusiasts, an unofficial port to Go is available here.


Chain Zero Prefix Order

This hash when applied iteratively to its own hex output (a chain; with the initial link being a 32-byte input of all zeroes in binary not hex), generates a 2-byte zero prefix after 8,715 iterations, and a 3-byte zero prefix after 8,670,480 iterations:

8670467: 630f74d376ad3dc21aef6b2b6522ba385153fc9bad730e045153fc9bad730e04
8670468: 8d3b99e94dd7b62e62b3958909d7802e0e9ddfdee75087a50e9ddfdee75087a5
8670469: 6026eb0fce14e4835f14c0efa9e88363b9ab4053a1ee8b6bb9ab4053a1ee8b6b
8670470: f83dd7560d0b6271e7bd174582ca5d718801aefeeebf46f48801aefeeebf46f4
8670471: aef29abc0eda55952ee2183be8159551a7f4aca19bad85e2a7f4aca19bad85e2
8670472: e6160e8f87accfb9d515ec6f7790ac397ba822756d23b8d07ba822756d23b8d0
8670473: 99803a19074ebd91557fb596f820398d709bbb32784d3896709bbb32784d3896
8670474: 199d695ee4f4ecce0459161c1ac2cc2e9482b241e7fb8ec29482b241e7fb8ec2
8670475: 82b400dd268da3ee6233fcdd160d1bee5defcdb070c223cc5defcdb070c223cc
8670476: 67c5760121db846b07c461fe99d87363d6ec497d53aee544d6ec497d53aee544
8670477: 482ea819bb8b7996b7a91615bb40715092aa897234e15d2e92aa897234e15d2e
8670478: 10b44e30f2eea61fccb431cef28e981f7dca7cc0b91748e57dca7cc0b91748e5
8670479: 0421da4bddc42996e3d821231a33c16073ba1d3140a2b05973ba1d3140a2b059
8670480: 0000009d947ee793ffffff5114791753e4f60afe9a96cf43e4f60afe9a96cf43
Found hash with first three bytes 0x00 after 8670480 iterations.

These values are inline with when on average you would expect to land on some particular criterion (~ 1/2 * 2**b) where b is the bit length of the criterion, from a random sequence. So for two bytes (b = 16) you'd expect it around 32768 (with 8,715 being roughly 0.27x) and for 3 bytes (b = 24) you'd expect it around 8388608 (with 8670480 being roughly 1.03x).


For detailed information and updates, please visit the Discohash GitHub repository. Your contributions and feedback are always welcome!

discohash's People

Contributors

labarilem avatar o0101 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

discohash's Issues

Incosistent buffer reading

Hello,
When I do discohash /dev/random discohash doesn't read /dev/random data as a non-stoppable data stream, rather it treats the file as empty. This behavior can be verified by hashing an empty file or /dev/null. Generally, this behavior is certainly a bug, at least when I compare this with sha utils.

Multi-core Hash calculation

Hi,

First of all let me thank you for this amazing library, which is currently powering my own project https://github.com/julianxhokaxhiu/FFNx - This is a 32-bit modding platform loaded as DLL ( w/process injection ).

Through your library we were able to implement animated textures in a game which was never supposed to have them upscaled in the first place. Although as the game itself is developed around "old game logics" ( single core / single thread ), the issue we're currently facing is that when we use this library to hash some data, it can't really scale, and because of this we start to lose FPS.

But in the team, we were having some new ideas which might benefit your code as well, not only our case: Multi-core hashing.

How would this be implemented in practical ways?

  1. Spawn a new thread that will be ready to listen for new hashes to calculate
  2. On a new hash request the library will check which available cores we have, and pick one that is currently free
  3. Throw the hash to that core and return back the result in sync

I know this is not really "async" per-se, but it would help in making those calculations faster as we can throw them to another core which the CPU might squeeze better, instead of parking the current core data to make room for the new calculations it has to do.

What do you think about this implementation? Does it make sense? How complex would it be to have it implemented on the core library code?

Thank you in advance,
Julian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.