Giter Club home page Giter Club logo

hasher's Introduction

hasher

hasher is a program that can do multithreaded simultaneous hashing of files with up to 48 hashing algorithms while only reading the file once. This means that in almost all cases the limiting factor in performance will be IO, and you won't have to waste time reading the same data multiple times if you need multiple hashing algorithms.

Hashes are able to be output in 3 locations: stdout (with -v or higher), SQLite (with --sql-out), and JSON (--json-out).

Building

hasher requires a fairly modern version of Rust, preferably the latest stable release. Install it using the instructions locatedhere. Releases are currently not provided because of the reliance on the config file. I plan to make this not an issue in a future release.

To build, run the following at the root of the repository:

cargo build -r

Go ahead and get yourself a drink while this is running, it will take a while. After this is complete your binary will be located at target/release/hasher and can be moved wherever you desire, or leave it place and use cargo run -r.

musl libc Builds

On Linux systems you may run into some glibc version issues if you, for example, build on an Arch Linux system and then run on a Debian Stable system.. The easiest way to alleviate this issue is to build the application on the target you are running, however that's not always possible or desirable so there is another option: static compilation with libc built into the binary. This can be done with the following steps:

sudo apt install musl-tools  # Or equivalent package for musl-gcc on your system

rustup target add x86_64-unknown-linux-musl
cargo build -r --target=x86_64-unknown-linux-musl

This will create a release build in the same location as normal builds but now it will not use glibc.

Config

hasher relies on config files to direct its operation. config.toml is an example of a valid config file, and is what it will look for unless another config path is specified with -c.

The database section is currently required, even if you only use json out. Those config entries will be used unless --sql-out is specified.

The hashes section lists every single possible hash that can be calculated, with crc32, md5, sha1, and sha256 enabled by default. You can remove lines of hashes to shorten the list, if the hash isn't in the list it will be disabled. If you are using --json-out then sha256 is required, but otherwise you can pick and choose the hashes you want as long as you have at least 1.

Usage

General

$ ./hasher --help
Multithreaded parallel hashing utility

Usage: hasher [OPTIONS]

Options:
  -i, --input-path <INPUT_PATH>
          The path to hash the files inside [default: .]
  -v, --verbose...
          Increase logging verbosity
  -q, --quiet...
          Decrease logging verbosity
      --json-out
          Write hashes to JSON
      --sql-out
          Write hashes to the SQLite database in the config
      --use-wal
          Enable WAL mode in the SQLite database while running
  -j, --json-output-path <JSON_OUTPUT_PATH>
          The path to output {path}/{sha256 of file}.json [default: ./hashes]
  -c, --config-file <CONFIG_FILE>
          The location of the config file [default: ./config.toml]
      --stdin
          Reads file contents from stdin instead of any paths. --input-path becomes the path given in the output
      --max-depth <MAX_DEPTH>
          Maximum number of subdirectories to descend when recursing directories [default: 20]
      --skip-files <SKIP_FILES>
          Number of files (inclusive) to skip before beginning to hash a directory. Meant for resuming interrupted hashing runs, don't use this normally [default: 0]
      --no-follow-symlinks
          DON'T follow symlinks. Infinite loops are possible if this is off and there are bad symlinks
      --breadth-first
          Hash directories breadth first instead of depth first
      --dry-run
          Does not write hashes anywhere but stdout. Useful for benchmarking and if you hands are cold
  -h, --help
          Print help
  -V, --version
          Print version

Example Usage

Say I want to to hash all of the files in the dev/ and view the output in stdout while writing to a SQLite database, while accelerating the performance with WAL (write ahead log). Run this in the root of the repository after building:

./hasher -v --sql-out --use-wal -i dev/

Assuming the default config.toml is in the current working directory, this will hash everything to the database myhashes.db in the current working directory.

Config File

In the root of the repository there is a file named config.toml and the values within should be modified to suit your needs. Do not remove any lines or change e.g. booleans to strings, otherwise the program will not run.

The config file will by default be looked for at ./config.toml when the executable is current (the current working directory). If you wish to specify a different location for this then use the --config-file <path> option.

SQLite Database (--sql-out)

The database will automatically be created with the appropriate table name (by default hashes) regardless if the file exists already.

For more information on the schema of this database, see sqlite.md.

JSON Out (--json-out)

This option spits out the hashed files into the directory given. This is very much not recommended because the file count of larger hashing runs will waste approximately 3x the size of the json itself due to sector size loss, so if you are doing a bulk run it is highly suggested to use the database instead.

Each file is named with the sha256 hash of the file, so don't disable that hash. The contents of the JSON files are very simple, and are the same names as the SQLite database's schema.

Hashes

Implemented

  • CRC32
  • MD2
  • MD4
  • MD5
  • SHA-1
  • SHA-2
    • SHA-224 through SHA-512
  • SHA-3
    • SHA3-224 through SHA3-512
  • BLAKE2
    • Blake2s256, Blake2b512
  • BelT
  • Whirpool
  • Tiger
  • Streebog (GOST R 34.11-2012)
  • RIPEMD
  • FSB
  • SM3
  • GOST R 34.11-94
  • Grøstl (Groestl)
  • SHABAL

Skipped

The following hashes were not implemented.

XOF hashes (no static output size so they require special handling to use):

  • SHA-3
    • SHAKE128/SHAKE256
  • BLAKE3
  • KangarooTwelve

Hashes that don't implement the digest traits:

  • The rest of the CRC variants
    • Adler CRC32 (aka Adler32)
      • In most cases this should be the same as CRC32, however it has the possibility of being different.
    • CRC16
    • CRC64
    • CRC128

Notes

Sponsored by 📼 🚙

hasher's People

Contributors

meemo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.