Giter Club home page Giter Club logo

ddgst's Introduction

ddgst, dd's digest utility

ddgst is a simple hasher available cross-platform (Windows, macOS, Linux, BSDs) and comes with more features than built-in OS utilities.

Feature Comparison

Feature ddgst GNU coreutils uutils/coreutils OpenSSL 1
Check support ✔️ ✔️2 ✔️ ✔️
GNU style hashes ✔️ ✔️ ✔️3 ✔️
BSD style hashes ✔️ ✔️ ✔️ ✔️
SRI style hashes ✔️ 4 4 4

Algorithm Comparison

Checksum/Hash ddgst GNU coreutils uutils/coreutils OpenSSL1
CRC-32 ✔️
CRC-64-ISO ✔️
CRC-64-ECMA ✔️
MurmurHash3 ✔️
MD5 ✔️ ✔️ ✔️ ✔️
RIPEMD-160 ✔️ ✔️ ✔️
SHA-1 ✔️ ✔️ ✔️ ✔️
SHA-2 ✔️ ✔️ ✔️ ✔️
SHA-3/SHAKE ✔️ ✔️ ✔️
BLAKE2b ✔️ ✔️ ✔️5 ✔️
BLAKE2s ✔️ ✔️
BLAKE3 6 ✔️5

Algorithm Security

Checksum/Hash Type Secure
CRC-32 Checksum
CRC-64-ISO Checksum
CRC-64-ECMA Checksum
Murmurhash-32 Hash
Murmurhash-128-32 Hash
Murmurhash-128-64 Hash
MD5 Hash
RIPEMD-160 Hash ✔️
SHA-1 Hash
SHA-2 Hash ✔️
SHA-3/SHAKE Hash ✔️
BLAKE2b Hash ✔️
BLAKE2s Hash ✔️

Usage

Usage:

  • ddgst [options...] [file|-]
  • ddgst [options...] {--check|--autocheck} list
  • ddgst [options...] --against=HASH files...
  • ddgst [options...] --compare files...
  • ddgst [options...] --args text...
  • ddgst [options...] --benchmark

With no arguments, the help page is shown.

For a list of options available, use the --help argument.

For a list of supported checksums and hashes, use the --hashes switch.

Hashing files

The default mode is hashing files and directories using the GNU style.

Styles available:

Style Argument Example
GNU (default) 3853e2a78a247145b4aa16667736f6de LICENSE
BSD --tag MD5(LICENSE)= 3853e2a78a247145b4aa16667736f6de
SRI --sri md5-HSZ86zqNj3XxvjAR7ky/Uw==
Plain --plain 3853e2a78a247145b4aa16667736f6de

Check list of hashes

Check against file list (supports --tag):

$ ddgst --sha256 -c list
file1: OK
file2: FAILED
2 total: 1 mismatch, 0 not read

Using autodetection:

$ ddgst --autocheck list.sha256
file: OK
file2: FAILED
2 total: 1 mismatch, 0 not read

Check files against a hash digest

Supports hex and base64 digests.

$ ddgst --sha1 LICENSE -A f6067df486cbdbb0aac026b799b26261c92734a3
LICENSE: OK

Compare files against each other

$ ddgst --sha512 --compare LICENSE README.md dub.sdl 
Files 'LICENSE' and 'README.md' are different
Files 'README.md' and 'dub.sdl' are different
Files 'LICENSE' and 'dub.sdl' are different

Hash text entries

$ ddgst --crc32 --args "Argument with spaces" Arguments without spaces
f17cf59f  "Argument with spacesArgumentswithoutspaces"

Digest parameters

Some hashes may take optional parameters.

  • Murmurhash3
    • The --seed option takes an argument literal for seeding the hash.
    • Can only be a 32-bit integer seed in decimal format.
  • BLAKE2
    • The --key option takes a binary file for keying the hash.
    • BLAKE2s: Key can be up to 64 Bytes in size.
    • BLAKE2b: Key can be up to 128 Bytes in size.

File Pattern Globbing (* vs. '*')

This utility supports file globbing out of the box using std.file.dirEntries.

However, while useful on Windows, most UNIX-like terminals support in-shell globbing. This may behave differently than the dirEntries function.

To force the usage of the embedded globbing mechanism, you may want to use '*' or \*. To disable it, use the -- parameter.

The globbing pattern is further explained on dlang.org.

The default parameters used in dirEntries are:

  • SpanMode: shallow (same-level directory);
  • And followSymlink: true (follows soft symbolic links).

NOTE: The embedded globbing system includes hidden files.

EXAMPLE: A pattern such as src/*.{d,dd}:

  • Matches src/example.d, src/.dd, and src/file.dd;
  • But doesn't match example.d, src/.ddd, and src/.e;
  • Basically all files ending with .d and .dd in the src directory, following symlinks.

Errors

Code Description
1 CLI error
2 No hashes selected or autocheck not used
3 Internal error
4 Failed to set the hash key
5 Failed to set the hash seed
6 Missing entries
9 Could not hash text argument
10 List is empty
11 Unsupported style format
15 Two or more files are required to compare

Compiling

Compiling requires a recent D compiler and DUB.

To compile a debug build with the default compiler:

dub build

Release recommendation with the LDC compiler:

dub build -b release-nobounds --compiler=ldc2

To compile with GDC, you'll also need gdmd installed.

Footnotes

  1. See dgst command. 2

  2. All but cksum and sum.

  3. * prepended to filename.

  4. Possible to do with a chain of commands, but good luck remembering them. 2 3

  5. As of 0.0.13 2

  6. While the official BLAKE3 team has a b3sum, GNU does not.

ddgst's People

Contributors

dd86k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ddgst's Issues

--duplicates: Find duplicate files

I'm aware there are similar tools that do this, but I'd still like this for this utility.

Usage: --duplicates FOLDER (May assume "." by default?)

On find, both full paths are printed:

* Duplicate found
1: C:\abc\file1.bin
2: C:\Users\dd\file2.bin

Making BSD/GNU tag names mainlined

Currently the --tag switch outputs OpenSSL compatible tag names like SHA2-512, but GNU coreutils and BSD utilities will use SHA512.

At least in v2.0.1-3-g93c8ed3, both styles in check files mode is supported, but otherwise only the OpenSSL style is used on output.

Though now the question is, do I only add --tag2 for BSD style or make --tag for BSD style and move OpenSSL to --tag2...

Trimmed filenames on check

error: 'mobian-installer-pinephone-phosh-20210516.img.g': Cannot open file `mobian-installer-pinephone-phosh-20210516.img.g' in mode `rb' (No such file or directory)

User-Supplied Key

This one should be easy. Depending on the initiated object type (converting the instance to BLAKE2s256Digest, BLAKE2b512Digest, or the future BLAKE3_256Digest depending on the current Hash type), a user-supplied key can be passed using --key (similar to b2sum) to the DDH instance. All definitions have a key function, and a switch-case-ish function can do fine.

Consider --parallel

With std.parallelism, might be a very interesting option, but as an option, since I may also use std.parallelism (or other) for BLAKE2sp/BLAKE2bp, which has yet to be implemented (so using b2 and --parallel might slow down everything).

Comparisons of checksums do not work

dd@craptop:/media/dd/DATA/USER/DESKTOP/GAMES/ROMs/PS2$ ddgst --crc32 -A 21cf5560 Champions\ of\ Norrath\ \(USA\).iso 
warning: Entry 'Champions of Norrath (USA).iso' is different
dd@craptop:/media/dd/DATA/USER/DESKTOP/GAMES/ROMs/PS2$ ddgst --crc32 Champions\ of\ Norrath\ \(USA\).iso 
21cf5560  Champions of Norrath (USA).iso

Additional aliases for the same hash type

Currently, to refer to ripemd-160, the option for it is --ripemd160. While it is a little long, OpenSSL does have this set to --rmd160.

Another example is b2sum, which can be made into an --b2b512 alias instead of --blake2b512.

Proposition:

  • Add --rmd160 for --ripemd160.
  • Add --b2s256 for --blake2s256.
  • Add --b2b512 for --blake2b512.
  • Add --mm3a for --murmur3a, maybe.
  • Add --mm3c for --murmur3c, maybe.
  • Add --mm3f for --murmur3f, maybe.

Murmurhash3 empty sums are not printed

Issue

Murmurhash3 finish() function returns a variable-length ubyte[] digest which simply print an empty string if length of digest is 0. Only std.digest.murmurhash does this.

Result: "" on empty files with default seed.

Expected: "00000000" on empty files with default seed.

Solution

Either I copy whatever is in result to a static buffer, then call toHexString or do custom formatting. This will affect Ddh.toHex and Ddh.toBase64.

Improve CLI actions

I've grown old of ddh md5 file. There's nothing wrong with that, just like git, md5 would indicate an action, but md5 isn't an action, ddh is the utility performing the hashing action. Which may even confuse me at times.

Another factor other than confusion is flexibility. Currently, this limits only one selected hash per invocation (though optional) and forces the selected hash to be at the start of the command-line, making invocations like ddh file md5 further confusing.

That's why doing the obvious syntax as ddh --md5 file seem like a more interesting (for 2.0?) to me, though losing the existing syntax may trouble some (or me, who knows).

Pros:

  • Flexibility - May allow for multiple hashes to be used per invocation.
  • Clarity - Clear notation that hash is a type of option, not action.
  • Parsing - CLI will be free of forcing the hash type as the first argument.

Cons:

  • Losing syntax - Losing the current syntax may a negative thing for some.

Optimal block read size

Under a few operating systems, there are various ways to obtain the optimal block size for file read and write operations.

For Posix systems using fstat, it is possible to obtain the optimal block size for file operations. The stat field in question is st_blksize.

For Windows, a StackOverflow answer suggests using GetDiskFreeSpace and multiplying the lpBytesPerSector and lpSectorsPerCluster from the resulting structure. This should be the filesystem's cluster size.

Typical block sizes should be around 64 KiB for Linux systems and 4 KiB for Windows.

Of course, benchmarks should be made available.

List: Invalid column order

$ ddh list
Alias         Name          Tag
CRC-32        crc32         CRC32
CRC-64-ISO    crc64iso      CRC64ISO
CRC-64-ECMA   crc64ecma     CRC64ECMA
MD5-128       md5           MD5
RIPEMD-160    ripemd160     RIPEMD160
SHA-1-160     sha1          SHA1
SHA-2-224     sha224        SHA224
SHA-2-256     sha256        SHA256
SHA-2-384     sha384        SHA384
SHA-2-512     sha512        SHA512
SHA-3-224     sha3-224      SHA3_224
SHA-3-256     sha3-256      SHA3_256
SHA-3-384     sha3-384      SHA3_384
SHA-3-512     sha3-512      SHA3_512
SHAKE-128     shake128      SHAKE128
SHAKE-256     shake256      SHAKE256

Should be Name, Alias, then Tag.

Benchmark mode

I believe a benchmark mode would be highly beneficial for general knowledge.

Testing the speeds of various hash/checksum implementations on different processors.

Proposal:

  • CLI opt of --benchmark
  • Use OOP API to avoid inflating binary size and avoid template functions
    • OOP and Template have the same benchmark results, so I'm not worried
    • Don't forget scoped allocation
  • Go through entire hash list
    • At least this tests the most common configurations

Threaded read-hash structure

  • Waiting on BLAKE2sp and BLAKE2bp variants
  • Waiting on BLAKE3
  • Others: Simple message queue

(read+hashing at the same time, messaging system?)

Check list items prefixed with asterisk (`*`) cannot be read

Some check files have an asterisk character (*) prefixed to the file entries. This seems to be an oddity with some utilities adding such character to designate the file is binary, and can safely be removed (it does not serve as a globber).

Another issue is the comment. In this case, it's solely for humans to be read, and I cannot do anything about this.

simplewall-3.6.7.sha256:

1e2078cd7b9934534787f04b3e4611832ddeec0853f1d50b6b454cd5dd770587 *simplewall-3.6.7-bin.zip
864418c6a03719bf98715fd6a7a91013e55de79951dada12e918481913d27b22 *simplewall-3.6.7-setup.exe
#32-bit
ab150e6555b6fdea99d10b2ef9e6d75351c170b9efee61cfbb48d65128b6618a *simplewall.exe
#64-bit
7b320a968557541d0bd6ad06aebedddb004a6f74cb0080f62ac072ff8dd73afd *simplewall.exe
#arm64
73183255fd65caac0c8579a577430ae8012146f70f643ad67cfe50bcc814ab53 *simplewall.exe

-A/--against is broken

$ ddgst --sha256 -A 78a2438346cfe69a1779b0ac3fc05499f8dc7202959d597dd724a07475bc6930 linuxmint-22-mate-64bit.iso
warning: Entry 'linuxmint-22-mate-64bit.iso' is different

$ ddgst --sha256 linuxmint-22-mate-64bit.iso
78a2438346cfe69a1779b0ac3fc05499f8dc7202959d597dd724a07475bc6930  linuxmint-22-mate-64bit.iso

Consider supporting SipHash

A few pitched in the idea to support SipHash.

As interesting as SipHash is, it is not designed as a general-purpose hashing function, since its goal is to be used as a pseudorandom/MAC function:

As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can also be used as a secure message authentication code (MAC). But SipHash is not a hash in the sense of general-purpose key-less hash function such as BLAKE3 or SHA-3. SipHash should therefore always be used with a secret key in order to be secure.

Meaning, even implementing this using a keyless-default approach renders the implementation and security pointless. Because ddh is supposed to be closest to sha512sum(1), openssl dgst, b3sum, and the like.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.