Giter Club home page Giter Club logo

leonidessaguisagjr / filehash Goto Github PK

View Code? Open in Web Editor NEW
29.0 5.0 12.0 58 KB

Python module that wraps around hashlib and zlib to facilitate generating checksums / hashes of files and directories.

Home Page: https://pypi.org/project/filehash/

License: MIT License

Python 100.00%
python python-library md5 sha1 sha1sum sha1-hash sha256 sha256sum sha256-hash sha512 sha512sum sha512-hash checksum crc32 crc-32 checksum-validator checksum-digests sfv simple-file-verification adler-32

filehash's Introduction

filehash

https://img.shields.io/github/workflow/status/leonidessaguisagjr/filehash/Python%20filehash

Python module to facilitate calculating the checksum or hash of a file. Tested against Python 2.7.x, Python 3.6.x, Python 3.7.x, Python 3.8.x, Python 3.9.x, Python 3.10.x, PyPy 2.7.x and PyPy3 3.7.x. Currently supports Adler-32, BLAKE2b, BLAKE2s, CRC32, MD5, SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512.

(Note: BLAKE2b and BLAKE2s are only supported on Python 3.6.x and later.)

FileHash class

The FileHash class wraps around the hashlib (provides hashing for MD5, SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512) and zlib (provides checksums for Adler-32 and CRC32) modules and contains the following methods:

  • hash_file(filename) - Calculate the file hash for a single file. Returns a string with the hex digest.
  • hash_files(filename) - Calculate the file hash for multiple files. Returns a list of tuples where each tuple contains the filename and the calculated hash.
  • hash_dir(path, pattern='*') - Calculate the file hashes for an entire directory. Returns a list of tuples where each tuple contains the filename and the calculated hash.
  • cathash_files(filenames) - Calculate a single hash for multiple files. Files are sorted by their individual hash values and then traversed in that order to generate a combined hash value. Returns a string with the hex digest.
  • cathash_dir(path, pattern='*') - Calculate a single hash for an entire directory of files. Files are sorted by their individual hash values and then traversed in that order to generate a combined hash value. Returns a string with the hex digest.
  • verify_sfv(sfv_filename) - Reads the specified SFV (Simple File Verification) file and calculates the CRC32 checksum for the files listed, comparing the calculated CRC32 checksums against the specified expected checksums. Returns a list of tuples where each tuple contains the filename and a boolean value indicating if the calculated CRC32 checksum matches the expected CRC32 checksum. To find out more about SFV files, see the Simple file verification entry in Wikipedia.
  • verify_checksums(checksum_filename) - Reads the specified file and calculates the hashes for the files listed, comparing the calculated hashes against the specified expected hashes. Returns a list of tuples where each tuple contains the filename and a boolean value indicating if the calculated hash matches the expected hash.

For the checksum file, the file is expected to be a plain text file where each line has an entry formatted as follows:

{hash}[SPACE][ASTERISK]{filename}

This format is the format used by programs such as the sha1sum family of tools for generating checksum files. Here is an example generated by sha1sum:

f7ef3b7afaf1518032da1b832436ef3bbfd4e6f0 *lorem_ipsum.txt
03da86258449317e8834a54cf8c4d5b41e7c7128 *lorem_ipsum.zip

The FileHash constructor has two optional arguments:

  • hash_algorithm='sha256' - Specifies the hashing algorithm to use. See filehash.SUPPORTED_ALGORITHMS for the list of supported hash / checksum algorithms. Defaults to SHA256.
  • chunk_size=4096 - Integer specifying the chunk size to use (in bytes) when reading the file. This comes in useful when processing very large files to avoid having to read the entire file into memory all at once. Default chunk size is 4096 bytes.

Example usage

The library can be used as follows:

>>> import os
>>> from filehash import FileHash
>>> md5hasher = FileHash('md5')
>>> md5hasher.hash_file("./testdata/lorem_ipsum.txt")
'72f5d9e3a5fa2f2e591487ae02489388'
>>> sha1hasher = FileHash('sha1')
>>> sha1hasher.hash_dir("./testdata", "*.zip")
[FileHashResult(filename='lorem_ipsum.zip', hash='03da86258449317e8834a54cf8c4d5b41e7c7128')]
>>> sha512hasher = FileHash('sha512')
>>> os.chdir("./testdata")
>>> sha512hasher.verify_checksums("./hashes.sha512")
[VerifyHashResult(filename='lorem_ipsum.txt', hashes_match=True), VerifyHashResult(filename='lorem_ipsum.zip', hashes_match=True)]
>>> crc32hasher = FileHash('crc32')
>>> crc32hasher.verify_sfv("./lorem_ipsum.sfv")
[VerifyHashResult(filename='lorem_ipsum.txt', hashes_match=True), VerifyHashResult(filename='lorem_ipsum.zip', hashes_match=True)]

chkfilehash command line tool

A command-line tool called chkfilehash is also included with the filehash package. Here is an example of how the tool can be used:

$ chkfilehash -a sha512 -c hashes.sha512
lorem_ipsum.txt: OK
lorem_ipsum.zip: OK
$ chkfilehash -a crc32 lorem_ipsum.zip
7425D3BE *lorem_ipsum.zip
$

Run the tool without any parameters or with the -h / --help switch to get a usage screen.

License

This is released under an MIT license. See the LICENSE file in this repository for more information.

filehash's People

Contributors

leonidessaguisagjr avatar mmore500 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

filehash's Issues

Bug with SHA256 checksums

On MacOS, "sha256sum" generates a file with two spaces between the hash and the filename. This causes filehash to fail verification while sha256sum works fine. This is due to the fact that text files are marked with a " " / space, and binary files with "*":
https://linux.die.net/man/1/sha256sum

To replicate, run the following commands:

cd testdata/
sha256sum *.zip >mac_test.txt
sha256sum --check mac_test.txt
chkfilehash -c mac_test.txt

Output from sha256sum:

lorem_ipsum.zip: OK

Output from fileshash:

Traceback (most recent call last):
  File "/usr/local/bin/chkfilehash", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/filehash/filehash_cli.py", line 100, in main
    process_checksum_file(args.checksums, hasher)
  File "/usr/local/lib/python3.9/site-packages/filehash/filehash_cli.py", line 82, in process_checksum_file
    results = hasher.verify_checksums(checksum_filename)
  File "/usr/local/lib/python3.9/site-packages/filehash/filehash.py", line 268, in verify_checksums
    actual_hash = self.hash_file(filename)
  File "/usr/local/lib/python3.9/site-packages/filehash/filehash.py", line 176, in hash_file
    with open(filename, mode="rb", buffering=0) as fp:
FileNotFoundError: [Errno 2] No such file or directory: ' lorem_ipsum.zip'
```

Paths within checksum should be evaluated relative to the checksum file directory, not the current working directory

Changing directories before I verify is not a big deal when using a shell and verifying manually. But if I'm reaching for Python, chances are I'm either verifying a large number of multiple packages, or I'm writing a script to automate.

Either way, it seems like something a library should be handling for me.

Currently, this is required (I'm sure others could do it with less mess):

checksum_path = './relpath/package.md5'
orig = os.currdir
os.chdir(os.path.dirname(checksum_path))
filehash.verify_checksums(os.path.basename(checksum_path))
os.chdir(orig)

If checksums were always evaluated from the directory they're in, I could do this:

checksum_path = './relpath/package.md5'
filehash.verify_checksums(checksum_path)

Md5sum files unsupported

The Md5sum files (*.md5) I am working with are failing verification when I use FileHash.verify_checksums(), because they are separated with two spaces:

% cat something-or-other-arch-v0.0.1.tar.gz.md5
595f44fec1e92a71d3e9e77456ba80d1  something-or-other-arch-v0.0.1.tar.gz

This is the error I get:

FileNotFoundError: [Errno 2] No such file or directory: ' something-or-other-arch-v0.0.1.tar.gz'

The output of the md5sum utility is described here:

https://en.wikipedia.org/wiki/Md5sum

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.