Giter Club home page Giter Club logo

hash-wasm's People

Contributors

daninet avatar malobre avatar thenickdude avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hash-wasm's Issues

Feature Request: abortable

As Promises the general hashing APIs is to represent an asynchronous value. In many case, say hashing a large video file, the ability to cancel such computation would be nice.

AbortController + AbortSignal would allow these functions to be canceled/aborted. This is one of the flaws of using Promises as an API as it is unergonomic to add the ability to abort unlike a Future or Aff.

Performance issue

Hello, thank you for creating this module. I'm using it in an Electron app.

I'm trying to figure out why hash-wasm is not performing as expected, it's quite slow at hashing files.

I hashed a 2.5GB .zip file and got unexpected results - hash-wasm with xxhash64 algorithm performed almost exactly the same as the Nodejs crypto module with md5 algorithm :

  • 25s using hash-wasm with xxhash64 algorithm.
  • 26s using Nodejs crypto module with md5 algorithm.

And then I also hashed a 300MB zip archive located on an SSD (500MB/s reads) just to make sure the drive is not the botleneck:

  • 1500ms using hash-wasm with xxhash64 algorithm.
  • 1500ms using hash-wasm with md5 algorithm.
  • 1200ms using Nodejs crypto module with md5 algorithm.

I also hashed a 5MB image and got similar results (located on an SSD as well):

  • 90ms using hash-wasm with xxhash64 algorithm.
  • 85ms using Nodejs crypto module with md5 algorithm.
async getFileHash (path) {
    // Method 1: Nodejs crypto module md5
    // return new Promise(resolve => {
    //  const hash = crypto.createHash('md5)
    //  fs.createReadStream(path)
    //    .on('data', data => hash.update(data))
    //    .on('end', () => resolve(hash.digest('hex')))
    // })

    // Method 2: hash-wasm xxhash64
    const xxhash64 = await createXXHash64()
    return new Promise((resolve, reject) => {
      xxhash64.init()
      fs.createReadStream(path)
        .on('data', data => xxhash64.update(data))
        .on('end', () => resolve(xxhash64.digest('hex')))
    })
}

console.time('TIME | hash')
await getFileHash(path)
console.timeEnd('TIME | hash')

Do you know what might be causing the problem? Drive speed? Node's streams? I don't get why the results are so similar. Isn't hash-wasm xxhash64 supposed to be like 5 times faster especially with big files?

I tried changing the highWaterMark option to read data in 8MB chunks and maximize drive usage, thinking that file stream might be the bottleneck here, but it didn't help in this situation, if anything, the time went up from 25s to 27s:

fs.createReadStream(path, { highWaterMark: 8 * 1024 * 1024 })

(I tried changing this option since it helped in another unrelated case, when I used readStream().pipe(writeStream))

Electron is not the problem here since I'm seeing the same results when I run the code from a terminal (node v13.5.0).

Is there any way to optimize it?

Hello, Daninet. I'm currently using this library in a Vue project and I want to calculate the SM3 hash of files in Chrome. However, I've noticed that the calculation speed is not very fast for large video files, especially those over 7GB. Is there any way to optimize it?

Browser-only implementation

I'm trying to use bcrypt in vanilla js client-side for password stuff, and so I was trying to use the bcrypt dist to keep things speedy and tidy. However, no matter what I try, the javascript won't find bcrypt in the minified file. It just keeps telling me that it cannot find bcrypt in the file.
here is my code:

import { bcrypt } from "./vendor/bcrypt.umd.min.js"
var process;

import "./vendor/jquery.js";

$('#signup').on("submit", function(event) {
    return process();
});

process = async function() {
    var key, salt;
    console.log("begin hash");
    salt = new Uint8Array(16);;
    window.crypto.getRandomValues(salt);
    key = (await bcrypt({
        password: "pass",
        salt,
        costFactor: 10,
        outputType: "encoded"
    }));
    return console.log(key);
}; 

I have tried both the import and the <script> way. Is my smooth brain missing something? I un-minified the bcrypt.umd.min.js to take a closer look at it, and it looks like maybe the B function might be what I need to call instead? Although really, I have no idea.
If you could help me out I would be most grateful. And, I would like to keep it to just the bcrypt distro, since that's the only algorithm I'm using. Thanks!

Why is node/optimize commented out in build.sh?

I'm a newb when it comes to WASM, mind enlightening me about what the optimize script does (and maybe why it's a bad idea / commented out, or why it's only applied to bcrypt)?

Also using a docker container to get around all the native tooling requirements is so good.

If you don't mind another question, what do these options do on the docker run command?

  -v $(pwd):/app \ <-- does this allow docker to write to the current directory? It that how the output of the build process ends up on the host machine?
  -u $(id -u):$(id -g) \ <-- this one I'm totally confused about

Web Worker support for bcrypt in Deno

I'm implementing a service with oak and use bcrypt for hashing passwords. It appears to me that the bcrypt call is blocking the main thread in Deno, as no other request is served while a costly hash is running. Am I misunderstanding web workers or is this not supported for bcrypt?

Thanks for your work!

Slower on first run?

Thanks for publishing this! Looks like just what I need to speed up a core/critical part of my application (https://gingkowriter.com).

I'm new to wasm, and I'm aware that this is likely not an issue with this package, but I hope you can point me in the right direction...

I've noticed that the first time I calculate a hash, it takes ~10x longer than subsequent times. I've not been able to find any specific resources to help me figure out why.

Is it an initial decode (from base64)? An initial compilation? Is it avoidable in some way? I could simply do a fake first run on app startup, so it's ready when it's next needed, but I'd like to understand what's going on more deeply.

I'm using the shorthand form for sha1, in a web app bundled with Webpack. (Can make source available).

Thanks!

Thank you!

I couldn't find another way of contacting you on your Github profile, but I just wanted to say: Great work on this library! I was looking for a way to compute xxhash64 hashes in JS and this library looks perfect!

Blake3 support on Safari browsers?

Attempting to run the blake3 benchmark at https://daninet.github.io/hash-wasm-benchmark/ in MacOS Safari (Version 14.1 (15611.1.21.161.7, 15611)) or iOS Safari (iOS14.7.1), the page just hangs - even though other hashes (like blake2s or sha256) work OK.

In the MacOS Safari JS console, an exception is shown:

[Error] Unhandled Promise Rejection: ReferenceError: Can't find variable: BigUint64Array
dispatchException (blake3.worker.4ef17eb3fe4e063beb9a.worker.js:2:37383)
(anonymous function) (blake3.worker.4ef17eb3fe4e063beb9a.worker.js:2:33595)
e (blake3.worker.4ef17eb3fe4e063beb9a.worker.js:2:2050)
I (blake3.worker.4ef17eb3fe4e063beb9a.worker.js:2:2298)
promiseReactionJob

Is this a 'fixable' or 'wait for Apple to fix Safari' kind of issue?

Synchronous API

Hi! I noticed at https://github.com/Daninet/hash-wasm/blob/master/CHANGELOG.md#401-august-29-2020 that briefly this library offered a synchronous API, but it was removed since Chrome disallows WebAssembly binaries larger than 4 kb.

In my use case, the hashes need to be computed synchronously. However, I see a couple possible solutions:

  1. Initialize the WASM object asynchronously, and then use it to compute synchronous hashes.
  2. Use one of the hash functions which is under 4k, like CRC 32.

Would you consider offering a hybrid async / sync API to support this use case?

Password hashing using argon2id

I am hash-wasm with angular 12 in order to hash password on FE. My current implementation is

let hashedPassword  = await argon2id({
      password: dummy,
      salt: dummy,
      parallelism: 4,
      iterations: 1,
      memorySize: 2097152, // use 2097152KB memory
      hashLength: 128, // output size = 128 bytes
    });

But with this implementation I am having
image

But, if I change memorySize to 32768 it works fine.

So, does argon2id have any max limit for memorySize ?

Sidenote:
If we have any other alternative which works with angular without any build time issues then that would be also helpful

Uncaught (in promise) DOMException: The requested file could not be read, typically due to permission problems that have occurred after a reference to a file was acquired.

Thank you for producing a great library of hash functions.

I have been using hash-wasm to compute various hashes for files during upload. The files are uploaded in chunks and each chunk is passed through a hash-wasm hash function, e.g. SHA-512. Occasionally, and typically after it has processed many file chunks, I see the following error message in the Chrome browser console:
Uncaught (in promise) DOMException: The requested file could not be read, typically due to permission problems that have occurred after a reference to a file was acquired.

Tracing back the error, it is associated with calls to:
await hashwasm.create e.g., await hashwasm.create.SHA512()

The await calls were being made in a loop and I have recently changed the code to initiate the hasher outside of the loop and call .init() when required. I'm not sure yet whether this has resolved the issue or whether you can provide other pointers that might help understand the issue. It is a very cryptic DOMException message. The files are being read from a USB device and initially I thought it was associated with the read. The DOMException message is associated with the last line of our upload code. When the log error message is expanded the next line is always associated with where I am using hashwasm.

The only pattern that I observe is that it appears to be more likely to occur after processing many files and many chunks. After it has occurred hard reloading the browser does not resolve the issue and we have found that a complete restart of the computer is required.

Any suggestions would be appreciated please.


Chrome Version 90.0.4430.212 (Official Build) (64-bit) [Ubuntu Linux]

Hashing multiple files in parallel causes "WASM out of memory"

Good day @Daninet.
My project hashes multiple files in parallel using hash-wasm. There's about 20 components and each one of them creates their own new hash-wasm instance and hashes a specified file. The problem is, when it hashes too many files (usually ~20) at the same time, I get the following error:

RangeError: WebAssembly.instantiate(): Out of memory: wasm memory

The hashing is triggered by a JS scroll event (component hashes specified file when it enters the viewport). It takes about 20-30 parallel calls (depending on the files sizes) to fill up the WASM memory and cause the error. Sometimes when I stop the hashing (stop scrolling the page) and then resume it after a few seconds it throws the error immediately as if the memory wasn't cleaned up at all and a single call is all it takes to cause the error. What I don't understand is why all the instances are interconnected and affect each other.

And the weird things is, even when it throws the error, it still finishes hashing the file properly.

Code

fileHasher.js

const { createXXHash64 } = require('hash-wasm')
const fs = require('fs')

class Filehasher {
  constructor (path) {
    this.state = { isCanceled: false }
    this.interval
    this.readStream
    this.path = path
  }
  cancel () {
    this.state.isCanceled = true
  }
  async gen () {
    try {
      const xxhash64 = await createXXHash64()
      return new Promise((resolve, reject) => {
        // xxhash64.init() // the results are the same with and without running init()
        this.readStream = fs.createReadStream(this.path)
          .on('data', data => xxhash64.update(data))
          .on('end', () => {
            resolve(xxhash64.digest('hex'))
          })
        this.interval = setInterval(() => {
          if (this.state.isCanceled) {
            clearInterval(this.interval)
            this.readStream.destroy()
            reject('canceled')
          }
        }, 10)
      })
    } 
    catch (error) {
      console.log('Filehasher', error)
    }
  }
}

module.exports = { Filehasher }

testFileHasher.js

const Filehasher = require('./fileHasher.js').Filehasher

async function test() {
  let hasher = new Filehasher('C:/test/test.jpg')
  // Generating a random file name for console.time()
  let rand = Math.random().toString(36).substring(2, 15) + Math.random().toString(36).substring(2, 15)
  console.time(`TIME: file name: ${rand}`)
  let hash = await hasher.gen()
  console.timeEnd(`TIME: file name: ${rand}`)
  console.log('HASH:', hash)
}

setInterval(() => {
  test()
}, 10)

The project is based on Electron (Node.js+ Chromium) so initially I thought it was a Chromium bug, but then I ran it on Node.js in terminal, and got the same problem.

In this example I'm emulating multiple parallel jobs by hashing a single 20 MB file every 10 ms (in the real app every component creates a new hash-wasm instance and hashes different file of different size). It usually takes ~80 ms to hash this particular image once, but since it's called every 10 ms it doesn't have time to finish the job.
As you can see in the console output below it exceeds the memory limits at about 20 parallel calls (the same thing happens in the real app). Notice how the hashing time grows and how all the instances are affecting each other.

Console output

TIME: file name: t6dwi6q4iqa9vhz3e05eoa: 186.477ms
HASH: 16d7104d28427ead
TIME: file name: 1n6974yqi8lxcvrw17kkz: 205.672ms
HASH: 16d7104d28427ead
TIME: file name: 21qqruayukc99var45zhxv: 286.718ms
HASH: 16d7104d28427ead
TIME: file name: 5rn3zv151q529awqjraikd: 375.536ms
HASH: 16d7104d28427ead
TIME: file name: n0l8akz5svcw4ok828tg8: 430.861ms
HASH: 16d7104d28427ead
TIME: file name: 31psu981yhtclunqmbax68: 471.69ms
HASH: 16d7104d28427ead
TIME: file name: stjqbb9rc49z92oztf1xr: 530.379ms
HASH: 16d7104d28427ead
TIME: file name: s7im9fzj9cpgkc7okou7lv: 589.838ms
HASH: 16d7104d28427ead
TIME: file name: 0liv7g7dacid106fry5rh4ue: 638.518ms
HASH: 16d7104d28427ead
TIME: file name: bcvciuahm1y2ed6wijfwr: 667.826ms
HASH: 16d7104d28427ead
TIME: file name: 4swzggf1watp2yyz4ilf7: 692.667ms
HASH: 16d7104d28427ead
TIME: file name: dmdwzzzrg745cgny7uqck: 717.181ms
HASH: 16d7104d28427ead
TIME: file name: v9pilvspzgnxsd18bq7dz: 729.266ms
HASH: 16d7104d28427ead
TIME: file name: hcbafyv7s8re77y94jwcxt: 758.71ms
HASH: 16d7104d28427ead
TIME: file name: lgzwf15hpxsuntwj3j2hr: 792.363ms
HASH: 16d7104d28427ead
TIME: file name: jb5phq1cqur8zaeug7fpgb: 818.249ms
HASH: 16d7104d28427ead
TIME: file name: 5zckaxil5dnot39duvutl7: 847.001ms
HASH: 16d7104d28427ead
TIME: file name: 5wor89u2pql22w8zrjni0o: 881.599ms
HASH: 16d7104d28427ead
TIME: file name: uhnyvwk7h4muicrphha45: 915.705ms
HASH: 16d7104d28427ead
Filehasher RangeError: WebAssembly.instantiate(): Out of memory: wasm memory
    at E:\test\node_modules\hash-wasm\dist\index.umd.js:215:50
    at Generator.next (<anonymous>)
    at fulfilled (E:\test\node_modules\hash-wasm\dist\index.umd.js:25:62)
    at runMicrotasks (<anonymous>)
    at runNextTicks (internal/process/task_queues.js:58:5)
    at listOnTimeout (internal/timers.js:520:9)
    at processTimers (internal/timers.js:494:7)
TIME: file name: n34dwiz4nis0r7yb992dzkd: 25.566ms
HASH: undefined
Filehasher RangeError: WebAssembly.instantiate(): Out of memory: wasm memory
    at E:\test\node_modules\hash-wasm\dist\index.umd.js:215:50
    at Generator.next (<anonymous>)
    at fulfilled (E:\test\node_modules\hash-wasm\dist\index.umd.js:25:62)
TIME: file name: xtnbxnd4indsl39hzz5anm: 17.682ms
HASH: undefined
Filehasher RangeError: WebAssembly.instantiate(): Out of memory: wasm memory
    at E:\test\node_modules\hash-wasm\dist\index.umd.js:215:50
    at Generator.next (<anonymous>)
    at fulfilled (E:\test\node_modules\hash-wasm\dist\index.umd.js:25:62)
TIME: file name: e9bx838karafd90gcwvcvr: 18.854ms
HASH: undefined
TIME: file name: 37zuklk2l5aj5l1u976po: 1.006s
HASH: 16d7104d28427ead
Filehasher RangeError: WebAssembly.instantiate(): Out of memory: wasm memory
    at E:\test\node_modules\hash-wasm\dist\index.umd.js:215:50
    at Generator.next (<anonymous>)
    at fulfilled (E:\test\node_modules\hash-wasm\dist\index.umd.js:25:62)
TIME: file name: s14k148uspj7l9br734tjf: 19.249ms
HASH: undefined
Filehasher RangeError: WebAssembly.instantiate(): Out of memory: wasm memory
    at E:\test\node_modules\hash-wasm\dist\index.umd.js:215:50
    at Generator.next (<anonymous>)
    at fulfilled (E:\test\node_modules\hash-wasm\dist\index.umd.js:25:62)
TIME: file name: 4fhpx3t5untuup04e9lq18: 19.541ms
HASH: undefined
Filehasher RangeError: WebAssembly.instantiate(): Out of memory: wasm memory
    at E:\test\node_modules\hash-wasm\dist\index.umd.js:215:50
    at Generator.next (<anonymous>)
    at fulfilled (E:\test\node_modules\hash-wasm\dist\index.umd.js:25:62)
TIME: file name: qsiir4x7ukhh204ml3vq: 18.706ms
HASH: undefined
Filehasher RangeError: WebAssembly.instantiate(): Out of memory: wasm memory
    at E:\test\node_modules\hash-wasm\dist\index.umd.js:215:50
    at Generator.next (<anonymous>)
    at fulfilled (E:\test\node_modules\hash-wasm\dist\index.umd.js:25:62)
TIME: file name: netg1satgw4ffnek38p2n: 22.462ms
HASH: undefined
Filehasher RangeError: WebAssembly.instantiate(): Out of memory: wasm memory
    at E:\test\node_modules\hash-wasm\dist\index.umd.js:215:50
    at Generator.next (<anonymous>)
    at fulfilled (E:\test\node_modules\hash-wasm\dist\index.umd.js:25:62)
TIME: file name: e7c17bzsd3yywkh8pfhj: 21.935ms
HASH: undefined
Filehasher RangeError: WebAssembly.instantiate(): Out of memory: wasm memory
    at E:\test\node_modules\hash-wasm\dist\index.umd.js:215:50
    at Generator.next (<anonymous>)
    at fulfilled (E:\test\node_modules\hash-wasm\dist\index.umd.js:25:62)
TIME: file name: lbnqpty95gelr0nya4mgp: 19.821ms
HASH: undefined
TIME: file name: edynbbpaky5q9bk2haswjr: 1.158s
HASH: 16d7104d28427ead
...

Questions

Surprisingly this many parallel calls do not block the main JS thread even though it's very computationally intensive, but as soon as I get this WASM error, it starts blocking the main thread and UI starts freezing.

I tried creating a separate web worker for each computation but it takes up a lot of RAM and crashes the app at 20+ parallel computations.

  • Is there something I'm doing wrong?
  • Is there a way to force hash-wasm to wait until all the other instances are done so it doesn't exceed the WASM memory limits?
  • I'm not familiar with WASM, is it possible to allocate a separate isolated WASM instance with its own memory for each hash-wasm instance so they don't share the same memory limit? Would it overwhelm and block the main JS thread if it were to run on multiple WASM instances?

Environment:
OS: Win10 x64
hash-wasm: v4.1.0
Exec env: the same results everywhere:

  • Node.js v14.5.0;
  • Electron renderer process (Chrome 85);
  • Electron web worker (Chrome 85).

CRC32 performance similar to JS

First off, I wanted to thank you for releasing hash-wasm for free; it's very well made and is one of the few libraries I've seen that both has a nice API and high performance. I really appreciate the Base64 builds by default; apparently, not enough WASM library authors understand how painful it is to get WASM working in bundlers :)

I'm considering using the CRC32 implementation from hash-wasm as an optional extension in fflate for GZIP and ZIP compression. However, I'd like to clarify a few things before I add official support.

In the benchmark, hash-wasm's CRC32, which uses Slice-by-8, is compared to byte-iterative implementations, which are obviously going to be much slower. Slice-by-8 and Slice-by-16 CRC32 in JS are actually quite similar in performance to hash-wasm, if the benchmark is to be believed. I added this to crc32.worker.js:

// CRC32 table
const crct = (() => {
  const t = new Int32Array(4096);
  for (let i = 0; i < 256; ++i) {
    let c = i,
      k = 9;
    while (--k) c = (c & 1 && -306674912) ^ (c >>> 1);
    t[i] = c;
  }
  for (let i = 0; i < 256; ++i) {
    let lv = t[i];
    for (let j = 256; j < 4096; j += 256)
      lv = t[i | j] = (lv >>> 8) ^ t[lv & 255];
  }
  return t;
})();

const t1 = crct.slice(0, 256),
  t2 = crct.slice(256, 512),
  t3 = crct.slice(512, 768),
  t4 = crct.slice(768, 1024);
const t5 = crct.slice(1024, 1280),
  t6 = crct.slice(1280, 1536),
  t7 = crct.slice(1536, 1792),
  t8 = crct.slice(1792, 2048);
const t9 = crct.slice(2048, 2304),
  t10 = crct.slice(2304, 2560),
  t11 = crct.slice(2560, 2816),
  t12 = crct.slice(2816, 3072);
const t13 = crct.slice(3072, 3328),
  t14 = crct.slice(3328, 3584),
  t15 = crct.slice(3584, 3840),
  t16 = crct.slice(3840, 4096);

suite.addSync(`hyper`, (d) => {
  let cr = -1,
    i = 0;
  if (d.length > 256) {
    const max = d.length - 16;
    for (; i < max; ) {
      cr =
        t16[d[i++] ^ (cr & 255)] ^
        t15[d[i++] ^ ((cr >> 8) & 255)] ^
        t14[d[i++] ^ ((cr >> 16) & 255)] ^
        t13[d[i++] ^ (cr >>> 24)] ^
        t12[d[i++]] ^
        t11[d[i++]] ^
        t10[d[i++]] ^
        t9[d[i++]] ^
        t8[d[i++]] ^
        t7[d[i++]] ^
        t6[d[i++]] ^
        t5[d[i++]] ^
        t4[d[i++]] ^
        t3[d[i++]] ^
        t2[d[i++]] ^
        t1[d[i++]];
    }
  }
  for (; i < d.length; ++i) cr = t1[(cr & 255) ^ d[i]] ^ (cr >>> 8);
  cr = ~cr;
  if (cr < 0) return (0x100000000 + cr).toString(16);
  return cr.toString(16);
});

And here are my results:
Screenshot

Performance is within 10% for the 1MB variant, which for fflate isn't worth the extra bundle size that the WASM adds. Bundle size could be improved by computing the tables at runtime, at which point it is probably worth at least supporting hash-wasm, but the CRC32 implementation in hash-wasm would need to be either smaller or significantly (>30%) faster to justify using it by default. My devices are all quite powerful, so perhaps it's a rare case. Maybe you could run the benchmark to see if this JS version is actually fast or if it's just for me?

If you find that the JS version tends to be slower on most devices, or if you find a way to improve CRC32 performance, I'll look into implementing hash-wasm as the default for fflate with large files in async mode. Thanks again for creating this awesome library!

Expose the Argon2 version number constant?

I know this library currently implements version 0x13 (aka 19) of Argon2, which seems like the latest reference version (from 2016?).

I dunno if version 19 is the last version that will ever be released... perhaps?

But to be as future proof as possible, the encoded output already preserves the version number used with a hashed value (along with the salt and params), so that in the future you would either be able to select the proper version from multiple algorithm implementations, or AT LEAST be able to know that your version of the algorithm is mismatched to the version used to create a hashed value.

However, right now, to perform such a check/match of version number, it seems like there's no way to "extract" the current version of the algorithm from this library's implementation, except to attempt a throw-away hashing and parse out the 19 from the encoded output.

That seems rather hacky (and inefficient). I wonder if that version number could be exposed as a property or constant in some way, so that I could just look at the version in an encoded hash output of some message, and compare that to the version exposed on the library, to double-check that it's valid to attempt a match of two hashes.

I'd be happy to provide a simple PR for that, if that seems reasonable?

Uncaught Error: update() called before init()

Hey,

thanks for this nice library. I'm trying to implement chunk based file hashing, which leads to the following error:

Uncaught Error: update() called before init()

This is strange, because I'm defintely calling init() before update(). But I'm using svelte, this is something, i should mention.

Code Snippet:

// global var where the output goes
let output = "";

// called "onchange / oninput" of the file element
async function fileUpdated(e) {
    if (e.detail && e.detail.files && e.detail.files[0]) {
        const file = e.detail.files[0];
        const sha1Hash = await createSHA1();
        sha1Hash.init();
        parseFile(file, 32 * 1024 * 1024 * 1024, (bytes) => {
            console.log(bytes.byteLength);
            if (bytes.byteLength > 0) {
                sha1Hash.update(bytes);
            }
        });
        output = sha1Hash.digest();
    }
}

function parseFile(file, chunkSize: number, callback) {
    const fileSize = file.size;
    let offset = 0;
    let chunkReaderBlock = null;
    let readEventHandler = function(evt) {
        if (evt.target.error == null) {
            offset += evt.target.result.length;
            callback(new Uint8Array(evt.target.result)); // callback for handling read chunk
        } else {
            console.log("Read error: " + evt.target.error);
            return;
        }
        if (offset >= fileSize) {
            console.log("Done reading file");
            return;
        }

        // of to the next chunk
        chunkReaderBlock(offset, chunkSize, file);
    }

    chunkReaderBlock = function(_offset, length, _file) {
        let r = new FileReader();
        let blob = _file.slice(_offset, length + _offset);
        r.onload = readEventHandler;
        r.readAsArrayBuffer(blob);
    }

    // now let's start the read with the first block
    chunkReaderBlock(offset, chunkSize, file);
}

Any ideas?

Output Uint8Array

HEX is useless for byte manipulation. Uint8Arrays should be preferred.

React-Native Support (Android and iOS)

I tried using the argon2id encryption function on React-native v0.70.5 and got an error

Error:

Module must be either an ArrayBuffer or an Uint8Array (BufferSource), object given

I already have crypto and other required libraries setup already.

I have a dummy project setup on this repository

sha1: all chunk file time > total file time

sha1 It takes longer to calculate all the fragments of the file than to calculate the total time for the entire file.

`

import { createSHA1 } from 'hash-wasm'

const chunkSize = 10 * 1024 * 1024
let wasmSha1 = null
const fileReader = new FileReader()

function getChunkCount(totalSize) {
	return Math.floor(totalSize / chunkSize)
}

function handleChunkSha1(chunk) {
	return new Promise((resolve, reject) => {
		fileReader.onload = async (e) => {
				if (!wasmSha1) {
					wasmSha1 = await createSHA1()
					wasmSha1.init()
				}
				const view = new Uint8Array(e.target.result)
				wasmSha1.update(view)
				resolve()
		}

		fileReader.readAsArrayBuffer(chunk)
	})
}

const readVideoFile = async (file) => {
	const maxCount = getChunkCount(file.size)

	for (let i = 0; i <= maxCount; i++) {
		const chunk = file.slice(chunkSize * 1, Math.max(chunkSize * (i + 1), file.size))
		await handleChunkSha1(chunk)
	}

	const hash = wasmSha1.digest()
	return Promise.resolve(hash)
}

readVideoFile(file)

this file is over 1G
`

Who's using hash-wasm?

I've built a browser based encryption tool based on the portable-secret project and replaced the default browser based PBKF2 key derivation algorithm with argon2id compiled as a WASM module and embedded into the secret.html file as a UMD.

I really appreciate the work you've put into this package and the care given towards having a highly optimized implementation. I am able to tune the algorithm fairly aggressively to have good security while providing a fairly good experience for the majority of people (assuming the user has reasonably decent hardware).

Here are the parameters I'm using to tune the Argon2id implementation:

const ARGON2_PARALLELISM = 1; // browsers don't support multi-threading on the same isolate, so no point in increasing this
const ARGON2_ITERATIONS = 12; // key derivation (argon2id: https://soatok.blog/2022/12/29/what-we-do-in-the-etc-shadow-cryptography-with-passwords)
const ARGON2_MEMORY_SIZE = 131072; // 128 MiB

PS: Feel free to close this issue and move this to the Github Discussions.

Random crash in Safari on demo page

URL: https://3w4be.csb.app/

According to @gojomo:

after entering a few different short strings, maybe 30 characters total, it manages to crash the Safari page, with a "This webpage was reloaded because a problem occurred" message.

I've never seen this issue with Chrome / Firefox and I currently don't have a Macbook nearby to debug it myself.
I would be glad to see some detailed error messages / debugging info if somebody would manage to reproduce this.

Performance issues

My tests shows other values:

jssha-sha1      - 135972.15 iterations / second (100000 / 0.74 sec.)
wasm-sha1       - 38596.03 iterations / second (100000 / 2.59 sec.)

Test is simple hash calc. func, called in loop.

jssha

const hash = new jsSHA( "SHA-1", "BYTES" );
hash.update( str );
hash.getHash( "HEX" );

hash-wasm

await sha1( str );

What I am doing wrong?

Feature Request: BLAKE3

Any chance https://github.com/BLAKE3-team/BLAKE3 could be implemented?
Though, most of the performance benefits would probably be from SIMD (I assume?).
I'm not sure how fleshed out SIMD is in WebAssembly, last time I checked V8 had buggy support.
Either way, it would be interesting to see the performance difference with Blake2b in WASM.

Consider shipping wasm files to npm

I'm trying to use hash-wasm's argon2 implementation with Vercel's Edge runtime and running into an compilation error. It seems hash-wasm's dist files include a hard-coded string of the wasm base64-encoded and then run through WebAssembly.compile. Dynamic code evaluation isn't supported in that runtime (https://nextjs.org/docs/messages/edge-dynamic-code-evaluation).

Wasm files can be imported differently and handled by nextjs during bundling (ex. import someWasm from './some.wasm?module') but the wasm outputs from hash-wasm don't seem to be shipped as independant artifacts with the library code.

I imagine having access to the wasm files could be useful for other situations as well.

Would you be open to a PR to add this?

WebAssembly conflict with restrictive CSP in Chrome

First, thanks for this great package! I'm using it in my web-application, where it works wonderful, even for very large files (> 10 GB).

Unfortunately, if a page has a restrictive Content-Security-Policy which does not allow 'unsafe-eval' for script-src, the WebAssembly.compile will fail with:

CompileError: WebAssembly.compile(): Wasm code generation disallowed by embedder

For more information see:
WebAssembly/content-security-policy#7
https://github.com/WebAssembly/content-security-policy/blob/master/proposals/CSP.md

I found a simply workaround which grabs the WebAssembly object from a page with 'unsafe-eval' set, and overwrites the object from the "main window". This is possible using this JavaScript:

// @ts-expect-error
window.WebAssembly = await (async () => new Promise((resolve) => {
	const iframe = document.createElement('iframe');

	// This page is using `Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-eval'`
	iframe.src = '/wasm/';

	iframe.addEventListener('load', () => {
		// @ts-expect-error
		const wasm = iframe.contentWindow?.WebAssembly ?? WebAssembly;

		document.body.removeChild(iframe);

		resolve(wasm);
	});

	document.body.appendChild(iframe);
}))();

So hash-wasm is using the WebAssembly object from the iframe with 'unsafe-eval' without to even know about this "hack".

Since I don't like overwriting global objects, I think it would be a nice feature to be able to provide the WebAssembly object, which it should use, directly to hash-wasm.

Weird performance of blake3 depending on presence of Chrome Dev Tools

Hi, thanks for your very useful library. I hit the following puzzling issues:

On https://daninet.github.io/hash-wasm-benchmark/ the blake3 1 MB benchmark gives me ~400 MB/s in both Chrome and Firefox.

But if I open the dev toolbar in Chrome, performance drops to 40 MB/s; in Firefox its dev tools do not affect performance.

Video:

hash-wasm-blake3-chrome-performance-depends-on-devtools.mp4

It gets even weirder, but perhaps it's the same issue.

My minimal example https://nh2.me/js-file-reading/js-file-slice-blob-arrayBuffer-blake3.html reads an actual real file from disk (which is based on your StackOverflow example).
If I open this in Chrome in a completely new tab, and select a 200 MiB file with random contents, I get 160 MB/s.

However, if I open it in Chrome in a completely new tab, then toggle the Dev Tools on+off (pressing F12 two times), then I get only 40 MB/s.

This means that the performance drops 4x depending on whether the Dev Tools were opened in this tab in the past.

Refreshing the tab with F5 does not fix the performance, only closing it re-opening the tab fully fixes it.

Firefox does not have this problem; its performance is unaffected by the Dev Tools.


Note how the two pages are different:

  • On your benchmark page, performance degrades only while Chrome Dev Tools are open; it recovers immediately once closed.
  • On my sample page, performance stays bad even after Chrom Dev Tools are closed.

Any clue why this difference might exist? Perhaps it's because your benchmark page is hashing in a worker?

It would be great to find out how to achieve reliable high-performance blake3 hashing.


Browser versions:

  • Chrome 111 on NixOS Linux, x86_64
  • Firefox 111 on NixOS Linux, x86_64

Thanks!

Feature Request: Calculate MD5 in parallel

It would be nice if we could get the md5 faster than now.

Is there any algorithm Calculating MD5 in parallel? Such as:

`
// worker1
hasher.update(new Uint8Array(buffer))
// worker 2
hasher.update(new Uint8Array(buffer))

// main worker
hasher.digest()
`

thank you for your great work.

Performance issues on browser for pbkdf with sha512

This is my code:

var salt = 4;
var iterations = 16384;
var hashLenght = 256;
var newLabel = 'new time';
var oldLabel = 'old time';
var cachedHash = hashwasm.createSHA512();
var result = 'NXmIcqNIcO8iods77YNNtLk3PRlJ/KdCkhufUKbFr+d4w9pIyE8Kr/a87VZa8izvV6LLB35rIyEiK9HhKMRJcntx/AwKfzumBTAHuIdELIaMqXuKBv1dHhFTa+rL7KfMTp+F6LiLJY4LxYgEqmx5SgpRD2/s5qUaRJWWTGsAG19sY5aJ/2XhGRN1vhS55ziHwXx5xNfEj//77pGxU8lWMVYL/2eqbNN3DN8RC6KQPmfB+LXizfLfkJ6ZtYDLlvIul5ilz2o2IVFvibzaIpswwhoW1jVB+FL2GaZAXgW3+ce1AYrsW6TVZPbw1YH2VoCA6OTTOcbLONJhybZJdp95Nw==';

async function runOld(){
  
  console.log('starting old');
 var start = new Date();
  
  var textEncoder = new TextEncoder("utf-8");

  var passwordBuffer = textEncoder.encode(session);

  var importedKey = await
  crypto.subtle.importKey("raw", passwordBuffer, "PBKDF2", false,
      [ "deriveBits" ]);
    
  var params = {
    name : "PBKDF2",
    hash : "SHA-512",
    iterations : iterations
  };

  params.salt = textEncoder.encode(salt);

  console.log('old key: ', result === btoa(String.fromCharCode(...new Uint8Array(
      await crypto.subtle.deriveBits(params, importedKey, hashLenght * 8)))))

   var end = new Date();
  
  console.log((end - start) / 1000);
  
}

async function run(repeat) {

  console.log('starting new');

  var start = new Date();

  const key = await hashwasm.pbkdf2({
    password: session,
    salt: salt.toString(),
    iterations: iterations,
    hashLength: hashLenght,
    hashFunction: cachedHash,
    outputType: 'binary',
  });

  console.log('new key:', result === btoa(String.fromCharCode(...new Uint8Array(key))));
  var end = new Date();
  console.log((end - start) / 1000);
  

  
}

runOld();

run();```

I'm geting very slow times when it comes to the WASM implementation, while ff can get times of 0.009 seconds, WASM gets over 3 seconds. chromium gives me similar results on WASM, even if native is a bit slower than ff. I tried running it twice in sequence, but didn't change a thing. You can notice I have cached createSHA512 too. My friend ran this code and got similar results. ps: notice that behavior changes between firefox and chrome, ff will run wasm only when native is done, while chrome starts and finishes both at the same time more or less.

WebAssembly.instantiate(): Out of memory: Cannot allocate Wasm memory for new instance

Hi, I got this error: WebAssembly.instantiate(): Out of memory: Cannot allocate Wasm memory for new instance, while trying to calculate MD5 hashes of files in web workers in multiple tabs.

Here is my code snippet:

import { md5 } from 'hash-wasm'; onmessage = async (event) => { const files = event.data; console.log(`worker received ${files.length} files`); for (let i = 0; i < files.length; i++) { const file = files[i]; const buffer = await file.arrayBuffer(); const hash = await md5(new Uint8Array(buffer)); postMessage([hash, file]); } };

This works as expected while running a single application (single tab), it creates several web workers and calculates hashes in parallel without any issue, but when several instances of the same application are running (several tabs), I got that error.
Please help me to figure out how to solve it or avoid, thank you.

Distributing the minimal bundles

Hi, I'm using this library from clojurescript with shadow-cljs, and to get proper code minification I believe I need to require the individual algorithms from their own separate files. It looks like you have a script rollup-min.config.js to build this stuff but the dist/ only seems to include the big index.*.js files. Is it possible for the small files to be distributed too, or some explicit way to have yarn install the rollup-min config instead? I'm not very familiar with the JS tooling here so thanks for the help.

Feature Request: 32 bit hash output

It would be nice if we could get a Number out of e.g. xxHash. Right now I have to parse the output hex which is rather slow. I'm using the hash to calculate a index into a form of hash map.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.