Using this library to read large zip files in browsers (2-80MB zipped, 15-300MB+ unzip

memory efficiency improvements about jszip HOT 4 OPEN

stuk commented on June 29, 2024

memory efficiency improvements

from jszip.

Comments (4)

dduponchel commented on June 29, 2024

Thanks for the report !

I was going to say that the indexOf method could find the wrong JSZip.signature.DATA_DESCRIPTOR, but the current method has the same flaw. If the current file is a zip file with data descriptors, the file won't be correctly unzipped.

Your patch is a nice improvement over the existing code but the whole findDataUntilDataDescriptor method could be deleted. I have a related patch (which removes this method and fixes the nested data descriptors bug) waiting on my machine but I didn't finished/pushed it, sorry about that :-(
I just pushed it on my branch issue30. I'll create a pull request for review as soon as I'm sure the unit tests are ok everywhere (I'll do that tomorrow). Is that ok for you ?

A note about the inflate and deflate files : the implementation might change (for a more robust one) or new compression methods might be added so the compress/uncompress interface must remain generic and easy to implement.

Lazily decompressed files is an interesting feature (and I don't have any sleeping patch for this !). A way to convert a compressed string into an object without loading the whole decompressed string in memory could be nice too (a new method on ZipObject and a lazy decompressed file may be the easiest way to implement it).

from jszip.

martingraham commented on June 29, 2024

Thanks for the quick reply,

That sort of segues into the next "improvement" (in my mind) I did, which is not bothering to extract the compressed data as a substring, instead just passing the whole zip string and an offset to the inflate method.

I've put the change into your new jszip-load.js as so..... :

   var fileStats = {start: reader.index, cdata: reader.stream}; // Basically a position in the entire zip file
     //this.compressedFileData = reader.readString(this.compressedSize);

     compression = findCompression(this.compressionMethod);
     if (compression === null) { // no compression found
        throw new Error("Corrupted zip : compression " + pretty(this.compressionMethod) +
                        " unknown (inner file : " + this.fileName + ")");
     }
     //this.uncompressedFileData = compression.uncompress(this.compressedFileData);
      this.uncompressedFileData = compression.uncompress(fileStats);

and in jszip-inflate.js I change the inflate method to do this:

function zip_inflate (fileStats) {
    console.log ("inflating zip file v2");
    var out, buff;
    var i, j;

    zip_inflate_start();
    zip_inflate_data = fileStats.cdata;
    zip_inflate_pos = fileStats.start;

    buff = new Array(1024);
    var bigout = [];
    out = [];
    var k = 0;

    while((i = zip_inflate_internal(buff, 0, buff.length)) > 0) {
        out.length = 0;
        for(j = 0; j < i; j++) {
            out[j] = String.fromCharCode(buff[j]);
        }
        bigout[k] = out.join("");
        k++;
    }
    zip_inflate_data = null; // G.C.
    return bigout.join("");
}

Basically there's 2 changes here, one is changing the read character routine to use buffers which are joined rather than string concatenated. Online sources say this is kinder to memory especially in older browsers (though its hard to find sources that discuss memory efficiency rather than speed efficiency). The join is a 2-stage affair because by monitoring memory use in task manager it seemed to use less peak memory in the five main browsers I've been trying to get this to function with (Chrome, IE, FF, Opera, Safari) than a 1-stage affair.

The second change is that zip_inflate_data is set to the cdata field of the object I pass in (and that is just the entire zip file as a string), and I set zip_inflate_pos to the start position of the file I want decompressed. This, for my files at least, seems to work straight off the bat. I thought I'd have to go hunting for an end character or know the end point or something but that seems to be dealt with in the inflation routines. Again, this is just for the few big old zips I've tested... I'd guess you'd know better whether this trips up any other type of inflating... you did warn there are other, and will be other, ways of inflating data.

(From this point I'm now exploring sending each of those out[] buffers to a routine that strips out data it doesn't want before doing a join, mainly by doing delimiter counts, hopefully reducing the memory footprint - frankly the whole of my 'memory efficient' quest revolves around creating as few new strings as possible - and making them as small as they need to be if I do so)

from jszip.

martingraham commented on June 29, 2024

One other reason I've fixated upon the use of Strings within the code is because javascript strings use 2 bytes per character whereas for a binary file such as a zip file just 1 should be sufficient. As such, I wondered what would happen if I read in the initial zip as an ArrayBuffer and rather than turn it into a String in the JSZip.utils functions try and change the extracting to work on an ArrayBuffer (actually the Uint8Array view of it). I've managed to get this working with various modifications (that shouldn't break it for processing Strings) and I can now read in and conditionally unzip files Chrome et al wouldn't touch a couple of weeks ago. Would you'd be interested in me branching the code or just mailing you what I've got with comments?

from jszip.

dduponchel commented on June 29, 2024

If you can push your changes on a branch, that will be great !

from jszip.

memory efficiency improvements about jszip HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent