Giter Club home page Giter Club logo

Comments (4)

dduponchel avatar dduponchel commented on June 29, 2024

Thanks for the report !

I was going to say that the indexOf method could find the wrong JSZip.signature.DATA_DESCRIPTOR, but the current method has the same flaw. If the current file is a zip file with data descriptors, the file won't be correctly unzipped.

Your patch is a nice improvement over the existing code but the whole findDataUntilDataDescriptor method could be deleted. I have a related patch (which removes this method and fixes the nested data descriptors bug) waiting on my machine but I didn't finished/pushed it, sorry about that :-(
I just pushed it on my branch issue30. I'll create a pull request for review as soon as I'm sure the unit tests are ok everywhere (I'll do that tomorrow). Is that ok for you ?

A note about the inflate and deflate files : the implementation might change (for a more robust one) or new compression methods might be added so the compress/uncompress interface must remain generic and easy to implement.

Lazily decompressed files is an interesting feature (and I don't have any sleeping patch for this !). A way to convert a compressed string into an object without loading the whole decompressed string in memory could be nice too (a new method on ZipObject and a lazy decompressed file may be the easiest way to implement it).

from jszip.

martingraham avatar martingraham commented on June 29, 2024

Thanks for the quick reply,

That sort of segues into the next "improvement" (in my mind) I did, which is not bothering to extract the compressed data as a substring, instead just passing the whole zip string and an offset to the inflate method.

I've put the change into your new jszip-load.js as so..... :

   var fileStats = {start: reader.index, cdata: reader.stream}; // Basically a position in the entire zip file
     //this.compressedFileData = reader.readString(this.compressedSize);

     compression = findCompression(this.compressionMethod);
     if (compression === null) { // no compression found
        throw new Error("Corrupted zip : compression " + pretty(this.compressionMethod) +
                        " unknown (inner file : " + this.fileName + ")");
     }
     //this.uncompressedFileData = compression.uncompress(this.compressedFileData);
      this.uncompressedFileData = compression.uncompress(fileStats);

and in jszip-inflate.js I change the inflate method to do this:

function zip_inflate (fileStats) {
    console.log ("inflating zip file v2");
    var out, buff;
    var i, j;

    zip_inflate_start();
    zip_inflate_data = fileStats.cdata;
    zip_inflate_pos = fileStats.start;

    buff = new Array(1024);
    var bigout = [];
    out = [];
    var k = 0;

    while((i = zip_inflate_internal(buff, 0, buff.length)) > 0) {
        out.length = 0;
        for(j = 0; j < i; j++) {
            out[j] = String.fromCharCode(buff[j]);
        }
        bigout[k] = out.join("");
        k++;
    }
    zip_inflate_data = null; // G.C.
    return bigout.join("");
}

Basically there's 2 changes here, one is changing the read character routine to use buffers which are joined rather than string concatenated. Online sources say this is kinder to memory especially in older browsers (though its hard to find sources that discuss memory efficiency rather than speed efficiency). The join is a 2-stage affair because by monitoring memory use in task manager it seemed to use less peak memory in the five main browsers I've been trying to get this to function with (Chrome, IE, FF, Opera, Safari) than a 1-stage affair.

The second change is that zip_inflate_data is set to the cdata field of the object I pass in (and that is just the entire zip file as a string), and I set zip_inflate_pos to the start position of the file I want decompressed. This, for my files at least, seems to work straight off the bat. I thought I'd have to go hunting for an end character or know the end point or something but that seems to be dealt with in the inflation routines. Again, this is just for the few big old zips I've tested... I'd guess you'd know better whether this trips up any other type of inflating... you did warn there are other, and will be other, ways of inflating data.

(From this point I'm now exploring sending each of those out[] buffers to a routine that strips out data it doesn't want before doing a join, mainly by doing delimiter counts, hopefully reducing the memory footprint - frankly the whole of my 'memory efficient' quest revolves around creating as few new strings as possible - and making them as small as they need to be if I do so)

from jszip.

martingraham avatar martingraham commented on June 29, 2024

One other reason I've fixated upon the use of Strings within the code is because javascript strings use 2 bytes per character whereas for a binary file such as a zip file just 1 should be sufficient. As such, I wondered what would happen if I read in the initial zip as an ArrayBuffer and rather than turn it into a String in the JSZip.utils functions try and change the extracting to work on an ArrayBuffer (actually the Uint8Array view of it). I've managed to get this working with various modifications (that shouldn't break it for processing Strings) and I can now read in and conditionally unzip files Chrome et al wouldn't touch a couple of weeks ago. Would you'd be interested in me branching the code or just mailing you what I've got with comments?

from jszip.

dduponchel avatar dduponchel commented on June 29, 2024

If you can push your changes on a branch, that will be great !

from jszip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.