kevva / decompress Goto Github PK

View Code? Open in Web Editor NEW

410.0 13.0 50.0 110 KB

Extracting archives made easy

License: MIT License

JavaScript 100.00%

extract decompress zip tar targz nodejs

decompress's Introduction

decompress

Extracting archives made easy

See decompress-cli for the command-line version.

Install

$ npm install decompress

Usage

const decompress = require('decompress');

decompress('unicorn.zip', 'dist').then(files => {
	console.log('done!');
});

API

decompress(input, [output], [options])

Returns a Promise for an array of files in the following format:

{
	data: Buffer,
	mode: Number,
	mtime: String,
	path: String,
	type: String
}

input

Type: string Buffer

File to decompress.

output

Type: string

Output directory.

options

filter

Type: Function

Filter out files before extracting. E.g:

decompress('unicorn.zip', 'dist', {
	filter: file => path.extname(file.path) !== '.exe'
}).then(files => {
	console.log('done!');
});

Note that in the current implementation, filter is only applied after fully reading all files from the archive in memory. Do not rely on this option to limit the amount of memory used by decompress to the size of the files included by filter. decompress will read the entire compressed file into memory regardless.

map

Type: Function

Map files before extracting: E.g:

decompress('unicorn.zip', 'dist', {
	map: file => {
		file.path = `unicorn-${file.path}`;
		return file;
	}
}).then(files => {
	console.log('done!');
});

plugins

Type: Array
Default: [decompressTar(), decompressTarbz2(), decompressTargz(), decompressUnzip()]

Array of plugins to use.

strip

Type: number
Default: 0

Remove leading directory components from extracted files.

License

MIT © Kevin Mårtensson

decompress's People

Contributors

Stargazers

Watchers

decompress's Issues

mopt is a hell heavy dependency

I tracked down this dependency from yeoman-generator and it takes about 150ms to require. Can we replace this lines with generic functions? It will speedup yo like twice.

P.s. here is my track path: yo - yeoman-generator - fetch - download - decompress and some numbers:

  decompress requires begin +0ms
  decompress required AdmZip +38ms
  decompress required fs +0ms
  decompress required mkdirp +2ms
  decompress required mout +135ms
  decompress required path +0ms
  decompress required stream-combiner +2ms
  decompress required rimraf +1ms
  decompress required tar +49ms
  decompress required temp +3ms
  decompress required zlib +0ms

Decompression a folder which name is chinese,the folder name will becomes garbled

Use Apple computer to compress a folder，this folder‘s name is Chinese ,then use this to extract down, the folder name becomes garbled

Missing dependencies

Hello there,

First, thank you for your building such a useful module.

Here is my problem. When I try to run decompress it is asking for 'is-absolute' module. Is that normal?

Option to exclude files

Is there a way to exclude files from decompressing (like the --exclude flag in the tar command)?

endsWith error on zip file

OK in 1.6 version but KO on 1.9 et 2.x version

TypeError: Object function (obj, val) {
var ret = find(Object.keys(obj), function (ret) {
return endsWith(val, ret);
});

return ret ? obj[ret] : null;

} has no method 'endsWith'
at Decompress._getExtractor (/Users/lcalvy/Documents/Boulot/Sources/coorpacademy/node_modules/decompress/index.js:89:19)
at new Decompress (/Users/lcalvy/Documents/Boulot/Sources/coorpacademy/node_modules/decompress/index.js:40:27)
at module.exports (/Users/lcalvy/Documents/Boulot/Sources/coorpacademy/node_modules/decompress/index.js:168:22)

Unzip & strip on Windows discards the complete folder structure

When I try to extract a *.zip archive on my Windows machine in combination with the { strip: 1 } option the whole folder structure is discarded and files with identical names are overwritten.

How to reproduce:

var download = require('download');

download('https://github.com/vsonix-bub/github-download-repo-test/archive/master.zip','.',{});

var fs = require('fs');
var decompress = require('decompress');

fs.createReadStream('master.zip').pipe(decompress({ ext: '.zip', strip: 1 }));

The error is also present when decompress is invoked using the download module:

download('https://github.com/vsonix-bub/github-download-repo-test/archive/master.zip','.',{ extract: true, strip: 1 });

Expected Result & Result on Linux machines: (The content of the file after the colon)

interesting_file.txt : root
sub/interesting_file.txt : sub
sub/sub_a/interesting_file.txt : sub_a 
sub/sub_b/interesting_file.txt : sub_b

Result on Windows:

interesting_file.txt : sub_b

System Info: Windows 7 SP 1 64-bit, Node v0.10.28, decompress 0.2.3, download 0.1.17

Windows does not identify folders correctly in zip

I'm trying to extract https://github.com/HaxeFoundation/haxe/releases/download/3.2.1/haxe-3.2.1-win.zip using decompress 4.0.0, but get a whole bunch of errors like

Error: EISDIR: illegal operation on a directory, open '...\haxe-3.2.1\std\flash\geom'
    at Error (native)
  errno: -4068,
  code: 'EISDIR',
  syscall: 'open',
  path: '...\haxe-3.2.1\std\flash\geom'

from file entries like

{ mode: 420,
  mtime: 2015-10-11T11:00:44.000Z,
  path: 'haxe-3.2.1/std/flash/geom/',
  type: 'file',
  data: <Buffer > }

It looks like the directories in that archive are marked as type: file (instead of directory), but oddly enough it only happens for that particular archive.

A quick fix I found was to treat files as directories if their path ends with a '/' (i.e. if (x.type === 'directory' || x.path.endsWith('/')) { /* mkdir -p */ }), but I don't know if that's a good idea or not.

started getting: TypeError: Invalid non-string/buffer chunk

This started last night..

full stacktrace is:

TypeError: Invalid non-string/buffer chunk
    at chunkInvalid (node_modules/decompress/node_modules/stream-combiner2/node_modules/readable-stream/lib/_stream_readable.js:421:10)
    at readableAddChunk (node_modules/decompress/node_modules/stream-combiner2/node_modules/readable-stream/lib/_stream_readable.js:176:12)
    at DuplexWrapper.Readable.push (node_modules/decompress/node_modules/stream-combiner2/node_modules/readable-stream/lib/_stream_readable.js:162:10)
    at DestroyableTransform.<anonymous> (node_modules/decompress/node_modules/stream-combiner2/node_modules/duplexer2/index.js:50:15)
    at DestroyableTransform.emit (events.js:95:17)
    at DestroyableTransform.<anonymous> (node_modules/decompress/node_modules/vinyl-fs/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:786:14)
    at DestroyableTransform.emit (events.js:92:17)
    at emitReadable_ (node_modules/decompress/node_modules/vinyl-fs/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:448:10)
    at emitReadable (node_modules/decompress/node_modules/vinyl-fs/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:444:5)
    at readableAddChunk (node_modules/decompress/node_modules/vinyl-fs/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:187:9)

it happens on all decompressions I am using ( tar.bz2, tar.gz, tar ) - not using zip decompression as it has the permissions issue.

Node <4 support with babel

Would you consider using babel to support older versions of node.js using babel transpilation?

I'm offering help with PR on this, if you like.

In every case, thank you for this lib, it is the only one which works correctly with XLSX files for me.

Support for bunzip (.bz2)

Suggest to support .rar

decompress applies filter after fully reading all files into memory, and documentation is misleading

The documentation of the filter option is misleading:

Filter out files before extracting

That's not really true. When you use decompress (say, with decompress-tar or decompress-unzip), it fully decompresses the entire tarball into memory before applying the filter. If you're not using the output feature, the filter option is entirely equivalent to just using Array.filter yourself on the return value.

It would make a lot more sense to me if decompress plugins ran filter as soon as they started processing a file, and dropped data from filtered-out files instead of filling RAM with garbage.

I recognize that this is a backwards-compatible change (as currently filter callbacks have access to data) so probably this means that a new "pre-filter" option should be added.

#57 is a PR to add documentation, though I'd suggest fixing the issue instead of documenting it.

Huge zip file leads to "ENFILE: file table overflow"

Hello,

Trying to decompress a zip file which contains itself many files (approx. 18k) leads to an ENFILE: file table overflow error if I specify an output path.

From what I understand, files once extracted are written simultaneously to disk, which in my case lead to a simultaneous overlapping numbers of fd open at the same time.

Is there any known workaround ?

Thanks !

Exception RangeError: out of range index

I got error using decompress on AWS Lambda
here is a stack trace when trying to unzip a file, AWS Lambda runs using Node 6.10.

at RangeError (native)
at BufferSlicer.read (/var/task/src/handler.js:13590:16)
at readAndAssertNoEof (/var/task/src/handler.js:35297:11)
at ZipFile.readEntry (/var/task/src/handler.js:34927:4)
at Promise (/var/task/src/handler.js:12198:7)
at extractFile (/var/task/src/handler.js:12195:29)

Here is the source of the function where line 35297 is for this.buffer.copy

	BufferSlicer.prototype.read = function(buffer, offset, length, position, callback) {
	  var end = position + length;
	  var delta = end - this.buffer.length;
	  var written = (delta > 0) ? delta : length;
	  this.buffer.copy(buffer, offset, position, end);
	  setImmediate(function() {
	    callback(null, written);
	  });
	};

npm install fails because of decompress

Can we change plugins to peerDependencies?

For example I only use tarGz in my project and I'm not interested in zip, etc.

Incorrectly decompressing through multiple streams

When attempting to extract a tar.gz archive of WordPress, Decompress correctly used Decompress.targz to extract everything, but then continued to pass those files through the other decompression plugins. Eventually, it encountered a .xap file, which it attempted to unzip, resulting in an error due to the files being Windows files.

Obviously this is not the desired behavior. I can see addressing this in a few ways:

Decompress.zip should not attempt to decompress .xap files. Perhaps isZip() should return false, or there should be some other short circuit here. While it technically is a zip file, it is not meant to be decompressed; I'm sure there are a lot of file extensions that fall into this category. Maybe there should be a blacklist for these types of extensions?
Some better handling of the decompression plugins. Perhaps you could use the decompression plugins based on extension, or skip the remaining plugins once the first succeeds? I realize that this is not a issue that's going to be hit often: for example, changing the order of these plugins would also fix my problem (but that's definitely not a long-term solution).

If this 'multiple decompressions' behavior is desired, I would either make that an option that has to be enabled, or at least document that behavior explicitly.

FYI I originally hit this while using yeoman/generator's extract function.

Windows Decompress Does Not Always Trigger 'Then' Promise

Hi - Great plugin we are using to decompress a lot of files. I came across a certain distribution (phpMyAdmin) which decompresses but fails to invoke the 'then' promise. This only happens on Windows, I verified it works fine on OSX (not verified on Linux, but I assume to be Windows only). Another interesting fact is that the same failure happens with any format of their distribution (zip, tar.gz, tar.bz2). You can test with any of their downloads from:

https://www.phpmyadmin.net/downloads/

Here is the test case I used to show this issue:

var decompress = require('decompress');

decompress('c:\\temp\\phpMyAdmin-4.6.3-english.tar.bz2', 'c:\\temp').then(files => { 
  console.log('Finished!');
}).catch(function(err) {
  console.log(err);
});

console.log('End of Script');

On Windows this test script will extract the entire archive, but never log 'Finished!'. On other platforms it works as intended. I am suspicious of circular references and symlinks in the archive but it should still invoke the promise. I also tried test, extracting, and re-creating the archive but the behavior remains the same.

Thanks and let me know if I can be of further assistance.

Simple example seems to not work

Am I doing it wrong? I have a zip file that does exist and I'm doing this from the node repl:

> require('fs').createReadStream('/Users/justin/Downloads/bootstrap-3.1.1-dist-foo.zip').pipe(require('decompress')({ext: 'zip'}))
TypeError: Property 'extractor' of object #<Decompress> is not a function
    at Decompress.extract (/Users/justin/Downloads/node_modules/decompress/index.js:51:23)
    at module.exports (/Users/justin/Downloads/node_modules/decompress/index.js:169:23)
    at repl:1:115
    at REPLServer.self.eval (repl.js:110:21)
    at Interface.<anonymous> (repl.js:239:12)
    at Interface.EventEmitter.emit (events.js:95:17)
    at Interface._onLine (readline.js:202:10)
    at Interface._line (readline.js:531:8)
    at Interface._ttyWrite (readline.js:760:14)
    at ReadStream.onkeypress (readline.js:99:10)

Ubuntu - Decompressed folder is left without any permissions 0000

When I run my server in Ubuntu (Ubuntu 14.04 LTS) the decompressed destination folder is left without any permissions.

var decompress = new Decompress({mode: '755'});
decompress.src(file).dest(destFolder).use(Decompress.zip).run(function (err) {
if (err)
throw err;
});

I get a Error: EACCES, open

When I check destFolder I find it was created without any permissions (octal: 0000).

This forces me to run as sudo which really isn't ideal.
Please help.

Windows requires admin rights for creating symlinks

I know. It sucks, but this line doesn't work when you're not running node as admin on windows: https://github.com/kevva/decompress/blob/master/index.js#L61

Is it possible to somehow just copy the linked file?

Support .xz format

As pointed at kevva/download#55, .xz format being used as standard by GNU packages is not supported, There are several libraries can be used:

https://www.npmjs.com/package/xz
https://www.npmjs.com/package/node-liblzma
https://www.npmjs.com/package/lzma-native

Also some of them in pure Javascript:

https://github.com/cscott/lzma-purejs
https://github.com/nmrugg/LZMA-JS

Version backwards compatibility

I have a project that uses dependencies that leverage decompress. I noticed that yesterday you upgraded to version 3.0.0 and my builds have begun breaking. I do not use decompress directly, however it looks like many of my direct dependencies reference it. I noticed that your repository only has 1 branch and when I try to download version 2.3.0 (most previous version) manually I am unable to do so. I believe that this is causing all of my other dependencies to break during my build.

I was just wondering if you had any insight into this issue?

Pre-POSIX.1-1988 (i.e. v7) tar

Hi,
I have an issue while opening an old tar.bz file, it seems that the problem is that the https://github.com/sindresorhus/file-type library cannot recognize the "tar" format becouse it is pre-posix.1-1988 version and there is no "magic number" for such files. You can follow the issue I opened with that library here:

sindresorhus/file-type#76

However, if I bypass the file check in the plugin decompress-tar the tarStream seems to extract correctly the files. I think that adding a new option called "legacyTar" is a good compromise.

I'm going to add a pool request soon, for the module and also for the following plugins:

Thanks

New version for commit "Bump `vinyl-fs` dependency"?

Hey!
Great job with this module! Just one thing:

I run into an error when installing decompress, because of the vinyl-fs dependency. I saw you corrected the problem (commit fb43c2f), but I don't think the change as been repercuted to the last version for npm (ie. the last version 2.2.1 doesn't take this commit into account, seemingly making it impossible to install it through npm). Am I wrong?

Are you aware of GPLed dependencies in decompress?

Do you guys know that adm-zip contains GPLed code?
https://github.com/cthackers/adm-zip/blob/master/methods/deflater.js

Any recommendations on dealing with this?

Fix extracting tar.gz using CLI

Doesn't work because path.extname('file.tar.gz') will result in .gz.

Files are not extracted and no error is thrown.

When I am trying to unzip a file, the callback is executed but there are no extracted files in the destination directory and no error is thrown. The file to unzip exists (checked in Finder and with fs.existsSync(file).

Environment

Decompress version: 3.0.0
Node.js version : 0.12.0
OS X El Capitan

Code:

var Decompress = require("decompress");

var file = "/Users/Username/Desktop/App/node_cache/0.12.0/package.zip";
var dest = "/Users/Username/Desktop/App/node_cache/0.12.0/";

new Decompress()
  .src(file)
  .dest(dest)
  .use(Decompress.zip({ stripe: 1 })
  .run(function(error) {
    if (error) {
      console.log(error);
    } else {
      console.log("ok");
    }
  });

decompress and plugins are inconsistent about whether you can pass in a stream

Most decompress plugins allow input to be a stream. But decompress itself requires it to be a Buffer or a filename, and the filename is converted to a Buffer not a stream. I get that eg decompress-tar takes a stream because it might get one from decompress-targz, but it's unclear why decompress-targz (eg) is allowed to take a stream.

Moreover, it would be beneficial from a performance standpoint if you could pass in a stream to plugins like the tar ones that support it. Yes, it would mean that automatic type detection wouldn't work, but if you're explicitly specifying a single plugin like decompressTargz then that doesn't matter.

#57 at least will document explicitly that this package always reads the whole file into memory. But it would be nice if this package (with such a nice API) didn't waste so much memory.

Allow to use stream as source

Allow to use a stream on .src() so it could allow to use less memory instead of having all the file in memory on a Buffer object.

It'd be very handy to have synchronous versions of the decompress API methods

I know, I know — you're probably all like:

Node world is all about ASYNC. WTF are you even thinking???

But hear me out! I'm actually working on some command line utilities for my workflow lately, and these workflows really are a series of steps. Which means serial. Which ultimately puts a high value on synchronous.

Yes, it's possible to implement even serial workflows with asynchronous APIs, but that really does introduce a level of complexity to the design and maintenance of the code base. Additionally, I anticipate sharing these tools with my colleagues (we have a Stash service at work), and look forward to getting PRs for improvements from folks, so simplicity and ease of understanding really are important to me for a variety of reasons.

In any case, it'd be handy if this library offered synchronous flavors of its methods, as do core Node libraries, like fs.

Support unzipping single gzipped file?

When using download with the extract option on a gzipped file that isn't also a tar file, it throws the 'Invalid tar header' error.

I made the following changes in decompress-targz from:

unzip.end(file.contents);
unzip.pipe(extract);

to:

unzip.end(file.contents);
unzip.pipe(through(function(chunk, enc, callback) {
  if (this.chunks === undefined) {
    this.chunks = [];
  }

  this.chunks.push(chunk);

  callback();
}, function (callback) {
  var buf = Buffer.concat(this.chunks);

  if (isTar(buf)) {
    this.push(buf);
    callback();
  } else {
    self.push(new File({
      contents: buf,
      path: file.path
    }));

    cb();
  }
})).pipe(extract);

This allowed it to bypass piping to extract if the output of unzip isn't a tar. Would it be possible to just convert decompress-targz into decompress-gz and have decompress-tar run against the output instead of having two tar related decompress plugins? I made some changes in my local decompress-targz to test the idea but it seems that decompress-tar is running before decompress-targz.

As a temporary solution I've disabled the extract option and using zlib.unzip against the file in the run callback.

Files losing their properties after extraction

Sorry to reopen this issue (#18) but it seems you have fixed only the mode bug. Extracting files, especially symlinks as I noticed, are still losing their properties. Am I wrong?

Emit errors on streams

I've only tested this with the .zip files, but when an error occurs, I don't think it's passed up properly.

I have a fix for the .zip which could be applied to the others. I'll try to get a PR in if you'd like.

Decompression callback never seems to get invoked

I'm sure I'm likely just doing something wrong, but I just can't seem to get this library to trigger my callback function. As you'll see from my code snippet, that causes my program (a command line utility) to basically hang.

Here's what I got:

  var tarFile = './example.tar';

  var decompress = new Decompress()
      .src(tarFile)
      .dest(wcPath)
      .use(Decompress.tar({ strip: 1 }));

  decompress.decompress(function () {
    fs.rmrfSync(tarFile)
  });

  while (fs.existsSync(tarFile)) {
    console.log("decompressing...");
    sleep.sleep(5);
  }

The tar file in question is 2.3 MB, and I just wind up hitting ctrl-c after a minute of my program continuously writing decompressing... to stdout.

Mac OS X Mavericks and Node 0.10.28, for what it's worth.

Preserve files properties.

Since commit 090849b ("New pluggable API") the properties of extracted files are no more preserved, this include the mode, uid, mtime and I think links (basically most of the information contained in the inode).

This will most certainly breaks all modules relying on this to compile binaries, i.e. all modules using bin-builder if it ever gets updated to use the latest version of this module. For example trying to execute ./configure will fail since the file no more has the exec permission.

Allow progress ?

is there a way to get the progress events or something so we can show users its progress visually?

or even the current extracted size vs the size it will be uncompressed ?

Deprecated engines property

As of npm v2.6.0, the following warning is shown when installing decompress:

npm WARN deprecation Per-package engineStrict will no longer be used
npm WARN deprecation in upcoming versions of npm. Use the config
npm WARN deprecation setting `engine-strict` instead.

"link" decompression fails

The following file fails to decompress:

https://s3.amazonaws.com/mozilla-games/emscripten/packages/llvm/nightly/linux_64bit/emscripten-llvm-latest.tar.gz

It fails during link creation due to not-finding the file, however the file is present in the archive.

P.S. on Ubuntu/Linux

Unsupport Chinese Character

The zip file structure is:

After decompress, the structure is:

Note: file name is not correct.

Decouple builtin plugins?

Currently i just uses decompress-unzip, but it needs some more workarounds it because of the dest. I don't need the others. Yea they are pretty small deps, but... would be better. :)

Documentation

Add info about options etc.

Switch to decompress-zip

Will switch as soon as bower/decompress-zip#13 is resolved.

Add strip option to zip extractor

decompressing same zip file to the same dest directory more than once

Hi,

The following error occurs if an existing zip file is decompressed to the same directory more than once. Is there some setting for this? I couldnot find something in the docs or in the code.

"err": {
"errno": -4048,
"code": "EPERM",
"path": "E:\app\test\dest1\my_file.txt"
}

Extracted files are incomplete

All the extracted files have nothing but the same 6 bytes of garbage in them. There is also another copy of the archive in the dest folder.

new Decompress()
  .src(src)
  .dest(dest)
  .use(Decompress.targz({strip: 1}))
  .run(callback);

Does the `decompress-bzip2` plugin need updating to the new plugin syntax?

In the decompress-bzip2 repo the example code is like this:

var decompress = new Decompress().src('foo.jpg.bz2').dest('dest').use(bzip2());
decompress.run(function (err, files) {
    if (err) { throw err; }
    console.log('Files extracted successfully!'); 
});

However, the main repo (this one) seems to suggest that the syntax should be like this:

decompress('unicorn.bz2', 'dist', {
  plugins: [decompressBz2()]
}).then(files => {
  console.log('done!');
});
// or presumably:
let files = await decompress('unicorn.bz2', 'dist', { plugins: [decompressBz2()] });

Is the bz2 plugin outdated? I noticed that it hasn't been edited in 2 years, whereas the others were updated 11 months ago. Cheers :)

Extraction events

A nice addition to this package would be emitting events such as progress and entry so that implementers can see how far along they are with their decompression and to know what the currently decompressing or decompressed file is.

const zip = new Decompress()
  .src(src)
  .dest(dest)
  .use(Decompress.zip())
  .on('progress', pe => {
    if (pe.lengthComputable) {
      const percent = Math.round(pe.decompressed / pe.total * 100);
      console.log(`${percent}% complete`);
    }
  })
  .on('entry', entry => {
    console.log(`Currently decompressing ${entry.fileName}.`);
  })
  .run();

The progress event data I modeled after the Web API standard: https://developer.mozilla.org/en-US/docs/Web/API/ProgressEvent and modified loaded to decompressed.

The entry event data I modeled after information found in the zip specification on wikipedia.

create directory when we first bytes written to stream

right now, the output file is created when you call:

var stream = decompress({ ... });

which is unexpected and causes you to have to be conscious of where you instantiate the stream.

Filter directories instead of files

Does decompress provides a way to filter directories instead of files inside the filter option? I am trying to delete all the *.lproj (this is actually a directory, not a file) from a macOS app:

const LOCALES = /example\.app\/Contents\/Resources\/(.*?)\.lproj/;

decompress(path, cache, {
  strip: 1,
  filter: file => extname(file.path) !== ".html" && ! LOCALES.test(file.path)
}).then(files => {
  fs.unlinkSync(path);
});