mafintosh / tar-stream Goto Github PK

tar-stream is a streaming tar parser and generator.

License: MIT License

JavaScript 100.00%

tar-stream's Issues

error on invalid headers

This is great, but it needs to error on invalid headers!

If you can give me some hints I can put in a pull request,
my guess is to error if the checksum is incorrect,
unless the block is all nulls (as you can have null blocks inbetween files)

GNU tar use @LongLink to set when the next entry will have a long name. This is used in the GCC and Linux kernel .tar.gz files, so when using tar-stream to de-compress them some files are not extracted and other times there are EACCESS errors.

Modifying tar in place

I perform tar file manipulation in my application. Adding, or replacing files in the tar archive. I built a wrapper that tries to keep it managed.

I'm manipulating tar files the way you suggest in the readme, by opening the tar, putting each entry into a new tar stream and writing that. However I don't really want a new tar file, so I've taken to moving the existing tar file to tmp (using fs.rename) and opening it from there. That way I can always just write to where the file is supposed to be.

It seems when I try to move a tar to tmp, while it is being accessed by tar-stream I receive an error. So, if two requests happen almost immediately after one another I get:

Error: ENOENT, rename '/<where-file-is-supposed-to-be>.tar'
    at Error (native)

At the fs.rename call.

What should I do about this? Is there a way to check whether a file operation is currently being performed?

Should I wait in that case using setTimeout?

    fs.rename(self.fullPath, tmpName, function (err) {
      if (err) {

        // TEMP: throw err
        throw err;

Is there a way to manipulate the tar file in place, without moving it? Wouldn't that just cause the same problem? I'm sorry if this is an easy question.

Invalid header

Hi,
I must generate a dynamic tar file from a S3 directory and as Matteo Collina suggested me I'm using yours fantastic module for tar with pump in the following way:

// this contain all files present in the directory
var files = [];
async.eachSeries(files, function(data, callback){
     // is an s3 file stream created with this module https://github.com/jb55/s3-blob-store
    var stream = store.createReadStream({ key: data.Key });
    var pack = tar.pack();
    var entry = pack.entry({ name: data.Key, size:data.Size }, function(err){
        if (err) console.log(err);
    });

    pump(stream, entry, function(err){
        if (err) console.log(err);
        if (tmp_count === files)
          pack.finalize();
        callback();
    })

    pump(pack, res, function(err){
        if (err) console.log(err);
    })
}, function(err){
    if (err) console.log(err);
    req.end();
})

this code generate the following stack trace:

Error: invalid header
    at Object.exports.decode (/tar-fs/node_modules/tar-stream/headers.js:205:40)
    at onheader (/tar-fs/node_modules/tar-stream/extract.js:103:39)
    at Extract._write (/tar-fs/node_modules/tar-stream/extract.js:206:8)
    at Extract._continue (/tar-fs/node_modules/tar-stream/extract.js:170:28)
    at oncontinue (/tar-fs/node_modules/tar-stream/extract.js:61:10)
    at ondrain (/tar-fs/node_modules/tar-stream/extract.js:81:5)
    at Extract._write (/tar-fs/node_modules/tar-stream/extract.js:206:8)
    at doWrite (/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:237:10)
    at writeOrBuffer (/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:227:5)
    at Writable.write (/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:194:11)

What you think about? I'm doing errors in my code or is there a problem?

Regards

large file issue

I've got a tar file with a few large (>10GB) files in it. When tar-stream gets to the first big file, it chokes with an Invalid tar header error. I can't provide the tar file itself, but hopefully this is helpful:

it hums along through a few small files, and then it hits the header for the first large file (provided here base64 encoded):

cnMtZHMwNTk1NDhfMjAxNi0xMS0yOFQxOTAwMTEuMDAwWi9nb29kZWdncy1nYXJiYW56by9vcmRlcl9pdGVtcy5ic29uAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAwMDA2NDQAMDAwMTc1MQAwMDAxNzUxAIAAAAAAAAACfZ6FHjEzMDE3MTAxMDc1ADAyMzMxNgAgMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB1c3RhciAgAG1hZG1pbgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYWRtaW5zAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=

which it claims to parse and tells me is {name: 'rs-ds059548_2016-11-28T190011.000Z/goodeggs-garbanzo/order_items.bson', size: 2} ... but the size according to bsdtar on OSX is 10697475358 ...

and then it chokes parsing the next tar header (presumably because it has used the entirely wrong offset).

If I can provide more data (without providing the file itself), please let me know.

pack.entry with streams

hi,

the pack.entry function only accepts buffers, but no streams. Im not really sure if this is even implementable. Any ideas about this?

Fails to pack an archive with non-Latin characters in file names

I've written simple code, it just copies a .tar to another one logging files names:

const tar = require('tar-stream');
const pack = tar.pack();
const extract = tar.extract();
const path = require('path');
const fs = require('fs');

extract.on('entry', (header, stream, next) => {
    console.log(header.name);
    stream.pipe(pack.entry(header, next));
});

extract.on('finish', () => {
    // all entries done - lets finalize it
    pack.finalize();
});

const tarPath = './example.tar';
const tarPathParsed = path.parse(tarPath);
const outputPath = `${tarPathParsed.dir}/${tarPathParsed.name}.new${tarPathParsed.ext}`;

let oldTarballStream = fs.createReadStream(tarPath);
let newTarballStream = fs.createWriteStream(outputPath);

// pipe the old tarball to the extractor
oldTarballStream.pipe(extract);

newTarballStream.on('close', () => {
    console.log(`${outputPath} has been written`);
});

// pipe the new tarball the another stream
pack.pipe(newTarballStream);

Also I've created an example.tar with a single file named Тестовый файл.txt (Cyrillic characters in the file name). When I ran my code above, I've got example.new.tar with 2 files, both are named Pax Header. One of them contains:

38 path=Тестовый файл.txt

Another Pax Header contains the full content of Тестовый файл.txt.

Moreover, once I re-ran the code applying it to the example.new.tar (with those 2 PaxHeader's) I've got a tarball with, also, 2 PaxHeader's, but one of them was:

38 path=Тестовый файл.txt
38 path=Тестовый файл.txt

Another, again, was exact my original Тестовый файл.txt.

I believe it's a bug of pack().

No error on invalid input

The following code tries to pipe some nonsense into an extractor:

var tar = require('tar-stream');
var stream = require('stream');
var extract = tar.extract();

extract.on('error', function(e) {
	console.log(e);
});

extract.on('entry', function(header, stream, next) {
	console.log(header);
});

extract.on('finish', function() {
	console.log('finish');
});

var input = new stream.PassThrough();

input.pipe(extract);

input.end(new Buffer('some random content'));

The only output is finish, but it should really emit some sort of error

Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped?

I have a tarball from the registry (and others) that consistency fail with invalid tar header errors but are valid archive.

To replicate

wget https://registry.npmjs.org/eslint-config-metashop/-/eslint-config-metashop-1.5.0.tgz
Use the following test case

const Tar = require('tar-stream');
const Gunzip = require('gunzip-maybe');
const Fs = require('fs');

const gunzip = Gunzip();
const extract = Tar.extract();
const inputFile = Fs.createReadStream(process.argv[2]);

extract.on('error', function (err) {
  console.log(err);
});

extract.on('entry', function (header, stream, callback) {
  stream.on('end', function () {
    return callback()
  });
  stream.resume()
});

inputFile.on('error', function (err) {
  console.log(err)
});

inputFile.pipe(gunzip).pipe(extract);

node test.js eslint-config-metashop-1.5.0.tgz

The error I get is something like

Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped?
    at Object.exports.decode (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/headers.js:265:40)
    at Extract.onheader [as _onparse] (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:124:39)
    at Extract._write (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:248:8)
    at Extract._continue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:212:28)
    at oncontinue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:65:10)
    at Extract.onheader [as _onparse] (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:132:7)
    at Extract._write (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:248:8)
    at Extract._continue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:212:28)
    at oncontinue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:65:10)
    at Extract.onheader [as _onparse] (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:132:7)

Doing some debugging it appears that the stream has advanced too far into the archive and the data it's trying to pass to parse the header is actually file contents, that's as far as we've got so far.

"Error: size mismatch" for files where options.size is left undefined

I see that the default size is set to 0. Why is that? If you leave size undefined, shouldn't the size be set to the length of the input (string or stream)?

Packing example does not run

Apologies, still learning nodejs libraries. The Packing example does not run because myStream is not define. I assume I am supposed to open a file, and get the stream for it and assign it to myStream. Could you add that to the example so people can copy, paste, run, then edit?

var tar = require('tar-stream')
var pack = tar.pack() // p is a streams2 stream

// add a file called my-test.txt with the content "Hello World!"
pack.entry({ name: 'my-test.txt' }, 'Hello World!')

// add a file called my-stream-test.txt from a stream
var entry = pack.entry({ name: 'my-stream-test.txt' }, function(err) {
  // the stream was added
  // no more entries
  pack.finalize()
})
myStream.pipe(entry)

// pipe the pack stream somewhere
pack.pipe(process.stdout)

Broken in New Version of Safari

On iOS 10 (Mobile Safari 10.0) and Desktop Safari 9.1.2, I get:

Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped?

This: decodeOct(buf, 148) is returning NaN.

Extract documentation error

In the README.md file for the Extracting example you have

next(); // ready for next entry

If you do this it produces an Error that next() is not defined.
Looking at you test code and a simple test I wrote this line should say

callback(); // ready for next entry

extract halts unexpectedly if predecessor stream is unzipped

hi,

i encountered a bug with this package:

if used with gunzip-maybe or zlib.createGunzip() the output stops after a few file entries
if the content is pre-unzipped (the same tar.gz file stored uncompressed) everything goes well
if the stream is consumed in the extract.on('entry') handler everything goes well

https://gist.github.com/chpio/6d0cedae59d8416d0aed

ohh, i don't know if it's relevant, i noticed the stops occur after a large file entry.

How do errors work?

Doesn't look like a tar package is an EventEmitter, which means no 'error' event. So where do errors go? How do you handle them?

No stream callback on a specific combination of files and sizes

Hey, I've experienced a nasty bug. When supplying an input stream of length A and there is already a directly supplied content of length B, there is no stream callback. Working example:

var fs = require("fs");
var tar = require('tar-stream');

var fileName = "demo";
var pack = tar.pack();

fs.writeFileSync(fileName, new Array(1674).join("X"), "utf-8");
pack.entry({ name: "specific-length.txt" }, new Array(13399).join("X"));

fs.stat(fileName, function (err, stat) {
    if (err) {
        return console.log(err);
    }

    var packOptions = {
        mode: stat.mode,
        mtime: stat.mtime,
        name: fileName,
        size: stat.size
    };

    var rs = fs.createReadStream(fileName);
    var entry = pack.entry(packOptions, function (err) {
        console.log("This never happens");

        pack.finalize();
        pack.pipe(fs.createWriteStream("output.tar"));
    });
    console.log("We pipe the stream here and expect a callback...");
    return rs.pipe(entry);
});

13399 followed by 1674 are the lengths I stumbled upon. I presume this happens in specific intervals based on stream buffer sizes and such. Looking into source, seems there is a disconnect between Sink's and Pack's drain dynamics. Callback is saved and never called. Didn't understand the code enough to actually fix it. :-(

Tested on node v.10.22, tar-stream 0.2.5.

Docs Resume Call

On this item on the README.md: https://github.com/mafintosh/tar-stream/blob/master/README.md#extracting

Shouldn't you call 'resume' after the observation of end/data events?
Calling resume before can possibly flush before the end event is registered.

Thanks for this library!

Bad file type

Hi,

With the following archive :
http://download.oracle.com/otn-pub/java/jdk/7u79-b15/server-jre-7u79-windows-x64.tar.gz
The file type of the first entry is wrong. It should a directory, not a file :
jdk1.7.0_79/ file
jdk1.7.0_79/COPYRIGHT file
jdk1.7.0_79/LICENSE file
jdk1.7.0_79/README.html file
jdk1.7.0_79/release file

Here's my small program :

var tar = require('tar-stream');
var fs = require('fs');

var extract = tar.extract();
extract.on('entry', function(header, stream, callback) {
  console.log(header.name + ' ' + header.type);
  stream.on('end', function() {
    callback() // ready for next entry
  })
  stream.resume() // just auto drain the stream
});

extract.on('finish', function() {
  // all entries read
  console.log('finish extract');
});

var readStream = fs.createReadStream('jdk.tar');
readStream.on('error', function(err) {
        console.log('read error', err);
});

readStream.pipe(extract);

convert windows style paths

Symlinks size is not consistent

There's different behaviour on the entry size field of symlinks when the location is set as stream or in the header, having sometimes a symlink with zero size and other times a symlink to nowhere but with a size the length of the destination path. It's needed to know what says the spec and make it work homoneneously.

Packing go binaries prevents entry callback from being called

This is the strangest bug, but I've narrowed it down to go binaries. tar-stream seems to silently fail to add them. Anything else I add to the tar works fine.

Steps to reproduce:

Create file test.go:

package main

import "fmt"

func main() {
	fmt.Println("test")
}

Build go file:

$ go build test.go

Create file test.js:

var fs = require("fs");
var tar = require('tar-stream');

var fileName = "./test";
var pack = tar.pack();


fs.stat(fileName, function (err, stat) {
    if (err) {
        return console.log(err);
    }

    var packOptions = {
        mode: stat.mode,
        mtime: stat.mtime,
        name: 'test',
        size: stat.size
    };

    var rs = fs.createReadStream(fileName);
    var entry = pack.entry(packOptions, function (err) {
        console.log("This happens");
        pack.finalize();
    });
    console.log("We pipe the stream here and expect a callback...");
    return rs.pipe(entry);
});

Expected result: This happens gets logged and the pack is finalized.
Actual result: This happens is never logged and pack is not finalized

Unable to set file mode

stream.entry({ name: `files/test.html`, mode: parseInt('777', 8) }, "test......");

But, result in :

➜  la files
total 16
-rwxr-xr-x@ 1 tony  staff   1.4K  7 31 11:30 test.html

Readable-stream dependency is not fixed to Streams2

Readable-stream is versioned so that ~1.0.0 is Streams2, ~1.1.0 is Streams3.

The prefix for readable-stream dependency has changed to ^ in 60cef01 which allows tar-stream to depend on the 1.1.x Streams3 versions of readable-stream.

Remove unused 'bl' package.

BufferList is not used anywhere. However 'bl' is still listed in package.json.

https://nodesecurity.io/advisories/596

stream.destroy in both pack and extract

Multivolume archive support

Hello, it would be awesome to support this feature. As described on http://www.gnu.org/software/tar/manual/html_node/Standard.html.

example tar 1000MB folder into 500MB volumes...

is it possible?

"Invalid tar header" error on Docker

As shown on this pull-request I'm converting a cpio file generated with the get_init_cpio tool of the Linux kernel to a tar file. The generated tar file works correctly with vagga, but it crash on Docker with a "Invalid tar header" error, and the same file makes file-roller (Ubuntu/Gnome compressed files manager) to core dump.

Inspecting the content of the generated file directly with the tar command I get the next output:

[piranna@Mabuk:~/Proyectos/NodeOS]
 (vagga) > tar -tvf node_modules/nodeos-barebones/out/latest
tar: Sustituyendo `.' por un nombre miembro vacío
d--x--x--x 0/0               0 2015-10-28 12:05 
-r-xr-xr-x 0/0          651800 2015-10-28 12:05 lib/libc.so
lr-xr-xr-x 0/0               8 2015-10-28 12:05 lib/ld-musl-x86_64.so.1 -> libc.so
tar: Saltando a la siguiente cabecera
-r--r--r-- 0/0         1250352 2015-10-28 12:05 lib/libstdc++.so.6.0.17
lr--r--r-- 0/0              20 2015-10-28 12:05 lib/libstdc++.so.6 -> libstdc++.so.6.0.17
tar: Saltando a la siguiente cabecera
l--x------ 0/0               9 2015-10-28 12:05 init -> bin/node
tar: Un bloque de ceros aislado en 25824
tar: Saliendo con fallos debido a errores anteriores

[piranna@Mabuk:~/Proyectos/NodeOS]
 (vagga) > echo $?
2

There are two missing entries (the ones with the tar: Saltando a la siguiente cabecera message) corresponding to the lib/libgcc_s.so.1 and the bin/node files. Their stat object as given by cpio-stream are:

{ ino: 724,
  mode: 33060,
  uid: 0,
  gid: 0,
  nlink: 1,
  mtime: Wed Oct 28 2015 12:05:19 GMT+0100 (CET),
  size: 96712,
  devmajor: 3,
  devminor: 1,
  rdevmajor: 0,
  rdevminor: 0,
  _nameLength: 18,
  _sizeStrike: 96712,
  _nameStrike: 18,
  name: 'lib/libgcc_s.so.1' }
{ ino: 727,
  mode: 33133,
  uid: 0,
  gid: 0,
  nlink: 1,
  mtime: Wed Oct 28 2015 01:40:48 GMT+0100 (CET),
  size: 11216736,
  devmajor: 3,
  devminor: 1,
  rdevmajor: 0,
  rdevminor: 0,
  _nameLength: 9,
  _sizeStrike: 11216736,
  _nameStrike: 10,
  name: 'bin/node' }

I'm not sure what could be the reason for this problem, since seems it's not related with file name length or file size or permissions or being binary ones... :-/ You can find the tar file if you want to inspect it yourself at https://dropfile.to/gWBaf

Make tar-stream a TransformStream

I mentioned this previously, but I'd like to be able to use tar-stream in this manner (or similar):

intar.pipe(tarStream(onentry, onfinish)).pipe(transformedTar)

I made a feeble attempt to get this to work outside of tar stream in dockerify lazy-stream branch, but had only mixed success.

The tests that all passed previously now only pass in node 0.10.
I hope though this can server as a start to figure out how the above could be achieved.

FATAL ERROR: JS Allocation failed - process out of memory

when doing

tar.pack('folder-a').pipe(tar.extract('folder-b'));

where the contents of folder-a is over roughly 1GB

I get FATAL ERROR: JS Allocation failed - process out of memory

Can finalize be called before entry streams are done?

Ideally, I would think I could add as many entries with streams as I want, then call finalize right afterward and everything would work (ie automatically wait for all the input streams to complete before actually creating the package). The documentation seems to imply that this isn't the case tho. Can I or can't i do that? If not, why not? Can we make it so finalize can be called without explicitly waiting for the streams?

Buffer constructor runtime deprecation - this package emits a warning on Node 10

Hey from Node.js here!,

Starting on Node 10 this package will emit deprecation warnings. See this guide on what you should do in order to migrate to Buffer.alloc/Buffer.from.

See nodejs/node#19079 for discussion around this change and why we can't make new Buffer work

Generated tar fails to be unpacked including a unicode directory with some specific pattern

The result.tar generated by the following codes fails to be unpacked,

const tar = require('tar-stream');
const writeStream = require('fs').createWriteStream('result.tar');
const pack = tar.pack();
pack.pipe(writeStream);

// the specific pattern I found:
// here, a '0' represents an ASCII character and a '哈' represents a unicode character
const directory = './0000000哈哈000哈哈0000哈哈00哈00哈0哈哈哈哈哈0哈/0000哈哈哈/';
const name = directory + 'somefile.txt';
const entry = pack.entry({ name }, 'any text', (...args) => console.log(args));

pack.finalize();

showing this after executing tar -xf result.tar on terminal

tar: Ignoring malformed pax extended attribute
tar: Error exit delayed from previous errors.

or something like this when double-clicked on Mac OS

Error 1: Operation not allowed

I'm working on Mac OS and have tried the codes on node of both version 6.9.1 and 7.5.0, producing the same result.

tar-stream works perfectly with almost all other unicode patterns so I think there might be a bug?

extraction documentation error?

In the docs on extraction, the line pack.pipe(extract); is at the end of the example, but pack isn't defined and it doesn't make sense to me. Is that supposed to be there?

Unexpected end of data should raise error when extracting

When the input data ends while tar-stream waits for more data to extract, it doesn't raise an error. This means that a truncated tar file will extract without reporting errors, while creating incomplete files.

If there is still missing data for a file or a partially read header, an error should be raised instead.

This issue is likely also the cause for #71. If only a short file is processed (shorter than a tar header), no errors are raised since the partial header data is never processed.

Can directories be added to the archive?

If I want to add a directory, do I have to traverse the whole directory tree and add in the directories and files inside there, or is there an easier way?

Tar stream never closes

Hey there, the tar .pipe() command doesn't work as I expect, which would be similar to fs.createReadStream().pipe() ... The problem is that it never closes.

So for example, this program using tar-stream will hang.

var tar = require('tar-stream'),
    spawn = require('child_process').spawn

var pack = tar.pack();

pack.entry({name: 'hello.txt'}, 'Hello world!')

var cat = spawn('cat');

pack.pipe(cat.stdin);
cat.stdout.pipe(process.stdout);

pack.on('end', function () {
    console.log('This is never fired.');
});

While this program with fs.createReadStream will close as expected.

var fs = require('fs'),
    spawn = require('child_process').spawn

var fileStream = fs.createReadStream('test.js');

var cat = spawn('cat');

fileStream.pipe(cat.stdin);
cat.stdout.pipe(process.stdout);

fileStream.on('end', function () {
    console.log('This does fire!');
});

Sparse file support?

It appears that your streaming tar parser does not support sparse files. Am I mistaken?

It it helps, GNU tar implements sparse support at:
http://git.savannah.gnu.org/cgit/tar.git/tree/src/sparse.c?id=63f2e969ddc162da7ae49a955bba9c6a2a0e77dc#n354

My question is relevant to this:
http://stackoverflow.com/questions/40166300/cross-platform-sparse-file-compression-with-nodejs

Also poking at cpio-stream for sparse support options too: finnp/cpio-stream#9

add tests

File encoding?

Why does this code, https://github.com/aaricpittman/yarn-pack-test/blob/master/stand-alone.js, corrupt the image files when the files are read with the 'binary' encoding and fine when I don't specify the encoding?

8GB Limit?

Can you confirm (or not) that there is an 8GB limit (per file) for TAR creation with the implementation of TAR that you use (ustar is it?)

If so, is there any way round this that you know of? Or another library i could use?

Many thanks

packing streams

it seems that pack.js line 98 would always fail to pack streams unless the size is set in the data.

I maintain ctalkington/node-archiver and would like to consolidate efforts on the creation of tar archives. would you be open to an alternative method that collects the stream before writing the header? in cases where the size isn't passed along and the source is a stream? if so, i can work up a PR.

Follow symlinks

Is there an option to follow symlinks, similar to tar's -L or -h option?

Pack and serve a glob directory as .tar.gz

I had a bit of trouble figuring out how to serve a directory as a .tar.gz in an Express app. Here is a snippet on how I accomplished it.

As a gist, https://gist.github.com/MadLittleMods/7eedb4001c52acec104e91dbd80618b5

const Promise = require('bluebird');
const path = require('path');
const fs = require('fs-extra');
const stat = Promise.promisify(fs.stat);
const glob = Promise.promisify(require('glob'));
const tarstream = require('tar-stream');
const zlib = require('zlib');
const express = require('express');

function targzGlobStream(globString, options) {
  const stream = tarstream.pack();

  const addFileToStream = (filePath, size) => {
    return new Promise((resolve, reject) => {
      const entry = stream.entry({
        name: path.relative(options.base || '', filePath),
        size: size
      }, (err) => {
        if(err) reject(err);
        resolve();
      });

      fs.createReadStream(filePath)
        .pipe(entry);
    });
  };

  const getFileMap = glob(globString, Object.assign({ nodir: true }, options))
    .then((files) => {
      const fileMap = {};
      const stattingFilePromises = files.map((file) => {
        return stat(file)
          .then((fileStats) => {
            fileMap[file] = fileStats;
          });
      });

      return Promise.all(stattingFilePromises)
        .then(() => fileMap);
    });


  getFileMap.then((fileMap) => {
      // We can only add one file at a time
      return Object.keys(fileMap).reduce((promiseChain, file) => {
        return promiseChain.then(() => {
          return addFileToStream(file, fileMap[file].size);
        });
      }, Promise.resolve());
    })
    .then(() => {
      stream.finalize();
    });

  return stream.pipe(zlib.createGzip());
}

const app = express();

app.get('/logs.tar.gz', function (req, res) {
  const logDirPath = path.join(process.cwd(), './logs/');
  const tarGzStream = targzGlobStream(path.join(logDirPath, '**/*'), {
    base: logDirPath
  });

  res
    .set('Content-Type', 'application/gzip')
    .set('Content-Disposition', 'attachment; filename="logs.tar.gz"');

  tarGzStream.pipe(res);
});

Thanks to #64, #25

TypeError: Cannot read property 'corked' of undefined

when trying to do the following:

            var readStream = fs.createReadStream(tarballPath);
            var extractStream = tar.extract(nodeModulesPath);

            readStream
                .pipe(zlib.createGunzip())
                .pipe(extractStream);

            readStream.on('error', callback);
            extractStream.on('error', callback);
            extractStream.on('finish', callback);

I keep getting the following error in some cases. The only difference between the case where things fail and where things don't fail is when the tarball already exists on disk before running the script (doesn't fail) and when i create the tarball and try to extract it (but only after the finish event on the tar.pack(nodeModulesPath)). I've run a fs.existsSync call before the code above to confirm that the tarball exists and the node modules path does not.

TypeError: Cannot read property 'corked' of undefined
    at Writable.end (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:429:12)
    at emptyStream (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:18:5)
    at onheader (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:135:34)
    at Extract._write (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:207:8)
    at doWrite (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:279:12)
    at writeOrBuffer (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:266:5)
    at Writable.write (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:211:11)
    at write (_stream_readable.js:601:24)
    at flow (_stream_readable.js:610:7)
    at Gunzip.pipeOnReadable (_stream_readable.js:642:5)

This appears to be caused by the fact that _writableState is on the _parent property of the Source instance and not on the stream object itself.

I tried adding a line to the source instantiation to pass this property through so it's available in s.end(), but then I get the following error:

Error: write after end
    at writeAfterEnd (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:161:12)
    at Writable.write (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:208:5)
    at Writable.end (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:426:10)
    at Extract._write (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:203:12)
    at Extract._continue (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:171:28)
    at oncontinue (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:62:10)
    at onheader (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:143:5)
    at Extract._write (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:207:8)
    at Extract._continue (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:171:28)
    at oncontinue (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:62:10)

My first intuition is that https://github.com/mafintosh/tar-stream/blob/master/extract.js#L31 should read PassThrough.call(this, self);, but that didn't fix the problem either (but still might be something you want to add)

I tried checking the value of _writableState all the way through the instantiation stack. It's defined at the end of the constructor function for Writable(), but is undefined after the Writable.call(options) in the instantiation function of Duplex.

Any ideas on what could be causing this and how to fix it?

Add a more prominent link to tar-fs to the readme

I only found it by accident. Maybe a Related section or something.

tar-stream does not handle directory entries well when modifying existing tarballs

Running a slightly modified example "Modifying existing tarballs" code on the compressed tar file http://registry.npmjs.org/which/-/which-1.0.5.tgz which other tar tools say are valid

Generates

processing entry package/bin/

_stream_readable.js:476
  dest.on('unpipe', onunpipe);
       ^
TypeError: Cannot call method 'on' of undefined
    at PassThrough.Readable.pipe (_stream_readable.js:476:8)
    at null.<anonymous> (bug.js:27:9)
    at EventEmitter.emit (events.js:106:17)
    at onheader (node_modules\tar-stream\extract.js:101:9)
    at Extract._write (node_modules\tar-stream\extract.js:172:7)
    at doWrite (_stream_writable.js:226:10)
    at writeOrBuffer (_stream_writable.js:216:5)
    at Writable.write (_stream_writable.js:183:11)
    at write (_stream_readable.js:583:24)
    at flow (_stream_readable.js:592:7)

So this is either a documentation / example improvement to outline that pack.entry returns returns a stream or nothing depending on the header.type field
or
A bug that extract.on('entry')
does seem to return a valid (but empty) stream to read from but pack.entry does not return a Sink point to read this empty stream because it is a directory

Modified example code below

// Location of problem tar file http://registry.npmjs.org/which/-/which-1.0.5.tgz


var tarStream = require('tar-stream');
var zlib = require('zlib');
var fs = require('fs');

var input = "which-1.0.5.tgz"; 
var output = "rewritten_which-1.0.5.tgz";

var inputTarfile = fs.createReadStream(input); 
var outputTarfile = fs.createWriteStream(output);
var gunzip = zlib.createGunzip();
var gzip = zlib.createGzip();

// Stream copy the tar.gz
var tarExtract = tarStream.extract();
var tarPack = tarStream.pack();

tarExtract.on('entry', function(header, stream, callback) {
    console.log('processing entry ' + header.name);
    // write the unmodified entry to the pack stream
    stream.pipe(tarPack.entry(header, callback));
});

tarExtract.on('finish', function() {
    // all entries copied, add new entry
    tarPack.finalize();
    });

//read input
inputTarfile.pipe(gunzip).pipe(tarExtract);

// write output
tarPack.pipe(gzip).pipe(outputTarfile);

Possible a better example for the documentation if this is the approach to fixing this bug is taken

var extract = tar.extract();
var pack = tar.pack();
var path = require('path');

extract.on('entry', function(header, stream, callback) {
    // let's prefix all names with 'tmp'
    header.name = path.join('tmp', header.name);
    var entrySink = pack.entry(header, callback);
    // If no entrySink was returned then the entry was not a 'file' or 'contigious-file'
    // therefore there is nothing to pipe data in to
    if ( typeof entrySink != "undefined") {
        // write the new entry to the pack stream
        stream.pipe(entrySink);
    }
});

extract.on('finish', function() {
    // all entries done - lets finalize it
    pack.finalize();
});

// pipe the old tarball to the extractor
oldTarball.pipe(extract);

// pipe the new tarball the another stream
pack.pipe(newTarball);

How to read sub folder/files ?

Hi, Sorry to raise issue here but I think you guys maintaining this module would understand this best,
So I thought I might get some help from here.

I am using this module to read a .tgz file, and to read every files's content from this tar file.
I am kind of stuck here

here is what I am trying to do :

.tgzfile structure:

root_folder
|-- _sub_folder1
|        |-- file1
|        |-- file2
|        ....
|-- _sub_folder2
...

(in coffee script) Read every sub folder and file

extract = require('tar-stream').extract()
fs.createReadStream(FILE_PATH).pipe(zlib.createUnzip()).pipe(extract)
    .on 'entry', (header, stream, callback) ->
        console.log "header -->", header.name, header.size, header.type
        if hearder.type == "directory"
             #go inside this directory and find all files
             #read content of every file......
             #           what should I do here ??
        else if hearder.type == "file"
             #read content of  this  file......

        stream.resume()
    .on 'error', ->
        console.log "error"
    .on 'finish', ->
        console.log "finished"

out put:

header --> offline_2014-08-06_16:54:28/ 0 directory
this entry end

What version of the tar spec does tar-stream implement?

how to zip file after tar

      const stream = tar.pack() 
      stream.entry({ name: '/foo/test.txt' }, 'hello');
      stream.finalize();

Now I have a tar stream, how to gz it?

Writing object as entry

Bumped my head against the wall for a while until I figured out that object serialization doesn't happen automatically when writing objects to an entry.

We could do something similar to

tar-stream/pack.js

Line 127 in 8e3b174

if (typeof buffer === 'string') buffer = new Buffer(buffer)

and call JSON.stringify on the object to prevent the user from committing an empty buffer to the file?

mafintosh / tar-stream Goto Github PK

tar-stream's Issues

Recommend Projects

Recommend Topics

Recommend Org