c2fo / fast-csv Goto Github PK

View Code? Open in Web Editor NEW

1.6K 1.6K 205.0 5.29 MB

CSV parser and formatter for node

Home Page: http://c2fo.github.io/fast-csv

License: MIT License

JavaScript 0.79% TypeScript 99.21%

csv csv-format csv-parser csv-parsing nodejs stream typescript

fast-csv's People

Contributors

Stargazers

Watchers

Forkers

doug-martin shane-walker pagameba fluffybunnies yxdh4620 ruzz311 montanaflynn achura chrisantaki wei-lee derjust xavi- yuvalk junwang1216 technotronicoz faradayio synoa adammichaelwilliams notslang alexandrusavin jpbufe3 schaitanya bernhardjung barenko shruthyhn miguelramosfdz medhajjej11 lapisit felipevalencia twinkle023020 morwalz segmentio rickwargo ironbuzz echorohit aaronpelzer jim-sheng rike422 mrawdon dewdad buchslava pmp0x jiraiya972 icedawn mgamarra haha7254 justadropofwater jakobloekke rahulroshan96 coreyallen sagansystems olleolleolle amcel1288 pietercolpaert josemigallas idris harendranathvegi9 giorfindel kuldeepkeshwar wieseljonas xrayv5 mother bryant1410 pierrelouisd4j dandv awsp jimeka adamvoss igor-matsuzaki janpansa metinkale ryuwho itay4 05ttxx morayg igorshilkov emreading ziggreen dc-developer peet xiongwei-git rallyfoundation mithilmanjrekar ianthorp aboutaaron pandinosaurus lkjx77 chrisjhoughton mcrocks999 sueroddick search-good-project xuender adn-tm montera82 pmiotto lukev111 rmelocondor rubenamaury silvaangelo cbrittingham

fast-csv's Issues

Doc: Example output for writeToString() with transform() is wrong

csv.writeToString(
    [
        {a: "a1", b: "b1"},
        {a: "a2", b: "b2"}
    ],
    {
        headers: true,
        transform: function (row) {
            return {
                A: row.a,
                B: row.b
            };
        }
    },
    function (err, data) {
        console.log(data); //"a,b\na1,b1\na2,b2\n"
    }
);

I believe console line should be:

        console.log(data); //"A,B\na1,b1\na2,b2\n"

Maximum Call Stack Size Exceed When Piping into another stream

It appears that when using fast-csv with the listener 'data' the stream cannot be piped into another stream, otherwise, you get this error:

/Users/bistsls/projects/vlh/blah/node_modules/fast-csv/lib/parser/parser_stream.js:287
    emit: function (event) {
                   ^
RangeError: Maximum call stack size exceeded

Here is the code I am using:

var myCsvStream = fs.createReadStream('csv_files/myCSVFile.csv');
var csv = require('fast-csv');
var myData = [];

var myFuncs = { 
 parseCsvFile: function (filepath) {

  var csvStream;

  csvStream = csv
  .parse({headers: true, objectMode: true, trim: true})
  .on('data', function (data) {

       myData.push(data);

  })
  .on('end', function () {

    console.log('done parsing counties')
  });

  return csvStream;

 }
}

myCsvStream
.pipe(myFuncs.parseCsvFile())
.pipe(process.stdout);

The process.stdout is just so I can see that the data can continue on to the next stream, however, when adding pipe(process.stdout) or even a through2 duplex stream I get this maximum callstack reached error.

Issue is noted in SO post found here:
http://stackoverflow.com/questions/30620756/node-streams-get-maximum-call-stack-exceeded?noredirect=1

Giving objects to writeToString with {headers: false} fails

csv.writeToString(
  [
    {a: "a1", b: "b1"},
    {a: "a2", b: "b2"}
  ], {
      headers: true
  }, function (err, data) {
    console.log('err:', err, 'data: |' + data + '|');
  }
);

gives

err: null data: |a,b
a1,b1
a2,b2|

but

csv.writeToString(
  [
    {a: "a1", b: "b1"},
    {a: "a2", b: "b2"}
  ], {
      headers: false
  }, function (err, data) {
    console.log('err:', err, 'data: |' + data + '|');
  }
);

gives

err: null data: |
|

Intermittently fails for large csv file

Hi,

I have a csv file with 15000 entries (1.3mb) . Csv parsing fails randomly on AWS amazon linux box, but then it passes sometimes cleanly parsing all the rows correctly.There is no issue of memory, the machine instance has 4Gb memory. The error received is:

Parse Error: expected: '\"' got: 'undefined'

Sometimes the parser cannot read the complete line and so it fails for matching ending quote in parser.js:76 parsedEscapedItem function call.
The strange thing is the issue happens intermittently so it can actually read correctly few times.

the only option i pass on to fromStream method are {ignoreEmpty:true} rest are all defaults.

Any pointers would be helpful.
thanks.

Dose comment supported?

Hello:
I'm a new user to fast-csv , and I want to know that:
Dose comment supported? Just as csv:
comment Treat all the characteres after this one as a comment, default to '#'.
I just haven't seen any similar option, thanks.

The new file is up to 65,535 lines , to how to modify the ceiling ?

Like title.

I my data has more then 100000. but only can changed 65535 lines.

Slow performance generating CSV from array / object

Hi,

I am trying your library to write CSV data files from data I have in memory and found very slow performance. At first I thought it was a disk acces issue but then I coded a small benchmark separating csv generation from disk writing and found that the main delay is when the csv is generated.

I coded an alternative benchmark just nesting two FORs and appending commas and endlines and it was about 4 times faster.

Mostly I'm interested in knowing if I'm doing something wrong in the way I'm using your lib or if this is just it's normal performance.

You can see my tests here: http://pastebin.com/McwhxY9B , http://pastebin.com/cRKZRqp3 . It just writes 1 million rows with 4 32-bit integer each. Test data generation isn't optimal but it's enough for the purpose of this test, I think.

In my local computer I got about 8 seconds with the lib and 2 seconds with my alternative to generate the million rows. Tried several times and results were consistent. Time ratio (~4 to 1) was also consistent when testing with different number of rows.

error event

Hi,
the documentation states that if fast-csv runs into a parse error the "parse-error" event will be emitted.

Based on the source code (and my tests :) ) the "error" event will be emitted.

Would be nice if either the event could be changed or the documentaion updated

fromStream?

I'm using the fromStream example in the README, and I get this error...

Object # has no method 'fromStream'

"delimiter" is misspelled

Reading options.delimeter here means that you can't override the delimiter without misspelling it.

`data` or `record` deprecated?

The parser stream code clearly emits record: https://github.com/C2FO/fast-csv/blob/master/lib/parser/parser_stream.js#L192-L197

And the tests clearly all use record, e.g. https://github.com/C2FO/fast-csv/blob/master/test/fast-csv.test.js#L461 (or seem to use some weird mix of data and record)

But the Readme notes that record is deprecated: https://github.com/C2FO/fast-csv/blame/master/README.md#L37

So I'm confused whether the emitRecord function in parser_stream is incorrect or the Readme?

New release including headers fix?

Do you plan on making a new release soon?

0.5.4 on npm does not include the pull request I made that was merged in Dec 2. It would be great if I could depend on a fixed, released version to get a working fast-csv.

Request: add callback to each row item, so that it can wait before sending next row

From the example:

var csvStream = csv()
    .on("data", function(data){
         console.log(data);
    })
    .on("end", function(){
         console.log("done");
    });

stream.pipe(csvStream);

It would be nice to have this as an option:

.on('data', function(data, callback) {} );

This way you could do stuff to each row at a time and then not start processing the next row until that was done.

Example use:

.on( 'data', function(data, callback) {
    getWeatherForDay( data, function( err, temp ) {  
               writeDayAndTemp();
               callback(null); 
    });
});

Thanks for this library it is very useful!

Bug in writer

Trying the example:
csv
.fromPath("my.csv", {headers: true, objectMode: true})
.pipe(csv.createWriteStream({headers: true}))
.pipe(fs.createWriteStream("out.csv", {encoding: "utf8"}));

formatter.js line 88: headers = hash.keys(item);
is expecting an object but item is a string so the script crashes:

TypeError
at Object.keys (/Users/tbentz/code/nodejs/extracts/extract_process/node_modules/fast-csv/node_modules/object-extended/index.js:116:23)
at Readable.writer.write (/Users/tbentz/code/nodejs/extracts/extract_process/node_modules/fast-csv/lib/formatter.js:88:36)

Pause does not work

Why is this important?
MongoDB is in many cases the standard to node when it comes to Databases. However if events are emitted faster than Mongoose can handle them, It can result in duplication and or index errors and possibly more

Attempts at solutions and work arounds

Pausing the Stream Directly inside of the "record" event

stream = fs.createReadStream(filepath);
csvstream = csv.fromStream(stream).on("record", function(data){
stream.pause();
console.log("shouldn't see this more than once");
})

stream.pause() doesn't seem to do anything
2) Going into old mode, then pausing directly

stream = fs.createReadStream(filepath);
stream.pause();
csvstream = csv.fromStream(stream).on("record", function(data){
stream.pause();
console.log("shouldn't see this more than once");
})
stream.resume();

stream.pause() doesn't seem to do anything
3) using pause method of the Parser_Stream
https://github.com/C2FO/fast-csv/blob/master/lib/parser_stream.js#L156

stream = fs.createReadStream(filepath);
csvstream = csv.fromStream(stream).on("record", function(data){
csvstream.pause();
console.log("shouldn't see this more than once");
})

csvstream.pause() doesn't seem to do anything
4) combining both
https://github.com/C2FO/fast-csv/blob/master/lib/parser_stream.js#L156

stream = fs.createReadStream(filepath);
stream.pause();
csvstream = csv.fromStream(stream).on("record", function(data){
stream.pause();
csvstream.pause();
console.log("shouldn't see this more than once");
})
stream.resume();

callback doesn't seem to do anything
5) Using csvstream to resume

stream = fs.createReadStream(filepath);
stream.pause();
csvstream = csv.fromStream(stream).on("record", function(data){
stream.pause();
csvstream.pause();
console.log("shouldn't see this more than once");
})
csvstream.resume();

Process Doesn't start
https://github.com/C2FO/fast-csv/blob/master/lib/parser_stream.js#L163
6) Attempt to see if csvstream.pause() even matters (which it doesn't)

stream = fs.createReadStream(filepath);
stream.pause();
csvstream = csv.fromStream(stream).on("record", function(data){
stream.pause();
console.log("shouldn't see this");
})
csvstream.pause();
stream.resume();

Doesn't work
7) Using from path

csvstream = csv.fromPath(filepath).on("record", function(data){
csvstream.pause();
console.log("shouldn't see this more than once");
})

Doesn't work
8) Checking if from path at least prevents it

csvstream = csv.fromPath(filepath).on("record", function(data){
console.log("shouldn't see this");
})
csvstream.pause();

Doesn't work

Double Quoted Values invalid?

Excel's way to escape double quotes is like so.

"He said ""Hello World"""

Unfortunately, it looks like the regex checking for valid rows does not account for this. I'd take a stab at the regex for this, but I'd definitely screw it up. Thoughts on this? Agree that this is a valid row?

Missing rowDelimiter at end of last line

The last line doesn't get a finishing "\n", which messes up legacy code.

What's the best way to add that? (using a stream with transformation).

Didn't seem to be able to find the right place to a write or a push.

includeEndRowDelimiter ignored on windows

Hi, I'm using the latest version (0.5.1) on a node-webkit project.

The same code is used in windows and linux.

In linux the generated csv when I use "includeEndRowDelimiter" : true, I can see the rowDelimiter carachter at the end of the file, in windows the carachter is missing.

It seems that the _flush method in the formatter_stream.js is not called.

on('end') not being triggered

I am reading a csv file and streaming the data through a mongoose.js model.

myFile.js

var stream = fs.createReadStream("name of file"); fcsv(stream) .on('data', function(data) { ModelName.find(query, function(err, docs) { console.log('docs', docs); }); }) .on('end', function() { console.log('done'); }) .parse();

The script runs and a list of docs is printed out.
But the on('end') is not triggered.

What can I do to call the on('end')?

How to parse file to particular row or upto particular data is received

Just wanted to ask how we can parse half of file

Error when parsing csv file with double quotes inside (comma as delimiter)

How to handle the exception with the invalid csv file like double quotes in the content?

I want to keep double quotes finally or catching the exception and ignoring the invalid row is acceptable as well, like the following example, I want first column as

blah...Start by "stamping" powder on the outer edges of face and ...blah

For example, the "stamping" in the csv file:

"blah...Start by "stamping" powder on the outer edges of face and ...blah","blah","blah blah","blah blah blah"

Error:

error: Parse Error: expected: '"' got: 's'. at 'stamping"  Error: Parse Error: expected: '"' got: 's'. at 'stamping"
    at parseEscapedItem (/node_modules/fast-csv/lib/parser.js:73:19)
    at ParserStream.parseLine [as parser] (/node_modules/fast-csv/lib/parser.js:142:30)
    at ParserStream._parseLine [as _parse] (/node_modules/fast-csv/lib/parser_stream.js:109:25)
    at ParserStream.extended.extend._transform (/node_modules/fast-csv/lib/parser_stream.js:166:29)
    at ParserStream.Transform._read (_stream_transform.js:179:10)
    at ParserStream.Transform._write (_stream_transform.js:167:12)
    at doWrite (_stream_writable.js:221:10)
    at writeOrBuffer (_stream_writable.js:211:5)
    at ParserStream.Writable.write (_stream_writable.js:180:11)
    at write (_stream_readable.js:583:24)

the error message follows the above one

verbose: Shutting down Aura...
verbose: Closing Http Server
error: Error: undefined
    at Object.serverErrorOccurred [as 500] (/config/500.js:19:28)
    at ServerResponse.respond500 [as serverError] (/node_modules/aura/lib/aura/http/hooks/request.js:127:23)
    at Domain. (/node_modules/aura/lib/aura/express/load.js:56:17)
    at Domain.EventEmitter.emit (events.js:95:17)
    at ParserStream.EventEmitter.emit (events.js:70:21)
    at spreadArgs (/node_modules/fast-csv/lib/parser_stream.js:20:21)
    at ParserStream.extended.extend.emit (/node_modules/fast-csv/lib/parser_stream.js:217:13)
    at ParserStream.onerror (_stream_readable.js:518:12)
    at ParserStream.EventEmitter.emit (events.js:95:17)
    at spreadArgs (/node_modules/fast-csv/lib/parser_stream.js:20:21)

pipe to a transform stream

I'm a little new to node streams, but I don't see how I can pipe the csvStream to another transform stream, since the data output is the stringified version of the record (not the parsed version).

instantiation?

How do you invoke this thing?

Stack errors with very large CSV files

I'm getting a RangeError when I read very large - 570k - CSV files.

Reading from myfile.csv

/..../node_modules/fast-csv/lib/parser/parser_stream.js:277
emit: function (event) {
^
RangeError: Maximum call stack size exceeded

I do not have the same problem when I write the file of this size - only when I turn around and read it back.

Here's the pattern I'm using :

var file_name = process.argv[2];
var instream = fs.createReadStream(file_name);
var readStream = csv.parse({headers:true});

readStream.on('data', function(data) {
// transform the data and write it to a new file
});

readStream.on('end', function() {
console.log('\ndone.');
});

instream.pipe(readStream);

end event not triggered

I was running v0.5.4 on node 0.10.32 without issues, but then decided to update to 0.5.6. Now 'record' and 'data-invalid' are triggered as expected, but when the file is done, no 'end' event occurs. I tried this with CSV files from different sources, so I'm sure there's nothing wrong with the input.

I also updated node to 0.12.0, but the problem persists. Rolling back to fast-csv v0.5.5 resolved it tho.

My use case:

var csv = require('fast-csv');
var Promise = require('bluebird');

var promisecsv = Promise.method(function(path, options) {
    return new Promise(function(resolve, reject) {
        var records = [];

        csv
            .fromPath(
                path,
                options
            )
            .validate(function(data) {
                // do stuff here, return true / false
            })
            .on('record', function(record) {
                console.log(records.length);
                records.push(record);
            })
            .on('data-invalid', function() {
                // validation failed, do nothing
            })
            .on('end', function() {
                console.log('parsing done: ' + records.length);
                resolve(records);
            })
        ;
    });
});

module.exports = promisecsv;

Piping output through zlib

I am attempting to output a compressed csv like this:

var csvStream = csv.createWriteStream({headers:true});
var writableStream = fs.createWriteStream("foo.csv.gz");
csvStream.pipe(zlib.createGzip()).pipe(writableStream);
csvStream.write({a:1,b:2});
csvStream.end();

I am finding that the file comes out zero length. If I remove the zlip stream in the middle things work perfectly. Any ideas?

Thanks!

Chris

Typo in README

Seems that there is typo in the documentation, you are calling the method fs.createWritableStream (does't exist on the default fs module) instead of calling fs.createWriteStream

download csv

We are using fast-csv to create a csv report and we want to download this file once its created. Any idea how can we download this with fast-csv?

Error: write after end

https://gist.github.com/devTristan/5aa69c2822a0da3f5e51

If the stream fast-csv is piped to doesn't always write immediately, I get a write after end error. This happens if it waits 100ms every 100 records, or every 10ms on every other record, or pretty much any other combination.

I've tried this with through and through2. I'm not familiar enough with streams to replicate this issue without using a helper module, so it's possible that this bug is present in both through and through2 instead of fast-csv. However, I'm pretty sure this is a fast-csv thing, because I can't replicate it using other csv readers.

Commenting out this line prevents this issue from happening, but it also causes a test to fail (specifically, the github issue test for #68).

Help me, @doug-martin, you're my only hope.

csv skips rows when writing

This code, which should copy a file, will not work for moderately large files (~50 mb)

var csv = require('fast-csv');

csv
.fromStream(fs.createReadStream("./input.csv"), {
    headers: true,
    ignoreEmpty: true
})
.transform(function(row) {
    return row;
})
.on("end", function() {
    console.log("done")
})
.pipe(csv.createWriteStream({
}))
.pipe(fs.createWriteStream("./output.csv", {
    encoding: "utf8"
}));

The result is many rows will be missing in the output file. My guess is you're not waiting for the 'drain' event before writing to file.

Semicolon support

It would be a great feature to have semicolon support when reading csv files. In many countries the csv file format is delimited by semicolons and not commas.

Configurable separators - tab, semicolon, etc

Please, add an option to choose separator character. Something like below:

separator : '\t'.

How to transform input csv to output csv via pipes

I'm trying to add an extra column to a csv file and would like to do this via streams utilising a pipe. This is what I have so far:

var fs = require('fs');
var csv = require("fast-csv");

var input = fs.createReadStream("input.csv");
var output = fs.createWriteStream("output.csv", {encoding: 'utf-8'});

var count = 1;

var csvStream = csv({rtrim: true, headers: true})
    .on("record", function(data){
        data.ordinalValue = count;
        count = count + 1;
     })
     .on("end", function(){
        console.log("done");
     });

input.pipe(csvStream).pipe(output);

It just puts a stringified JSON object though..

Writes [object Object] to csv file?

According to docs I can write arrays of objects. However the resulting .csv file has '[object Object]' field after field, instead of the object's fields and then rows of the objects' values.

function makeReport(stamp, cb){
    var outputDir = getDir(stamp);
    var file = path.join(outputDir, 'shortcode_report.csv');
    var report = [];
    for(var k in shortcodeStats)
        report.push(shortcodeStats[k]);
    writeReport(file, report, cb);
}

function writeReport(file, data, cb){
    var existsAlready = fs.existsSync(file);
    var csvStream = csv.createWriteStream({
        headers: !existsAlready,
        includeEndRowDelimiter: true,
    });
    csvStream.on('finish', function(err){
        cb(err, file);
    });
    csvStream.pipe(fs.createWriteStream(file, {flags:'a'}));
    csvStream.write(data);
    csvStream.end();
}

stats object looks like this:

stats = {
    'shortcode': 'beftfit_325454',
    'boolean req failed occurance': 0,
    'eligibly failed occurance': 0,
    'boolean hois': '',
    'eligiblity hois': ''
};

output was:

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Behavior change between 0.4.4 and 0.5.0

I recently updated an app using fast-csv from 0.4.4 to 0.5.0

It also uses async 0.9.0.

I have this code:

var async = require('async');
var csv = require('fast-csv');
var fs = require('fs');

var rows = [];

for(var i = 0; i< 100000; i++)
{
    rows.push({a:i, b:i+2, c:i+4});
}

var csvStream;
var fileCount = 0;
async.doWhilst(
    function(done) {
        var filename = "test-" + fileCount + ".csv";
        console.info("Writing to: ", filename);
        csvStream = csv.createWriteStream({headers:true});
        var writableStream = fs.createWriteStream(filename);
        csvStream.pipe(writableStream);

        async.doWhilst(
            function(done) {
                var row = rows.shift();
                console.info("Row: ", row);
                csvStream.write(row);
                setImmediate(done);
            },
            function() {
                console.info("Bytes: ", writableStream.bytesWritten);
                return writableStream.bytesWritten < 1024 * 512 && rows.length > 0;
            },
            function(err) {
                console.info("Done with: " + filename);
                csvStream.write(null);
                fileCount++;
            }
        );

        writableStream.on('finish', function() {
            console.info('finished');
            setImmediate(done);
        });
    },
    function() {
        return rows.length > 0;
    },
    function(err) {
        console.info("Done writing rows");
    }
);

I have found that with 0.4.4 this does as expected and creates two files test-0.csv and test-1.csv.

With 0.5.0 when the inner doWhilst() finishes the outer never restarts and execution stops. Very bizarre behavior. There must be some change in fast-csv causing this but I can't figure out what.

fast-csv error stream issues

I ran into an issue when using fast-csv and listening for the error event. It seems a fast-csv stream doesn't properly manage stream events. The example I have is I will thrown an exception outside of the stream and fast-csv stream error event will trigger instead of having it handled via the process.

Example can be found at https://gist.github.com/jgornick/cd2b8bf0d8ba76b5d747

How to skip first couple of lines?

We are parsing files from various sources and some of these files we need to potentially skip the first couple of lines. Can you provide some suggestions on how to achieve this? I didnt see any way to do this in the module api.

Thanks!
Tom

Empty last line breaks parser

Hi,

It seems that CSV files ending with an empty line fail to parse:

events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: Parse Error: expected: '"' got: 'undefined'. at '
    at parseEscapedItem ([...]\node_modules\fast-csv\lib\parser.js:70:23)
    at ParserStream.parseLine [as parser] ([...]\node_modules\fast-csv\lib\parser.js:142:30)
    at ParserStream._parseLine [as _parse] ([...]\node_modules\fast-csv\lib\parser_stream.js:109:25)
    at ParserStream.extended.extend._flush ([...]\node_modules\fast-csv\lib\parser_stream.js:184:18)
    at ParserStream.<anonymous> (_stream_transform.js:130:12)
    at ParserStream.g (events.js:175:14)
    at ParserStream.EventEmitter.emit (events.js:117:20)
    at spreadArgs ([...]\node_modules\fast-csv\lib\parser_stream.js:17:21)
    at ParserStream.extended.extend.emit ([...]\node_modules\fast-csv\lib\parser_stream.js:217:13)
    at finishMaybe (_stream_writable.js:354:12)

I removed the line break manually and voila - parsing worked fine. I understand that this is rather an issue of unclean CSV data, but fixing this would make the parser more robust.

Is there an option to have it ignore any quoting?

I have a tsv file that has direct embedded quotes that are not escaped. I want them just treated as quotes.

Thanks!

Chris

throwing err with large tsv

I have a 2M line tsv, and im getting this

events.js:72
    throw er; // Unhandled 'error' event
          ^
Error: no writecb in Transform class
    at afterTransform (_stream_transform.js:90:33)
    at TransformState.afterTransform (_stream_transform.js:74:12)
    at /Users/nickdestefano/node_modules/fast-csv/lib/parser/parser_stream.js:206:21
    at /Users/nickdestefano/node_modules/fast-csv/lib/parser/parser_stream.js:121:17
    at asyncIterator (/Users/nickdestefano/node_modules/fast-csv/lib/extended.js:29:17)
    at Object._onImmediate (/Users/nickdestefano/node_modules/fast-csv/lib/extended.js:18:37)
    at processImmediate [as _immediateCallback] (timers.js:345:15)

My code is using the writeStream. I am trying to use a readStream in here as well but I'm wondering if this should be good enough, or is there a cap on size?

var formatStream = csv
  .createWriteStream({headers: true})
  .transform(function(data){
      return {
          'Author': data['Author'],
          'GUID': data['GUID'],
          'Contents': data['GUID'].split(',').join(''),
          'Date(GMT)': moment(new Date(data['Date(GMT)'])).format('MM/DD/YYYY HH:mm:ss A')
      };
  });
csv
   .fromPath(file, {delimiter: '\t', quote:'�', objectMode:true, headers: true, escape:'′'})
   .pipe(formatStream)

.pipe(fs.createWriteStream(test, {encoding: "utf8"}));

Also I tried a few different csv parsing libraries and this is best I've seen so nice work.

Formatting Functions example does not work

Using [email protected]
The first example for "Formatting Functions" does not work.

The example:
var csvStream = csv.createWriteStream({headers: true}),
writableStream = fs.createWriteStream("my.csv");

writableStream.on("finish", function(){
console.log("DONE!");
});

csvStream.pipe(writableStream);
csvStream.write({a: "a0", b: "b0"});
csvStream.write({a: "a1", b: "b1"});
csvStream.write({a: "a2", b: "b2"});
csvStream.write({a: "a3", b: "b4"});
csvStream.write({a: "a3", b: "b4"});
csvStream.end();

Results in:
events.js:72
throw er; // Unhandled 'error' event
^
TypeError: Object # has no method 'end'

What am I doing wrong?

Thanks

npm install / update is bloated

Through the dependency chain, a very old version of grunt-contrib-jshint is being loaded that references a bloated tarball of esprima.

The chain occurs here:

fast-csv -> string-extended -> array-extended -> grunt-contrib-jshint (~0.4.3) -> jshint -> esprima

and also here:

fast-csv -> string-extended -> number-extended -> grunt-contrib-jshint (~0.4.3) -> jshint -> esprima

There are a number of dependent projects from Doug Marint that are installing grunt and grunt related modules even on npm install --production.
I believe updates to these modules would fix the bloated dependency tree.

When we listen to the data event, the end event emitted after we paused the stream

Example

var path = 'test.csv'
var csvstream = require('fast-csv')
  .fromPath(path)
  .on("data", function(data){
    csvstream.pause();
    console.log("Shouldn't see this more than once");
  })
  .on("end", function(data){
    console.log("Shouldn't see this");
  });

This is works properly when we listen the record event insted of data

async doesnt seem to work

I'm having trouble with the async next function

If I have this code:
.validate(function (data, next) {
console.log('data=======', next);

The result of next is 0: data======= 0

and crashes the app:
next(valid);
^
TypeError: number is not a function

I'm using version: '0.5.2'

What am I doing wrong?

Thanks

Parser doesn't work after 2.2 milions of entries

I am trying to use fast csv but it stopped working after around 2.3 milions of entries when trying to read a file.

var csvStream = csv
    .fromStream(stream, {delimiter : ';', ignoreEmpty: true})
    .validate(function(data, next) {
        //console.log('validating');
        if (videosToBeInserted.length >= 500) {
            Entity.collection.insert(videosToBeInserted, {}, function(error, docs) {
                if (error) {
                    console.log(error);
                }
                else {
                    console.info('%d entities were successfully stored.', docs.length);
                    success += docs.length;
                }
                videosToBeInserted = [];
                next(null, true);
            });
        }
        else {
            next(null, true);

        }

    })
    .on("data", function(data) {

        if (counter >= counterOffset) {
            var entity = createEntity(data);
            console.log('Entity: ' + counter + ' - ' + entity.externalId);
            videosToBeInserted.push(entity);
        }
        counter++;
    })
    .on("end", function(){
        console.log("done");
        console.log('Total entries: ' + counter);
        console.log('Success: ' + success);
        console.log('Errors: ' + counter - success);
        console.log('Parsing errors: ' + parsingErrors);
    })
    .on('error', function(error) {
        console.log("Catch an invalid csv file!!!");
        console.log(error);
        parsingErrors++;

    });

stream.pipe(csvStream);

I tried multiple times to run it but it just freezes.

Streams 2 support

Really like this csv library compared to others I've found, but streaming is a little clunky. Would prefer to do this:

in.pipe(csv({headers: true})).pipe(process.stdout)

rather than this:

csv(in, {headers: true}).on('data', function(d){console.log(d)}).parse();

TypeError at Object.keys

This snippet does not work in webkit (desktop app):

var csvStream = csv
    .createWriteStream({headers: true})
    .transform(function(row){
        return {
           A: row.a,
           B: row.b
        };
    }),
    writableStream = fs.createWriteStream("my.csv");

writableStream.on("finish", function(){
  console.log("DONE!");
});

csvStream.pipe(writableStream);
csvStream.write({a: "a0", b: "b0"});
csvStream.write({a: "a1", b: "b1"});
csvStream.write({a: "a2", b: "b2"});
csvStream.write({a: "a3", b: "b4"});
csvStream.write({a: "a3", b: "b4"});
csvStream.write(null);

My stacktrace:

TypeError
    at Object.keys (/home/ivan/Workspace/js/neuroskyvis/app/node_modules/fast-csv/node_modules/object-extended/index.js:116:23)
    at CsvTransformStream.extended.extend.write (/home/ivan/Workspace/js/neuroskyvis/app/node_modules/fast-csv/lib/formatter.js:126:51)
    at HTMLInputElement.<anonymous> (file:///home/ivan/Workspace/js/neuroskyvis/app/dist/app.min.js:121:19)
    at HTMLInputElement.jQuery.event.dispatch (file:///home/ivan/Workspace/js/neuroskyvis/app/dist/vendor.min.js:4409:9)
    at HTMLInputElement.elemData.handle (file:///home/ivan/Workspace/js/neuroskyvis/app/dist/vendor.min.js:4095:28)

I'm falling back to csv module for now.

formatter.js: Disabling quote doesn't work

When I try to disable the quoting like this:

var csv = require('fast-csv');

var csvStream = csv.createWriteStream({
  quote : null
});

it's not working, because the null value of options.quote is ignored while setting QUOTE in formatter.js:

QUOTE = options.quote || '"',

When I use the following option

var csvStream = csv.createWriteStream({
  quoteColumns : false
});

the fields are escaped / quoted, too.

Example

What I want: my text;text with " in it;another text
What I get: my text;"text with "" in it";another text

.transform() async callback?

Is it possible to do an async operation in the transform method?

.transform(function(data, done){
   mongoose.model('user').findOne({ username: data.username }, function(err, result){
     if(err) throw err;
     return done({ userId: result._id });
  }
});

No parsing occurs in certain contexts

In certain contexts, like Meteor publish functions, or io.sockets.on('connection') functions, it appears that .parse() doesn't run. By contrast, the csv module does parse data in the very same context. Here is sample code that reproduces the bug:

'use strict';
var fs = require('fs');
var stream = fs.createReadStream("bug.csv");
var csv = require("fast-csv");

var io = require('socket.io').listen(8901);

io.sockets.on('connection', function (socket) {

  console.log("On connection");
  csv(stream, {headers : true})
    .on("data", function (d) {
      console.log(d);  // never here
    })
    .on("end", function () {
      console.log("done");  // never here
    })
    .parse();
  console.log("After csv.parse(), which didn't happen");

});

The client code just connects to the server:

<!doctype html>
<html>

<head>
<script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/0.9.10/socket.io.min.js"></script>

<body>

<script>
var websocket = io.connect('ws://localhost:8901');
</script>

c2fo / fast-csv Goto Github PK

fast-csv's People

Contributors

Stargazers

Watchers

Forkers

fast-csv's Issues

Example

Recommend Projects

Recommend Topics

Recommend Org