Giter Club home page Giter Club logo

binary-csv's Introduction

binary-csv

Update: you should use csv-parser instead, it has the same API as this but is faster

A fast CSV parser written in javascript.

NPM

badge

Consumes Buffer in node or Uint8Array in the browser (thanks to bops). Whereas most CSV parsers parse String data, this library never converts binary data into non-binary data. It's fast because it never creates Numbers, Strings, Arrays or Objects -- only binary representations of the line and cell values in the CSV, meaning the JS VM spends less time doing things like decoding UTF8 strings and going back and forth between C++ and JS.

By default it will only split lines, but you can use the provided .line and .cell methods to parse the cells and cell values.

Parses a 55 million line, 5.18GB CSV in a little over 1 minute.

demo

See a demo running in the browser on RequireBin:

http://requirebin.com/?gist=maxogden/7555664

You can also load any CSV on the internet via querystring, e.g.:

http://requirebin.com/embed?gist=maxogden/7555664&csv=http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv

Huge CSVs might be slow to render because of the terminal renderer used in the demo.

usage

You can use it two ways: programmatically in Node programs, or from the command line.

binaryCSV([options])
var binaryCSV = require('binary-csv')
var parser = binaryCSV()

parser is a duplex stream -- you can pipe data to it and it will emit a buffer for each line in the CSV

default options

{
  separator: ',',
  newline: '\n',
  detectNewlines: true,
  json: false
}

if json is truthy then the parser stream will emit fully decoded JSON objects representing each row of the csv (combined with the header row)

fs.createReadStream('data.csv').pipe(parser)
  .on('data', function(line) { })

parser.line(buf)

Parses cells from a line buffer. Returns an array of cell buffers.

var cells = parser.line(new Buffer('hello,world'))
// returns equivalent of [new Buffer('hello'), new Buffer('world')]

parser.cell(buf)

Parses a single cell buffer, returns the unescaped data in a buffer.

var cell = parser.cell(new Buffer('"this is a ""escaped"" csv cell value"'))
// returns equivalent of new Buffer('this is a "escaped" csv cell value")

See test/test.js for more examples.

CLI API

To use on the command line install it globally:

$ npm install binary-csv -g

This should add the bcsv command to your $PATH.

Then, you either pipe data into it or give it a filename:

# pipe data in
$ cat some_data.csv | bcsv
# pass a filename
$ bcsv some_data.csv
# tell bcsv to read from + wait on stdin
$ bcsv -

run the test suite

$ npm install
$ npm test

binary-csv's People

Contributors

brianloveswords avatar brycebaril avatar max-mapper avatar olov avatar seldo avatar timhudson avatar zeke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

binary-csv's Issues

TypeError: Invalid non-string/buffer chunk

Using {json:true} with the API, I get the above error.

Piping the same data to the bcsv CLI produces the expected results.

App code and csv data below.

I also tried with some data from /test/data, but the error obtained.

var fs = require('fs')                                                                          
var Parser = require('binary-csv')                                                              
var catscsv= fs.createReadStream(__dirname + '/categories.csv')                                 
var catsjson = fs.createWriteStream(__dirname + '/categories.json', 'utf8')                     
catscsv.pipe(Parser({json:true})).pipe(catsjson) 

data:

D,Pillar,Categories,Sub-Category,Description,,,                                                
c0,Platforms,Mobility,,Lorem Ipsum,,,                                                           
c1,Platforms,Cloud,,Lorem Ipsum,,,                                                              
c2,Platforms,Cloud,Application,Lorem Ipsum,,,                                                   
c3,Platforms,Cloud,Server,Lorem Ipsum,,,                                                        
c4,Platforms,Main Frame,,Lorem Ipsum,,,                                                         
c5,Practices,Devops,,Lorem Ipsum,,,                                                             
c6,Practices,Management Cloud,,Lorem Ipsum,,,                                                   
c7,Practices,Security,,Lorem Ipsum,,,                                                           
c8,Practices,Security,Application,Lorem Ipsum,,,                                                
c9,Practices,Security,Router,Lorem Ipsum,,,                                                     
c10,Portfolio,IntelliCenter,,Lorem Ipsum,,,                                                     
c11,Portfolio,OpsCenter,,Lorem Ipsum,,,                                                         
c12,Portfolio,OpsCenter,Testing,Lorem Ipsum,,,                                                  
c13,Portfolio,DevCenter,,Lorem Ipsum,,,                                                         
c14,Portfolio,SecureCenter,,Lorem Ipsum,,,                                                      
,,,,,,,                                                                                         
,,,,,,,                                                                                         
,,,,,,,                                                                                         
,,,,,,,                                                                                         
,,,,,,, 

Enhancement: Add option for unescaped quotes in csv

Would be really nice if you could pass in an option to the parser to NOT try to escape quotes across multiple lines. Ran into this issue when the data feed I received didn't properly escape their double quotes so the parser didn't process the next lines properly. Ie in the example below everything is considered the first row.

ie:
row, some item, that is 16"
another row, that has something in it, no quotes
row3, has another, item with quote that is 12"

.line() issue with empty trailing column

I've got some data with missing data, and it appears that it's not parsing it correctly:

// Row in csv:
2007-01-01,,
// Buffer after piping through csv()
<Buffer 32 30 30 37 2d 30 31 2d 30 31 2c 2c>
// output from csv.line()
[ <Buffer 32 30 30 37 2d 30 31 2d 30 31>, <Buffer > ]

RequireBin example is broken in FF and needs update

I was playing around with your requirebin example, and it throws an error in FF because of max-mapper/binary-xhr#2. So you might want to resave the requirebin bundle.

BUT THEN! I copy/pasted into a new bin, and got errors about slice being undefined. Turns out slice is not present on ArrayBuffers, which is what was being passed into write from binary-xhr.

So after futzing with that (and investigating what library expects what data types) it turns out the fix is really easy: convert the ArrayBuffer to a node Buffer before passing it into binary-csv:

// Assumedly necessary because the main requirebin script is not
// passed to browserify-cdn.
// Definitely took me longer than it should have to figure that out :)
var Buffer = require('buffer').Buffer; 
...
var buff = new Buffer(new Uint8Array(data))

Full fixed bin: http://requirebin.com/?gist=kirbysayshi/9881537

I suppose it makes sense that binary-csv only accepts Buffers when streaming, but this was definitely confusing while debugging because of both "binary-csv" and "binary-xhr" having the "binary" prefix but requiring different binary transport types!

Not sure if there's anything to be done here, aside from updating the requirebin example, but wanted to let you know about the confusion.

deprecate

you can do this with npm deprecate:

npm deprecate binary-csv "use csv-parser instead, it has the same API but is faster"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.