Giter Club home page Giter Club logo

node-csvtojson's Introduction

Build Status Coverage Status OpenCollective OpenCollective

CSVTOJSON

csvtojson module is a comprehensive nodejs csv parser to convert csv to json or column arrays. It can be used as node.js library / command line tool / or in browser. Below are some features:

  • Strictly follow CSV definition RFC4180
  • Work with millions of lines of CSV data
  • Provide comprehensive parsing parameters
  • Provide out of box CSV parsing tool for Command Line
  • Blazing fast -- Focus on performance
  • Give flexibility to developer with 'pre-defined' helpers
  • Allow async / streaming parsing
  • Provide a csv parser for both Node.JS and browsers
  • Easy to use API

csvtojson online

Here is a free online csv to json convert service utilizing latest csvtojson module.

Upgrade to V2

csvtojson has released version 2.0.0.

It is still able to use v1 with [email protected]

// v1
const csvtojsonV1=require("csvtojson/v1");
// v2
const csvtojsonV2=require("csvtojson");
const csvtojsonV2=require("csvtojson/v2");

Menu

Quick Start

Library

Installation

npm i --save csvtojson

From CSV File to JSON Array

/** csv file
a,b,c
1,2,3
4,5,6
*/
const csvFilePath='<path to csv file>'
const csv=require('csvtojson')
csv()
.fromFile(csvFilePath)
.then((jsonObj)=>{
	console.log(jsonObj);
	/**
	 * [
	 * 	{a:"1", b:"2", c:"3"},
	 * 	{a:"4", b:"5". c:"6"}
	 * ]
	 */ 
})

// Async / await usage
const jsonArray=await csv().fromFile(csvFilePath);

From CSV String to CSV Row

/**
csvStr:
1,2,3
4,5,6
7,8,9
*/
const csv=require('csvtojson')
csv({
	noheader:true,
	output: "csv"
})
.fromString(csvStr)
.then((csvRow)=>{ 
	console.log(csvRow) // => [["1","2","3"], ["4","5","6"], ["7","8","9"]]
})

Asynchronously process each line from csv url

const request=require('request')
const csv=require('csvtojson')

csv()
.fromStream(request.get('http://mywebsite.com/mycsvfile.csv'))
.subscribe((json)=>{
	return new Promise((resolve,reject)=>{
		// long operation for each json e.g. transform / write into database.
	})
},onError,onComplete);

Convert to CSV lines

/**
csvStr:
a,b,c
1,2,3
4,5,6
*/

const csv=require('csvtojson')
csv({output:"line"})
.fromString(csvStr)
.subscribe((csvLine)=>{ 
	// csvLine =>  "1,2,3" and "4,5,6"
})

Use Stream

const csv=require('csvtojson');

const readStream=require('fs').createReadStream(csvFilePath);

const writeStream=request.put('http://mysite.com/obj.json');

readStream.pipe(csv()).pipe(writeStream);

To find more detailed usage, please see API section

Command Line Usage

Installation

$ npm i -g csvtojson

Usage

$ csvtojson [options] <csv file path>

Example

Convert csv file and save result to json file:

$ csvtojson source.csv > converted.json

Pipe in csv data:

$ cat ./source.csv | csvtojson > converted.json

Print Help:

$ csvtojson

API

Parameters

require('csvtojson') returns a constructor function which takes 2 arguments:

  1. Parser parameters
  2. Stream options
const csv=require('csvtojson')
const converter=csv(parserParameters, streamOptions)

Both arguments are optional.

For Stream Options please read Stream Option from Node.JS

parserParameters is a JSON object like:

const converter=csv({
	noheader:true,
	trim:true,
})

Following parameters are supported:

  • output: The format to be converted to. "json" (default) -- convert csv to json. "csv" -- convert csv to csv row array. "line" -- convert csv to csv line string
  • delimiter: delimiter used for separating columns. Use "auto" if delimiter is unknown in advance, in this case, delimiter will be auto-detected (by best attempt). Use an array to give a list of potential delimiters e.g. [",","|","$"]. default: ","
  • quote: If a column contains delimiter, it is able to use quote character to surround the column content. e.g. "hello, world" won't be split into two columns while parsing. Set to "off" will ignore all quotes. default: " (double quote)
  • trim: Indicate if parser trim off spaces surrounding column content. e.g. " content " will be trimmed to "content". Default: true
  • checkType: This parameter turns on and off whether check field type. Default is false. (The default is true if version < 1.1.4)
  • ignoreEmpty: Ignore the empty value in CSV columns. If a column value is not given, set this to true to skip them. Default: false.
  • fork (experimental): Fork another process to parse the CSV stream. It is effective if many concurrent parsing sessions for large csv files. Default: false
  • noheader:Indicating csv data has no header row and first row is data row. Default is false. See header row
  • headers: An array to specify the headers of CSV data. If --noheader is false, this value will override CSV header row. Default: null. Example: ["my field","name"]. See header row
  • flatKeys: Don't interpret dots (.) and square brackets in header fields as nested object or array identifiers at all (treat them like regular characters for JSON field identifiers). Default: false.
  • maxRowLength: the max character a csv row could have. 0 means infinite. If max number exceeded, parser will emit "error" of "row_exceed". if a possibly corrupted csv data provided, give it a number like 65535 so the parser won't consume memory. default: 0
  • checkColumn: whether check column number of a row is the same as headers. If column number mismatched headers number, an error of "mismatched_column" will be emitted.. default: false
  • eol: End of line character. If omitted, parser will attempt to retrieve it from the first chunks of CSV data.
  • escape: escape character used in quoted column. Default is double quote (") according to RFC4108. Change to back slash (\) or other chars for your own case.
  • includeColumns: This parameter instructs the parser to include only those columns as specified by the regular expression. Example: /(name|age)/ will parse and include columns whose header contains "name" or "age"
  • ignoreColumns: This parameter instructs the parser to ignore columns as specified by the regular expression. Example: /(name|age)/ will ignore columns whose header contains "name" or "age"
  • colParser: Allows override parsing logic for a specific column. It accepts a JSON object with fields like: headName: <String | Function | ColParser> . e.g. {field1:'number'} will use built-in number parser to convert value of the field1 column to number. For more information See details below
  • alwaysSplitAtEOL: Always interpret each line (as defined by eol like \n) as a row. This will prevent eol characters from being used within a row (even inside a quoted field). Default is false. Change to true if you are confident no inline line breaks (like line break in a cell which has multi line text).
  • nullObject: How to parse if a csv cell contains "null". Default false will keep "null" as string. Change to true if a null object is needed.
  • downstreamFormat: Option to set what JSON array format is needed by downstream. "line" is also called ndjson format. This format will write lines of JSON (without square brackets and commas) to downstream. "array" will write complete JSON array string to downstream (suitable for file writable stream etc). Default "line"
  • needEmitAll: Parser will build JSON result if .then is called (or await is used). If this is not desired, set this to false. Default is true. All parameters can be used in Command Line tool.

Asynchronous Result Process

Since v2.0.0, asynchronous processing has been fully supported.

e.g. Process each JSON result asynchronously.

csv().fromFile(csvFile)
.subscribe((json)=>{
	return new Promise((resolve,reject)=>{
		// Async operation on the json
		// don't forget to call resolve and reject
	})
})

For more details please read:

Events

Converter class defined a series of events.

header

header event is emitted for each CSV file once. It passes an array object which contains the names of the header row.

const csv=require('csvtojson')
csv()
.on('header',(header)=>{
	//header=> [header1, header2, header3]
})

header is always an array of strings without types.

data

data event is emitted for each parsed CSV line. It passes buffer of stringified JSON in ndjson format unless objectMode is set true in stream option.

const csv=require('csvtojson')
csv()
.on('data',(data)=>{
	//data is a buffer object
	const jsonStr= data.toString('utf8')
})

error

error event is emitted if any errors happened during parsing.

const csv=require('csvtojson')
csv()
.on('error',(err)=>{
	console.log(err)
})

Note that if error being emitted, the process will stop as node.js will automatically unpipe() upper-stream and chained down-stream1. This will cause end event never being emitted because end event is only emitted when all data being consumed 2. If need to know when parsing finished, use done event instead of end.

  1. Node.JS Readable Stream
  2. Writable end Event

done

done event is emitted either after parsing successfully finished or any error happens. This indicates the processor has stopped.

const csv=require('csvtojson')
csv()
.on('done',(error)=>{
	//do some stuff
})

if any error during parsing, it will be passed in callback.

Hook & Transform

Raw CSV Data Hook

the hook -- preRawData will be called with csv string passed to parser.

const csv=require('csvtojson')
// synchronous
csv()
.preRawData((csvRawData)=>{
	var newData=csvRawData.replace('some value','another value');
	return newData;
})

// asynchronous
csv()
.preRawData((csvRawData)=>{
	return new Promise((resolve,reject)=>{
		var newData=csvRawData.replace('some value','another value');
		resolve(newData);
	})
	
})

CSV File Line Hook

The function is called each time a file line has been parsed in csv stream. The lineIdx is the file line number in the file starting with 0.

const csv=require('csvtojson')
// synchronous
csv()
.preFileLine((fileLineString, lineIdx)=>{
	if (lineIdx === 2){
		return fileLineString.replace('some value','another value')
	}
	return fileLineString
})

// asynchronous
csv()
.preFileLine((fileLineString, lineIdx)=>{
	return new Promise((resolve,reject)=>{
			// async function processing the data.
	})
	
	
})

Result transform

To transform result that is sent to downstream, use .subscribe method for each json populated.

const csv=require('csvtojson')
csv()
.subscribe((jsonObj,index)=>{
	jsonObj.myNewKey='some value'
	// OR asynchronously
	return new Promise((resolve,reject)=>{
		jsonObj.myNewKey='some value';
		resolve();
	})
})
.on('data',(jsonObj)=>{
	console.log(jsonObj.myNewKey) // some value
});

Nested JSON Structure

csvtojson is able to convert csv line to a nested JSON by correctly defining its csv header row. This is default out-of-box feature.

Here is an example. Original CSV:

fieldA.title, fieldA.children.0.name, fieldA.children.0.id,fieldA.children.1.name, fieldA.children.1.employee.0.name,fieldA.children.1.employee.1.name, fieldA.address.0,fieldA.address.1, description
Food Factory, Oscar, 0023, Tikka, Tim, Joe, 3 Lame Road, Grantstown, A fresh new food factory
Kindom Garden, Ceil, 54, Pillow, Amst, Tom, 24 Shaker Street, HelloTown, Awesome castle

The data above contains nested JSON including nested array of JSON objects and plain texts.

Using csvtojson to convert, the result would be like:

[{
    "fieldA": {
        "title": "Food Factory",
        "children": [{
            "name": "Oscar",
            "id": "0023"
        }, {
            "name": "Tikka",
            "employee": [{
                "name": "Tim"
            }, {
                "name": "Joe"
            }]
        }],
        "address": ["3 Lame Road", "Grantstown"]
    },
    "description": "A fresh new food factory"
}, {
    "fieldA": {
        "title": "Kindom Garden",
        "children": [{
            "name": "Ceil",
            "id": "54"
        }, {
            "name": "Pillow",
            "employee": [{
                "name": "Amst"
            }, {
                "name": "Tom"
            }]
        }],
        "address": ["24 Shaker Street", "HelloTown"]
    },
    "description": "Awesome castle"
}]

Flat Keys

In order to not produce nested JSON, simply set flatKeys:true in parameters.

/**
csvStr:
a.b,a.c
1,2
*/
csv({flatKeys:true})
.fromString(csvStr)
.subscribe((jsonObj)=>{
	//{"a.b":1,"a.c":2}  rather than  {"a":{"b":1,"c":2}}
});

Header Row

csvtojson uses csv header row as generator of JSON keys. However, it does not require the csv source containing a header row. There are 4 ways to define header rows:

  1. First row of csv source. Use first row of csv source as header row. This is default.
  2. If first row of csv source is header row but it is incorrect and need to be replaced. Use headers:[] and noheader:false parameters.
  3. If original csv source has no header row but the header definition can be defined. Use headers:[] and noheader:true parameters.
  4. If original csv source has no header row and the header definition is unknown. Use noheader:true. This will automatically add fieldN header to csv cells

Example

// replace header row (first row) from original source with 'header1, header2'
csv({
	noheader: false,
	headers: ['header1','header2']
})

// original source has no header row. add 'field1' 'field2' ... 'fieldN' as csv header
csv({
	noheader: true
})

// original source has no header row. use 'header1' 'header2' as its header row
csv({
	noheader: true,
	headers: ['header1','header2']
})

Column Parser

Column Parser allows writing a custom parser for a column in CSV data.

What is Column Parser

When csvtojson walks through csv data, it converts value in a cell to something else. For example, if checkType is true, csvtojson will attempt to find a proper type parser according to the cell value. That is, if cell value is "5", a numberParser will be used and all value under that column will use the numberParser to transform data.

Built-in parsers

There are currently following built-in parser:

  • string: Convert value to string
  • number: Convert value to number
  • omit: omit the whole column

This will override types inferred from checkType:true parameter. More built-in parsers will be added as requested in issues page.

Example:

/*csv string
column1,column2
hello,1234
*/
csv({
	colParser:{
		"column1":"omit",
		"column2":"string",
	},
	checkType:true
})
.fromString(csvString)
.subscribe((jsonObj)=>{
	//jsonObj: {column2:"1234"}
})

Custom parsers function

Sometimes, developers want to define custom parser. It is able to pass a function to specific column in colParser.

Example:

/*csv data
name, birthday
Joe, 1970-01-01
*/
csv({
	colParser:{
		"birthday":function(item, head, resultRow, row , colIdx){
			/*
				item - "1970-01-01"
				head - "birthday"
				resultRow - {name:"Joe"}
				row - ["Joe","1970-01-01"]
				colIdx - 1
			*/
			return new Date(item);
		}
	}
})

Above example will convert birthday column into a js Date object.

The returned value will be used in result JSON object. Returning undefined will not change result JSON object.

Flat key column

It is also able to mark a column as flat:

/*csv string
person.comment,person.number
hello,1234
*/
csv({
	colParser:{
		"person.number":{
			flat:true,
			cellParser: "number" // string or a function 
		}
	}
})
.fromString(csvString)
.subscribe((jsonObj)=>{
	//jsonObj: {"person.number":1234,"person":{"comment":"hello"}}
})

Contribution

Very much appreciate any types of donation and support.

Code

csvtojson follows github convention for contributions. Here are some steps:

  1. Fork the repo to your github account
  2. Checkout code from your github repo to your local machine.
  3. Make code changes and don't forget add related tests.
  4. Run npm test locally before pushing code back.
  5. Create a Pull Request on github.
  6. Code review and merge
  7. Changes will be published to NPM within next version.

Thanks all the contributors

Backers

Thank you to all our backers! [Become a backer]

OpenCollective

Sponsors

Thank you to all our sponsors! (please ask your company to also support this open source project by becoming a sponsor)

Paypal

donate

Browser Usage

To use csvtojson in browser is quite simple. There are two ways:

1. Embed script directly into script tag

There is a pre-built script located in browser/csvtojson.min.js. Simply include that file in a script tag in index.html page:

<script src="node_modules/csvtojson/browser/csvtojson.min.js"></script>
<!-- or use cdn -->
<script src="https://cdn.rawgit.com/Keyang/node-csvtojson/d41f44aa/browser/csvtojson.min.js"></script>

then use a global csv function

<script>
csv({
	output: "csv"
})
.fromString("a,b,c\n1,2,3")
.then(function(result){

})
</script>

2. Use webpack or browserify

If a module packager is preferred, just simply require("csvtojson"):

var csv=require("csvtojson");

// or with import
import {csv} from "csvtojson";

//then use csv as normal, you'll need to load the CSV first, this example is using Fetch https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch 
fetch('http://mywebsite.com/mycsvfile.csv')
  .then(response => response.text())
  .then(text => csv.fromString(text));
  .then(function(result){
  
  })

node-csvtojson's People

Contributors

0xflotus avatar atufkas avatar blakeblackshear avatar colarob avatar cwolfanderson avatar dependabot[bot] avatar dogabudak avatar gabegorelick avatar geofflangenderfer avatar jason-cooke avatar jeremyrajan avatar jimihford avatar jondayft avatar joseexposito avatar josgraha avatar kakts avatar keyang avatar kriscarle avatar michaelshaffer37 avatar nbelakovski avatar nivsherf avatar notslang avatar richardpringle avatar roodboi avatar sanchitbansal10 avatar schoologysolutions avatar silid avatar tlhunter avatar trangtungn avatar vmanolas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-csvtojson's Issues

Multi Thread Not working as expected

Hi Guys,

I pulled in the latest code change which support multi cpu support. However somehow it seem to not work as expected. Here is the sample code I am using.

var data = fs.readFileSync(config.destination, 'utf8');
parse_csv(data)
.then(function (object) {
console.log(object)
}

function parse_csv_internal(data, onComplete){

var csvConverter = new Converter({constructResult:true, workerNum:4});
csvConverter.fromString(data,function(err,jsonObj) {
  if (err){
      log("Parse CSV Internal Function Error" + err);
      onComplete(err, null);
  } else {
      log('Parse CSV Internal function has finished!');
      onComplete(null, jsonObj);
  }
});

csvConverter.on("end_parsed", function(jsonObj) {
  //final result poped here as normal.
});

}

function parse_csv(csv_data) {
return new Promise(function (fulfill, reject) {
parse_csv_internal(csv_data, function(err, result) {
if (err == null) {
fulfill(result);
} else {
log('Parse CSV function has an error!');
return reject(err);
}
});
});
}

It takes like 20mins to finish because of
node 0.0 00:00.15 5 0 28 10M 0B 0B 9766 9766 sleeping *0[1] 0.00000 0.00000 501 5119 156 50 25 450 85 258 0 0 0.0 varun N/A N/A N/A N/A N/A N/A
9769 node 0.0 00:00.16 5 0 28 10M 0B 0B 9766 9766 sleeping *0[1] 0.00000 0.00000 501 5116 153 50 25 452 85 251 0 0 0.0 varun N/A N/A N/A N/A N/A N/A
9768 node 0.0 00:00.16 5 0 28 10M 0B 0B 9766 9766 sleeping *0[1] 0.00000 0.00000 501 5149 158 50 25 450 85 207 0 0 0.0 varun N/A N/A N/A N/A N/A N/A
9767 curl 0.0 00:00.05 2 0 38 1996K 0B 0B 9767 420 sleeping *0[4] 0.00000 0.00000 501 1335 122 119 46 1755+ 250 817+ 9 0 0.0 varun N/A N/A N/A N/A N/A N/A
9766 node 100.1 07:35.63 7/1 0 37

I see 4 threads but only one thread is doing work. Others are sleeping. What am I doing wrong here?

Line breaks within cell causes erroneous parsing

If there is a CSV where there are fields containing line breaks, the csvtojson will parse each line as if it is a new entry in the csv. It doesn't respect the line breaks as part of the row value even though it is within the quotation marks (").

For example

day,month,description
"1","12","something that works"
"3","05","something
that<hr>
doesn't"

the above should only parse out into two objects, instead it will generate 4.

Mac-made .csv files won't parse

When creating a .csv generated by the OSX version of Excel, you have to save the file specifically as "Windows Commas Separated" .csv file to get it to parse, otherwise, the converter will return an empty array.

When saving it as the normal, "default" .csv type, it won't parse.

var Converter       = require('csvtojson').Converter;
var converter = new Converter({constructResult:true});
converter.on("end_parsed",function(json){
    if(json.length>0){
        report.json = json;
        defer.resolve(json);
    }else{
        //this happens when a mac .csv is sent
        defer.reject({message:'.csv could not be parsed. Please create a "Windows Comma Separated" .csv file'});
    }
});
fs.createReadStream(fileLocation).pipe(converter);

unable to convert to json

var fs = require("fs");
var Converter = require("csvtojson").Converter;
var csvFileName="./Sacramentorealestatetransactions.csv"
// Source http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv

self = module.exports ={
    getCsvAsJSON: function (callback) {
    //Converter Class
    var fileStream = fs.createReadStream(csvFileName);
    //new converter instance
    var converter = new Converter({ constructResult: true });
    //end_parsed will be emitted once parsing finished
    converter.on("end_parsed", function (jsonObj) {
        console.log(jsonObj); //here is your result json object
        callback(jsonObj);
    });
    //read from file
    fileStream.pipe(converter);
   }

}

self.getCsvAsJSON(function(){})

Output I get is always empty
version: latest

Special charaters (éèàâ’...) ?

Hello !

I must convert a .csv file written in French that contain special characters.
When I do that all the special characters are replaced by "�".
What can I do ?

Thank you !

Conversion fails with CRLF

The following file:

Date, Type, Description, Value, Balance, Account Name, Account Number

11/04/2012,DPC,"'CALL REF.NO. 1778 , FROM A/C 12345678",50.00,149.47,"'Some Business Ltd","'123456-24438749",

Reasons:

  • The first line endings are CRLF (\r\n)
  • The CSV starts with a blank line (removing the first line solves the issue).

Parsing Large CSV file (90k) records

I have seen examples where I saw that {constructResult:false} will help reduce memory consumption. While I understand that, I have 16GB of ram and I can effort to put the result in memory and do some processing. My main issue is that it took 24 minutes to finish this work. While I understand the load, 24 minutes sounds really high!!! Can this take advantage of multi cpu cores for faster processing?

avoid dot(.) in headers as nested JSON?

Hi how i can use the default parser but avoid the representation of dot(.) in a header as a nested JSON? or the only options is to create a custom parser?

Thanks

Creates double objects and triggers `Error: CSV no longer writable` on resubmit

Hello,

Im trying to use your module to convert a csv file to json and send said json as the response to a request.

A few issues have occurred.

First, It seams csvtojson is creating duplicates of my csv. My csv currently has 4 lines including the header line (subject to change) and the output contains 8 json objects with 2 headers.

Second, I am submitting a request for data through a form. The first time I submit the request, I get data back but receive the error above. If I change a few parameters and submit again, my express server dies and spits out the error Error: CSV no longer writable

Is there any insight you can provide to solve this problem?

Thanks!

Nesting beyond two levels does not work

Hi @Keyang

This looks like it could be a really useful library for me. But it's not working as expected with objects that are nested more than two levels.

Take a very simple example. Here is a CSV with a single header and single value.

leve1.level2.level3
hello

According to the docs it should produce the following JSON:

[ { leve1: { level2: { level3: "hello" } } } ]

However, it is actually producing:

[ { leve1: { level2: [Object] } } ]

Can you take a look at this - I can't really see in your code where it's going wrong.

Thanks,

James

Column Array in Multiple Lines

Is possible to load column arrays joining data from multiple lines in case if we have 1..n number of indexes to be loaded we can not have multiple columns defined, instead it is easier to have multiple lines of data.

Example:

name, age, employer[].name, employer[].startDate, employer[].endDate
Paul, 38, Company 1, 01-01-2010, 20-08-2011
, , Company 2, 21-08-2011, 21-01-2015

Thanks,
DGKM

Nested array of objects

I have a complex json structure to replicate.
It is basically something like this:

csv: [{
    levelA : {
    title: 'foo',
    levelB: [
      {
      title : 'bar',
      levelC: [
        {title : 'deep blue'},
        {title : 'deep red'},
        { ... }
        ]
      },
      { ... }
      ]
    }
  }, 
  { ... }
]

As you can see it is a deep nested structure with arrays of objects.
Do you think it would be possible to achieve this just using csvtojson parser keys or should I go for a custom parser?

converter.fromString returning partial results with workerNum:1

I'm using "csvtojson": "^0.4.3",
When parsing a tsv string, I was only getting 7 out of 30 records. (A small MWS GetReportList)
When I set workerNum to 4 then it returned all 30 records.
My tsv string is only 12 KB. It fails silently with a partial result.
I could readStream.pipe(csvConverter).pipe(writeStream) but it is always a small file (<100k).

Questions, What is the best way to detect a partial result? I am concerned that I could have the same issue with workerNum:4 in production. I won't always have the number of expected records to test against. Should I just use pipe writeStream all the time? - Thanks for the help.

code:
function convertTsvToJson(val){
var deferred = Q.defer();
var CSVConverter = require("csvtojson").Converter;
var converter = new CSVConverter({
workerNum:4,
delimiter:"\t",
});

converter.fromString(val, function(err, result){
if (err) {
deferred.reject(err);
} else {
if (result === undefined || _.isEmpty(result) ) {
deferred.reject('Issue parsing tsv, result is invalid');
} else {
console.log('resultJsonLength', result.length);
deferred.resolve(result);
}
}
});
return deferred.promise;
}

trailing comma

I am getting the error "Cannot call method 'parse' of undefined" when the CSV rows have trailing commas. For example "field1,field2 \n value1,value2, \n value1,value2, \n"

Thanks,

Nathan

Add new feature to convert CSV to JSON where first row in csv is not a column name

You can see Demo here: http://www.convertcsv.com/csv-to-json.htm

This is an input data read from CSV :

productId,classification_id,mfgDate,gross_price,retail_price,color,size
CC102-PDMI-001,eClass_5.1.3,10/3/2014,12,40,green,40
CC200-009-001,eClass_5.1.3,11/3/2014,5,3,blue,38
CC200-070-001,eClass_5.1.3,10/4/2014,15,13,red,45
CC200-099,eClass_5.1.3,10/5/2014,20,17,orange,28
CC200-100,eClass_5.1.3,10/3/2014,5,4,black,32

And expected resulted JSON:
[
{
"FIELD1":"productId",
"FIELD2":"classification_id",
"FIELD3":"mfgDate",
"FIELD4":"gross_price",
"FIELD5":"retail_price",
"FIELD6":"color",
"FIELD7":"size"
},
{
"FIELD1":"CC102-PDMI-001",
"FIELD2":"eClass_5.1.3",
"FIELD3":"10/3/2014",
"FIELD4":"12",
"FIELD5":"40",
"FIELD6":"green",
"FIELD7":"40"
},
{
"FIELD1":"CC200-009-001",
"FIELD2":"eClass_5.1.3",
"FIELD3":"11/3/2014",
"FIELD4":"5",
"FIELD5":"3",
"FIELD6":"blue",
"FIELD7":"38"
},
{
"FIELD1":"CC200-070-001",
"FIELD2":"eClass_5.1.3",
"FIELD3":"10/4/2014",
"FIELD4":"15",
"FIELD5":"13",
"FIELD6":"red",
"FIELD7":"45"
},
{
"FIELD1":"CC200-099",
"FIELD2":"eClass_5.1.3",
"FIELD3":"10/5/2014",
"FIELD4":"20",
"FIELD5":"17",
"FIELD6":"orange",
"FIELD7":"28"
},
{
"FIELD1":"CC200-100",
"FIELD2":"eClass_5.1.3",
"FIELD3":"10/3/2014",
"FIELD4":"5",
"FIELD5":"4",
"FIELD6":"black",
"FIELD7":"32"
}
]

Can I omit a specific column before the conversion

Is it possible, in the current state of this module, to omit a specific column form CSV before getting the JSON result.

e.g

id, name, age, address
 1, john,  25, 120 road ave

I want to omit address before converting the data to JSON.

writeAfterEnd being thrown

I have been getting the following error while using this module tonight. The relevant code is below the error. The error comes when, after a converted page is rendered, i go to a second url to render another page.

Wed Jul 02 2014 23:14:36 GMT-0400 (Eastern Daylight Time): Node server stopped.

stream.js:94
      throw er; // Unhandled stream error in pipe.
           ^
Error: write after end
    at writeAfterEnd (_stream_writable.js:133:12)
    at csvAdv.Writable.write (_stream_writable.js:181:5)
    at Request.ondata (stream.js:51:26)
    at Request.emit (events.js:95:17)
    at IncomingMessage.<anonymous> (D:\repos\openshift-nodejs\node_modules\request\request.js:932:12)
    at IncomingMessage.emit (events.js:95:17)
    at IncomingMessage.<anonymous> (_stream_readable.js:748:14)
    at IncomingMessage.emit (events.js:92:17)
    at emitReadable_ (_stream_readable.js:410:10)
    at emitReadable (_stream_readable.js:406:5)

And the code...

var converter = require('csvtojson').core.Converter;
var express   = require('express');
var request   = require('request');
var router    = express.Router();

// Don't save everything to memory. This facilitates large CSV's
var csvConverter = new converter({constructResult:false});

/* 
 *  GET home page.
 */
router.get('/', function(req, res) {
  res.render('index');
});


/*
 *  URL's to pull data from
 */
var aSchool= require('./schools/aSchool');

router.get('/aSchool/bulletin/:term', function(req, res) {
  var term = req.params.term;
  if ( aSchool.validTerms.indexOf(term) >= 0 ) {
    res.writeHead(200, {"Content-Type": "application/json"});
    var url = '';
    switch (term) {
      case 'fall':
        var url = aSchool.bulletin + aSchool.fall;
        break;
      case 'spring':
        var url = westga.bulletin + westga.spring;
        break;
      case 'summer':
        var url = westga.bulletin + westga.summer;
        break;
      default:
        console.log('Something went wrong...');
    }
    console.log(url);
    request.get(url).pipe(csvConverter).pipe(res);
  } else {
    res.status(404);
  }
});


module.exports = router;

Column filter

How to filter only a desired set of columns during CSV to JSON conversion.
So I have columnA, columnB, ColumnC.
And I would like to display only ColumncC in the JSON file.

How to go about it.

Raising an issue as I didn't find any mailing list. Thanks~

Special Chars encoding

Hi!
I'm trying to convert a CSV to JSON.
Text value BEFORE to convert:

  • "O FUTEBOL.¿"
  • " emoção"

The same value AFTER convertion:

  • "O FUTEBOL.\ufffd"
  • " emo\ufffd\ufffdo"

There's an option to disable code/encode special chars?

Is worth to say that all special chars (ç,ã,õ,í,¿, etc) are being converted to the unicode char \ufffd that is equal to 'REPLACEMENT CHARACTER'.
More info: http://www.fileformat.info/info/unicode/char/0fffd/index.htm

Can you help me?

Conversion chokes on single quotation

I receive the error "Incomplete CSV file detected. Quotes does not match in pairs." on a tab-delimited file, where this entry includes a " to indicate inches.

Is there a way to escape it or otherwise convince it to continue past it?

this.eol undefined reading empty csv file.

  • csvtojson version: 0.3.11
  • NodeJS version: v0.10.22
  • Mac OS version: 10.9.4

Incase csv file is empty, this.eol will not be set that leads to _flush function throws the following exception:
undefined:1
undefined]
^
SyntaxError: Unexpected token u
at Object.parse (native)
at Result.getBuffer (/Users/trangtungn/Working/Omnia_Git/om_file_uploader/node_modules/csvtojson/libs/core/Result.js:19:15)
at csvAdv. (/Users/trangtungn/Working/Omnia_Git/om_file_uploader/node_modules/csvtojson/libs/core/init_onend.js:8:70)
at csvAdv.EventEmitter.emit (events.js:117:20)
at _stream_readable.js:920:16
at process._tickDomainCallback (node.js:459:13)

By initiallizing this.eol, I can get through the problem.

Map headers to different field name for objects

How can we efficiently map headers to field names for outputting a JSON object with the desired keys rather than what comes from the CSV file as-is?

So going from:

a, b, c
1, 2, 3

to:

{
aAlt: 1,
bAlt: 2,
cAlt: 3
}

tab delimiter?

how do i specify a tab delim at the cmd line?
i tried

csvtojson --delimiter='\t' data.tsv
csvtojson --delimiter='¥t' data.tsv

also the | seems to get taken as a pipe in the cmd:

csvtojson --delimiter=| data/jgram/comments.tsv
zsh: permission denied: data/jgram/comments.tsv

types

hi Keyang,

Is there a way to tell the parser how to convert a column type?
for example, i have this csv data:
name, timestamp, value
a.b.c.d.e.f, 1397434200, 1.82

yield this json:

[ { name: 'a.b.c.d.e.f', timestamp: '1397434200', value: '1.82' } ]

i would like that timestamp and value will be a number and not a string

thanks in advance,
Shay

Failure to read large csv

This is my code to read the csv:

   var csvtojson = require("csvtojson").core.Converter,
   var fileStream = fs.createReadStream("my.csv");
    var csvConverter = new csvtojson({
      constructResult: true
    });
    fileStream.pipe(csvConverter);
    csvConverter.on("end_parsed", function(data) {
      next(null, data); //1
    });

The only problem I have here is that this works great for a small csv, but it fails to give any output for large csv . The data I receive in the 1 is an empty array.

You can get the sample csv here.
It's very common for me to parse very large csv to the tune of upto 20k or even 40k records.

csv data

Hello Keyang, thanks for the great libray.

I m not able to understand how can i pass a simple csv string and get the json out put.

can you please show an example on to how to pass raw csv data to the convertor?

Stream does not return JSON array

Hi, first of all thank you for the great lib.

I have an issue to parse a CSV using the streaming API because it does not return a JSON array. I don't know if it is the "normal" behavior but here an example to explain my problem (with Node v0.12):

The original CSV file:

annee;jour;date;b_1;b_2;devise;
2015029;LUNDI   ;09/03/2015;35;31;eur;
2015028;SAMEDI  ;07/03/2015;48;9;eur;

The code to parse the CSV:

var rs = require('fs').createReadStream('source.csv');
var ws = require('fs').createWriteStream('destination.json');

 rs.pipe(new Converter({ constuctResult: false, delimiter: ';', trim: true })).pipe(ws);

The result file content:

{"annee":2015029,"jour":"LUNDI","date":"09/03/2015","b_1":35,"b_2":31,"devise":"eur","":""}
{"annee":2015028,"jour":"SAMEDI","date":"07/03/2015","b_1":48,"b_2":9,"devise":"eur","":""}

I would expected to have this json object:

[
  {"annee":2015029,"jour":"LUNDI","date":"09/03/2015","b_1":35,"b_2":31,"devise":"eur","":""},
  {"annee":2015028,"jour":"SAMEDI","date":"07/03/2015","b_1":48,"b_2":9,"devise":"eur","":""}
]

It works fine if I use the end_parsed event instead, but not with the stream and the pipe API.

Thank you for your help

Note: I don't know why by it adds me an empty field in the JSON result too

Incorrectly escaping quotes when quotes are used inside a column

Brief summary:

CSV record is

... ,"[""Neil"", ""Bill"", ""Carl"", ""Richard"", ""Linus""]", ...

What's parsed is

..."[\"\"Neil\"\", \"\"Bill\"\", \"\"Carl\"\", \"\"Richard\"\", \"\"Linus\"\"]\", ...

In this case, the comma at the end is not treated as a delimiter...

Which means that the column is not delimited correctly, many columns run together, and the entire json result is incorrect. Seems to be an issue with determining quote escaping.

Version 0.4.4 broke existing code

Thanks for adding the error event Listener from 0.4.3 to 0.4.4

However, my code stopped working when I upgraded to this new version. What I mean by that is whenever I upload a csv, my code gets stuck now.

Current code, for reference:

var form = new formidable.IncomingForm();
form.parse(req, function(err, fields, files) {
 console.log(files.file.type);
 if (!files || !files.file || files.file.size == '0') {
   res.send("CSV must not be empty.");
 }else if( files.file.type == 'text/csv'){
   var Converter = require("csvtojson").Converter;

  var csvFileName = files.file.path;
  var fileStream = fs.createReadStream(csvFileName);
  //new converter instance

  var csvConverter = new Converter({constructResult:true});

  csvConverter.on("record_parsed",function(resultRow,rawRow,rowIndex) {
    //console.log(resultRow); //here is your result json object
    fileStream.unpipe(csvConverter);
    console.log('manually close the file stream2');
    csvConverter.end();
  });

  //end_parsed will be emitted once parsing finished
  csvConverter.on("end_parsed",function(jsonObj) {
    res.json(200, jsonObj);
  });

  //read from file
  fileStream.pipe(csvConverter, { end: false });
} else {
  res.send("Not a valid type! Must be CSV format");
}
});

global install of csvtojson -> bash can't find csvtojson command

even in the csvtojson/bin directory, I have to use ~$node csvtojson to get a reaction, and I get this error:

module.js:340
throw err;
^
Error: Cannot find module '../libs/interfaces'
at Function.Module._resolveFilename (module.js:338:15)
at Function.Module._load (module.js:280:25)
at Module.require (module.js:364:17)
at require (module.js:380:17)
at Object. (/Users/jeffmjack/csvtojson/bin/csvtojson:3:13)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)

commandline execution does not exit

I am trying to use csv2json from the command line to convert a csv file to json. I am doing the following: csvtojson test.csv>file.json. The entire file is converted but the program never exits; it rather hangs.

[deploy@shippinghub ~]$ csvtojson test.csv>file.json
^C[deploy@shippinghub ~]$

As you can see I had to do a ctrl+C to force it to quit. I think this is a bug. it should exit and give control back to shell after the file is converted.

Please advise.

Custom line ending

HI there, it would be good if you enable a custom line ending character to be passed in the array of params when creating the Converter.

Ignore blank fields

Is there a command line option to ignore fields in a record that do not have a value?
For example the following csv file

col1, col2, col3
d1,,d3

would become

{
  col1:"d1",
  col3:"d3"
}

I'm particularly interested in this feature for use with arrays

col1, col2[], col2[]
d1,,d3

would become

{
  col1: "d1",
  col2: ["d3"]
}

CLI checkType usage

What is the syntax for disabling the checkType logic via the command line? I have a large file to convert, with a large String field, however random lines are being detected as numeric. This is throwing off the import logic in the next app I'm using. My goal is to have all fields treated as strings so I may convert them in the next phase of my processing.

constructResult true or false

(This is a question so, please label it as that)

Hi, let me thank you for this repo, it is great!

I am using it a lot and I was wondering if I am managing large files using {constructResult:false} and then I need to upload the entire json to memory right after the json file is complete, it would be the same if I use {constructResult:true} and having the json directly in memory? Does that make any sense?

Thanks!

option: fork new process

Converting csv to json needs computing resource. it is good to have an option to allow the whole converting happen in another process so that it wont affect main process which always has busy.

How do i create an array of objects?

i.e. something like this

"addresses":[
{"type":"work", "address_line1":"val1", "address_line2":"val2"},
{"type"home","address_line1":"val3", "address_line2":"val4" }
]

Is that possible now? or does it require a custom parser to be written?

Thanks! Seems like a promising solution.

changing delimiter when using the api

i am using some code from your docs:

  //Converter Class
  var Converter = csvtojson.core.Converter;

  //CSV File Path or CSV String or Readable Stream Object
  var csvFileName = process.cwd()+'/sample_data/orders.csv';

  //new converter instance
  var csvConverter = new Converter();

  //end_parsed will be emitted once parsing finished
  csvConverter.on("end_parsed",function(jsonObj){
     console.log(jsonObj); //here is your result json object
  });

  //read from file
  csvConverter.from(csvFileName);

how can i change the delimiter of the orders.csv file? youre expecting , but it should be ;

tanks for your help

Emit 'data' event

Is there a reason to emit 'record_parsed' instead of the 'data' event from the stream API?

This is an issue when trying to pipe parsed records.

Exception thrown "SyntaxError: Unexpected token ]" reading empty csv file

  • NodeJS version: v0.10.22
  • csvtojson version: 0.3.8 (also 0.3.11)
  • Mac OS version: 10.9.4
  • The exception is thrown when the application receives empty csv file. The exception is as follows:

undefined:2
]
^
SyntaxError: Unexpected token ]
at Object.parse (native)
at Result.getBuffer (/Users/trangtungn/Working/Omnia_Git/om_file_uploader/node_modules/csvtojson/libs/core/Result.js:19:15)
at csvAdv. (/Users/trangtungn/Working/Omnia_Git/om_file_uploader/node_modules/csvtojson/libs/core/init_onend.js:8:70)
at csvAdv.EventEmitter.emit (events.js:117:20)
at _stream_readable.js:920:16
at process._tickDomainCallback (node.js:459:13)

  • Looking at the source code (function csvAdv.prototype._flush(cb)) and found that the Converter only pushes eol and ] whenever the buffer length is 0; [ is not included which causes the issue.
  • By catching zero buffer length case as below, I can run the app without the issue:

csvAdv.prototype._flush = function(cb) {
if (this._buffer.length != 0) { //emit last line
this.emit("record", this._buffer, this.rowIndex++, true);
this.push(eol + "]");
} else { // incase of _buffer length is 0, add [, eol and ]
this.push("[" + eol + "]");
}
cb();
};

Error handle Options needed

Hi Keyang

The current csvtojson does not support invalid CSV file conversion error handle. (See http://stackoverflow.com/questions/30028453/how-to-handle-incomplete-csv-file-detected-quotes-does-not-match-in-pairs)

Is it possible to let csvtojson emit invalid csv (e.g. a binary file with fake.csv name) when it detects the error.

Currently nodejs will report "Incomplete CSV file detected. Quotes does not match in pairs" message. It would be nice to emit "invalid csv" message at this error so we can catch the error and handle it.

I know we should have our separate CSV validation code but most of other CSV nodejs module already have validation and conversion built in. I really want to use your module as it provides convenience as in convert csv to json but it could be better by including this "emit invalid csv file message" feature instead of just print out a message to the console.

Thanks.

Non-delineating commas being removed from text

@Keyang I'm getting a weird issue where quoted csv text like
"Many types of citrus are available, including oranges, white grapefruit, tangelos, and lemons."
when converted into json looks like
"notes":"Many types of citrus are availableincluding orangeswhite grapefruittangelos,and lemons."
The object coming back from csvtojson has several, but not all, of the non-delineating commas removed and whitespace randomly trimmed. If I set trim: false I don't have the issue with the whitespace being removed but the issue with the commas being removed is still there. Any ideas? csvtojson is being used here https://github.com/azgs/az-agriculture/blob/master/data/csv2json.js#L85. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.