Giter Club home page Giter Club logo

stream-csv-as-json's Introduction

stream-csv-as-json NPM version

stream-csv-as-json is a micro-library of node.js stream components with minimal dependencies for creating custom data processors oriented on processing huge CSV files while requiring a minimal memory footprint. It can parse CSV files far exceeding available memory. Even individual primitive data items can be streamed piece-wise. Streaming SAX-inspired event-based API is included as well.

stream-csv-as-json is a companion project for stream-json and it is meant to be used with its filters, streamers and general infrastructure.

Available components:

  • Streaming JSON Parser.
    • It produces a SAX-like token stream.
    • Optionally it can pack individual values.
    • The main module provides helpers to create a parser.
  • Essentials:
    • AsObjects uses the first row as a list of field names and produces rows as shallow objects with named fields.
    • Stringer converts a token stream back into a JSON text stream.

All components are meant to be building blocks to create flexible custom data processing pipelines. They can be extended and/or combined with custom code. They can be used together with stream-chain and stream-json to simplify data processing.

This toolkit is distributed under New BSD license.

Introduction

const {chain}  = require('stream-chain');

const {parser} = require('stream-csv-as-json');
const {asObjects} = require('stream-csv-as-json/AsObjects');
const {StreamValues} = require('stream-json/streamers/StreamValues');

const fs   = require('fs');
const zlib = require('zlib');

const pipeline = chain([
  fs.createReadStream('sample.csv.gz'),
  zlib.createGunzip(),
  parser(),
  asObjects(),
  streamValues(),
  data => {
    const value = data.value;
    return value && value.department === 'accounting' ? data : null;
  }
]);

let counter = 0;
pipeline.on('data', () => ++counter);
pipeline.on('end', () =>
  console.log(`The accounting department has ${counter} employees.`));

See the full documentation in Wiki.

Installation

npm install --save stream-csv-as-json
# or:
yarn add stream-csv-as-json

Use

The whole library is organized as a set of small components, which can be combined to produce the most effective pipeline. All components are based on node.js streams, and events. They implement all required standard APIs. It is easy to add your own components to solve your unique tasks.

The code of all components is compact and simple. Please take a look at their source code to see how things are implemented, so you can produce your own components in no time.

Obviously, if a bug is found, or a way to simplify existing components, or new generic components are created, which can be reused in a variety of projects, don't hesitate to open a ticket, and/or create a pull request.

Release History

  • 1.0.4 technical release: updated deps.
  • 1.0.3 technical release: updated deps.
  • 1.0.2 technical release: updated deps, updated license's year.
  • 1.0.1 minor readme tweaks, added TypeScript typings and the badge.
  • 1.0.0 the first 1.0 release.

stream-csv-as-json's People

Contributors

uhop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

stream-csv-as-json's Issues

An in-range update of stream-chain is breaking the build 🚨

The devDependency stream-chain was updated from 2.1.0 to 2.2.1.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

stream-chain is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 29 commits.

  • e0a041e New version: 2.2.1.
  • fa9807f Change symbols' namespace.
  • a1347f1 Merge pull request #1 from uhop/greenkeeper/initial
  • 95ab3ea Merge pull request #2 from kyliau/patch-1
  • a92f4cf Create LICENSE
  • ebf5b85 docs(readme): add Greenkeeper badge
  • 4b43a2e Simplified asGen.
  • c7c673c Allow iterables as first items in a chain.
  • 32c1678 Fixed a minor error.
  • c298ba0 Replaced unique objects with unique symbols.
  • 59f0a9e Fixed typos.
  • d35b74b Added asGen().
  • dff2d44 Restructured so asFun() is a stream-independent.
  • aede2ad Added FromIterable and regularized tests.
  • 1007bf4 Minor bugfix.

There are 29 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

chain not getting an `end` callback

Hi!

Any suggestion on why a stream would not end?

      const pipeline = chain([
        stream,
        parser(),
        asObjects(),
        streamValues(),
        (data: any) => {
          objectCount++;
          if (objectCount % 100 === 0) console.log(objectCount);
        },
      ]);

      pipeline.on('error', (err: Error): void => {
        console.error('pipeline error', err);
        reject(`Csv Parse Error: ${err}`);
      });
      pipeline.on('end', (): void => {
        console.warn('pipeline end', objectCount);
      });

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.