Giter Club home page Giter Club logo

Comments (10)

uhop avatar uhop commented on June 3, 2024

My notes:

Below is the code I used to test it:

const fs = require("fs");

const { chain } = require("stream-chain");
const { parser } = require("stream-csv-as-json");
const { asObjects } = require("stream-csv-as-json/AsObjects");
const { streamValues } = require("stream-json/streamers/StreamValues");

let objectCount = 0;

const pipeline = chain([
  fs.createReadStream("./sample.csv"),
  parser(),
  asObjects(),
  streamValues(),
  (data) => {
    objectCount++;
    console.log(data);
    if (objectCount % 100 === 0) console.log(objectCount);
    return data; // I added this line to continue streaming
  },
]);

pipeline.on("error", (err) => {
  console.error("pipeline error", err);
});
// The next event handler enables 'end' as well.
// pipeline.on("data", (data) => {
//   console.log("data", data);
// });
pipeline.on("end", () => {
  console.warn("pipeline end", objectCount);
});
pipeline.on("finish", () => {
  console.warn("pipeline finish", objectCount);
});

My sample file (sample.csv):

a,b,c
1,2,3
4,5,6

package.json:

{
  "name": "temp",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "stream-chain": "^2.2.1",
    "stream-csv-as-json": "^1.0.2",
    "stream-json": "^1.5.0"
  }
}

from stream-csv-as-json.

stevenroussey-privicy avatar stevenroussey-privicy commented on June 3, 2024

I originally had return data but it got stuck/paused at item 32.

I will investigate the use of the finish event. However, one thing I did not note was that the stream was a zip file entry from yauzl.

from stream-csv-as-json.

stevenroussey-privicy avatar stevenroussey-privicy commented on June 3, 2024

"finish" is not being called either in my case. Why does your code not show "pipeline end"?

I'm working on an example

from stream-csv-as-json.

stevenroussey-privicy avatar stevenroussey-privicy commented on June 3, 2024

Change sample.csv to be this:

a,b,c
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
4,5,6
1,2,3
6,6,6

from stream-csv-as-json.

stevenroussey-privicy avatar stevenroussey-privicy commented on June 3, 2024

It has over 32 lines of data. The last one is recognizable.

Output:

node index.js
{ key: 0, value: { a: '1', b: '2', c: '3' } }
{ key: 1, value: { a: '4', b: '5', c: '6' } }
{ key: 2, value: { a: '1', b: '2', c: '3' } }
{ key: 3, value: { a: '4', b: '5', c: '6' } }
{ key: 4, value: { a: '1', b: '2', c: '3' } }
{ key: 5, value: { a: '4', b: '5', c: '6' } }
{ key: 6, value: { a: '1', b: '2', c: '3' } }
{ key: 7, value: { a: '4', b: '5', c: '6' } }
{ key: 8, value: { a: '1', b: '2', c: '3' } }
{ key: 9, value: { a: '4', b: '5', c: '6' } }
{ key: 10, value: { a: '1', b: '2', c: '3' } }
{ key: 11, value: { a: '4', b: '5', c: '6' } }
{ key: 12, value: { a: '1', b: '2', c: '3' } }
{ key: 13, value: { a: '4', b: '5', c: '6' } }
{ key: 14, value: { a: '1', b: '2', c: '3' } }
{ key: 15, value: { a: '4', b: '5', c: '6' } }
{ key: 16, value: { a: '1', b: '2', c: '3' } }
{ key: 17, value: { a: '4', b: '5', c: '6' } }
{ key: 18, value: { a: '1', b: '2', c: '3' } }
{ key: 19, value: { a: '4', b: '5', c: '6' } }
{ key: 20, value: { a: '1', b: '2', c: '3' } }
{ key: 21, value: { a: '4', b: '5', c: '6' } }
{ key: 22, value: { a: '1', b: '2', c: '3' } }
{ key: 23, value: { a: '4', b: '5', c: '6' } }
{ key: 24, value: { a: '1', b: '2', c: '3' } }
{ key: 25, value: { a: '4', b: '5', c: '6' } }
{ key: 26, value: { a: '1', b: '2', c: '3' } }
{ key: 27, value: { a: '4', b: '5', c: '6' } }
{ key: 28, value: { a: '1', b: '2', c: '3' } }
{ key: 29, value: { a: '4', b: '5', c: '6' } }
{ key: 30, value: { a: '1', b: '2', c: '3' } }
{ key: 31, value: { a: '4', b: '5', c: '6' } }
pipeline finish 32

from stream-csv-as-json.

stevenroussey-privicy avatar stevenroussey-privicy commented on June 3, 2024

removing the return data; "fixes" it to return 44 rows ending in 6,6,6. Still no "end" event.

I don't see these behaviors on the stream-json

from stream-csv-as-json.

uhop avatar uhop commented on June 3, 2024

Take my original code above with your 44 rows CSV file (or any other), uncomment the data event handler and you will get all 44 rows.

I already gave this link before: https://nodejs.org/api/stream.html#stream_event_end — it explains why end is not issued. This is not something specific to stream-*, it is how streams were designed to behave (don't ask me why). The "flowing mode" is explained on the same page, e.g., https://nodejs.org/api/stream.html#stream_two_reading_modes

After reading it, please take my original code above (with data commented out) and add these three lines at the end:

console.log("readable flowing", pipeline.readableFlowing);
pipeline.resume();
console.log("readable flowing", pipeline.readableFlowing);

This is what I see:

readable flowing null
readable flowing true
{ key: 0, value: { a: '1', b: '2', c: '3' } }
...omitted for brevity...
{ key: 43, value: { a: '6', b: '6', c: '6' } }
pipeline finish 44
pipeline end 44

You can read all about readableFlowing and it's possible values and what they mean using the link above. I hope it helps!

PS: Incomplete reads you saw before is the stream machinery filling in its buffers anticipating future requests. Obviously it doesn't read everything.

from stream-csv-as-json.

stevenroussey-privicy avatar stevenroussey-privicy commented on June 3, 2024

Just to be clear, and easy for you to test:

Archive.zip

This is the result:

> node index.js
pipeline finish 32

For reference:
index.js:

const fs = require("fs");

const { chain } = require("stream-chain");
const { parser } = require("stream-csv-as-json");
const { asObjects } = require("stream-csv-as-json/AsObjects");
const { streamValues } = require("stream-json/streamers/StreamValues");

let objectCount = 0;

const pipeline = chain([
  fs.createReadStream("./sample.csv"),
  parser(),
  asObjects(),
  streamValues(),
  (data) => {
    objectCount++;
    return data; 
  },
]);

pipeline.on("error", (err) => {
  console.error("pipeline error", err);
});
pipeline.on("end", () => {
  console.warn("pipeline end", objectCount);
});
pipeline.on("finish", () => {
  console.warn("pipeline finish", objectCount);
});

from stream-csv-as-json.

stevenroussey-privicy avatar stevenroussey-privicy commented on June 3, 2024

So either of these work (just talking about finish, not end):

  1. pipeline.on("data", function(){})
  2. removing returning data in my callback
  3. pipeline.resume();

Anyhow, thank you for the references. I think my code for csv is slightly different than all my other code as it is not using pipe, and the documentation you referred above mentions changes reading mode. Also worries me that the code of ours has other bugs...

I am debugging someone else's code not in my area of expertise, but now I am getting there!

Thank you!

from stream-csv-as-json.

uhop avatar uhop commented on June 3, 2024

removing returning data in my callback

It works only because a pipe is broken and produces no values — the stream machinery tries to fill internal buffers and none are filled. Well, some stuff works because it is documented to work this way, some stuff works because of obscure side-effects — what to use is a judgement call.

I am glad that you were able to go forward with you project.

from stream-csv-as-json.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.