Giter Club home page Giter Club logo

line-by-line's People

Contributors

abelozerov avatar alexloi avatar gbejic avatar gogson avatar netoneko avatar osterjour avatar rustymarvin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

line-by-line's Issues

close() method does not trigger end event

The close() method on its own seems to do very little. It does close the file but does not, of itself, stop subsequent line events or emit the end event as stated in the dcumentation - https://www.npmjs.com/package/line-by-line

close()

Stops emitting 'line' events, closes the file and emits the 'end' event.

Code similar to the following

const LineByLineReader = require('line-by-line');

var lr = new LineByLineReader('file.txt', 
                              { encoding: 'utf8', skipEmptyLines: true });
var rx='';

lr.on('error', function(err) {
  // 'err' contains error object
  console.log(err.message);
  process.exit();
});

lr.on('line', function(line) {
    if (line.includes ('===END MARKER===')) {
        console.log("Finito");
        lr.close();
        lr.pause();
    }
    console.log("Processing " + line);
    rx+='|\\.'+line.replace(/\./g,'\\.');
});

lr.on('end', function () {
    console.log("result regex "+ rx);
});

never prints a "result regex" line and if I omit the pause call then lines after the "END MARKER" line are also processed.

On the other hand code like this does emit an end event:

const LineByLineReader = require('line-by-line');

var lr = new LineByLineReader('file.txt', 
                              { encoding: 'utf8', skipEmptyLines: true });
var rx='';
var done=false;

lr.on('error', function(err) {
  // 'err' contains error object
  console.log(err.message);
  process.exit();
});

lr.on('line', function(line) {
    if (line.includes ('===END MARKER===')) {
        console.log("Finito");
        done=true;
    }
    if (done ) return; // skip lines after end marker
    console.log("Processing " + line);
    rx+='|\\.'+line.replace(/\./g,'\\.');
});

lr.on('end', function () {
    console.log("result regex "+ rx);
});

This seems to be contrary to the documentation. It is inefficient to continue processing, particularly if the END MARKER line shows up near the front of the file.

Last Line

How to identify the last line?

It's different than the "end" since on the end it already passed the last line.

Thanks

Skips a line and makes a line at the end??

This is some weird, weird stuff going on. I'm too tired to explain it right now, that's why I started this repo to show the bare basics of what the problem I'm facing is.

The expected outcome of running test.js is that in output.txt (it will create it when you run the script, don't worry) there will be 17,885 lines -- two more than the 17,883 of Log.txt, and the cause of that are two empty lines on :17150 and :17885. Those are obviously not supposed to be there.

Node version: 8.11.2
line-by-line version: 0.1.6

Can you help me see why 'line' events are not being emitted?

Hi, I'm somewhat new to streams, so I assume I'm missing something basic here.

Assuming filePath is defined and associated with a large file, why would end events be emitted, but line events never are? mutatedBuffer isn't really important here, just part of the context. Thanks in advance.

function writeFileLineByLine(filePath) {
  const mutatedBuffer = new MutatedFileBuffer(filePath);

  const lineReader = new LineByLineReader(filePath);

  lineReader.on('line', (line) => {
    // never called
    const mutatedLine = `${line} plus appended`;
    mutatedBuffer.push(mutatedLine);
  });

  lineReader.on('error', (err) => {
    // never called
    console.error(err);
  });

  lineReader.on('end', () => {
    // called
    mutatedBuffer.store();
  });

  return lineReader;
}

Unable to perform line.pause()

Hi,

I recently updated my line-by-line to version 0.1.5 and realized that my program was not working well as it was before. After debugging, I realized that it might be a timing issue, because the next line is processed before I can pause the line reader. Below is a little snippet of how my program calls line-by-line.

var lr = new linebyline(data. files);

    lr.on('error', function (err) {
        console.log(err);
    });

    lr.on('line', function (line) {
        /* Pause emitting of lines */
        lr.pause();
        /* Process line */

I've reverted to version 1.0.4 for now, it works well. Kindly shed some light into this. Thanks!

After about 1.4M lines I hit memory issues

Even though this library claims it doesn't load the file into memory there is either a memory leak or it IS loading the file into memory. I'm trying to parse a 7M+ line file going line by line and then inserting it into a mongdb database. I tried this a few times and each time nodejs exited and told me there was a memory issue.

I ended up doing something like this without the library and I'm not getting a single memory issue.
When I used my method I was getting at most 1MB of extra RAM being used compared to IDLE and I'm pretty sure that's just the mongodb connection.
Using your library I was hitting 130MB - 200MB of RAM. When I first open node with my project I sit on about 70MB of RAM on IDLE.

var fs = require('fs'),
    es = require("event-stream");

var lineNumber = 1;

var s = fs.createReadStream('import.txt').pipe(es.split()).pipe(es.mapSync(function(line){
    // pause the readstream
    s.pause();

    lineNumber += 1;

    (function(){
        // Run my line by line code here
        // resume the readstream
        s.resume();
    });
}).on('error', function(){
    console.log('Error while reading file.');
}).on('end', function(){
    console.log('Read entirefile.');
}));

Extra empty line outputted when CRLF straddles internal buffer

What I see:
When streaming a large file with CRLF (\r\n) line endings, occasionally an extra "empty" line is emitted by linereader.on('line', ...).

i.e. When "printing" the file/stream "line by line", an empty line appears in the output when there are no empty lines in the original file.

What I expect:
No extra blank lines are outputted.

Reproduction:
See attached test case - had to rename index.js and packagejson file to txt for github upload
the test will:

  • generates a file just bigger than 65536; which I suspect is the internal buffer size
    • also attached a sample data file, but it is easy to see with the test that there is nothing special about the data except for where the last \r\n lands.
  • the test then goes through the file line-by-lines checking for an erroneous empty line

Other Notes:

  • Could be the cause of this: #26 (comment)
  • I didn't see an api to set/read the internal buffer size which would have helped writing a test and debugging #4
  • I didn't see an api to return to me the end of lines #11 - which would help me inspect the original file. (I would have just streamed the file back out verbatim and then diff their output and compare their hash). Especially when I don't control the original input file and could even have mixed-line-endings (so I can't presume to just add '\n' to the end of each emitted 'line')

index.js.txt

package.json.txt

data.txt

Latest package not on NPM

Hi,

I don't know if you own the right to be able to update the package on NPM, but your latest commits including 126bb11 are seriously needed on the NPM platform as their bugs are very noticeable.

I found this repository through here: https://www.npmjs.com/package/line-by-line

So I assume you have the right to update the package.

For now I have manually added the source to my project.

Incorrect parsing of data

In the attached CSV file (I had to zip it since GitHub does not allow attaching CSVs), it has names in the first column, like:

Robert Aragon
Ashley Borden
Thomas Conley

..and so on

After parsing this file using line-by-line and the encoding set to utf8 in the options, I am getting this data as:

Robert Aragon
Ashley Borden
Thomas Conley

Relevant part of my code:
const lineReader = new LineByLineReader(filePath, { encoding: 'utf8' });

How do I fix this?

The CSV that I am using, it is the 1 MB Excel file downloaded from https://dlptest.com/sample-data/ and converted to CSV format using Google Sheets.

1-MB-Test.csv.zip

Set internal buffer size?

Hello,

First, thanks so much for a great package... Makes my life easier! I have a suggestion that could make this plugin a little more useful.

The problem - reading line-by-line from disk is probably significantly slower than reading all at once, if memory is big enough. However, this is useful if memory is not big enough. However, it might improve performance if you could set an internal buffer size so that this module internally read, say a buffer of 10MB at a time, but only provided one line at a time, thereby improving performance.

Do you think this would have a performance benefit? I'm no I/O expert, but it seems like it might.

Include a property to know which EOL has been detected

Hi @Osterjour ,

It will be very useful if the module can tell which END-OF-LINE has been detected to split file into lines.

In our application we use your library to read a file line-by-line, and some parts of the file read will be recreated (just portion), but it must be recreated with same line ending.

Happy new year 😄

Option to include line endings

I want to count the byte size of a file up to a specific point line by line, without knowing what the line endings are this isn't possible unless guessing that their size is one

'End' triggered before the end of the file

File has 2300 lines (csv), in which I build a JSON object that is saved in a file at the end.

However, I realized that the output file was halfway complete. Through some digging with node debugger, I realized that lr.on('end'... was called, even if the file was not read fully.

The csv is not big, only 130kB.

Pointing out the issue here, but using this npm library solved my problem.

Skipped lines at start of file

I'm an angular app on electron using line-by-line to read some json.
The problem is that lines from the start of the file are skipped.
If the file is small enough, no events get catched.

The problem is maybe in the setImmediate beginning to quick (before lr.on('line', () => ...) is called)
I tried a lr.pause immediatly after calling lr = new LineByLineReader('test.txt'); and a resume after registering all event processors... but no effect.

Why does the linereader start immediately? Isn't it more logical we have to call a start method ourselves?

Event end doest work....

When the csv is too large and the "end" event is dispatched, all rows have been not readed. It is like the "end" event was triggered before all "line" event has been dispatched and readed.

The code bellow returns the number of lines of a file. I tested a file with 1500 lines.

   let linesCounter = function(path) {
        return function(callback){ 
            let lr = new LineByLineReader(path)
            let count = 0
            lr.on('line', function(line) {  
                count ++
            })

            lr.on('end', function () {
                callback(null,count)
            })
        }
    }

The result was 1423 lines.
I am using the version 0.1.4

UTF-8 byte order marker not suppressed

line-by-line works very well, but I've noticed that the value returned for the first line read from the file begins with a special character. Each line of my file is a json object. When I invoke JSON.parse(line), it fails on the first line. I inserted a blank line as the first line of the file, and set skipEmptyLines: true. It did not interpret this line as empty, and generated the following error:

23 Sep 13:35:53 - line: 1 error: [SyntaxError: Unexpected token o]
23 Sep 13:35:53 - bad line: o;?

This is that bit of code inside the on.('line' function:
try { line_obj = JSON.parse(line); } catch (err) { util.log('line:',counter,'error:',err); util.log('bad line:',line); return; }
Is there a work-around to avoid this initial character or characters?
Thanks.

Can't use promises or async functions on end event

Having trouble acquiring db connection (using knex) on end event callback.

Async and promises work in other places of my code though.

.babelrc

{
  "presets": ["node6","stage-0"],
  "plugins": ["transform-async-to-generator"]
}

Code

  lr.on('end', async () => {
    const [header, ...users] = lines;
    await insertUsersToDb(db, users);
    debug(`Finished reading ${users.length} users from ${path}.`);
  });

Doesn't work while unit-testing with Mocha and Chai

Hi, the module seems to stop working from the moment when it sees Chai's expect keyword.
Following is the sample testing code I tried:

const LineByLineReader = require('line-by-line');
const expect = require(`chai`).expect;
const path = require('path');

describe('Tests for line-by-line', function () {
  it('should be something sensible', function () {
    let filePath = path.resolve(`./test/sampleText.js`)
    let lr = new LineByLineReader(filePath);

    lr.on('error', function (err) {
      console.log('ERROR: ' + err);
    })
    lr.on('line', function (line) {
      if (line.includes(`Avada`)) {
        console.log(`line: ` + line);
        expect(line).to.equal('Voldemort');
        console.log('This line should be printed');
      }
    });
    lr.on('end', function () {
      console.log(`Line reading finished.`);
      console.log(`This should also be printed`);
    })
  });
});

Where the sampleText.js is as follows:

/*
Expelliarmus
Stupefy
Morsmordre
Avada Kedavra
*/

The test should definitely fail, because I am asking 'Avada Kedavra' to be equal 'Voldemort'. But still the test passes. The console prints the following:

$ npm run test

> [email protected] test C:\work\MyApp
> mocha || true

  Tests for line-by-line
    √ should be something sensible

  1 passing (24ms)

line: Avada Kedavra

Notice that, the console output: line: Avada Kedavra ensures that sampleText.js is correctly read.
Also, nothing under the expect statement gets executed. Not even the codes inside the end event in lr.on('end', function(){ })

Array outside of the "end" function none existent

Hey there, im pretty new to javscript and i was fooling around with this module. Does anyone has an idea why i wont get any output from the "output" array in my second for-loop. When i put this loop into the lr.on('end') function it will work, but thats not what i want.
I would like to use the line-by-line reader to fill an array with the lines and then use it elsewhere, is that not possible?

const LineByLineReader = require('line-by-line');
const path = require('path');
document.getElementById("inputFile").addEventListener("change", parseFiles);

function parseFiles(evt) {    
    var files = evt.target.files;
    var output = [];
    for (var i = 0, f; f = files[i]; i++) {        
        var lr = new LineByLineReader(path.join(__dirname,'/txtFile/',files[i].name))
        
        lr.on('error', function (err) {
            console.log(err);
        });        
        lr.on('line', function (line) {          
            output.push(line);
            console.log(line)
        });
        
        lr.on('end', function () {                       
            // if i'd put it here it would work
        });
    };
    //not working
   for (var k = 0; k < output.length; k++){
        console.log(k)
        document.getElementById('output').innerHTML += `<li class="list-group-item"> ${output[k]} </li>`;
    }
};

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.