Giter Club home page Giter Club logo

binary-split's Introduction

binary-split

Split streams of binary data. Similar to split but for Buffers. Whereas split is String specific, this library never converts binary data into non-binary data.

travis

How fast is it?

On a SSD w/ a Haswell i5 1.3ghz CPU and 4GB RAM reading a 2.6GB, 5.2 million entry line delimited JSON file takes 15 seconds. Using split for the same benchmark takes 1m23s.

Example usage

const split = require('binary-split')

fs.createReadStream('log.txt')
  .pipe(split())
  .on('data', line => console.log(line))

API

split([splitOn])

Returns a stream. You can .pipe other streams to it or .write them yourself (if you .write don't forget to .end).

The stream will emit a stream of binary objects representing the split data.

Pass in the optional splitOn argument to specify where to split the data. The default is your current operating systems EOL sequence (via require('os').EOL).

For more examples of usage see test.js.

Collaborators

binary-split is only possible due to the excellent work of the following collaborators:

binary-split's People

Contributors

damonoehlman avatar juliangruber avatar lpinca avatar max-mapper avatar mourner avatar tyrasd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

binary-split's Issues

Deprecating binary-split-streams2

I sometimes remember to check up on the package when new node versions come out to make sure its claims are still accurate -- they aren't. I also noticed you switched to through2, which takes care of my original use case for lazy splitting. I'm just writing to mention I deprecated it in favor of this package :)

One thing I did notice is that I may have more thorough tests, so you're welcome to steal them if you like. They seem to pass out of the box with this module.

I was curious but unable to discern why the significant speed difference, maybe it has something to do with optimizations in through2 itself. Anyway, thanks for maintaining this!

mode that only returns split positions

I thought it might be nice to have a flag that made this return the offsets of each newline. the default mode would still remain the same - it would split the buffers for you. but this new mode wouldnt split the buffers, it would just return the position in the byte stream of each newilne that is found

it's an approach similar to what we use in this native module: https://github.com/maxogden/rabin

mode that includes line ending in line buffer and start position in source stream

i would love to work on this if these changes are interesting for anyone.

  • adding start position as a property of each line buffer makes it really easy to process one line at a time and resume where you left off.
  • Including the line ending removes guess work as to the start of the next line

I'm working on a module to use a log file as a changes feed.

Streams2 interface not splitting correctly?

I was just writing some one off scripts and used this to split stdin -- only to find that I was getting combined data from the source. I suspect this might be an artifact of using through2 and possibly not setting the readable portion of the stream in object mode to guarantee one push = one readable chunk.

foo:

/volume1/data/keke
/volume1/data/lar
/volume1/dufus

test.js:

'use strict';

var split = require('binary-split'),
    fs = require('fs');

var stream = fs.createReadStream('./foo').pipe(split());

setTimeout(function () {
    console.log(stream.read().toString());
}, 500);

output:

admin@ds:~/merge-dirs$ node test
/volume1/data/keke/volume1/data/lar/volume1/dufus
admin@ds:~/merge-dirs$ node -v
v4.4.2

remove bops

bops was written before buffer-browserify existed, but was made more or less obsolete by its existence (because nowadays Buffer code 'just works' with browserify). we can remove the bops dependency

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.