Giter Club home page Giter Club logo

streaming-bytestring's People

Contributors

bgamari avatar michaelt avatar unkindpartition avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

streaming-bytestring's Issues

"fromHandle" should not close the handle upon reaching EOF.

fromHandle should not close the handle when it reaches EOF, as the corresponding function from pipes-bytestring doesn't do so.

Closing the handle can cause problems when we want to implement tail-like functionality for files, in combination with a library like hinotify. When new data is added to the file, we can't keep reading from the same handle because it has been closed already.

Closing the handle makes more sense for hGetContents because it mimics the lazy I/O function of the same name.

Does 'lines' work on Windows?

I was reading the code of lines in connection to #3 and noticed that it only cares about \ns.

I call lines on the result of getContents, which requires binary mode. That in turn means that the automatic newline conversion won't be happening.

So, if I'm reading this right, the code will not work correctly on Windows, will it?

null is O(n), not O(1)

The null function forces the entire stream. It documented as an O(1) operation though.

groupBy is missing

Not sure if this is intended, but the docs for group explain it as a special case of groupBy, but there is no groupBy.

null_ could return the remaining stream

For testing to see whether or not a stream is exhausted, there are currently two options. There is null, which evaluates all the effects and discards all elements in the stream. And there is null_, which evaluates effects up to the next non-empty chunk:

null :: Monad m => ByteString m r -> m (Of Bool r)
null_ :: Monad m => ByteString m r -> m Bool

I need something like this:

null_' :: Monad m => ByteString m r -> m (Of Bool (ByteString m r))

Which would tell me if the stream is exhausted, but if it's not, give it back to me, after having stripped off and evaluated any necessary monadic effects.

Lines has incorrect behavior on Windows

The CRLF handling introduced in #5 is incorrect in a small number of corner cases. When a chunk ends with a CR and the next chunk begins with LF, lines fails to remove the CR. I have put up a PR that resolves this issue.

Chunk support for lazy ByteStrings

There are various times that you might want to convert from Stream (Of L.ByteString) m r into ByteString m r and vice-versa (e.g. aeson and cassava encode/decode from lazy bytestrings, so if you have a stream of records that you want to convert then they're going to be converted into lazy bytestrings).

Currently it's possible get around this with B.fromChunks . S.map toStrict but it would be preferable if this was built-in (especially if the actual lazy bytestrings are rather large to convert into a single strict chunk; as such, an alternative may be to do B.fromChunks . S.concat . S.map L.toChunks).

"length" should return an Int64

Functions like drop, take and splitAt index into the string using Int64, so it stands to reason that length should also use Int64 instead of Int. This also applies to I/O functions like hGet.

readFile reads from closed handle

I'm not sure how much of a representative sample I can provide as this is from an internal application, but will try and document what happens.

I've written a streaming wrapper for cassava, and am using it to parse a CSV file. As such, this may very well be an error in my code there, but it appears correct. Using it to parse a file, I have the following:

runResourceT
. (either (liftIO . print) return =<<)
. runExceptT
. S.print
. decodeByName @_ @MyDataType
. SB.readFile
$ myCSVFile

If for some reason CSV parsing fails, then this "succeeds". However, if there are no problems with the CSV, I get the error of:

hGetBufSome: illegal operation (handle is closed)

(I've tried switching around the order of runResourceT and runExceptT but that has no effect.)

If, however, I use bracket from the exceptions package to implement this:

-- | A lifted variant of 'System.IO.withBinaryFile'.
withFile :: (MonadMask m, MonadIO m) => FilePath -> IOMode -> (Handle -> m r) -> m r
withFile fp md = bracket (liftIO (openBinaryFile fp md)) (liftIO . hClose)

Then this succeeds:

withFile myCSVFile ReadMode $
  (either (liftIO . print) return =<<)
  . runExceptT
  . S.print
  . decodeByName @_ @MyDataType
  . SB.hGetContents

So it appears that ResourceT isn't being used correctly here. As such, I wonder if this is related to michaelt/streaming/issues/23

lineSplit

I have a piece of code that I've needed recently in two unrelated projects, so I thought I'd see if you would be interested in adding it to streaming-bytestring. It's a function that splits up a stream after every nth newline. In the PR for lineSplit, documentation is included that explains this better and demonstrates it.

My uses for this have been (1) chunking syslog messages and (2) bulk inserts into elasticsearch. Both of these are newline-delimited protocols where I have needed to break stream into 1000-line chunks.

MMonad instance

Would it be possible to provide an MMonad instance for ByteString?

It seems like it would be very similar to the one for Stream:

instance MMonad ByteString where
  embed phi = loop where
    loop bs = case bs of
      Empty r        -> Empty r
      Chunk bs' rest -> Chunk bs' (loop rest)
      Go m           -> phi m >>= loop
  {-# INLINABLE embed #-}

Happy to provide a PR if you think this is a sane thing.

Reading a file recursively seems to re-open file every loop

When I try to read a file recursively, I get the following error:

openBinaryFile: resource exhausted (Too many open files)

In the code sample below, "hello" is printed 1021 times regardless of how many bytes I drop each recursion.

Code sample:

import           Control.Monad.Trans (MonadIO)
import           Control.Monad.Trans.Resource (runResourceT)
import qualified Data.ByteString.Streaming          as BSS
import qualified Data.ByteString.Streaming.Char8    as BSSC
import           System.TimeIt

main :: IO ()
main = timeIt $ runResourceT $ dump $ BSS.readFile "filename"

dump :: MonadIO m => BSS.ByteString m r -> m ()
dump bs = do
    isEmpty <- BSS.null_ bs
    if isEmpty then return ()
    else do
        BSSC.putStrLn $ BSSC.string "hello" 
        dump $ BSS.drop 1 bs 

'lines' produces an extra empty line

% cat streaming-bug.hs 
import Streaming as S
import qualified Streaming.Prelude as S
import qualified Data.ByteString.Streaming.Char8 as SBS

main = do
  writeFile "test.txt" $ unlines ["a"]
  runResourceT $ streamFold
    (const $ return ())
    join
    (join . SBS.putStrLn)
    (SBS.lines $ SBS.readFile "test.txt")

% stack runghc --resolver=lts-7.0 --package streaming-0.1.4.3 --package streaming-bytestring-0.1.4.4 streaming-bug.hs | cat -n
     1  a
     2  

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.