michaelt / streaming-bytestring Goto Github PK
View Code? Open in Web Editor NEWeffectful sequences of bytes; an alternative no-lazy-io implementation of Data.ByteString.Lazy
License: BSD 3-Clause "New" or "Revised" License
effectful sequences of bytes; an alternative no-lazy-io implementation of Data.ByteString.Lazy
License: BSD 3-Clause "New" or "Revised" License
fromHandle
should not close the handle when it reaches EOF, as the corresponding function from pipes-bytestring doesn't do so.
Closing the handle can cause problems when we want to implement tail-like functionality for files, in combination with a library like hinotify. When new data is added to the file, we can't keep reading from the same handle because it has been closed already.
Closing the handle makes more sense for hGetContents
because it mimics the lazy I/O function of the same name.
I was reading the code of lines
in connection to #3 and noticed that it only cares about \n
s.
I call lines
on the result of getContents
, which requires binary mode. That in turn means that the automatic newline conversion won't be happening.
So, if I'm reading this right, the code will not work correctly on Windows, will it?
The null
function forces the entire stream. It documented as an O(1)
operation though.
Not sure if this is intended, but the docs for group
explain it as a special case of groupBy
, but there is no groupBy
.
For testing to see whether or not a stream is exhausted, there are currently two options. There is null
, which evaluates all the effects and discards all elements in the stream. And there is null_
, which evaluates effects up to the next non-empty chunk:
null :: Monad m => ByteString m r -> m (Of Bool r)
null_ :: Monad m => ByteString m r -> m Bool
I need something like this:
null_' :: Monad m => ByteString m r -> m (Of Bool (ByteString m r))
Which would tell me if the stream is exhausted, but if it's not, give it back to me, after having stripped off and evaluated any necessary monadic effects.
The CRLF handling introduced in #5 is incorrect in a small number of corner cases. When a chunk ends with a CR
and the next chunk begins with LF
, lines
fails to remove the CR
. I have put up a PR that resolves this issue.
Is there any reason consChunk
from the .Internal
module isn't re-exported from the public module?
It's mentioned in the documentation for mwrap
, unconsChunk
is exported, etc.
@michaelt I would like to offer to be a comaintainer for this library.
There are various times that you might want to convert from Stream (Of L.ByteString) m r
into ByteString m r
and vice-versa (e.g. aeson and cassava encode/decode from lazy bytestrings, so if you have a stream of records that you want to convert then they're going to be converted into lazy bytestrings).
Currently it's possible get around this with B.fromChunks . S.map toStrict
but it would be preferable if this was built-in (especially if the actual lazy bytestrings are rather large to convert into a single strict chunk; as such, an alternative may be to do B.fromChunks . S.concat . S.map L.toChunks
).
Functions like drop
, take
and splitAt
index into the string using Int64
, so it stands to reason that length
should also use Int64
instead of Int
. This also applies to I/O functions like hGet
.
I'm not sure how much of a representative sample I can provide as this is from an internal application, but will try and document what happens.
I've written a streaming wrapper for cassava, and am using it to parse a CSV file. As such, this may very well be an error in my code there, but it appears correct. Using it to parse a file, I have the following:
runResourceT
. (either (liftIO . print) return =<<)
. runExceptT
. S.print
. decodeByName @_ @MyDataType
. SB.readFile
$ myCSVFile
If for some reason CSV parsing fails, then this "succeeds". However, if there are no problems with the CSV, I get the error of:
hGetBufSome: illegal operation (handle is closed)
(I've tried switching around the order of runResourceT
and runExceptT
but that has no effect.)
If, however, I use bracket
from the exceptions package to implement this:
-- | A lifted variant of 'System.IO.withBinaryFile'.
withFile :: (MonadMask m, MonadIO m) => FilePath -> IOMode -> (Handle -> m r) -> m r
withFile fp md = bracket (liftIO (openBinaryFile fp md)) (liftIO . hClose)
Then this succeeds:
withFile myCSVFile ReadMode $
(either (liftIO . print) return =<<)
. runExceptT
. S.print
. decodeByName @_ @MyDataType
. SB.hGetContents
So it appears that ResourceT
isn't being used correctly here. As such, I wonder if this is related to michaelt/streaming/issues/23
I have a piece of code that I've needed recently in two unrelated projects, so I thought I'd see if you would be interested in adding it to streaming-bytestring
. It's a function that splits up a stream after every nth newline. In the PR for lineSplit, documentation is included that explains this better and demonstrates it.
My uses for this have been (1) chunking syslog messages and (2) bulk inserts into elasticsearch. Both of these are newline-delimited protocols where I have needed to break stream into 1000-line chunks.
Would it be possible to provide an MMonad
instance for ByteString
?
It seems like it would be very similar to the one for Stream
:
instance MMonad ByteString where
embed phi = loop where
loop bs = case bs of
Empty r -> Empty r
Chunk bs' rest -> Chunk bs' (loop rest)
Go m -> phi m >>= loop
{-# INLINABLE embed #-}
Happy to provide a PR if you think this is a sane thing.
When I try to read a file recursively, I get the following error:
openBinaryFile: resource exhausted (Too many open files)
In the code sample below, "hello" is printed 1021 times regardless of how many bytes I drop each recursion.
Code sample:
import Control.Monad.Trans (MonadIO)
import Control.Monad.Trans.Resource (runResourceT)
import qualified Data.ByteString.Streaming as BSS
import qualified Data.ByteString.Streaming.Char8 as BSSC
import System.TimeIt
main :: IO ()
main = timeIt $ runResourceT $ dump $ BSS.readFile "filename"
dump :: MonadIO m => BSS.ByteString m r -> m ()
dump bs = do
isEmpty <- BSS.null_ bs
if isEmpty then return ()
else do
BSSC.putStrLn $ BSSC.string "hello"
dump $ BSS.drop 1 bs
% cat streaming-bug.hs
import Streaming as S
import qualified Streaming.Prelude as S
import qualified Data.ByteString.Streaming.Char8 as SBS
main = do
writeFile "test.txt" $ unlines ["a"]
runResourceT $ streamFold
(const $ return ())
join
(join . SBS.putStrLn)
(SBS.lines $ SBS.readFile "test.txt")
% stack runghc --resolver=lts-7.0 --package streaming-0.1.4.3 --package streaming-bytestring-0.1.4.4 streaming-bug.hs | cat -n
1 a
2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.