composewell / streamly-lz4 Goto Github PK
View Code? Open in Web Editor NEWStreamly combinators for LZ4 compression.
License: Apache License 2.0
Streamly combinators for LZ4 compression.
License: Apache License 2.0
Improve the Array API instead.
See this document for the lz4 container format: https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md .
We can write a parser/terminating fold to decode a stream of frames into a stream of compressed blocks and decode the frame header and then decompress appropriately using the options/flags in the frame header. Similarly write a serializer to write a stream of compressed blocks as a lz4 frame using options provided by user.
Also, we have to handle the following in the block headers:
Test the following cases:
See #29 (comment)
compressChunk :: Int -> Fold m (Array.Array Word8) (Array.Array Word8)
-- Accept properly resized chunks
decompressChunk :: Fold m (Array.Array Word8) (Array.Array Word8)
We can then implement compress
and decompress
in terms of these folds.
In the code above, Ptr C_LZ4Stream
would be freed ONLY after the following path (in reversed orderd):
c_freeStream
was calledCompressDone ctx
Stream.Stop
from inner stream (The only way to produce CompressDone ctx
)Does that mean we'll got memory leak in code like:
readFile f & compressChunks cfg speed & take 0 & fold drain :: IO ()
since we'll never got Stream.Stop
from readFile
, or did I missing other path that would clean up Ptr C_LZ4Stream
properly?
In streamly, we use the following CPP helpers,
#define INLINE_EARLY INLINE [2]
#define INLINE_NORMAL INLINE [1]
#define INLINE_LATE INLINE [0]
This does not work well with a .hsc
file.
This might not even be the right way to define these helpers in a .hsc
file.
The closest, I can see in the documentation is by using #let
.
This code is currently commented out but we should figure out a way to write informative INLINE pragmas.
We should use a specific byte ordering instead of the machine byte ordering otherwise the code will not work across machines with different byte ordering. We can raise an issue for this and address it separately.
For example, like this:
else if ((fromIntegral :: Int32 -> Int) srcLen < Array.byteLength arr)
It is is easier to verify the safety, and forces us to think whether it is safe. Always try to upcast a smaller type rather than doing vice-versa.
We should not use the whole absolute path of the corpus file in the benchmark name.
compress/files/bufsize(65536)/compress 5//home/harendra/composewell/streamly-lz4/corpora/large/bible.txt.normalized time 41.77 ms
compress/files/bufsize(65536)/compress 5//home/harendra/composewell/streamly-lz4/corpora/large/world192.txt.normalized time 33.97 ms
compress/files/bufsize(65536)/compress 5//home/harendra/composewell/streamly-lz4/corpora/cantrbry/alice29.txt.normalized time 39.86 ms
Format vs Config vs Options
All combinators fuse, is they are the only combinators in the pipeline.
Adding any other combinator breaks fusion.
The problem seems to be with stuff not getting inlined.
Installation
and Benchmarks
Compress:
compress
: A convenience API that uses a default speedup factor and a default block size for compressioncompressWith
: ability to specify speedup and block sizecompressRaw
: compress the arrays in the input stream as is without resizingDecompress:
decompress
: resize the input arrays and decompressdecompressRaw
: assume the arrays are of the right size alreadyHi, as far as I know, streamly-lz4 still uses streamly == 0.8.2
currently. Do we have any plan to migrate streamly-lz4 to new streamly API?
Check the size of the input array and fail if it is more than the max block size set in BlockConfig. Otherwise decompression would fail.
For decompression, we could flatten the compressed input arrays into a stream and parse the stream into compressed chunks using a terminating fold. Check how this performs in comparison to array resizing.
For compression we could chunk the stream into arrays of 64K by default and provide an option to change the default.
We can provide both array as well as stream based APIs if arrays show some perf advantage otherwise we can keep just stream based ones.
The CI for GHC 884 on Linux failed once. Unfortunately, the logs were not recorded and I don't know the right conditions to reproduce the error.
It was probably a memory issue but I'm not sure.
The compress/decompress APIs work on chunks/arrays. All the APIs in stream that work on arrays are suffixed with chunk. We should use the same convention here as well:
compress
decompress
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.