miloszkrajewski / k4os.compression.lz4 Goto Github PK

LZ4/LH4HC compression for .NET Standard 1.6/2.0 (formerly known as lz4net)

License: MIT License

F# 1.06% Shell 0.24% Batchfile 0.03% C++ 4.97% C 18.71% C# 74.63% Python 0.07% PowerShell 0.30%

k4os.compression.lz4's Introduction

K4os.Compression.LZ4

Name	Nuget	Description
`K4os.Compression.LZ4`		Block compression only
`K4os.Compression.LZ4.Streams`		Stream compression
`K4os.Compression.LZ4.Legacy`		Legacy compatibility

LZ4

LZ4 is lossless compression algorithm, sacrificing compression ratio for compression/decompression speed. Its compression speed is ~400 MB/s per core while decompression speed reaches ~2 GB/s, not far from RAM speed limits.

This library brings LZ4 to .NET Standard compatible platforms: .NET Core, .NET Framework, Mono, Xamarin, and UWP. Well... theoretically... kind of. Currently, it targets .NET Framework 4.6.2+, .NET Standard 2.0+ and .NET 5.0+.

As it is .NET Standard 2.0+ so all this platforms should be supported although I did not test it on all of them.

LZ4 has been written by Yann Collet and original C sources can be found here

Build

./build.ps1

NOTE: technically, it could be built on Linux as well, but setup process downloads and uses some Windows tools, like 7z.exe and lz4.exe. It could be adapted, but wasn't. Feel free to send PR.

Changes

Change log can be found here.

Support

Maintaining this library is outside of my daily job completely. Company I work for is not even using it, so I do this completely in my own free time.

So, if you think my work is worth something, you could support me by funding my daily caffeine dose:

(or just use PayPal)

What is 'Fast compression algorithm'?

While compression algorithms you use day-to-day to archive your data work around the speed of 10MB/s giving you quite decent compression ratios, 'fast algorithms' are designed to work 'faster than your hard drive' sacrificing compression ratio.

One of the most famous fast compression algorithms in Google's own Snappy which is advertised as 250MB/s compression, 500MB/s decompression on i7 in 64-bit mode. Fast compression algorithms help reduce network traffic / hard drive load compressing data on the fly with no noticeable latency.

I just tried to compress some sample data (Silesia Corpus) receiving:

zlib (7zip) - 7.5M/s compression, 110MB/s decompression, 44% compression ratio
lzma (7zip) - 1.5MB/s compression, 50MB/s decompression, 37% compression ratio
lz4 - 280MB/s compression, 520MB/s decompression, 57% compression ratio

Note: Values above are for illustration only. they are affected by HDD read/write speed (in fact LZ4 decompression in much faster). The 'real' tests are taking HDD speed out of equation. For detailed performance tests see [Performance Testing] and [Comparison to other algorithms].

Other 'Fast compression algorithms'

There are multiple fast compression algorithms, to name a few: LZO, QuickLZ, LZF, Snappy, FastLZ. You can find comparison of them on LZ4 webpage or here

Usage

This LZ4 library can be used in two distinctive ways: to compress streams and blocks.

Use as blocks

Compression levels

enum LZ4Level
{
    L00_FAST,
    L03_HC, L04_HC, L05_HC, L06_HC, L07_HC, L08_HC, L09_HC,
    L10_OPT, L11_OPT, L12_MAX,
}

There are multiple compression levels. LZ4 comes in 3 (4?) flavors of compression algorithms. You can notice suffixes of those levels: FAST, HC, OPT and MAX (while MAX is just OPT with "ultra" settings). Please note that compression speed drops rapidly when not using FAST mode, while decompression speed stays the same (actually, it is usually faster for high compression levels as there is less data to process).

Utility

static class LZ4Codec
{
    static int MaximumOutputSize(int length);
}

Returns maximum size of of a block after compression. Of course, most of the time compressed data will take less space than source data, although in case of incompressible (for example: already compressed) data it may take more.

Example:

var source = new byte[1000];
var target = new byte[LZ4Codec.MaximumOutputSize(source.Length)];
//...

Compression

Block can be compressed using Encode(...) method family. They are relatively low level functions as it is your job to allocate all memory.

static class LZ4Codec
{
    static int Encode(
        byte* source, int sourceLength,
        byte* target, int targetLength,
        LZ4Level level = LZ4Level.L00_FAST);

    static int Encode(
        ReadOnlySpan<byte> source, Span<byte> target,
        LZ4Level level = LZ4Level.L00_FAST);

    static int Encode(
        byte[] source, int sourceOffset, int sourceLength,
        byte[] target, int targetOffset, int targetLength,
        LZ4Level level = LZ4Level.L00_FAST);
}

All of them compress source buffer into target buffer and return number of bytes actually used after compression. If this value is negative it means that error has occurred and compression failed. In most cases mean that target buffer is too small.

Please note, it might be tempting to use target buffer the same size (or even one byte smaller) then source buffer, and use copy as a fallback. This will work just fine, yet compression into buffer that is smaller than MaximumOutputSize(source.Length) is a little bit slower.

Example:

var source = new byte[1000];
var target = new byte[LZ4Codec.MaximumOutputSize(source.Length)];
var encodedLength = LZ4Codec.Encode(
    source, 0, source.Length,
    target, 0, target.Length);

Decompression

Previously compressed block can be decompressed with Decode(...) functions.

static class LZ4Codec
{
    static int Decode(
        byte* source, int sourceLength,
        byte* target, int targetLength);

    static int Decode(
        ReadOnlySpan<byte> source, Span<byte> target);

    static int Decode(
        byte[] source, int sourceOffset, int sourceLength,
        byte[] target, int targetOffset, int targetLength);
}

You have to know upfront how much memory you need to decompress, as there is almost no way to guess it. I did not investigate theoretical maximum compression ratio, yet all-zero buffer gets compressed 245 times, therefore when decompressing output buffer would need to be 245 times bigger than input buffer. Yet, encoding itself does not store that information anywhere therefore it is your job.

var source = new byte[1000];
var target = new byte[knownOutputLength]; // or source.Length * 255 to be safe
var decoded = LZ4Codec.Decode(
    source, 0, source.Length,
    target, 0, target.Length);

NOTE: If I told you that decompression needs potentially 100 times more memory than original data you would think this is insane. And it is not 100 times, it is 255 times more, so it actually is insane. Please don't do it. This was for demonstration only. What you need is a way to store original size somehow (I'm not opinionated, do whatever you think is right) or... you can use LZ4Pickler (see below) or LZ4Stream.

Pickler

Sometimes all you need is to quickly compress a small chunk of data, let's say serialized message to send it over the network. You can use LZ4Pickler in such case. It does encode original length within a message and handles incompressible data (by copying).

static class LZ4Pickler
{
    static byte[] Pickle(
        byte[] source,
        LZ4Level level = LZ4Level.L00_FAST);

    static byte[] Pickle(
        byte[] source, int sourceOffset, int sourceLength,
        LZ4Level level = LZ4Level.L00_FAST);

    static byte[] Pickle(
        ReadOnlySpan<byte> source,
        LZ4Level level = LZ4Level.L00_FAST);

    static byte[] Pickle(
        byte* source, int sourceLength,
        LZ4Level level = LZ4Level.L00_FAST);
}

Example:

var source = new byte[1000];
var encoded = LZ4Pickler.Pickle(source);
var decoded = LZ4Pickler.Unpickle(encoded);

Please note that this approach is slightly slower (copy after failed compression) and has one extra memory allocation (as it resizes buffer after compression).

Streams

Stream implementation is in different package (K4os.Compression.LZ4.Streams) as it has dependency on K4os.Hash.xxHash. It is fully compatible with LZ4 Frame format although not all features are supported on compression (they are "properly" ignored on decompression).

Stream compression settings

There are some thing which can be configured when compressing data:

class LZ4EncoderSettings
{
    long? ContentLength { get; set; } = null;
    bool ChainBlocks { get; set; } = true;
    int BlockSize { get; set; } = Mem.K64;
    bool ContentChecksum { get; set; } = false;
    bool BlockChecksum { get; set; } = false;
    uint? Dictionary => null;
    LZ4Level CompressionLevel { get; set; } = LZ4Level.L00_FAST;
    int ExtraMemory { get; set; } = 0;
}

Default options are good enough so you don't change anything. Refer to original documentation for more detailed information.

Please note that ContentLength and Dictionary are not currently supported and trying to use values other than defaults will throw exceptions.

Stream compression

The class responsible for compression is LZ4EncoderStream but its usage is not obvious. For easy access two static factory methods has been created:

static class LZ4Stream
{
    static LZ4EncoderStream Encode(
        Stream stream, LZ4EncoderSettings settings = null, bool leaveOpen = false);

    static LZ4EncoderStream Encode(
        Stream stream, LZ4Level level, int extraMemory = 0,
        bool leaveOpen = false);
}

Both of them will take a stream (a file, a network stream, a memory stream) and wrap it adding compression on top of it.

Example:

using (var source = File.OpenRead(filename))
using (var target = LZ4Stream.Encode(File.Create(filename + ".lz4")))
{
    source.CopyTo(target);
}

Stream decompression settings

Decompression settings are pretty simple and class has been added for symmetry with LZ4EncoderSettings.

class LZ4DecoderSettings
{
    int ExtraMemory { get; set; } = 0;
}

Adding extra memory to decompression process may increase decompression speed. Not significantly, though so there is no reason to stress about it too much.

Stream decompression

Same as before, there are two static factory methods to wrap existing stream and provide decompression.

static class LZ4Stream
{
    static LZ4DecoderStream Decode(
        Stream stream, LZ4DecoderSettings settings = null, bool leaveOpen = false);

    static LZ4DecoderStream Decode(
        Stream stream, int extraMemory, bool leaveOpen = false);
}

Example:

using (var source = LZ4Stream.Decode(File.OpenRead(filename + ".lz4")))
using (var target = File.Create(filename))
{
    source.CopyTo(target);
}

Please note that stream decompression is (at least I hope it is) fully compatible with original specification. Well, it does not handle predefined dictionaries but lz4.exe does not either. All the other features which are not implemented yet (ContentLength, ContentChecksum, BlockChecksum) are just gracefully ignored but does not cause decompression to fail.

Other stream-like data structures

As per version 1.3-beta new stream abstractions has been added (note, it has both sync and async methods, but here I'm listing sync ones only):

interface ILZ4FrameReader: IDisposable
{
    bool OpenFrame();
    long? GetFrameLength();
    int ReadOneByte();
    int ReadManyBytes(Span<byte> buffer, bool interactive = false);
    long GetBytesRead();
    void CloseFrame();
}

interface ILZ4FrameWriter: IDisposable
{
    bool OpenFrame();
    void WriteOneByte(byte value);
    void WriteManyBytes(ReadOnlySpan<byte> buffer);
    long GetBytesWritten();
    void CloseFrame();
}

This allows to adapt any stream-like data structure to LZ4 compression/decompression, which currently is: Span and ReadOnlySpan (limited support), Memory and ReadOnlyMemory, ReadOnlySequence, BufferWriter, Stream, PipeReader, and PipeWriter.

This mechanism is extendable so implementing stream-like approach for other data structures will be possible (although not trivial).

Factory methods for creating ILZ4FrameReader and ILZ4FrameWriter are available on LZFrame class:

static class LZ4Frame
{
    // Decode
    
    static void Decode<TBufferWriter>(
        ReadOnlySpan<byte> source, TBufferWriter target, int extraMemory = 0);
    static ByteMemoryLZ4FrameReader Decode(
        ReadOnlyMemory<byte> memory, int extraMemory = 0);
    static ByteSequenceLZ4FrameReader Decode(
        ReadOnlySequence<byte> sequence, int extraMemory = 0);
    static StreamLZ4FrameReader Decode(
        Stream stream, int extraMemory = 0, bool leaveOpen = false);
    static PipeLZ4FrameReader Decode(
        PipeReader reader, int extraMemory = 0, bool leaveOpen = false);
    
    // Encode
        
    static int Encode(
        ReadOnlySequence<byte> source, Span<byte> target, LZ4EncoderSettings? settings = default);
    static int Encode(
        Span<byte> source, Span<byte> target, LZ4EncoderSettings? settings = default);
    static int Encode(
        Action<ILZ4FrameWriter> source, Span<byte> target, LZ4EncoderSettings? settings = default);
    static ByteSpanLZ4FrameWriter Encode(
        byte* target, int length, LZ4EncoderSettings? settings = default);
    static ByteMemoryLZ4FrameWriter Encode(
        Memory<byte> target, LZ4EncoderSettings? settings = default);
    static ByteBufferLZ4FrameWriter<TBufferWriter> Encode<TBufferWriter>(
        TBufferWriter target, LZ4EncoderSettings? settings = default);
    static ByteBufferLZ4FrameWriter Encode(
        IBufferWriter<byte> target, LZ4EncoderSettings? settings = default);
    static StreamLZ4FrameWriter Encode(
        Stream target, LZ4EncoderSettings? settings = default, bool leaveOpen = false);
    static PipeLZ4FrameWriter Encode(
        PipeWriter target, LZ4EncoderSettings? settings = default, bool leaveOpen = false);
}

Performance for small frames

Lot of LZ4 usage are small frames, like network packets, or small files. As this is still work not finished, and there is more memory allocation than I would like, performance is getting much better.

Please note, LZPickler is the fastest option, it is just not portable, as it has been developed by me, for my own needs.

If you need to use LZ4Frame format (the official streaming format) you will by pleased to know that that with new abstraction it can be much faster. So far, people needed to use Stream even if data was in memory already:

using var source = new MemoryStream(encoded);
using var decoder = LZ4Stream.Decode(source);
using var target = new MemoryStream();
decoder.CopyTo(target);
var decoded = target.ToArray();

Now it is simpler, and faster:

var decoded = LZ4Frame.Decode(encoded.AsSpan(), new ArrayBufferWriter<byte>()).WrittenMemory.ToArray();

ArrayBufferWriter<T> is a go-to implementation of IBufferWriter<T>, but you may want specialized implementation, if performance is critical. It is quite fast, but it seems is relatively relaxed about allocating memory. If you want to reduce GC usage, implement your own IBufferWriter<T> and test it.

Below decoding small blocks:

using var source = new MemoryStream(_encoded);
using var decoder = LZ4Stream.Decode(source);
_ = decoder.Read(_decoded, 0, _decoded.Length);

using var decoder = LZ4Frame.Decode(_encoded);
_ = decoder.ReadManyBytes(_decoded.AsSpan());

shows that frame reader is much faster, and is allocating less memory, also please note no Gen1 or Get2 allocations.

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.1889/21H2/November2021Update)
AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.300
  [Host]     : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT AVX2
  DefaultJob : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT AVX2

Method	Size	Mean	Ratio	Gen0	Gen1	Gen2	Alloc Ratio
UseStream	128	1,173.6 ns	1.00	1.2703	1.2074	1.2074	1.00
UseFrameReader	128	466.2 ns	0.40	0.0525	-	-	0.85

UseStream	1024	1,593.2 ns	1.00	1.6575	1.5945	1.5945	1.00
UseFrameReader	1024	756.7 ns	0.47	0.0525	-	-	0.85

UseStream	8192	5,081.0 ns	1.00	5.1956	5.1270	5.1270	1.00
UseFrameReader	8192	3,836.3 ns	0.76	0.0458	-	-	0.82

Legacy (lz4net) compatibility

There is a separate package for those who used lz4net before and still need to access files generated with it:

static class LZ4Legacy
{
    static LZ4Stream Encode(
        Stream innerStream,
        bool highCompression = false,
        int blockSize = 1024 * 1024,
        bool leaveOpen = false);

    static LZ4Stream Decode(
        Stream innerStream, 
        bool leaveOpen = false);
}

This provide access to streams written by lz4net. Please note, that interface is not compatible, but the file format is.

Example:

using (var source = LZ4Legacy.Decode(File.OpenRead(filename + ".old")))
using (var target = LZ4Stream.Encode(File.Create(filename + ".new")))
{
    source.CopyTo(target);
}

Code above will read old (lz4net) format and write to new format (lz4_Frame_format).

Memory pooling

I've added memory block pooling to most of the classes. It is enabled by default, but comes with potential danger: this pooled memory gets pinned and may be problematic in some scenarios, for example with long lived streams.

LZ4 is used most of the time for small packages, "in-and-out 20 minutes adventure", so pinning pooling memory is not a problem.

As usual: it depends.

If you want to change the maximum pooled array use PinnedMemory.MaxPooledSize. You can set it to 0 to disable pooling.

ARMv7, IL2CPP, Unity

Apparently ARMv7 does not handle unaligned access:

[...] It looks like the code here will do an unaligned memory access, which is not allowed on armv7, hence the crash. Mono works in this case because it generates code that is less efficient than IL2CPP, and does only aligned memory access. With IL2CPP, we have chosen to convert the C# code as-is, so that the generated code will do unaligned access if the C# code does. [...]

I've adapted 32-bit algorithm to use aligned access only (64-bit version still tries to maximise speed by using unaligned access). Think about 32-bit as "compatibility mode". It case of alignment related problems force 32-bit mode as soon as possible with:

LZ4Codec.Enforce32 = true;

Issues

Please use this template when raising one. Try to be as helpful as possible to help me reproduce it.

k4os.compression.lz4's People

Contributors

Stargazers

Watchers

k4os.compression.lz4's Issues

DecoderStream just reads first 65520 bytes

hey there,

I have a strange problem by using the lz4decoder stream. It seems as if the decoder just decodes the first 65520 bytes. By reading further the decoder gives me random useless bytes. The underlying stream is a FileInputStream.
I created an own file format from which I read different sized byte blocks by using the LZ4Decoder Stream.

        //open base filestream
        FileStream sourceStream = new FileStream(System.IO.Path.Combine(this.path, file), FileMode.Open);

        //open decoder stream
        LZ4DecoderStream diffStream = LZ4Stream.Decode(sourceStream); 

        //read block count
        byte[] buffer = new byte[4];
        diffStream.Read(buffer, 0, 4);
        UInt32 blockCount = BitConverter.ToUInt32(buffer, 0);

        //iterate through all blocks
        for (int i = 0; i < blockCount; i++)
        {
            ulong offset;
            ulong length;

            //read block offset
            buffer = new byte[8];
            diffStream.Read(buffer, 0, 8);
            offset = BitConverter.ToUInt64(buffer, 0);

            //read block length
            int b = diffStream.Read(buffer, 0, 8);
            length = BitConverter.ToUInt64(buffer, 0);

            //read data block
            buffer = new byte[length];
            int a = diffStream.Read(buffer, 0, buffer.Length);

            //do sth with data block
        }
        diffStream.Close();

Now the first "for"- iterations work fine and after that "offset" and "length" vars contain strange incorrect values. As mentioned before that happens after I read 65520 bytes.
Before using your LZ4 implementation I used the .net built-in ZipArchive implementation. By using almost the same code as above the problem didn't exist.

What am I doing wrong or where is the problem here?
Thanks in advance.

LZ4Codec.Decode returns -1 instead of correct length

Hi, Milosz!
I hope reporting this issue will be helpful. Here is a source code example which exhibits the issue:

        string text = "The quick brown fox jumps over the lazy dog";
        byte[] textBytes = System.Text.Encoding.UTF8.GetBytes(text);

        byte[] encoded = new byte[LZ4Codec.MaximumOutputSize(textBytes.Length)];
        int encodedLength = LZ4Codec.Encode(
            textBytes, 0, textBytes.Length,
            encoded, 0, encoded.Length);

        byte[] decoded = new byte[textBytes.Length * 2];
        var decodedLength = LZ4Codec.Decode(
            encoded, 0, encoded.Length,
            decoded, 0, decoded.Length);

        Assert.Equal(textBytes.Length, decodedLength); // -1 instead of 43

DataContractSerializer causes InvalidDataException

Hello, I'm having problems using this library with a DataContractSerializer. This example throws System.IO.InvalidDataException: LZ4 frame magic number expected. Can't figure it out, and it works fine copying to an intermediate MemoryStream for some reason (but of course I want to avoid that).

at K4os.Compression.LZ4.Streams.LZ4DecoderStream.ReadFrame()
   at K4os.Compression.LZ4.Streams.LZ4DecoderStream.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.BufferedStream.Read(Byte[] array, Int32 offset, Int32 count)
   at System.Xml.EncodingStreamWrapper.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.Xml.XmlBufferReader.TryEnsureBytes(Int32 count)
   at System.Xml.XmlUTF8TextReader.BufferElement()
   at System.Xml.XmlUTF8TextReader.ReadStartElement()
   at System.Xml.XmlUTF8TextReader.Read()
   at System.Xml.XmlBaseReader.IsStartElement()
   at System.Xml.XmlBaseReader.IsStartElement(XmlDictionaryString localName, XmlDictionaryString namespaceUri)
   at System.Runtime.Serialization.XmlReaderDelegator.IsStartElement(XmlDictionaryString localname, XmlDictionaryString ns)
   at System.Runtime.Serialization.XmlObjectSerializer.IsRootElement(XmlReaderDelegator reader, DataContract contract, XmlDictionaryString name, XmlDictionaryString ns)
   at System.Runtime.Serialization.DataContractSerializer.InternalIsStartObject(XmlReaderDelegator reader)
   at System.Runtime.Serialization.DataContractSerializer.InternalReadObject(XmlReaderDelegator xmlReader, Boolean verifyObjectName, DataContractResolver dataContractResolver)
   at System.Runtime.Serialization.XmlObjectSerializer.ReadObjectHandleExceptions(XmlReaderDelegator reader, Boolean verifyObjectName, DataContractResolver dataContractResolver)
   at System.Runtime.Serialization.XmlObjectSerializer.ReadObject(XmlDictionaryReader reader)
   at System.Runtime.Serialization.XmlObjectSerializer.ReadObject(Stream stream)
   at CitybreakOnlineWS.Tests.App.Utils.LZ4Tests.LibEncodeDecode() in LZ4Tests.cs:line 38

    [TestClass]
    public class LZ4Tests
    {
        public class TestClass
        {
            public string Test1 { get; set; }
        }

        [TestMethod]
        public void LibEncodeDecode()
        {
            var obj = new TestClass { Test1 = "1" };
            var serializer = new DataContractSerializer(typeof(TestClass));

            byte[] bytes;
            using (var ms = new MemoryStream())
            {
                using (var compressionStream = LZ4Stream.Encode(ms))
                {
                    serializer.WriteObject(compressionStream, obj);
                }
                bytes = ms.ToArray();
            }

            using (var ms = new MemoryStream(bytes))
            {
                using (var decompressionStream = LZ4Stream.Decode(ms))
                {
                    var o = serializer.ReadObject(decompressionStream) as TestClass;

                    Assert.AreEqual(obj.Test1, o.Test1);
                }
            }
        }
    }

[Question] Is this library thread safe?

Are the static classes LZ4Codec, LZ4Pickler, etc safe to use from multiple threads concurrently?

What about the instance classes LZ4EncoderStream, LZ4DecoderStream, etc? (I would assume not)

Cannot use readasync

var lz4_input = LZ4Stream.Decode(input);

When I call readasync, I can't call Inner stream readasync .

[Feature] In memory LZ4Frame decompression

Could we have an API to encode/decode frame blocks directly to arrays/spans ?

something like this:

Byte[] lz4bytes = { ... };

int flattenedSize = LZ4Stream.GetFlattenedSize(lz4bytes);

bytes[] flattenedBytes = new Byte[flattenedSize];
LZ4Stream.Decode(lz4bytes,flattenedBytes);

Switch to Span<T>

Since you are reimplementing your library for .NET Standard 2, why not use Span<T> which brings huge performance improvements without having to use unsafe code and pointers?

See https://msdn.microsoft.com/en-us/magazine/mt814808.aspx for a great introductory article from Stephen Toub.

Can't make proccess bar

How can I proccess bar make for your library?

[Solved] System.TypeInitializationException starting from 1.2.5

Hi,

Starting from version 1.2.5, using LZ4Pickler.Pickle(), I get a System.TypeInitializationException. Downgrading to 1.1.11 fixes this.

Steps to reproduce the behavior:
var compressed = LZ4Pickler.Pickle(source);

Exception: The type initializer for 'K4os.Compression.LZ4.Engine.LL' threw an exception.
Inner Exception: Could not load file or assembly 'System.Runtime.CompilerServices.Unsafe, Version=4.0.6.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.

I'm clueless, how to fix this. Searching the net, it's seems to be a binding issue. But the proposed solution(s) didn't work for me. Or I just failed to apply them the right way. ;-)

.NET: 4.7.2 Framework
LZ4: 1.2.X

Question: ReadAsync on LZ4EncoderStream?

Is there a way how to set-up stream compression where something is reading from the compressing stream?
In similar fashion to this:

var blobClient = containerClient.GetBlobClient(blobName);
using var fileStream = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.Read);
using var lz4Stream = LZ4Stream.Encode(fileStream, lz4Opts, leaveOpen: true);

// blobClient is Azure.Storage.Blobs.BlobClient
// it will try to read lz4Stream and it will fail with InvalidOperationException: "Operation ReadAsync is not allowed for LZ4EncoderStream"
//    at K4os.Compression.LZ4.Streams.Internal.LZ4StreamBase.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
var result = await blobClient.UploadAsync(lz4Stream, overwrite: true).ConfigureAwait(false);

lz4Stream.Close();
fileStream.Close();

The solution in this case might be an intermediary MemoryStream to which lz4Stream would be copied entirely, but that would kind of defeats whole purpose of streaming approach to compression, no?

This works, but feels wrong:

var blobClient = containerClient.GetBlobClient(blobName);
var fileInfo = new FileInfo(filePath);
using var fileStream = fileInfo.Open(FileMode.Open, FileAccess.Read, FileShare.Read);
using var intermediaryStream = new MemoryStream((int)fileInfo.Length / 4);
using var lz4Stream = LZ4Stream.Encode(intermediaryStream, lz4Opts, leaveOpen: true);
fileStream.CopyTo(lz4Stream);
lz4Stream.Flush();
intermediaryStream.Seek(0, SeekOrigin.Begin);

var result = await blobClient.UploadAsync(intermediaryStream, overwrite: true).ConfigureAwait(false);

lz4Stream.Close();
fileStream.Close();

Thank you!

Sorry for the spam, but really just wanted to leave a comment to tell you how much I appreciate this lib. I managed to compress my bit-stream of images up to 95% at virtually no cost. It's insane. I wouldn't believe it if I didn't see the images on the other side of the wire.

Can't Build. Restore:Corpus Execution failed with error code 2

D:\C#Projects\K4os.Compression.LZ4>fake build

Unaligned memory access on ARM/Unity/IL2CPP

Seem like unaligned memory access is still an issue for Unity.
My testing on Raspberry Pi gave me false sense of security as dotnet core runtime (which I used for testing) is handling unaligned access properly while IL2CPP is translating access "as is", and it just ends for Segmentation fault.

The easiest way to handle unaligned access can be done using byte-aligned byte-by-byte access with shifting but it will affect performance a lot. It is easiest thing to do, so to enable people blocked by this issue it will be first approach which I will try to fine-tune later.

Thanks for reporting this issue. It looks like the code here will do an unaligned memory access, which is not allowed on armv7, hence the crash. Mono works in this case because it generates code that is less efficient than IL2CPP, and does only aligned memory access. With IL2CPP, we have chosen to convert the C# code as-is, so that the generated code will do unaligned access if the C# code does.

Encrypt with a password ?

is it possiable to encrypt a file while compressing files?

Unable to build

Description
I downloaded LZ4 project, however, as stated in instructions, I am unable to build and receiving attached error.

I have .NET Core SDKs installed.

To reproduce
Download package and run paket restore on command prompt

Expected behavior
I expected build would proceed normally as stated in instructions.

Actual behavior
A clear and concise description of what actually happens.

Environment

CPU: Intel 64-bit
OS: Windows 10
.NET: .NET Core 2.1.519
LZ4: 1.2.8-beta

Additional context
Add any other context about the problem here.

can't decode

Description
a string of Json Formate can't decode when i set byte[] lenth .

To reproduce
Steps to reproduce the behavior:

 var s = "{\"ID\":\"59fd4cb7-dd79-4f9d-953a-ac91dc0b00f9\",\"Key\":\"init\",\"Step\":2147483647,\"Datas\":null}";             var srcList = new List<byte>(); 
            srcList.AddRange(System.Text.Encoding.UTF8.GetBytes(s));
            var src = srcList.ToArray();
            var target = new byte[LZ4Codec.MaximumOutputSize(src.Length) ];
       
            LZ4Codec.Encode(src, 0, src.Length, target, 0, target.Length, LZ4Level.L03_HC);
           
            var rarray = new byte[target.Length];
            Array.Copy(target, rarray, rarray.Length);

            var r = new byte[src.Length];
            var rbuffer = LZ4Codec.Decode(rarray, r);

            var rs = System.Text.Encoding.UTF8.GetString(r);
            Assert.IsTrue(rs == s);

Expected behavior
希望解压以后和s相等

Actual behavior
实际解决后的结果是空字符串。
Environment

CPU: _ Intel, 64 bit
OS: _ Windows 10
.NET: _.net5
LZ4: _ 1.2.6

Additional context
在压缩和解压非json字符串的时候，以上代码是能够通过的，但是当压缩和解决json格式的时候就不可以了，解压以后的rs 是string.Empty;

Different Output After Copying Stream

Description
I am getting strange results with a specific lz4 file. I have a tar.lz4 where the tar contains a bunch of json files. When I try to deserialize the json files it is failing with some garbage data. I found that with the particular lz4 file I am using, if I pass the LZ4DecoderStream directly into a TarInputStream (SharpZipLib) I see the garbage data (actually repeat of a section of the end of one of the files because ReadBlock is returning 0 so TarInputStream is continuing with the same buffer that was filled on the previous call). However, if I first copy the LZ4DecoderStream into a MemoryStream using CopyTo it reads correctly. This same lz4 file can successfully be decoded by lz4.exe and 7-zip-zstd (although I guess it's not a 1-1 test because LZ4DecoderStream can also do it successfully if I am going to a file on disk vs passing it into another stream). After I observed this behavior I compared the stream after being copied to a MemoryStream and the original and they differ.

To reproduce
Apologies this is a little vague but so far I have only been able to reproduce this issue with one file that contains proprietary data so I cannot share. The same file works if I decompress it with 7-zip or lz4.exe and then re-compress it.

Steps to reproduce the behavior:

string lz4File_2 = Path.Combine(outputDir, "TestFiles.tar.lz4");
File.Copy(lz4File, lz4File_2);
using (FileStream inputFileStream = File.OpenRead(lz4File))
using (LZ4DecoderStream decompressionStream = LZ4Stream.Decode(inputFileStream))
using (FileStream inputFileStream_2 = File.OpenRead(lz4File_2))
using (LZ4DecoderStream decompressionStream_2 = LZ4Stream.Decode(inputFileStream_2))
using (var intermediateStream = new MemoryStream())
{
    decompressionStream_2.CopyTo(intermediateStream);
    intermediateStream.Position = 0;

    int originalByte, intermediateByte;
    do
    {
        originalByte = decompressionStream.ReadByte();
        intermediateByte = intermediateStream.ReadByte();

        if (originalByte != intermediateByte)
        {
            throw new Exception("Bytes are not equal");
        }
    }
    while (originalByte != -1 && intermediateByte != -1);
}

Expected behavior
Should not throw exception as both streams should be identical

Actual behavior
Throws exception. Here are some details from the break point of the exception:

originalByte = -1, intermediateByte = 44
decompressionStream.Position=89774080, decompressionStream_2.Position=89784320

Environment

CPU: Intel Core i5-10310U
OS: Windows version 10.0.18362 Build 18362
.NET: .NET 6.0.0-preview.5.21301.5
LZ4: 1.2.6 (also reproduced from source)

Migration from lz4net to K4os causes System.AccessViolationException

Hi,

I have tried migrating my code from lz4net to K4os.Compression.LZ4 and unfortunatelly manage to run into some errors when using Apex serializer and LZ4 stream from FileStream.

This is how I used the lz4net streams to save a file:

using (FileStream writeFile = File.Create(@"D:\Test\CompressedOLD.res"))
{
    using(LZ4.LZ4Stream compressionStream = new LZ4.LZ4Stream(writeFile, LZ4.LZ4StreamMode.Compress))
    {
        using (IBinary apex = Binary.Create())
        {
            apex.Write(simResults, compressionStream);
        }
    }
}

New code (compression works fine, it gives slightly smaller file sizes but I have verified these and the bytes written to file are correct):

using (FileStream writeFile = File.Create(@"D:\Test\CompressedNEW.res"))
{
    using (LZ4EncoderStream compressionStream = LZ4Stream.Encode(writeFile))
    {
        using (IBinary apex = Binary.Create())
        {
            apex.Write(simResults, compressionStream);
        }
    }
}

Now for reading the file back I used to use:

SimResults res2;
using (FileStream readFile = File.OpenRead(@"D:\Test\CompressedOLD.res"))
{
    using (LZ4.LZ4Stream decompressionStream = new LZ4.LZ4Stream(readFile, LZ4.LZ4StreamMode.Decompress))
    {
        using (IBinary apex = Binary.Create())
        {
            res2 = apex.Read<SimResults>(decompressionStream);
        }
    }
}

Which with the new library becomes this:

SimResults res3;
using (FileStream readFile = File.OpenRead(@"D:\Test\CompressedNEW.res"))
{
    using (LZ4DecoderStream decompressionStream = LZ4Stream.Decode(readFile))
    {
        using (IBinary apex = Binary.Create())
        {
            res3 = apex.Read<SimResults>(decompressionStream);
        }
    }
}

Unfortunately it gives me an error which is due to the LZ4DecoderStream returning a stream that's not long enough compared to the original data.

Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at Apex.Serialization.Internal.BufferedStream.Flush()
   at Apex.Serialization.Read_APG.SimResults(Closure , BufferedStream& , Binary`1 )
   at Apex.Serialization.Binary`1.ReadSealedInternal[T]()
   at Apex.Serialization.Binary`1.ReadObjectEntry[T]()
   at Apex.Serialization.Binary`1.Read[T](Stream inputStream)
   at Program.Main(String[] args) in D:\ConsoleApp1\Program.cs:line 277

I can fix this by copying the LZ4DecoderStream to a MemoryStream and use that when deserialising:

using (FileStream readFile = File.OpenRead(@"D:\Test\CompressedNEW.res"))
{
    using (LZ4DecoderStream decompressionStream = LZ4Stream.Decode(readFile))
    {
        MemoryStream decompressedSim2 = new MemoryStream();
        decompressionStream.CopyTo(decompressedSim2);

        using (IBinary apex = Binary.Create())
        {
            decompressedSim2.Position = 0;
            res3 = apex.Read<SimResults>(decompressedSim2);
        }
    }
}

However this only works up to a certain file size which is not big enough for my case. Is there something I am doing wrong here or is there a bug or a compatibility issue with the Apex serialization?
`

Warnings in LZ4_Frame.cs

@MiloszKrajewski
Hi. Suggest to add #pragma warning disable CS0169 in LZ4_Frame.cs to disable these warnings:
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(143,20,143,33): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.maxBufferSize' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(147,19,147,30): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.totalInSize' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(146,20,146,29): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.tmpInSize' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(144,19,144,26): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.tmpBuff' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(148,27,148,30): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.xxh' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(150,18,150,29): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.lz4CtxLevel' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(138,32,138,37): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.prefs' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(142,20,142,32): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.maxBlockSize' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(139,18,139,25): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.version' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(145,19,145,24): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.tmpIn' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(140,18,140,24): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.cStage' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(141,25,141,30): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.cdict' is never used
1>3rdParty\LZ4\Source\Internal\LZ4_Frame.cs(149,19,149,28): warning CS0169: The field 'LZ4_Frame.LZ4F_cctx_t.lz4CtxPtr' is never used

Interface for hash functions

@MiloszKrajewski
Hello.
Can you provide here:

var HC = (byte) (XXH32.DigestOf(_buffer16, 0, _index16) >> 8);

and here:

var actualHC = (byte) (Farmhash.Sharp.Farmhash.Hash32(_buffer16, _index16) >> 8);

Ability to use other hash function via some kind of interface, or something like this? For those, who want to use something else than xxhash.
Specifically I would like to use Farmhash here which is faster than xxhash (depends on payload, though).

Segfault in Unity 2018.3.8 compiled on Android using IL2CPP

When running a Unity 2018.3.8 app on Android compiled with IL2CPP, I'm getting a SIGSEGV when calling LZ4Codec.Decode(). The encoded data is coming from a server running .Net Core 2.1 on Linux, encoded with LZ4Codec.Encode(). Everything works fine when running Unity in editor, I'm experience the segfault only on Android devices. Do you have any idea what could be going wrong in this build scenario?

unable to decompress data received in API

Hi i am getting lz4 compressed data in API request body. data is compressed in ios but when i try to decompress that data i get the following error.. "inputBuffer size is invalid or has been corrupted"

i am using the fokkowing metho to decompress data..

var lorems = Encoding.UTF8.GetString(LZ4Codec.Unwrap(Convert.FromBase64String(compressed)));

it works when i compress data with LZ4Codec.Wrap & decompress with Unwrap. but when it comes frm other plateforms(ios & andriod) then i got that issue..

please help me what i am missing here...

Thanks in anticipation

Issue report template

Description

...

Expected result

...

Actual result

...

Steps to reproduce

...

Example project (optional)

...

Environment

LZ4: _ (Version, Alpha/Beta)
.NET: _ (Version, Framework, Core, Mono, Unity, Xamarin)
OS: _ (Windows, MacOS, Android)
CPU: _ (Intel, ARM, Samsung, 32/64 bit)

Issue migrating from lz4net to K4os.Compression.LZ4.Legacy

Description
I'm trying to migrate a code base to K4os.Compression.LZ4.Legacy from lz4net since it is no longer supported. I receive an Unexpected end of stream exception when trying to use the CopyTo function to a MemoryStream.

            var array = buf.ReadBytes((int) header.DecompressedSize);
            byte[] result = { };
            switch (header.CompressionType)
            {
                case CompressionType.NONE:
                    result = array;
                    break;
                case CompressionType.LZ4:
                case CompressionType.LZ4HC:
                    // Works fine - lz4net 1.0.15.93 (last version from nuget)
                    //result = LZ4Codec.Decode(array, 0, array.Length, (int) header.CompressedSize);

                    // Throws 'inputBuffer size is invalid or has been corrupted' - K4os.Compression.LZ4.Legacy 1.2.6
                    // Throws 'inputBuffer size is invalid or has been corrupted' - K4os.Compression.LZ4.Legacy 1.2.8-beta
                    //result = LZ4Legacy.Unwrap(array);

                    // Throws 'Unexpected end of stream' - K4os.Compression.LZ4.Legacy 1.2.6
                    // Throws 'Unexpected end of stream' - K4os.Compression.LZ4.Legacy 1.2.8-beta
                    using (var target = new MemoryStream())
                    {
                        using (var source = LZ4Legacy.Decode(new MemoryStream(array)))
                        {
                            source.CopyTo(target);
                        }
                        result = target.ToArray();
                    }
                    break;
                case CompressionType.LZMA:
                    break;
                default:
                    result = null;
                    break;
            }

To reproduce
Steps to reproduce the behavior:

            const string testString = "This is a test.";
            var stringBytes = Encoding.ASCII.GetBytes(testString);
            var lz4NetBytes = LZ4Codec.Encode(stringBytes, 0, stringBytes.Length); // Using lz4net to compress the string
            byte[] result;
            using (var target = new MemoryStream())
            {
                using (var source = LZ4Legacy.Decode(new MemoryStream(lz4NetBytes)))
                {
                    source.CopyTo(target);
                }
                result = target.ToArray();
            }
            Console.WriteLine(result.SequenceEqual(lz4NetBytes));

Expected behavior
Data from returned from the LZ4Legacy.Decode stream should be copied to the target MemoryStream.

Actual behavior
An Unexpected end of stream exception occurs when trying to use the CopyTo function to a MemoryStream.

Unhandled Exception: System.IO.EndOfStreamException: Unexpected end of stream
   at K4os.Compression.LZ4.Legacy.LZ4Stream.AcquireNextChunk()
   at K4os.Compression.LZ4.Legacy.LZ4Stream.Read(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.Stream.InternalCopyTo(Stream destination, Int32 bufferSize)
   at System.IO.Stream.CopyTo(Stream destination)
   at UBPU.Program.Main(String[] args)

Environment

CPU: Intel, 64-bit
OS: Windows 10
.NET: .NET Framework 4.7.2
LZ4: 1.2.6 & 1.2.8-beta
lz4net: 1.0.15.93

Additional context
Feel free to ask me any questions if you need any further context.

Can support net45?

I already have an app that runs on net45 and my client wont upgrade to net46. Can you support net45?

Thanks.

XA3001: Cannot AOT the assembly: K4os.Compression.LZ4.dll

Description
I got this error when compile Xamarin.Android app with llvm & AOT enabled:

    <AotAssemblies>true</AotAssemblies>
    <EnableLLVM>true</EnableLLVM>

To reproduce
Steps to reproduce the behavior:
Just compile a Xamarin.Android project with llvm and AOT enabled.

Expected behavior
Should be able to AOT

Actual behavior
XA3001: Cannot AOT the assembly: K4os.Compression.LZ4.dll

Environment

CPU: Intel 64 bit
OS: Windows
.NET: Xamarin
LZ4: 1.2.6

Additional context

Decode with endOnOutputSize

It doesn't appear that decoding with endOnOutputSize is exposed. Is there any plan to make this available?

I'm after partial decoding of data, where I know the uncompressed and compressed sizes, but would like to do partial decoding to get at a file header before I choose to decode the entire file. Currently if I try to do this with LZ4Codec.Decode it returns -1 without any bytes written to my output buffer.

The relevant underlying C# implementation is all set to internal, so I can't just readily use that without making my own build of this library, I think.

Compression is not working

Hi,

I have tried many tests, none of them is working.

`
var dataSize = 1000000;
var input = new byte[dataSize];
new Random().NextBytes(input);

var outputBuffer = new byte[LZ4Codec.MaximumOutputSize(input.Length)];
var bytes = LZ4Codec.Encode(input, 0, input.Length, outputBuffer, 0, outputBuffer.Length);
// bytes > 1000000 !!!?

outputBuffer = LZ4Pickler.Pickle(input);
// outputBuffer.Length > 1000000 !!!?

outputBuffer = LZ4Pickler.Unpickle(outputBuffer);
// outputBuffer.Length > 1000000 !!!?
`

My Bests,
Hung Tran

Frame decompression with LZ4Stream.Decode seems to ignore all blocks after the first one

Please see discussion of this issue.

To summarize: I have an incoming stream which is compressed with LZ4F_compressBegin/Update/End in a C program running on a Xilinx FPGA (Microblaze processor.) Options for the frame are set to use the 4MB block size, but can be reproduced with any block size.

The problem is that when the size of the entire uncompressed buffer exceeds the block size (even if broken up into multiple calls to LZ4F_compressUpdate), then when the frame is decompressed in C#, the decompressed data in the stream ends after the block size is reached (1MB, 4MB, or whatever.)

Code performing the decompression:

using (var source = LZ4Stream.Decode(incomingData, 0, false))
{
    source.Read(drawBuffer, 0, ImageBufferLength);
}

ImageBufferLength is the full size in bytes required for the decompressed data (heightwidthsizeof(short)) since in this case each data element is 2 bytes, and drawBuffer is more than large enough to hold it.

If I write the compressed data out to a file and decompress it with the LZ4 command line utility, I am able to verify that ALL of the data is decompressed from the frame into the resulting file and matches the original source buffer on the FPGA.

I'm attaching a binary file containing the sample compressed frame as received by the C# app prior to decompression.
compressedFrac.zip

At this point I can only assume that the stream API is somehow ignoring the data in the frame that exceeds the specified block length.

Your help would be greatly appreciated-

Thanks,
Andy

Calculate decompress buffer size

Is it possible to add a method that pretends to decompress data and thus works out buffer size needed for decompression?

Support for `ReadOnlySequence<byte>` and `IBufferWriter<byte>`

I'm happy to see support for ReadOnlySpan<byte> already in the library.

I want to be able to effectively stream compression/decompression, but without using the Stream class. That is, I don't want to have to allocate contiguous memory for the input and output of the algorithm.

If the algorithm would accept a ReadOnlySequence<byte> as input, and write its output to a given IBufferWriter<byte> instance, then I can for instance process 1GB of data without having to have found 2 blocks of 1GB of contiguous memory to allocate arrays. Instead, these two types allow for breaking up the input/output into much more conveniently sized blocks.

Can such support be added? I can already do this on top of your Stream APIs, but doing it natively would presumably be at least a bit more efficient, and possibly(?) avoid the extra dependency you have on your stream-supporting package.

OverflowException: Arithmetic operation resulted in an overflow

Description
I found that for large file sizes this exception occurs on decompressing in a period of 8 megabytes!
For example between files with actual size (before compress) is between 1 to 8 MB, decompress works but for files with actual size between 9 to 16 MB, that exception occurs on decompressing. also for 17 to 24 MB is OK and for 25 to 32 is not OK and so on...

To reproduce
Check out this sample.

Issue upgrading from lz4net to K4os.Compression.LZ4

Hi, I'm upgrading to the new version of your library, but I'm struggling to understand what I'm doing wrong. This is the code I updated:

public static byte[] ObjToBytes(object obj)
 {
     using (MemoryStream ms = new MemoryStream())
     using (Stream compressionStream =
         //new LZ4Stream(ms, LZ4StreamMode.Compress)  // This works
         LZ4Stream.Encode(ms) // this does not work
         )
     {
         using (StreamWriter sw = new StreamWriter(compressionStream, Utf8Encoding))
         using (JsonTextWriter writer = new JsonTextWriter(sw))
         {
             var serializer = JsonSerializer.CreateDefault();
             serializer.Serialize(writer, obj);
             writer.Flush();

             return ms.ToArray();
         }
     }
 }

 public static object BytesToObj(Type type, byte[] bytes)
 {
     
     using (MemoryStream ms = new MemoryStream(bytes))
     using (Stream decompressionStream = 
         //new LZ4Stream(ms, LZ4StreamMode.Decompress)  // This works
         LZ4Stream.Decode(ms) // this does not work
         )
     {
         using (StreamReader sr = new StreamReader(decompressionStream, Utf8Encoding))
         {
             using (JsonReader reader = new JsonTextReader(sr))
             {
                 var serializer = JsonSerializer.CreateDefault();
                 return serializer.Deserialize(reader, type); // Here is where I get System.IO.EndOfStreamException: 'Unexpected end of stream. Data might be corrupted.'

             }
         }
     }
 }

I get a System.IO.EndOfStreamException: 'Unexpected end of stream. Data might be corrupted.' when trying to deserialize the object.

Any suggestion will be appreciated!
Thanks in advance

ArgumentOutOfRangeException Occurs when decompressing large files

This exception occurs when decompressing files with actual large size (for example 900 MB).
To reproduce you can use this sample with a large file on your machine and I hope this error occurs to you too ;)

Compression is 0%

Hi is my code right ?
```var source = File.ReadAllBytes(@"Example.dll");
byte[] output_compressed = new byte[source.Length];

        LZ4Codec.Encode(source,0,source.Length, output_compressed,0,source.Length, LZ4Level.L12_MAX);
        File.WriteAllBytes("Compressed.dll", output_compressed);

        Console.WriteLine(" DONE !");
        Console.ReadKey();```

The file is 5 mb , with 7zip it goes 980kb with winrar it goes 1mb with LZ4 it's 5mb!

what I'm doing wrong ?

Strongly named assembly

Hi Milosz, can your assemblies be strongly named once again like the old lz4net version was?

Code style + some questions

Hello
Just a few questions:

LZ4_64.cs:

private static readonly uint[] DeBruijnBytePos

DeBruijn?

Can you please comment declarations of important functions? What they do, what params expect, what output/result is expected... current project haven't comments in code at all :-\
Also, can you please provide simple compression/decompression test/example? Which for example take a folder with files using specified path, compress them, measure comp speed, and decompress resulted archive to another folder, and also measure decomp speed
I would like to start use your library in my game engine project as comp lib for archive format to keep game assets. So curious, will current source code structure will change much, or it will stay mostly the same? Also, comp/decomp works fine in current library in-dev state? :)

Anyway, thanks for your fanstastic job on adapting LZ4 to .NET Core 2.0. Looking forward when project became mature enough :)

LZ4DecoderStream followed by MD5 CryptoStream in read mode is insanely slow

It's barely an issue, it's quite interesting though.

using (var hasher = MD5.Create())
{
    using (SomeStream ss = new SomeStream())) //Not important, assume this stream is fast, you can use a filestream.
    using (LZ4DecoderStream lz4ds = LZ4Stream.Decode(ss))
    using (CryptoStream md5HashStream = new CryptoStream(lz4ds, hasher, CryptoStreamMode.Read))
    using (FileStream writeFs = new FileStream(destPath, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite,  1024 * 1024))
    {
        md5HashStream.CopyTo(writeFs, 1024 * 1024);
    }
}

The above code is extremely slow, approximately 14MB/s on 8700K with nvme ssd.
Simply use CryptoStream in write mode as following

using (var hasher = MD5.Create())
{
    using (SomeStream ss = new SomeStream()))
    using (LZ4DecoderStream lz4ds = LZ4Stream.Decode(ss))
    using (FileStream writeFs = new FileStream(destPath, FileMode.OpenOrCreate,
                                FileAccess.ReadWrite, FileShare.ReadWrite,1024 * 1024))
    using (CryptoStream md5HashStream = new CryptoStream(writeFs, hasher, CryptoStreamMode.Write))   
    {
        lz4ds.CopyTo(md5HashStream,  1024 * 1024);
    }
}

The stream can reach 400MB/s on the same hardware.
I used visual studio and dottrace to profile the code, none of them worked, they reported 80% the cpu time was spent with System Code. The dottrace snapshot is attached here
dottrace.zip

My wild guess is the low-level optimization used in K4os.Compression.LZ4 conflicts with the MD5 implementation, CPU cache thing? Hope you can find the reason.

What's next...

1.0.3

1.1.0

stream encoder (independent blocks)
stream decoder (independent blocks)
signing assemblies

1.1.3

add LegacyLz4Stream (compatible with lz4net, but not lz4 frame)

1.2.1-beta

porting 1.9.2
explicit support for both 32 and 64 environments

1.2.5

provide read-only Position and Length for compression stream (nothing fancy, just symmetry)
add true async read/write interface
add full async support (ie: DisposeAync for .NET Standard 2.1)

1.3.0

reduce allocations / use memory pool
stream abstraction

1.3.3

block (encoded) checksum
content (decoded) checksum

vNext

abstract frame encoder/decoder state machine
predefined dictionaries

Not planned

fast decoder loop (gotos across scopes, not a thing in C#)

Note: block checksum and content checksum are not required (and may be ignored)
Note: frame encoder/decoder now pulls data from stream, which means it needs to "understand" async. I would love to remove this dependency, and make it a pure state machine, which will still allow to build "async" solution on top of it.

Add true async stream implementations for use in aspnetcore3+

I recently attempted use to the LZ4EncoderStream with aspnetcore3.1 by writing a custom ICompressionProvider. It worked great until I started using it in services with AllowSynchronousIO disabled.

dotnet/aspnetcore#7644

At that point, I got the following exception.

"Message":"Synchronous operations are disallowed. Call WriteAsync or set AllowSynchronousIO to true instead.",
"StackTrace":"   at Microsoft.AspNetCore.Server.IIS.Core.HttpResponseStream.Write(Byte[] buffer, Int32 offset, Int32 count)\r\n
   at Microsoft.AspNetCore.Server.IIS.Core.WrappingStream.Write(Byte[] buffer, Int32 offset, Int32 count)\r\n
   at K4os.Compression.LZ4.Streams.LZ4EncoderStream.FlushStash()\r\n
   at K4os.Compression.LZ4.Streams.LZ4EncoderStream.WriteFrame()\r\n
   at K4os.Compression.LZ4.Streams.LZ4EncoderStream.Write(Byte[] buffer, Int32 offset, Int32 count)\r\n  
   at System.IO.Stream.<>c.<BeginWriteInternal>b__51_0(Object <p0>)\r\n
   at System.Threading.Tasks.Task`1.InnerInvoke()\r\n
   at System.Threading.Tasks.Task.<>c.<.cctor>b__274_0(Object obj)\r\n
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)\r\n
   --- End of stack trace from previous location where exception was thrown ---\r\n
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)\r\n
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)\r\n
   --- End of stack trace from previous location where exception was thrown ---\r\n
   at Microsoft.AspNetCore.ResponseCompression.ResponseCompressionBody.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)

It would be great if you could write stream implementations that are compatible with this new asp.net configuration option.

"Invalid IL code" exception on Unity/MacOS (lz4net)

Unity 2018.4 for macOS. Visual Studio Community.

I downloaded the lz4net-1.0.10.93-portable package and added it to my Unity project. A few tests worked fine, then I hit this.

Code:

            return LZ4Codec.Wrap(obj.Serialize());

where Serialize() using normal binary serialization on a serializable object.

Exception:

System.InvalidProgramException: Invalid IL code in LZ4pn.LZ4Codec:Encode32 (byte*,byte*,int,int): IL_003c: stloc.3   


  at LZ4pn.LZ4Codec.Encode32 (System.Byte[] input, System.Int32 inputOffset, System.Int32 inputLength, System.Byte[] output, System.Int32 outputOffset, System.Int32 outputLength) [0x00025] in <a08746723a4945d7a8b051e6098cecda>:0
  at LZ4.Services.Unsafe32LZ4Service.Encode (System.Byte[] input, System.Int32 inputOffset, System.Int32 inputLength, System.Byte[] output, System.Int32 outputOffset, System.Int32 outputLength) [0x00000] in <4136ef42ba0e48c18b0ca481cdc84fc8>:0
  at LZ4.LZ4Codec.AutoTest (LZ4.ILZ4Service service) [0x0001e] in <4136ef42ba0e48c18b0ca481cdc84fc8>:0
  at LZ4.LZ4Codec.TryService[T] () [0x00020] in <4136ef42ba0e48c18b0ca481cdc84fc8>:0

Error occurs when async decompressing in v1.2.5

Description
LZ4DecoderStream.CopyToAsync throws an ArgumentOutOfRangeException.

To reproduce
Steps to reproduce the behavior:

using (var source = LZ4Stream.Decode(File.OpenRead(filename + ".lz4")))
using (var target = File.Create(filename))
{
	await source.CopyToAsync(target);
}

Exception Details

Message: 
	System.ArgumentOutOfRangeException : Index was out of range. Must be non-negative and less than the size of the collection. (Parameter 'startIndex')
Stack Trace: 
	BitConverter.ToInt32(Byte[] value, Int32 startIndex)
	BitConverter.ToUInt32(Byte[] value, Int32 startIndex)
	Stash.Last4()
	LZ4DecoderStream.TryPeek4(CancellationToken token)
	LZ4DecoderStream.ReadFrame(CancellationToken token)
	LZ4DecoderStream.EnsureFrame(CancellationToken token)
	LZ4DecoderStream.ReadImpl(CancellationToken token, Memory`1 buffer)
	Stream.CopyToAsyncInternal(Stream destination, Int32 bufferSize, CancellationToken cancellationToken)

Environment

K4os.Compression.LZ4.Streams: Version 1.2.5

Migrate compressed files from lz4net

We are updating our application from lz4net to K4os.Compression.LZ4 and are looking for a way to decompress old data compressed with lz4net (in stream mode) using K4os.Compression.LZ4.
However decompression always throws an InvalidDataException with the message "LZ4 frame magic number expected".

Compression in lz4net has been done by calling

new LZ4Stream(fileStream, CompressionMode.Compress)

the code for decompression is

LZ4Stream.Decode(fileStream)

Here is an example file which has been compressed with lz4net: loremIpsum-lz4net.txt
Here is the same file uncompressed and compressed with K4os.Compression.LZ4: loremIpsum-k4osLZ4.txt loremIpsum-uncompressed.txt

Am I missing any settings which are necessary to decompress files which have been compressed with lz4net or is this not possible at all?

Thanks heaps :)

Problem decoding data

Hi!

I'm trying to decode a message which I have encoded in python using py-lz4framed, but having some issues.

Have I understood it correctly that I cannot use the K4os.Compression.LZ4 package, since that says "Block compression only", and the py-lz4framed lib uses frames? This leaves me with the K4os.Compression.LZ4.Streams package.

My encoded message is not a stream, so what I've done is to use MemoryStream like this in order to get a stream:

Stream incomingStream = new MemoryStream(Encoding.UTF8.GetBytes(compressedStringMessage));

But I honestly don't understand how to continue to decode the message, I feel like I am misunderstanding something very basic here.

What I've got is:

byte[] source = Encoding.UTF8.GetBytes(compressedStringMessage)
MemoryStream incomingStream = new MemoryStream(source);
MemoryStream outgoingStream = new MemoryStream(source.Length * 255);
using (LZ4DecoderStream decodeSource = LZ4Stream.Decode(incomingStream, 0, false))
{
    decodeSource.CopyTo(outgoingStream);
}

which, depending on what I give it as compressedStringMessage, will either give me a
System.ArgumentException: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection
or an
System.InvalidOperationException: Operation is not valid due to the current state of the object.

Any pointers are very welcome 😅

ZStandard

Hello,

I've really enjoyed working with your .net standard 2.0 library for LZ4 on our network stack. We're interested in using zstd in our network stack as well, but given that we cross compile windows/linux/ios/android, it really needs to be a .net standard, and not a wrapper of the native library (as all of the C# implementations are right now). I'm wondering if you'd be interested in either a sponsored open source effort, or direct consulting to bring such a library to life. Message me at karl at taniustech.com.

Thanks!

-Karl

Stream is not properly disposed

Example:

var path = "test.txt";
var f = new FileStream (path, FileMode.Create, FileAccess.Write, FileShare.Read);
var z = LZ4Stream.Encode (f);
var w = new BinaryWriter (z, Encoding.UTF8, false);
w.Write ("asdf");
w.Dispose ();
//z.Dispose ();
File.Move ("test.txt", $"test2.txt");

results in IOException:

The process cannot access the file because it is being used by another process.

Commenting in the z.Dispose solves the issue.

However, it should not be necessary due to the leaveOpen: false parameter.

Note: Using Close instead of Dispose on z does NOT work. It seems BinaryWriter calls Close on its base stream, expecting it to be equivalent to Dispose. But that seems to not be the case for LZ4Stream.

End of Stream Behavior

When we reach an end of a stream, but call a read anyways, I would expect to return immediately with a zero bytes return. Instead I either get an exception related to invalid magic number, or End Of Stream Exception. I believe this could be handled in the GetFrame Method, when it asks for a magix number, but instead returns 0/null, it could recognize this is an end of stream and surface a return 0 to the read method.

Is it possible to create a "backward compatible" stream?

Background

In my current system I use the deprecated lz4.net library. I'm planning to migrate to K4os.Compression.LZ4, but I already have hundreds of thousands files compressed using the old LZ4Stream. The files are scattered across many locations and I don't want to migrate them all at once to the new LZ4 Stream format.

In an ideal world I'd like newly created files in my system to use the new Stream format, i.e. K4os.Compression.LZ4.Streams.LZ4Stream.Encode.

Question

Is it possible to decode data in the following way:

if a file was encoded using K4os.Compression.LZ4.Streams.LZ4Stream.Encode, use K4os.Compression.LZ4.Streams.LZ4Stream.Decode.
otherwise, try using K4os.Compression.LZ4.Legacy.LZ4Legacy.Decode
?

xxhash dependency

@MiloszKrajewski
Curious, what is it for?
And, can it be done via some common interface instead of lock-in to xxHash?
Like a IHasher with doHash() function or something like this, so I can provide my own hash library that I use to do the required hash job?
For example, in my game engine I prefer FarmHash (https://github.com/nickbabcock/Farmhash.Sharp this port particullary, which seems faster to me) instead of xxHash.

Exception: Arithmetic operation resulted in an overflow

@MiloszKrajewski
Here is picture with callstack: https://i.imgur.com/5J7R4H1.png
Exception happens only when arithmetic overflow checking is enabled in project settings (which is enabled in my case when I build Debug configuration).

miloszkrajewski / k4os.compression.lz4 Goto Github PK

k4os.compression.lz4's Introduction

K4os.Compression.LZ4

LZ4

Build

Changes

Support

What is 'Fast compression algorithm'?

Other 'Fast compression algorithms'

Usage

Use as blocks

Compression levels

Utility

Compression

Decompression

Pickler

Streams

Stream compression settings

Stream compression

Stream decompression settings

Stream decompression

Other stream-like data structures

Performance for small frames

Legacy (lz4net) compatibility

Memory pooling

ARMv7, IL2CPP, Unity

Issues

k4os.compression.lz4's People

Contributors

Stargazers

Watchers

Forkers

k4os.compression.lz4's Issues

Background

Question

Recommend Projects

Recommend Topics

Recommend Org