airlift / aircompressor Goto Github PK

View Code? Open in Web Editor NEW

549.0 549.0 110.0 71.69 MB

A port of Snappy, LZO, LZ4, and Zstandard to Java

License: Apache License 2.0

Java 100.00%

aircompressor's Introduction

Airlift

Airlift is a framework for building REST services in Java.

This project is used as the foundation for distributed systems like Trino (formerly PrestoSQL).

Airlift pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done and includes built-in support for configuration, metrics, logging, dependency injection, and much more, enabling you and your team to ship a production-quality web service in the shortest time possible.

Airlift takes the best-of-breed libraries from the Java ecosystem and glues them together based on years of experience in building high performance Java services without getting in your way and without forcing you into a large, proprietary framework.

Getting Started

Overview
Getting Started
- Then see Next Steps

Reference

Configuration
Lifecycle/Bootstrapping
TBD - Concurrency
TBD - Database Pooling
TBD - Discovery
TBD - Events
TBD - HTTP server
HTTP client
TBD - Packaging
TBD - Logging
TBD - Tracing
TBD - Maven BOM
TBD - Jackson/JSON

Recipes

How do I ... do conditional binding based on config
How do I ... serve static HTML files
How do I ... package my service
TBD - How do I do ... ?

aircompressor's People

Contributors

Stargazers

Watchers

Forkers

martint dain strategist922 electrum harta01 raghavsethi dmlap youknowjack0 leesong-stack buptpatriot highker dabaitu donglua zjpjohn lizunmvn archer-christ bogste erzuozhou luohao jackge1979 tool-recommender-bot yaccc prsantos78 zbyszkop goodstar deltixlab jecyhw coffeelatte007 merlimat prestodb nickik mariusvolkhart mcraiha shangxinli praveen2112 findepi vvvvalvalval adedayoominiyi corporateadedayo learn2pro hixio-mh gitchong luyuncheng zhengxiaomei123 yuanyuanlove jimmyzz405 isabella232 ibelee shadi jacobtolar hedgehogcode dweiss christma asad-awadia xxdark heary-cloud jaogoy zhengkw since1886 cornelcreanga teamquantumfusion kaori-seasons wormtql bcmcmill calvinkirs dinocassowary frankfanslc joke007 wendigo y-scope refacktor misselvexu kkxiaotikk fgwe tgregg ebyhr dlazerka arnej27959 wsry sineaggi gaokechen sshyran tom916 bholdt bulksecuritygeneratorprojectv2 zzzzming95 thentschel aicas fokko jason-x-xu iq-scm lrs121 mattyb149 owencwl nguyennp zongoforspeed tomscut rohankumardubey mszabo-wikia tianjiqx

aircompressor's Issues

Add baseline "memcopy" benchmark

Streaming compression/decompression for Zstd

Hi. Is there support for Zstd streaming compression/decompression in this library (like in native zstd's streaming_compression/streaming_decompression examples)?

If not, can you mark this as an enhancement request please?

Add `SECURITY.md` file and enable vulnerability reporting

Hello,
could you please add a SECURITY.md file to your repository, and ideally also enable private vulnerability reporting?

Decompression: How to determine the size of the output buffer?

I'm trying out this library for snappy compression and decompression. Currently I'm using the Xerial Snappy library. This may be a very dumb question, but I can't seem to figure this out:

How do we know the required size of the output buffer?

public ByteArray decompress(ByteArray compressed){
    var decompressor = SnappyDecompressor();
    var outputBuffer = ByteBuffer.allocate(?????);
    decompressor.decompress(ByteBuffer.wrap(compressed), outputBuffer);
    return outputBuffer.array()
}

ByteBuffers by design cannot resize dynamically. And I don't know anything about the input array, except that it has been previously compressed by snappy. So... how should I size the output buffer?

.git directory could not be found

I'm getting this error when I try to build:
Failed to execute goal pl.project13.maven:git-commit-id-plugin:2.1.13:revision (default) on project aircompressor: .git directory could not be found! Please specify a valid [dotGitDirectory] in your pom.xml

compile and generate class files for the given Java implementaion.

Hi Team,

I would like to generate the class files for any of the compression techniques implemented here in pure Java.
Can you help me with the steps I can do that?

ZstdOutputStream writes empty stream

io.airlift.aircompressor v0.24
JDK 11.0.16.1 (Amazon Coretto)
Mac OS 12.5

When I write bytes through ZstdOutputStream it pushes none of the compressed byes (or any bytes). Analogous code with Snappy works ok. Compressing bytes via new ZstdCompressor() then calling compress() method works ok.

Code example:

byte[] someTextBytes = new byte[]{90,115,116,100,79,117,116,112,117,116,83,116,114,101,97,109,32,
                    105,115,32,110,111,116,32,119,111,114,107,105,110,103,44,32,98,117,116,32,83,110,97,112,
                    112,121,70,114,97,109,101,100,79,117,116,112,117,116,83,116,114,101,97,109,32,105,115,
                    32,119,111,114,107,105,110,103,46,90,115,116,100,79,117,116,112,117,116,83,116,114,101,97,109,32,
                    105,115,32,110,111,116,32,119,111,114,107,105,110,103,44,32,98,117,116,32,83,110,97,112,
                    112,121,70,114,97,109,101,100,79,117,116,112,117,116,83,116,114,101,97,109,32,105,115,
                    32,119,111,114,107,105,110,103,46};

System.out.println("\n\nUnCompressed bytes to SnappyCompressOut -> System.out:");
System.out.println("----------------------------------------------------------");
SnappyFramedOutputStream snappyOS = new SnappyFramedOutputStream(System.out);
snappyOS.write(someTextBytes);
snappyOS.flush();
System.out.println("\n----------------------------------------------------------");
System.out.println("UnCompressed bytes to ZstdCompressOut -> System.out:");
System.out.println("----------------------------------------------------------");
ZstdOutputStream zstdOS = new ZstdOutputStream(System.out);
zstdOS.write(someTextBytes);
zstdOS.flush();
System.out.println("\n----------------------------------------------------------");

Result:

UnCompressed bytes to SnappyCompressOut -> System.out:
----------------------------------------------------------
)�.�I��I
----------------------------------------------------------
UnCompressed bytes to ZstdCompressOut -> System.out:
----------------------------------------------------------

----------------------------------------------------------

I tried file output stream, also tried wrapping various out streams in BufferedOutputStream, just to see where the issue lies. It looks like issue lies in private ZstdOutputStream.compressIfNecessary() where it decides not to .writeChunk()

Version 0.19 can't work with heap buffer on JDK1.8

To reproduce:

ByteBuf source = ByteBufAllocator.DEFAULT.heapBuffer(1024);
byte[] bytes = new byte[1024];
source.writeBytes(bytes); 
ZstdCompressor compressor = new ZstdCompressor();
int maxLength = compressor.maxCompressedLength(source.readableBytes());
ByteBuf target = ByteBufAllocator.DEFAULT.heapBuffer(10240);
compressor.compress(source.nioBuffer(), target.nioBuffer(0, maxLength));

Output:

/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/bin/java -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:51570,suspend=y,server=n -javaagent:/Users/lipenghui/Library/Caches/JetBrains/IntelliJIdea2021.1/captureAgent/debugger-agent.jar -Dfile.encoding=UTF-8 -classpath /Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/deploy.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/javaws.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/jfxswt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/plugin.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/ant-javafx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/javafx-mx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/jconsole.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/packager.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/sa-jdi.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/tools.jar:/Users/lipenghui/GitHub/pulsar-test/target/classes:/Users/lipenghui/.m2/repository/org/apache/avro/avro/1.9.2/avro-1.9.2.jar:/Users/lipenghui/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.10.2/jackson-core-2.10.2.jar:/Users/lipenghui/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.10.2/jackson-databind-2.10.2.jar:/Users/lipenghui/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.10.2/jackson-annotations-2.10.2.jar:/Users/lipenghui/.m2/repository/org/apache/commons/commons-compress/1.19/commons-compress-1.19.jar:/Users/lipenghui/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/pulsar-client-original/2.9.0-SNAPSHOT/pulsar-client-original-2.9.0-SNAPSHOT.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/pulsar-client-api/2.9.0-SNAPSHOT/pulsar-client-api-2.9.0-SNAPSHOT.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/pulsar-common/2.9.0-SNAPSHOT/pulsar-common-2.9.0-SNAPSHOT.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/pulsar-client-admin-api/2.9.0-SNAPSHOT/pulsar-client-admin-api-2.9.0-SNAPSHOT.jar:/Users/lipenghui/.m2/repository/io/swagger/swagger-annotations/1.6.2/swagger-annotations-1.6.2.jar:/Users/lipenghui/.m2/repository/com/google/guava/guava/30.1-jre/guava-30.1-jre.jar:/Users/lipenghui/.m2/repository/com/google/guava/failureaccess/1.0.1/failureaccess-1.0.1.jar:/Users/lipenghui/.m2/repository/com/google/guava/listenablefuture/9999.0-empty-to-avoid-conflict-with-guava/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/Users/lipenghui/.m2/repository/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar:/Users/lipenghui/.m2/repository/org/checkerframework/checker-qual/3.5.0/checker-qual-3.5.0.jar:/Users/lipenghui/.m2/repository/com/google/errorprone/error_prone_annotations/2.3.4/error_prone_annotations-2.3.4.jar:/Users/lipenghui/.m2/repository/com/google/j2objc/j2objc-annotations/1.3/j2objc-annotations-1.3.jar:/Users/lipenghui/.m2/repository/io/netty/netty-transport-native-unix-common/4.1.66.Final/netty-transport-native-unix-common-4.1.66.Final-linux-x86_64.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/cpu-affinity/4.14.1/cpu-affinity-4.14.1.jar:/Users/lipenghui/.m2/repository/io/airlift/aircompressor/0.19/aircompressor-0.19.jar:/Users/lipenghui/.m2/repository/io/netty/netty-codec-haproxy/4.1.66.Final/netty-codec-haproxy-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/org/eclipse/jetty/jetty-util/9.4.42.v20210604/jetty-util-9.4.42.v20210604.jar:/Users/lipenghui/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-yaml/2.12.3/jackson-dataformat-yaml-2.12.3.jar:/Users/lipenghui/.m2/repository/org/yaml/snakeyaml/1.27/snakeyaml-1.27.jar:/Users/lipenghui/.m2/repository/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/pulsar-transaction-common/2.9.0-SNAPSHOT/pulsar-transaction-common-2.9.0-SNAPSHOT.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/bouncy-castle-bc/2.9.0-SNAPSHOT/bouncy-castle-bc-2.9.0-SNAPSHOT-pkg.jar:/Users/lipenghui/.m2/repository/io/netty/netty-codec-http/4.1.66.Final/netty-codec-http-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-common/4.1.66.Final/netty-common-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-buffer/4.1.66.Final/netty-buffer-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-transport/4.1.66.Final/netty-transport-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-codec/4.1.66.Final/netty-codec-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-handler-proxy/4.1.66.Final/netty-handler-proxy-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-codec-socks/4.1.66.Final/netty-codec-socks-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-resolver-dns/4.1.66.Final/netty-resolver-dns-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-resolver/4.1.66.Final/netty-resolver-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-codec-dns/4.1.66.Final/netty-codec-dns-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/org/apache/commons/commons-lang3/3.11/commons-lang3-3.11.jar:/Users/lipenghui/.m2/repository/org/asynchttpclient/async-http-client/2.12.1/async-http-client-2.12.1.jar:/Users/lipenghui/.m2/repository/org/asynchttpclient/async-http-client-netty-utils/2.12.1/async-http-client-netty-utils-2.12.1.jar:/Users/lipenghui/.m2/repository/org/reactivestreams/reactive-streams/1.0.3/reactive-streams-1.0.3.jar:/Users/lipenghui/.m2/repository/com/typesafe/netty/netty-reactive-streams/2.0.4/netty-reactive-streams-2.0.4.jar:/Users/lipenghui/.m2/repository/com/sun/activation/javax.activation/1.2.0/javax.activation-1.2.0.jar:/Users/lipenghui/.m2/repository/commons-codec/commons-codec/1.15/commons-codec-1.15.jar:/Users/lipenghui/.m2/repository/com/yahoo/datasketches/sketches-core/0.8.3/sketches-core-0.8.3.jar:/Users/lipenghui/.m2/repository/com/yahoo/datasketches/memory/0.8.3/memory-0.8.3.jar:/Users/lipenghui/.m2/repository/com/google/code/gson/gson/2.8.6/gson-2.8.6.jar:/Users/lipenghui/.m2/repository/org/apache/avro/avro-protobuf/1.10.2/avro-protobuf-1.10.2.jar:/Users/lipenghui/.m2/repository/com/fasterxml/jackson/module/jackson-module-jsonSchema/2.12.3/jackson-module-jsonSchema-2.12.3.jar:/Users/lipenghui/.m2/repository/javax/validation/validation-api/1.1.0.Final/validation-api-1.1.0.Final.jar:/Users/lipenghui/.m2/repository/net/jcip/jcip-annotations/1.0/jcip-annotations-1.0.jar:/Users/lipenghui/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.10.0/log4j-slf4j-impl-2.10.0.jar:/Users/lipenghui/.m2/repository/org/apache/logging/log4j/log4j-api/2.10.0/log4j-api-2.10.0.jar:/Users/lipenghui/.m2/repository/org/apache/logging/log4j/log4j-core/2.10.0/log4j-core-2.10.0.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/bookkeeper-server/4.11.0/bookkeeper-server-4.11.0.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/bookkeeper-common/4.11.0/bookkeeper-common-4.11.0.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/stats/bookkeeper-stats-api/4.11.0/bookkeeper-stats-api-4.11.0.jar:/Users/lipenghui/.m2/repository/org/jctools/jctools-core/2.1.2/jctools-core-2.1.2.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/bookkeeper-common-allocator/4.11.0/bookkeeper-common-allocator-4.11.0.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/bookkeeper-proto/4.11.0/bookkeeper-proto-4.11.0.jar:/Users/lipenghui/.m2/repository/com/google/protobuf/protobuf-java/3.5.1/protobuf-java-3.5.1.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/bookkeeper-tools-framework/4.11.0/bookkeeper-tools-framework-4.11.0.jar:/Users/lipenghui/.m2/repository/org/rocksdb/rocksdbjni/5.13.1/rocksdbjni-5.13.1.jar:/Users/lipenghui/.m2/repository/org/apache/zookeeper/zookeeper/3.5.7/zookeeper-3.5.7.jar:/Users/lipenghui/.m2/repository/org/apache/zookeeper/zookeeper-jute/3.5.7/zookeeper-jute-3.5.7.jar:/Users/lipenghui/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/Users/lipenghui/.m2/repository/io/netty/netty-handler/4.1.32.Final/netty-handler-4.1.32.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-transport-native-epoll/4.1.32.Final/netty-transport-native-epoll-4.1.32.Final-linux-x86_64.jar:/Users/lipenghui/.m2/repository/io/netty/netty-transport-native-unix-common/4.1.32.Final/netty-transport-native-unix-common-4.1.32.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-tcnative-boringssl-static/2.0.20.Final/netty-tcnative-boringssl-static-2.0.20.Final.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/http/http-server/4.11.0/http-server-4.11.0.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/circe-checksum/4.11.0/circe-checksum-4.11.0.jar:/Users/lipenghui/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/Users/lipenghui/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/Users/lipenghui/.m2/repository/org/apache/commons/commons-collections4/4.1/commons-collections4-4.1.jar:/Users/lipenghui/.m2/repository/org/bouncycastle/bcpkix-jdk15on/1.60/bcpkix-jdk15on-1.60.jar:/Users/lipenghui/.m2/repository/org/bouncycastle/bcprov-jdk15on/1.60/bcprov-jdk15on-1.60.jar:/Users/lipenghui/.m2/repository/org/bouncycastle/bcprov-ext-jdk15on/1.60/bcprov-ext-jdk15on-1.60.jar:/Users/lipenghui/.m2/repository/com/beust/jcommander/1.48/jcommander-1.48.jar:/Users/lipenghui/.m2/repository/net/java/dev/jna/jna/3.2.7/jna-3.2.7.jar:/Users/lipenghui/.m2/repository/org/apache/httpcomponents/httpclient/4.5.5/httpclient-4.5.5.jar:/Users/lipenghui/.m2/repository/org/apache/httpcomponents/httpcore/4.4.9/httpcore-4.4.9.jar:/Users/lipenghui/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar:/Users/lipenghui/.m2/repository/commons-configuration/commons-configuration/1.10/commons-configuration-1.10.jar:/Users/lipenghui/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/Users/lipenghui/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/211.7628.21/IntelliJ IDEA.app/Contents/lib/idea_rt.jar io.streamnative.test.AirCompressorTest
Connected to the target VM, address: '127.0.0.1:51570', transport: 'socket'
Exception in thread "main" java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
	at io.airlift.compress.zstd.ZstdCompressor.compress(ZstdCompressor.java:101)
	at io.streamnative.test.AirCompressorTest.main(AirCompressorTest.java:19)

Context

JDK: 1.8.0_281
Aircompressor: 0.19

ZstdCompressor OutputStream and InputStream

Similar to zstd-jni and apache commons libraries, they offer OuputStream and InputStream classes for zstd compression. This issue is similar to the enhancement feature request brought up in 2020 from issue 112. Would this project welcome this contribution now or have the authors already have this feature implemented but not released? If it's the former, I am happy to contribute this functionality to this project.

Thoughts on using ByteBuffer instead of UNSAFE in zstd?

Hi,
Thanks for the tool!

I'm wondering if there are any pitfalls or other considerations in trying to implement the same thing, but instead of UNSAFE.* family of functions -- using standard ByteBuffers? Seems like a straightforward change, but maybe I'm missing something.

Pros would be:

No dependency on sun.misc.* which prevents using modern JDK --release for modern versions (sun.misc is unavailable).
Potentially easier on AOT compilation.
In some sense it would be more "pure java".
Would support sliced heap byte buffers (with .arrayOffset > 0).

The code of course supports ByteBuffers, but only with zero arrayOffset, and internally it just uses them just as a byte[].

JNI error

I tried running:
java -cp aircompressor-0.12-SNAPSHOT-tests.jar io.airlift.compress.benchmark.CompressionBenchmark

Got the following error:
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/openjdk/jmh/runner/options/Options
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:522)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:504)
Caused by: java.lang.ClassNotFoundException: org.openjdk.jmh.runner.options.Options
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

ZstdCompressor.maxCompressedLength is unnecessarily pessimistic

ZstdCompressor.maxCompressedLength isn't an accurate translation of the ZSTD_COMPRESSBOUND macro due to some missing & misplaced parenthesis. I believe the following diff would be more accurate. Not sending a PR because I haven't looked into tests.

diff --git a/src/main/java/io/airlift/compress/zstd/ZstdCompressor.java b/src/main/java/io/airlift/compress/zstd/ZstdCompressor.java
index 23ace52..b714d63 100644
--- a/src/main/java/io/airlift/compress/zstd/ZstdCompressor.java
+++ b/src/main/java/io/airlift/compress/zstd/ZstdCompressor.java
@@ -30,7 +30,7 @@ public class ZstdCompressor
         int result = uncompressedSize + (uncompressedSize >>> 8);
 
         if (uncompressedSize < MAX_BLOCK_SIZE) {
-            result += MAX_BLOCK_SIZE - (uncompressedSize >>> 11);
+            result += ((MAX_BLOCK_SIZE - uncompressedSize) >>> 11);
         }
 
         return result;

Snappy Decompress throw MalformedInputException

my code:

    private final static Compressor compressor = new SnappyCompressor();
    private final static Decompressor decompressor = new SnappyDecompressor();

    public byte[] compress(String s) {
        ByteBuffer byteBuffer = ByteBuffer.wrap(s.getBytes(StandardCharsets.UTF_8));
        int length = compressor.maxCompressedLength(s.getBytes(StandardCharsets.UTF_8).length);
        ByteBuffer out = ByteBuffer.allocate(length);
        compressor.compress(byteBuffer, out);
        return out.array();
    }

    public String decompress(byte[] s) {
        ByteBuffer in = ByteBuffer.wrap(s);
        int length = SnappyDecompressor.getUncompressedLength(s, 0);
        ByteBuffer out = ByteBuffer.allocate(length);
        decompressor.decompress(in, out);
        return new String(out.array());
    }


 public static void main(String[] args) {
        String str = "123";
        Logger log = LoggerFactory.getLogger("compress");
        log.info("char length={}，byte array length={}", str.length(), str.getBytes(StandardCharsets.UTF_8).length);
        MessageCompresser compresser = new LzwMessageCompresser();
        byte[] compressedBytes = compresser.compress(str);
        String value = new String(compressedBytes);
        log.info("compressed：char length={},byte array length={}", value.length(), value.getBytes(StandardCharsets.UTF_8).length);
        log.info("value={}", value);
        String strs = compresser.decompress(compressedBytes);
        log.info("restore：{}", strs);
    }

why is MalformedInputException thrown when decompressing compressed bytes

Hive fails opening split for zst compressed files

This issue is copied from trinodb/trino#17792 since I believe this repo is where zstd de/compression is handled.

I have a Hive table built on top of zst compressed data. On a Trino 419 cluster I get the following error when trying to read this table from Trino. That version of Trino uses aircompressor 0.23.

Query 20230607_172621_00003_5hpz7 failed: Error opening Hive split s3://path/to/file.csv.access.log.zst (offset=0, length=1544108): Window size too large (not yet supported): offset=3084

We are currently running Trino 405 and this query executes without an issue. We also have been running previous versions of Trino/Presto and this executed without an issue in the past. That version of Trino uses 0.21

Did something change between aircompressor 0.21 and 0.24 that might have caused this? And is there anything I can do to get past this error? Thanks in advance for your help!

Full stack trace

io.trino.spi.TrinoException: Error opening Hive split s3://path/to/file.csv.access.log.zst (offset=0, length=1544108): Window size too large (not yet supported): offset=3084
at io.trino.plugin.hive.line.LinePageSourceFactory.createPageSource(LinePageSourceFactory.java:179)
at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:218)
at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:156)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:48)
at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:61)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:298)
at io.trino.operator.Driver.processInternal(Driver.java:402)
at io.trino.operator.Driver.lambda$process$8(Driver.java:305)
at io.trino.operator.Driver.tryWithLock(Driver.java:701)
at io.trino.operator.Driver.process(Driver.java:297)
at io.trino.operator.Driver.processForDuration(Driver.java:268)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:888)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:556)
at io.trino.$gen.Trino_18f7842____20230607_162211_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: io.airlift.compress.MalformedInputException: Window size too large (not yet supported): offset=3084
at io.airlift.compress.zstd.Util.verify(Util.java:45)
at io.airlift.compress.zstd.ZstdFrameDecompressor.decodeCompressedBlock(ZstdFrameDecompressor.java:303)
at io.airlift.compress.zstd.ZstdIncrementalFrameDecompressor.partialDecompress(ZstdIncrementalFrameDecompressor.java:236)
at io.airlift.compress.zstd.ZstdInputStream.read(ZstdInputStream.java:89)
at io.airlift.compress.zstd.ZstdHadoopInputStream.read(ZstdHadoopInputStream.java:53)
at com.google.common.io.CountingInputStream.read(CountingInputStream.java:64)
at java.base/java.io.InputStream.readNBytes(InputStream.java:506)
at io.trino.hive.formats.line.text.TextLineReader.fillBuffer(TextLineReader.java:248)
at io.trino.hive.formats.line.text.TextLineReader.(TextLineReader.java:67)
at io.trino.hive.formats.line.text.TextLineReaderFactory.createLineReader(TextLineReaderFactory.java:77)
at io.trino.plugin.hive.line.LinePageSourceFactory.createPageSource(LinePageSourceFactory.java:171)
... 17 more

Lzop Codec in Apache Spark

Is there any documentation to use LzopCodec codec in apache-spark ?

Create a tag for the 0.3 release.

I see a 0.3 release in Maven central, but there isn't a tag of it on github.

Can you make the release tag?

How should ZstdDecompressor.getDecompressedSize work?

Hi,

I am trying to port ZstdDecompressor to C# and I have hard time understanding how ZstdDecompressor.getDecompressedSize should work.

If I modify testDecompressWithOutputPaddingAndChecksum() to following

public void testDecompressWithOutputPaddingAndChecksum()
            throws IOException
    {
        int padding = 1021;

        byte[] compressed = Resources.toByteArray(getClass().getClassLoader().getResource("data/zstd/with-checksum.zst"));
        byte[] uncompressed = Resources.toByteArray(getClass().getClassLoader().getResource("data/zstd/with-checksum"));

        byte[] output = new byte[uncompressed.length + padding * 2]; // pre + post padding
        int decompressedSize = getDecompressor().decompress(compressed, 0, compressed.length, output, padding, output.length);

        long decompressedSize2 = ZstdDecompressor.getDecompressedSize(compressed, 0, compressed.length);

        assertEquals(decompressedSize2, 11359, "Should be equal");
        assertByteArraysEqual(uncompressed, 0, uncompressed.length, output, padding, decompressedSize);
    }

then the decompressedSize2 has value -1 and test fails. Am I missing something in here?

HadoopLzoCompressor vs LzoCompressor?

We are trying to add LzoCodec to Apache Hadoop based on the implementation of aircompressor. apache/hadoop#2159

When we try to integrate it into Hadoop, we get couple tests failures due to java.lang.UnsupportedOperationException: LZO block compressor is not supported. We find it's because in LzoCodec in aircompressor, we have a static class HadoopLzoCompressor that returns dummy implementation when getCompressor is called. Why don't we return LzoCompressor instead?

Set up OSS-Fuzz

What do you think about setting up OSS-Fuzz for this project?

Given that aircompressor uses sun.misc.Unsafe quite a lot, it is probably important that all of this usage is safe since otherwise the JVM could crash, or worse. OSS-Fuzz might be able to help find issues with that.

I assume aircompressor fulfills the requirements to be included into OSS-Fuzz, but that can also be clarified with the maintainers beforehand.

For Zstd it might be necessary to add a hook to disable checksum verification, otherwise fuzzing might not be that effective there, see jazzer documentation for some information. I don't have any experience with that yet.

Fuzzing with OSS-Fuzz / jazzer might not support detecting out-of-bounds Unsafe reads and writes yet though, see CodeIntelligenceTesting/jazzer#891.

java 9 illegal reflective access waring

warning on java 9:
(probably known as the class is called UnsafeUtil)

WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by io.airlift.compress.snappy.UnsafeUtil (.../repository/io/airlift/aircompressor/0.9/aircompressor-0.9.jar) to field java.nio.Buffer.address WARNING: Please consider reporting this to the maintainers of io.airlift.compress.snappy.UnsafeUtil WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release

Zstd compressionLevel

Currently the Zstd compressionLevel is hard-coded, Can you add a method to modify compressionLevel? Thx.

Active development? Enhance ZstdFrameDecoder for max awesomeness

Currently the ZstdFrameDecoder has a lot of the functionality required for someone to build a streaming decoder (thinking to integrate into Netty channel). However, it could use a little refactoring to give visibility into the block header data and critically, https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/zstd/ZstdFrameDecompressor.java is not a public class!

If this library is meant to be actively maintained, I would be happy to contribute, but won't if it's unlikely to get merged or be supported.

SIG_BUS on SunOS pm1 5.10 Generic_150400-59 sun4v sparc sun4v

This happened with aircompressor-0.9.
is aircompressor not supported on 64 bit sparc platforms?

Here is the native stack:

----------------- lwp# 2 / thread# 2 --------------------
ffffffff7e7dccbc _lwp_kill (6, 0, ffffffff7e949968, ffffffffffffffff, ffffffff7e93e000, 0) + 8
ffffffff7e74c250 abort (1, 1d8, ffffffff7db9b1cc, 1f1ebc, 0, 0) + 118
ffffffff7db84a48 _1cCosFabort6Fb_v (1, 1, 4b318, ffffffff7e0dda20, 55902c, 4b000) + 58
ffffffff7ddfb2d4 _1cHVMErrorOreport_and_die6M_v (1, ffffffff7e1eeb35, 100110800, ffffffff7db98100, ffffffff7e2d5a80, ffffffff7e2751c0) + 10ac
ffffffff7db9574c JVM_handle_solaris_signal (a, ffffffff7affa2a0, ffffffff7dd7cff0, ffffffff7aff9b30, ffffffffffaaf928, ffffffff7aff9fc0) + c0c
ffffffff7db8d364 signalHandler (a, ffffffff7affa2a0, ffffffff7aff9fc0, ffffffff7e94ec38, 1001042d0, 2) + 1c
ffffffff7e7d8d6c __sighndlr (a, ffffffff7affa2a0, ffffffff7aff9fc0, ffffffff7db8d348, 0, 9) + c
ffffffff7e7cc8d4 call_user_handler (ffffffff7e600a00, ffffffff7e600a00, ffffffff7aff9fc0, c, 0, 0) + 3e0
ffffffff7e7ccae0 sigacthandler (0, ffffffff7affa2a0, ffffffff7aff9fc0, ffffffff7e600a00, 0, ffffffff7e93e000) + 68
--- called from signal handler with signal 10 (SIGBUS) ---
ffffffff7dd7cff0 Unsafe_GetInt (ffffffff7affc5e8, ffffffff7e26f240, ffffffff7affa510, 11, 100110800, 191800) + 174
ffffffff6bb450a0 * *sun/misc/Unsafe.getInt(Ljava/lang/Object;J)I [compiled]
ffffffff6b007b18 * io/airlift/compress/snappy/SnappyRawCompressor.compress(Ljava/lang/Object;JJLjava/lang/Object;JJ[S)I+462
ffffffff6b007b18 * io/airlift/compress/snappy/SnappyCompressor.compress([BII[BII)I+57
ffffffff6b007b18 * io/airlift/compress/snappy/SnappyFramedOutputStream.writeCompressed([BII)V+59
ffffffff6b008068 * io/airlift/compress/snappy/SnappyFramedOutputStream.flushBuffer()V+34
ffffffff6b008068 * io/airlift/compress/snappy/SnappyFramedOutputStream.flush()V+32

Implement other ZSTD compression levels

Currently only the default level is implemented, and before compression level can be set by end users the other strategies must be implemented.

Android

Is it made for working on android devices?

I'm getting

java.lang.NoSuchFieldError: No static field ARRAY_BYTE_BASE_OFFSET of type I in class Lsun/misc/Unsafe; or its superclasses (declaration of 'sun.misc.Unsafe' appears in /apex/com.android.art/javalib/core-oj.jar)

Lz4 double buffering

Hello first of all thanks a lot for your library already tested on snappy zstd and lz4 and indeed it's much faster. I read that lz4 compression with double buffering is much more efficient do you know if it's supported? I didn't try to implement yet it's just a genuine question before implementing it

API to compress to multiple buffers

The current API requires that the output of compress be put into a scratch buffer and copied over afterwards. It would be great if either we could provide a second buffer to use if needed or an allocator to create a new buffer. So something like:

void compress(ByteBuffer input, ByteBuffer output, ByteBuffer overflow);

where the compressed bytes get put into output until it is full and then put into overflow. Instead of requiring that output is large enough, it would require that output + overflow is large enough.

Tag for 0.19

It seems that there is no tag for 0.19 version. Could you make a tag for it?

https://github.com/airlift/aircompressor/tags

Support Zstd seekable format

Hi there,
I was wondering if there are any plans to support the additional functionality that the original zstd library has in /contrib (https://github.com/facebook/zstd/tree/dev/contrib/seekable_format).

It allows for random read access to files without the need to decompress the entire file. I am not familiar with C nor with the zstd codebase, but if someone would be willing to give me a bit of guidance I am happy to try and implement it myself.

Thanks

Configurable compression level in ZstdCompressor

That would be very helpful

LZO not compatible with Hadoop LZO

Comment from @nezihyigitbasi

Stepping through hadoop lzo impl. (which implements LZO1X) I noticed something different than aircompressor lzo (I don't really know whether aircompressor implements the same algorihtm). Hadoop lzo at the beginning reads two integers from the input stream (4 bytes for original block size + 4 bytes for compressed chunk length) then the rest of the stream is interpreted as compressed data, and it succeeds. I did the same and consumed 8 bytes before passing the data to aircompressor's lzo decompressor, then most of the test cases passed (there were still failures).

Anyway here is the minimal code that shows how airlift lzo decompressor fails while hadoop's lzo decomp. succeeds with the same input.

Recent test on the compressor comparison

Hi all, yesterday I ran a compressor comparison through the benchmark tool provided in our test code. The below table shows a part of the result, a more detailed log can be found at the end of this comment.

comparison ratio	compress	decompress
airlift-snappy/xerial-snappy	2.294	0.706
airlift-lz4/jpountz-lz4	0.86	1.07
airlift-lzo/hadoop-lzo	1.92	2.6
airlift-zstd/luben-zstd	0.998	1.08

Any comments are welcome.

The full table is here: compress_log.xlsx

The full runlog of the benchmark is here: compressor.log

ZStd JNI vs Aircompressor pure java performance question

OpenSearch is evaluating the pure java implementation of Zstd using AirCompressor. I stumbled on a thoughtful comment here which certainly aligns to the reasons we avoid jni code as "top level" modules or plugins on OpenSearch core (we have similar reasons on Lucene). We realize that comment is now five years old, and so @reta ran benchmarks that seem to show significant performance differences between the pure java implementation (hotspot has gotten better of course) and jni. Do these number look valid to folks on this project? What are the potential pitfalls to running those benchmarks that we need to be aware of? Are there certain config conditions that should be followed to squeeze better performance?

Thanks in advance for any assistance that can be provided.

Big Endian support missing in snappy

Current codebase does UNSAFE.getXXX and UNSAFE.putXXX for short, int and long datatypes.
The subsequent code the only works on LITTLE_ENDIAN

You need to conditionally do reversebytes() for these methods for this to work on BIG_ENDIAN machines
ie translate to LTTLE_ENDIAN on getInt etc, and translate from LITTLE_ENDIAN on putInt etc

I posted the solution to identical issue in snappy-java project, which you guys should ideally duplicate

xerial/snappy-java#254

LzopCodec (.lzo) failure: "Unsupported LZO flags 50331649"

What I did:

Take Hive from Presto product tests (presto-product-tests/conf/docker/singlenode/compose.sh up, i.e. hdp2.6-hive)
yum install -y lzop
create a text file with abc\n and compress it with lzop -o output.lzo inputfile
create a table with format = 'TEXTFILE' in Presto, add the file to it

Observed

Hive can read the file if i add .m2/repository/org/anarres/lzo/lzo-core/1.0.5/lzo-core-1.0.5.jar and .m2/repository/org/anarres/lzo/lzo-hadoop/1.0.5/lzo-hadoop-1.0.5.jar to classpath and enable the codec in site.xml
Presto fails with:

Caused by: java.io.IOException: Unsupported LZO flags 50331649
	at io.airlift.compress.lzo.HadoopLzopInputStream.<init>(HadoopLzopInputStream.java:93)
	at io.airlift.compress.lzo.LzopCodec.createInputStream(LzopCodec.java:91)
	at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:122)
	at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
	at io.prestosql.plugin.hive.HiveUtil.createRecordReader(HiveUtil.java:220)
	... 20 more

Zstd Hadoop compression codec?

AFAICT, there is no Hadoop codec for Zstd in this library, despite what the README says, is this correct?

Remove dependency on sun.misc.Unsafe?

As I understand it sun.misc.Unsafe is removed in JDK17 and later - I wonder if there is any plans to remove the use of this class in air compressor allowing it to be used with also the latest JDKs?

Lz4Codec fails on silesia/x-ray

The Hadoop version of Lz4Codec can not decode silesia/x-ray, compressed by the Airlift Lz4Codec. This appears to be a problem with the native Lz4 code in Hadoop, but it could be a problem with the Airlift Lz4 block compressor, or the Airlift implementation of the Hadoop block stream encoding.

To reproduce remove, check in TestLz4Codec.testCompress in #34

Missing documentation

As far as I can tell, this project has practically zero documentation, and I have no idea how I would go about using it, or even whether this is anything that is usable at all or a work in progress, a proof of concept or some companies' internal code. Am I missing something or looking at the wrong place ? If there is documentation, a link should be placed on the front page/readme.MD

Add ZStandard compressor

[question] is dictionary compression/decompression supported in zstd implementation?

Add support for Brotli compression algorithm

Implement Hadoop CompressionInputStream resetState

Hi,

Please remove the below exception. It is ok to reset state that is not existing :)
Otherwise your SnappyCodec will not work with Hadoop's SequenceFile implementation.

java.lang.UnsupportedOperationException: resetState not supported for Snappy
	at io.airlift.compress.snappy.HadoopSnappyInputStream.resetState(HadoopSnappyInputStream.java:81)
	at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:2134)
	at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2217)
	at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:78)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

The result is different

Why is my data compressed in java different from the data compressed by the lzo library implemented by ANSI C using jni on the mobile side

Make zstd compressionLevel configurable

Current ZstdCompressor hardcode the compressionLevel as CompressionParameters.DEFAULT_COMPRESSION_LEVEL. It needs to be configuable.

Presto 0.157.1 + Lzop: NullPointerException

Hey guys,

So I've been successfully using your library with EMR (emr-5.3.1) & Hive (2.1.1) with LZOP_X1 (no constraints) and now moving to Presto (0.157.1) I get the following stack trace:

com.facebook.presto.spi.PrestoException: java.lang.reflect.InvocationTargetException
	at com.facebook.presto.hive.HiveSplitSource.propagatePrestoException(HiveSplitSource.java:137)
	at com.facebook.presto.hive.HiveSplitSource.isFinished(HiveSplitSource.java:115)
	at com.facebook.presto.split.ConnectorAwareSplitSource.isFinished(ConnectorAwareSplitSource.java:63)
	at com.facebook.presto.split.BufferingSplitSource.fetchSplits(BufferingSplitSource.java:59)
	at com.facebook.presto.split.BufferingSplitSource.lambda$fetchSplits$1(BufferingSplitSource.java:65)
	at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160)
	at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:276)
	at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:246)
	at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:78)
	at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:179)
	at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
	at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$1(ResumableTasks.java:33)
	... 4 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:273)
	... 9 more
Caused by: java.lang.NullPointerException
	at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:101)
	... 14 more

Now, The query I'm getting this exception with works well in Hive. It's basically:
select * from table limit 10;

I've added an .lzo.index near my lzop file in S3 but to no eval.

As far as I can tell, DeprecatedLzoTextInputFormat.class has a member called indexes which, if not populated well, get NPE here: https://github.com/twitter/hadoop-lzo/blob/master/src/main/java/com/hadoop/mapred/DeprecatedLzoTextInputFormat.java#L101
As no check is begin made on LzoIndex index.

Now, I presumed with your library I could pass on that check by it seems like it's not working.
I'm using aircompressor-0.9.jar. I've copied it to /usr/lib/presto/plugin/hive-hadoop2 and removed any older version that was in there.

I am confident that your code is actually called (from the stack trace, and many many tests I've done with and without aircompressor jar).

So for my question: Did you guys ever managed to resolve this?

Relevant EMR cluster configuration:

{
    "classification": "core-site",
    "properties": {
      "io.compression.codec.lzo.class": "io.airlift.compress.lzo.LzopCodec",
      "io.compression.codecs": "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,io.airlift.compress.lzo.LzoCodec,io.airlift.compress.lzo.LzopCodec"
    },
    "configurations": []
}

Thank you very much!

Idan

Zstd Decompress does support files compressed by newer C version?

I was recently trying to use this library to decode a compressed HDF5 file, the compression side uses the native library version 1.4.5, when I tried to decode it reports "Input is corrupted: offset=2305". Then I switched to zstd-jni, it worked. I wonder if the later versions or some strategies (or levels) are not supported?

Got "Overflow detected" at presto-orc with zstd

I got "Overflow detected" at creating an ORC file through presto-orc from large input data so I could not easily reproduce. That might be because I'm using a bit old version of presto 317.

java.lang.IllegalStateException: Overflow detected
	at io.airlift.compress.zstd.Util.checkState(Util.java:59)
	at io.airlift.compress.zstd.BitOutputStream.close(BitOutputStream.java:85)
	at io.airlift.compress.zstd.HuffmanCompressor.compressSingleStream(HuffmanCompressor.java:130)
	at io.airlift.compress.zstd.HuffmanCompressor.compress4streams(HuffmanCompressor.java:75)
	at io.airlift.compress.zstd.ZstdFrameCompressor.encodeLiterals(ZstdFrameCompressor.java:333)
	at io.airlift.compress.zstd.ZstdFrameCompressor.compressBlock(ZstdFrameCompressor.java:224)
	at io.airlift.compress.zstd.ZstdFrameCompressor.compressFrame(ZstdFrameCompressor.java:172)
	at io.airlift.compress.zstd.ZstdFrameCompressor.compress(ZstdFrameCompressor.java:145)
	at io.airlift.compress.zstd.ZstdCompressor.compress(ZstdCompressor.java:45)
	at io.prestosql.orc.OrcOutputBuffer.writeChunkToOutputStream(OrcOutputBuffer.java:445)
	at io.prestosql.orc.OrcOutputBuffer.flushBufferToOutputStream(OrcOutputBuffer.java:425)
	at io.prestosql.orc.OrcOutputBuffer.close(OrcOutputBuffer.java:146)
	at io.prestosql.orc.stream.LongOutputStreamV2.close(LongOutputStreamV2.java:739)
	at io.prestosql.orc.writer.SliceDirectColumnWriter.close(SliceDirectColumnWriter.java:139)
	at io.prestosql.orc.writer.SliceDictionaryColumnWriter.close(SliceDictionaryColumnWriter.java:324)
	at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407)
	at io.prestosql.orc.OrcWriter.bufferStripeData(OrcWriter.java:369)
	at io.prestosql.orc.OrcWriter.flushStripe(OrcWriter.java:331)
	at io.prestosql.orc.OrcWriter.close(OrcWriter.java:444)

But I have a question on the these two lines as I'm not familiar with the zstd. If currentAddress < outputLimit is expected at line 85, currentAddress = outputLimit looks confusing at line 73. Just raising an exception here might be less confusing.

https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/zstd/BitOutputStream.java#L73

https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/zstd/BitOutputStream.java#L85

Add compressed size to compress benchmark

Does not work with java 16+?

Hi,

We are very happy users of your library, many thanks for this awesome piece of code.

We are however having initialization issues on machines using java 16 and above, which seems to be caused by a change in nio.Buffer. A StackTrace is below. Do you have an idea on how to circumvent this?

Many thanks for your help,

Marc

Exception in thread "ImportThread" java.lang.ExceptionInInitializerError
	at io.airlift.compress.zstd.ZstdFrameCompressor.writeMagic(ZstdFrameCompressor.java:57)
	at io.airlift.compress.zstd.ZstdFrameCompressor.compress(ZstdFrameCompressor.java:143)
	at io.airlift.compress.zstd.ZstdCompressor.compress(ZstdCompressor.java:45)
	...
        Suppressed: java.lang.NoClassDefFoundError: Could not initialize class io.airlift.compress.zstd.UnsafeUtil
		at io.airlift.compress.zstd.ZstdFrameCompressor.writeMagic(ZstdFrameCompressor.java:57)
		at io.airlift.compress.zstd.ZstdFrameCompressor.compress(ZstdFrameCompressor.java:143)
		at io.airlift.compress.zstd.ZstdCompressor.compress(ZstdCompressor.java:45)
		...
Caused by: io.airlift.compress.IncompatibleJvmException: Zstandard requires access to java.nio.Buffer raw address field
	at io.airlift.compress.zstd.UnsafeUtil.<clinit>(UnsafeUtil.java:53)
	...