Giter Club home page Giter Club logo

yalzma's Introduction

Yet another LZMA Python wrapper

This time with support for LZMA_SYNC_FLUSH 🎉

Works directly with liblzma.so via ctypes. No other dependencies.

What is LZMA?

LZMA is a compression algorithm - like gzip or bzip2.

The default configuration (of both Python lzma module and this library) is to use the LZMA2 filter and XZ container format. So you can directly save the compressed data in a file with the .xz suffix and it will work with many other programs, utilities and systems. For example xzcat or xzgrep.

Wikipedia: Lempel–Ziv–Markov chain algorithm

Lot of software uses LZMA compression internally. For example many software package managers.

But why? Python standard library already contains lzma module…

Yes, but it does not support the SYNC FLUSH operation.

There is LZMACompressor.flush() but it does something different - it finishes the compression process and closes the compressor. It is not possible to compress more data after flush().

For some of my use cases I need to use "sync flush". The constant LZMA_SYNC_FLUSH does not even appear in the CPython source code.

What is LZMA_SYNC_FLUSH?

From lzma/base.h (by Lasse Collin, public domain):

        LZMA_SYNC_FLUSH = 1,
                /**<
                 * \brief       Make all the input available at output
                 *
                 * Normally the encoder introduces some latency.
                 * LZMA_SYNC_FLUSH forces all the buffered data to be
                 * available at output without resetting the internal
                 * state of the encoder. This way it is possible to use
                 * compressed stream for example for communication over
                 * network.
                 *
                 * Only some filters support LZMA_SYNC_FLUSH. Trying to use
                 * LZMA_SYNC_FLUSH with filters that don't support it will
                 * make lzma_code() return LZMA_OPTIONS_ERROR. For example,
                 * LZMA1 doesn't support LZMA_SYNC_FLUSH but LZMA2 does.
                 *
                 * Using LZMA_SYNC_FLUSH very often can dramatically reduce
                 * the compression ratio. With some filters (for example,
                 * LZMA2), fine-tuning the compression options may help
                 * mitigate this problem significantly (for example,
                 * match finder with LZMA2).
                 *
                 * Decoders don't support LZMA_SYNC_FLUSH.
                 */

Installation

Tested on Debian Linux and macOS.

You need to have liblzma installed - that means there should be a file liblzma.so (for Linux) or liblzma.dylib (for macOS) somewhere in a library directory (/usr/lib or similar). Usually it is already installed.

Install yalzma from current Github master:

$ pip install git+https://github.com/messa/yalzma

Install specific version:

$ pip install git+https://github.com/messa/[email protected]

Or add this line to your requirements.txt:

git+https://github.com/messa/[email protected]#egg=yalzma==0.0.4

Usage

from yalzma import LZMAEncoder
import lzma

text = b'Hello, World!'
enc = LZMAEncoder()
xz_data = enc.run(text)
xz_data += enc.finish()
assert lzma.decompress(xz_data) == text

Demonstration of the flush functionality:

from io import BytesIO

enc = LZMAEncoder()
xz_data = enc.run(b'first line\n')
xz_data += enc.sync_flush()
assert lzma.open(BytesIO(xz_data), mode='rb').readline() == b'first line\n'

xz_data += enc.run(b'second line\n')
xz_data += enc.finish()
assert lzma.decompress(xz_data) == b'first line\nsecond line\n'

yalzma's People

Contributors

messa avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.