Giter Club home page Giter Club logo

arithmetic-compressor's Introduction

Arithmetic Coding Library

Run tests

This library is an implementation of the Arithmetic Coding algorithm in Python, along with adaptive statistical data compression models like PPM (Prediction by Partial Matching), Context Mixing and Simple Adaptive models.

Installation

To install the library, you can use pip:

pip install arithmetic_compressor

Usage

from arithmetic_compressor import AECompressor
from arithmetic_compressor.models import StaticModel

# create the model
model = StaticModel({'A': 0.5, 'B': 0.25, 'C': 0.25})

# create an arithmetic coder
coder = AECompressor(model)

# encode some data
data = "AAAAAABBBCCC"
N = len(data)
compressed = coder.compress(data)

# print the compressed data
print(compressed) # => [0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1]

And here's an example of how to decode the encoded data:

decoded = coder.decompress(compressed, N)

print(decoded) # -> ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C']

API Reference

Arithmetic Compressing

  • AECompressor:
    • compress(data: list|str, model: Model) -> List[str]: Takes in a string or list representing the data, encodes the data using arithmetic coding then returns a string of bits.

    • decompress(encoded_data: List[str], length: int) -> List: Takes in an encoded string and the length of the original data and decodes the encoded data.

Models

In addition to the arithmetic coding algorithm, this library also includes adaptive statistical models that can be used to improve compression.

  • StaticModel: A class which implements a static model that doesn't adapt to input data or statistics.
  • BaseBinaryModel: A class which implements a simple adaptive compression algorithm for binary symbols (0 and 1)
  • BaseFrequencyTable: This implements a basic adaptive frequency table that incrementally adapts to input data.
  • SimpleAdaptiveModel: A class that implements a simple adaptive compression algorithm.
  • PPMModel: A class that implements the PPM compression algorithm.
  • MultiPPM: A class which uses weighted averaging to combine several PPM Models of different orders to make predictions.
  • BinaryPPM: A class that implements the PPM compression algorithm for binary symbols (0 and 1).
  • MultiBinaryPPM: A class which uses weighted averaging to combine several BinaryPPM models of different orders to make predictions.
  • ContextMixing_Linear: A class which implements the Linear Evidence Mixing variant of the Context Mixing compression algorithm.
  • ContextMixing_Logistic: A class which implements the Neural network (Logistic) Mixing variant of the Context Mixing compression algorithm.

All models implement these common methods:

  • update(symbol): Updates the models statistics given a symbol
  • probability(): Returns the probability of the next symbol
  • cdf(): Returns a cummulative distribution of the next symbol probabilities
  • test_model(): Tests the efficiency of the model to predict symbols

Models Usage

A closer look at all the models.

Simple Models

  • BaseFrequencyTable(symbol_probabilities: dict)
  • SimpleAdaptiveModel(symbol_probabilities: dict, adaptation_rate: float)

The Simple Adaptive models are models that adapts to the probability of a symbol based on the frequency of the symbol in the data.

Here's an example of how to use the Simple Adaptive models included in the library:

from arithmetic_compressor import AECompressor
from arithmetic_compressor.models import\
   BaseFrequencyTable,\
   SimpleAdaptiveModel

# create the model
# model = SimpleAdaptiveModel({'A': 0.5, 'B': 0.25, 'C': 0.25})
model = BaseFrequencyTable({'A': 0.5, 'B': 0.25, 'C': 0.25})

# create an arithmetic coder
coder = AECompressor(model)

# encode some data
data = "AAAAAABBBCCC"
compressed = coder.compress(data)

# print the compressed data
print(compressed) # => [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1]

The BaseFrequencyTable does an incremental adaptation to adapt to the statistics of the input data while the SimpleAdaptiveModel is essentially an exponential moving average that adapts to input data relative to the adaptation_rate.

PPM models

https://en.wikipedia.org/wiki/Prediction_by_partial_matching

  • PPMModel(symbols: list, context_size: int)
  • MultiPPMModel(symbols: list, models: int)
  • BinaryPPMM(context_size: int)
  • MultiBinaryPPMM(models: int)

PPM (Prediction by Partial Matching) models are a type of context modeling that uses a set of previous symbols to predict the probability of the next symbol. Here's an example of how to use the PPM models included in the library:

from arithmetic_compressor import AECompressor
from arithmetic_compressor.models import\
   PPMModel,\
   MultiPPM

# create the model
model = PPMModel(['A', 'B', 'C'], k = 3) # no need to pass in probabilities, only symbols

# create an arithmetic coder
coder = AECompressor(model)

# encode some data
data = "AAAAAABBBCCC"
compressed = coder.compress(data)

# print the compressed data
print(compressed) # => [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1]

The MultiPPM model uses weighted averaging to combine predictions from several PPM models to make a prediction, gives better compression when the input is large.

# create the model
model = MultiPPM(['A', 'B', 'C'], models = 4) # will combine PPM models with context sizes of 0 to 4

# create an arithmetic coder
coder = AECompressor(model)

# encode some data
data = "AAAAAABBBCCC"
compressed = coder.compress(data)

# print the compressed data
print(compressed) # => [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1]

Binary version

The Binary PPM models BinaryPPM and MultiBinaryPPM behave just like normal PPM models demonstrated above, except that they only work for binary symbols 0 and 1.

from arithmetic_compressor import AECompressor
from arithmetic_compressor.models import\
   BinaryPPM,\
   MultiBinaryPPM

# create the model
model = BinaryPPM(k = 3)

# create an arithmetic coder
coder = AECompressor(model)

# encode some data
data = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1]
compressed = coder.compress(data)

# print the compressed data
print(compressed) # => [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1]

Likewise the MultiBinaryPPM will combine several Binary PPM models to make prediction using weighted averaging.

Context Mixing models

  • ContextMix_Linear(models: List)
  • ContextMix_Logistic(learnung_rate: float)

Context mixing is a type of data compression algorithm in which the next-symbol predictions of two or more statistical models are combined to yield a prediction that is often more accurate than any of the individual predictions.

Two general approaches have been used, linear and logistic mixing. Linear mixing uses a weighted average of the predictions weighted by evidence. While the logistic (or neural network) mixing first transforms the predictions into the logistic domain, log(p/(1-p)) before averaging.

The library contains a minimal implementation of the algorithm, only the core algorithm is implemented, it doesn't include as many contexts / models as in PAQ.

Note: They only work for binary symbols (0 and 1).

Linear Mixing

https://en.wikipedia.org/wiki/Context_mixing#Linear_Mixing

The mixer computes a probability by a weighted summation of the N models.

from arithmetic_compressor import AECompressor
from arithmetic_compressor.models import ContextMix_Linear

# create the model
model = ContextMix_Linear()

# create an arithmetic coder
coder = AECompressor(model)

# encode some data
data = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
compressed = coder.compress(data)

# print the compressed data
print(compressed) # => [0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1]

The Linear Mixing model lets you combine other models:

from arithmetic_compressor import AECompressor
from arithmetic_compressor.models import ContextMix_Linear,\
   SimpleAdaptiveModel,\
   PPMModel,\
   BaseFrequencyTable

# create the model
model = ContextMix_Linear([
  SimpleAdaptiveModel({0: 0.5, 1: 0.5}),
  BaseFrequencyTable({0: 0.5, 1: 0.5}),
  PPMModel([0, 1], k = 10)
])

# create an arithmetic coder
coder = AECompressor(model)

# encode some data
data = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
compressed = coder.compress(data)

# print the compressed data
print(compressed) # => [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Logistic (Neural Network) Mixing

https://en.wikipedia.org/wiki/PAQ#Neural-network_mixing

A neural network is used to combine models.

from arithmetic_compressor import AECompressor
from arithmetic_compressor.models import ContextMix_Logistic

# create the model
model = ContextMix_Logistic()

# create an arithmetic coder
coder = AECompressor(model)

# encode some data
data = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
compressed = coder.compress(data)

# print the compressed data
print(compressed) # => [0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1]

Note: This library is intended for learning and educational purposes only. The implementation may not be optimized for performance or memory usage and may not be suitable for use in production environments. Please consider the performance and security issues before using it in production. Please also note that you should thoroughly test the library and its models with real-world data and use cases before deploying it in production.

More Examples

You can find more detailed examples in the /examples folder in the repository. These examples demonstrate the capabilities of the library and show how to use the different models.

Contribution

Contributions are very much welcome to the library. If you have an idea for a new feature or have found a bug, please submit an issue or a pull request.

License

This library is distributed under the MIT License. See the LICENSE file for more information.

arithmetic-compressor's People

Contributors

kodejuice avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

arithmetic-compressor's Issues

AssertionError: Low or high out of range

I'm trying to use this module on enwik5 data (10 000 bytes). But I encounter this error:

AssertionError: Low or high out of range

Are there any additional limitations in the implementation? Or do I do something wrong?

The script below works ok with enwik4 data (1 000 bytes).

I count statistics myself and then use StaticModel, but I encounter either this Low or high out of range error, or ValueError: Symbol has zero frequency error.

enwik5.zip

fn = 'enwik5'

print(fn)

def read_bytes(path):
    with open(path, 'rb') as f:
        return list(f.read())

data = read_bytes(fn)
nsyms = 256
stats = [0] * nsyms
for c in data:
    stats[c] += 1

from arithmetic_compressor import AECompressor

from arithmetic_compressor.models.base_adaptive_model import BaseFrequencyTable
from arithmetic_compressor.util import *

SCALE_FACTOR = 4096

class StaticModel:
  """A static model, which does not adapt to input data or statistics."""

  def __init__(self, counts_dict):
    #vals = (v for k, v in counts_dict.items())
    #counts_sum = sum(vals)
    #probability = {k: v / counts_sum for k, v in counts_dict.items()}
    #print(probability)
    probability = counts_dict

    symbols = list(probability.keys())

    self.name = "Static"
    self.symbols = symbols
    self.__prob = dict(probability)

    # compute cdf from given probability
    cdf = {}
    prev_freq = 0
    self.freq = freq = {sym: round(SCALE_FACTOR * prob)
                        for sym, prob in probability.items()}
    for sym, freq in freq.items():
      cdf[sym] = Range(prev_freq, prev_freq + freq)
      prev_freq += freq
    self.cdf_object = cdf

  def cdf(self):
    return self.cdf_object

  def probability(self):
    return self.__prob

  def predict(self, symbol):
    assert symbol in self.symbols
    return self.probability()[symbol]

  def update(self, symbol):
    pass

  def test_model(self, gen_random=True, N=10000, custom_data=None):
    self.name = "Static Model"
    return BaseFrequencyTable.test_model(self, gen_random, N, custom_data)

freq_map = {
    sym: freq for sym, freq in enumerate(stats)
    if freq > 0
}

model = StaticModel(freq_map)
coder = AECompressor(model)

N = len(data)
compressed = coder.compress(data)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.