Giter Club home page Giter Club logo

compressjs's Introduction

compressjs

NPM

Build Status dependency status dev dependency status

compressjs contains fast pure-JavaScript implementations of various de/compression algorithms, including bzip2, Charles Bloom's LZP3, a modified LZJB, PPM-D, and an implementation of Dynamic Markov Compression. compressjs is written by C. Scott Ananian. The Range Coder used is a JavaScript port of Michael Schindler's C range coder. Bits also also borrowed from Yuta Mori's SAIS implementation; Eli Skeggs, Kevin Kwok, Rob Landley, James Taylor, and Matthew Francis for Bzip2 compression and decompression code. "Bear" wrote the original JavaScript LZJB; the version here is based on the node lzjb module.

Compression benchmarks

Here are some representative speeds and sizes for the various algorithms implemented in this package. Times are with node 0.8.22 on my laptop, but they should be valid for inter-algorithm comparisons.

test/sample5.ref

This is the Taoism article from the Simple English wikipedia, in HTML format as generated by the Wikipedia Parsoid project.

Type Level Size (bytes) Compress time (s) Decompress time (s)
bwtc 9 272997 13.10 1.85
bzip2 9 275087 22.57 1.21
lzp3 - 292978 1.73 1.74
ppm - 297220 42.05 44.04
bzip2 1 341615 22.63 1.40
bwtc 1 345764 12.34 0.80
dmc - 434182 6.97 9.00
lzjbr 9 491476 3.19 1.92
lzjbr 1 523780 2.76 2.02
lzjb 9 706210 1.02 0.30
lzjb 1 758467 0.66 0.29
context1 - 939098 5.20 4.69
fenwick - 1440645 3.06 3.72
mtf - 1441763 1.92 3.86
huffman - 1452055 7.15 6.56
simple - 1479143 0.72 2.42
defsum - 1491107 3.19 1.46
no - 2130648 0.80 0.92
- - 2130640 - -

enwik8

This test data is the first 108 bytes of the English Wikipedia XML dump on March 3, 2006. This is the data set used for the Large Text Compression Benchmark. It can be downloaded from that site.

Type Level Size (bytes) Compress time (s) Decompress time (s)
ppm - 26560169 2615.82 2279.17
bzip2 9 28995650 1068.51 66.95
bwtc 9 29403626 618.63 112.00
bzip2 1 33525893 1035.29 66.98
lzp3 - 34305420 123.69 167.77
bwtc 1 34533422 618.61 43.52
lzjbr 9 43594841 242.60 141.51
lzjbr 1 44879071 207.38 147.14
context1 - 48480225 253.48 223.30
huffman - 62702157 301.50 267.31
fenwick - 62024449 143.49 164.15
mtf - 62090746 83.62 168.03
simple - 63463479 27.79 92.84
defsum - 64197615 75.48 32.05
lzjb 9 64992459 63.75 5.90
lzjb 1 67828511 29.26 5.89
no - 100000008 26.29 31.98
- - 100000000 - -

Algorithm descriptions

  • compressjs.Bzip2 (-t bzip2) is the bzip2 algorithm we all have come to know and love. It has a block size between 100k and 900k.
  • compressjs.BWTC (-t bwtc) is substantially the same, but with a few simplifications/improvements which make it faster, smaller, and not binary-compatible. In particular, the unnecessary initial RLE step of bzip2 is omitted, and we use a range coder with an adaptive context-0 model after the MTF/RLE2 step, instead of the static huffman codes of bzip2.
  • compressjs.PPM (-t ppm) is a naive/simple implementation of the PPMD algorithm with a 256k sliding window.
  • compressjs.Lzp3 (-t lzp3) is an algorithm similar to Charles Bloom's LZP3 algorithm. It uses a 1M sliding window, a context-4 model, and a range coder.
  • compressjs.Dmc (-t dmc) is a partial implementation of Dynamic Markov Compression. Unlike most DMC implementations, our implementation is bytewise (not bitwise). There is currently no provision for shrinking the Markov model (or throwing it out when it grows too large), so be careful with large inputs! I may return to twiddle with this some more; see the source for details.
  • compressjs.Lzjb (-t lzjb) is a straight copy of the fast LZJB algorithm from https://github.com/cscott/lzjb.
  • compressjs.LzjbR (-t lzjbr) is a hacked version of LZJB which uses a range coder and a bit of modeling instead of the fixed 9-bit literal / 17-bit match format of the original.

The remaining algorithms are self-tests for various bits of compression code, not real compressors. Context1Model is a simple adaptive context-1 model using a range coder. Huffman is an adaptive Huffman coder using Vitter's algorithm. MTFModel, FenwickModel, and DefSumModel are simple adaptive context-0 models with escapes, implementing using a move-to-front list, a Fenwick tree, and Charles Bloom's deferred summation algorithm, respectively. Simple is a static context-0 model for the range coder. NoModel encodes the input bits directly; it shows the basic I/O overhead, as well as the few bytes of overhead due to the file magic and a variable-length encoding of the uncompressed size of the file.

How to install

npm install compressjs

or

volo add cscott/compressjs

This package uses Typed Arrays if available, which are present in node.js >= 0.5.5 and many modern browsers. Full browser compatibility table is available at caniuse.com; briefly: IE 10, Firefox 4, Chrome 7, or Safari 5.1.

Testing

npm install
npm test

Usage

There is a binary available in bin:

$ bin/compressjs --help
$ echo "Test me" | bin/compressjs -t lzp3 -z > test.lzp3
$ bin/compressjs -t lzp3 -d test.lzp3
Test me

The -t argument can take a number of different strings to specify the various compression algorithms available. Use --help to see the various options.

From JavaScript:

var compressjs = require('compressjs');
var algorithm = compressjs.Lzp3;
var data = new Buffer('Example data', 'utf8');
var compressed = algorithm.compressFile(data);
var decompressed = algorithm.decompressFile(compressed);
// convert from array back to string
var data2 = new Buffer(decompressed).toString('utf8');
console.log(data2);

There is a streaming interface as well. Use Uint8Array or normal JavaScript arrays when running in a browser.

See the tests in the tests/ directory for further usage examples.

Documentation

require('compressjs') returns a compressjs object. Its fields correspond to the various algorithms implemented, which export one of two different interfaces, depending on whether it is a "compression method" or a "model/coder".

Compression Methods

Compression methods (like compressjs.Lzp3) export two methods. The first is a function accepting one, two or three parameters:

cmp.compressFile = function(input, [output], [Number compressionLevel] or [props])

The input argument can be a "stream" object (which must implement the readByte method), or a Uint8Array, Buffer, or array.

If you omit the second argument, compressFile will return a JavaScript array containing the byte values of the compressed data. If you pass a second argument, it must be a "stream" object (which must implement the writeByte method).

The third argument may be omitted, or a number between 1 and 9 indicating a compression level (1 being largest/fastest compression and 9 being smallest/slowest compression). Some algorithms also permit passing an object for finer-grained control of various compression properties.

The second exported method is a function accepting one or two parameters:

cmp.decompressFile = function(input, [output])

The input parameter is as above.

If you omit the second argument, decompressFile will return a Uint8Array, Buffer or JavaScript array with the decompressed data, depending on what your platform supports. For most modern platforms (modern browsers, recent node.js releases) the returned value will be a Uint8Array.

If you provide the second argument, it must be a "stream", implementing the writeByte method.

Models and coders

The second type of object implemented is a model/coder. Huffman and RangeCoder share the same interface as the simple context-0 probability models MTFModel, FenwickModel, LogDistanceModel, and DeflateDistanceModel.

model.factory = function(parameters)

This method returns a function which can be invoked with a size argument to create a new instance of this model with the given parameters (which usually include the input/output stream or coder).

model.encode = function(symbol, [optional context])

This method encodes the given symbol, possibly with the given additional context, and then updates the model or adaptive coder if necessary. The symbol is usually in the range [0, size), although some models allow adding "extra symbols" to the possible range, which are usually given negative values. For example, you might want to create a LogDistanceModel with one extra state to encode "same distance as the last one encoded".

model.decode = function([optional context])

Decode the next symbol and updates the model or adaptive coder. The values returned are usually in the range [0, size] although negative numbers may be returned if you requested "extra symbols" when you created the model.

Related articles and projects

Other JavaScript compressors

License (GPLv2)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

compressjs's People

Contributors

cscott avatar trevorah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

compressjs's Issues

BZip2 DecompressFile: Trace: TypeError: Cannot use 'in' operator to search for 'readByte' in <file>

I'm not sure why I'm getting this error. I can upload the file somewhere if you would like, it's 8MB

node -v v8.9.4
node_modules/.bin/compressjs -V 0.0.1
Real version: 1.0.3

Here is the error:

Trace: TypeError: Cannot use 'in' operator to search for 'readByte' in cache/complete.json.bz2
    at Object.Util.coerceInputStream (/home/drazisil/github/minepack/node_modules/compressjs/lib/Util.js:10:26)
    at Object.Bunzip.decode [as decompressFile] (/home/drazisil/github/minepack/node_modules/compressjs/lib/Bzip2.js:456:26)
    at unbzip2Modlist (/home/drazisil/github/minepack/main.js:107:34)
    at /home/drazisil/github/minepack/main.js:132:11
    at tryCatcher (/home/drazisil/github/minepack/node_modules/bluebird/js/release/util.js:16:23)

Here is the code:

  var algorithm = compressjs.Bzip2;

  try {
    var decompressed = algorithm.decompressFile(compressedModlistPath);
  } catch (error) {
}

Bzip2: Possible oversight?

Hi there,

I'm not sure if it was an oversight or not, but the documentation is misleading. I cannot use Bzip2 to compress anything but a stream. I forked your repo and did the needful. Should I submit a PR, or was the omission intentional?

Only 64K is being compressed

du -hs words.txt
6.6M words.txt
cat words.txt | compressjs -t lzp3 -z > words1.lzp3
du -hs words1.lzp3
20K words1.lzp3
cat words1.lzp3 | compressjs -t lzp3 -d > words1.txt
du -hs words1.txt
64K words1.txt

Use compressjs in a browser

Hi,

I want to use your module in a browser, but I could not find any example how to do this. Is there one?

Best,
Jochen

bzip2 CRC error: incompatibility with command line bunzip2 v 1.0.6

Using [email protected] and node v 11.4.0

compressjs.Bzip2 can produce output that the bunzip2 command line program, v 1.0.6 as distributed with Ubuntu 18.10, declares to have a CRC error.

The attached repro case compresses a large json file using Bzip2, once with blocksize 800k, and again with blocksize 900k.

bunzip2 of the compressed file of block size 800k succeeds.
bunzip2 of the compressed file of block size 900k fails.

error Error: Command failed: bunzip2 -t message.9.json.bz2
bunzip2: message.9.json.bz2: data integrity (CRC) error in data

To run the repro case, bunzip2 must be on your path. Then run

node index.js

repro.zip

Very slow to load in nodejs

In a PC

Windows 10 64 bits
Nodejs 4.x
Procesador AMD E1 Dual Core
RAM 4GB

Take from 2 to 4 seconds require('compressjs')

handling streams

Hi,

Does it support streams? if so, could u pls provide an example

I mean i would like to read from file and compress it and write it to another stream/file.

Help needed for contribution

Hi, I have created a new project arithmetic-coding and finished the implementation. But now the biggest problem is to read/write to files, the speed is so slow. I'm going to find another method for reading/writing files.

As shown in travis build log, my algorithm takes 1357ms to encode and decode a 61,357B file.

And so I find your project! I have a few questions now:

  1. Is it okay for me to add my arithmetic coding algorithm to your project?
  2. Also, should I reuse https://github.com/cscott/compressjs/blob/master/lib/Stream.js
  3. Should I keep it a single project or integrate to this project directly?

Thanks for your patience in advance!

LZ4

Would you consider adding LZ4? I haven't been able to find a working JS-only implementation that does not corrupt data.

Excellent timing...quick question

This might be naive but since the docs don't explicitly state I'll ask.

I'm looking to zip up a folder's worth of CSS and JS files and stream the resulting file to the user for download. Nowhere on the readme does it specifically say that you support standard ZIP. Is this just a misunderstanding on my part or is that actually the case?

AMD module definition breaks Webpack

This isn't really an issue with the code, but it does hurt integration with Webpack. Webpack uses their own custom "version" of AMD, which injects a pseudo-object that takes over from AMD.

Since the fake AMD is not an function, this line fails:

if (typeof define !== 'function') { var define = require('amdefine')(module); }

Simply taking out this line on every single file will allow Webpack to handle this module. I already have a copy of the module with that line taken out from every file, let me know if you would be interested in merging my PR.

Problem with compress.js Bzip2 compress

Hello,

When I try to decompress data compressed with compressjs using bunzip2 I got an error. However I can decompress my data using compressjs.

Environment:

  • Mac OS 10.9.1
  • Node.js v0.10.25
  • Compress.js 1.0.1
>>> compressjs -t bzip2 data > data.bz2
>>> bunzip2 data.bz2

bunzip2: Data integrity error when decompressing.
        Input file = data.dmg.bz2, output file = data.dmg

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

bunzip2: Deleting output file data.dmg, if it exists.

>>> bunzip2 -tvv data.bz2
  data.dmg.bz2: 
    [1: huff+mtf rt+rld]
    [2: huff+mtf rt+rld]
    [3: huff+mtf rt+rlddata integrity (CRC) error in data

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

no need code in DMC

line 109: this.nodes.push(newNode);
I think that nodes is not used. Is it only for PRINT_STATS?
If so, You should use nodes counter instead of nodes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.