Giter Club home page Giter Club logo

Comments (6)

cowtowncoder avatar cowtowncoder commented on July 21, 2024

Interesting. Yes, this sounds like a very interesting idea. Thank you for suggesting it. I had actually seen a reference to this, but had forgotten to read more.

Btw, I was unable to use Unsafe tricks for single-threaded compression so far -- I tried, but in the end failed. :-)

Another sort of related thing that I have been toying with (the idea, that is, not implementation) is ability to define "blindly splittable" format. Meaning that one would be able to find a block boundary given an arbitrary point in file. This is theoretically possible, with some caveats; for one, compressor must guarantee that specific byte sequence never occurs (easiest to guarantee for sequence that would always be compressed, like string of 4 or more instances of same byte), and for another, that all blocks use compressor (otherwise "raw" data could have such sequence).
But that warrants another issue obviously.

from compress.

whoschek avatar whoschek commented on July 21, 2024

Interesting, I once wrote a byte stuffing codec that might be useable for such a "blindly splittable" format. I still haven't figured out how to attach code to this issue tracking system. I can email you the two small (BSD licensed) java files, if you're interested.

/**

  • Encoder/Decoder implementing Consistent Overhead Byte Stuffing (COBS) for
  • efficient, reliable, unambigous packet framing regardless of packet content,
  • making it is easy for applications to recover from malformed packet payloads.
  • For details, see the <a
  • href="http://www.stuartcheshire.org/papers/COBSforToN.pdf">paper . In
  • case the link is broken, get it from the <a
  • href="http://www.stuartcheshire.org">paper's author .
  • Quoting from the paper: "When packet data is sent over any serial medium, a
  • protocol is needed by which to demarcate packet boundaries. This is done by
  • using a special bit-sequence or character value to indicate where the
  • boundaries between packets fall. Data stuffing is the process that transforms
  • the packet data before transmission to eliminate any accidental occurrences
  • of that special framing marker, so that when the receiver detects the marker,
  • it knows, without any ambiguity, that it does indeed indicate a boundary
  • between packets.
  • COBS takes an input consisting of bytes in the range [0,255] and produces an
  • output consisting of bytes only in the range [1,255]. Having eliminated all
  • zero bytes from the data, a zero byte can now be used unambiguously to mark
  • boundaries between packets.
  • This allows the receiver to synchronize reliably with the beginning of the
  • next packet, even after an error. It also allows new listeners to join a
  • broadcast stream at any time and without failing to receive and decode the
  • very next error free packet.
  • With COBS all packets up to 254 bytes in length are encoded with an overhead
  • of exactly one byte. For packets over 254 bytes in length the overhead is at
  • most one byte for every 254 bytes of packet data. The maximum overhead is
  • therefore roughly 0.4% of the packet size, rounded up to a whole number of
  • bytes. COBS encoding has low overhead (on average 0.23% of the packet size,
  • rounded up to a whole number of bytes) and furthermore, for packets of any
  • given length, the amount of overhead is virtually constant, regardless of the
  • packet contents."
  • This class implements the original COBS algorithm, not the COBS/ZPE variant.
  • There holds: decode(encode(src)) = src.
  • Performance Note: The JDK 1.5 server VM runs decode(encode(src))
  • at about 125 MB/s throughput on a commodity PC (2 GHz Pentium 4). Encoding is
  • the bottleneck, decoding is extremely cheap. Obviously, this is way more
  • efficient than Base64 encoding or similar application level byte stuffing
  • mechanisms.
  • @author [email protected]
  • @author $Author: hoschek3 $
  • @Version $Revision: 1.4 $, $Date: 2005/06/09 22:44:05 $
    */

from compress.

cowtowncoder avatar cowtowncoder commented on July 21, 2024

Cool thanks. I'll have a look. For what it's worth, it looks like Snappy format (alas!) might actually work with simple sequence of 4 zero bytes... for LZF, a change or two might be needed. But it too does have couple of unused bytes for the first byte of each sequence.

from compress.

javabean avatar javabean commented on July 21, 2024

Hi all,

I may have a go at writing a parallel version of LZF if no-one started working on one yet.
If street creds are required, I have implemented a parallel GZip compressor in Java (similar to pigz); hope this is enough! :-) (Ping me for the URL, I am not writing here for advertisement.)
Tatu, would you be interested in such a contribution?

from compress.

cowtowncoder avatar cowtowncoder commented on July 21, 2024

I would be absolutely thrilled to get such a contribution! Please let me know if you need help with block-level handling or such. And obviously you can add accessors if/as necessary.

I would also be interested in link to the project if that's ok; maybe tweet to 'cowtowncoder'? I am ok with adding link in this issue as well unless you don't want to.

Finally: this package has small gzip wrapper, so if you have improvements to that, those would be welcome.
But it's mostly just added for my own use (I handle smallish payloads with gzip, larger with lzf).

from compress.

cowtowncoder avatar cowtowncoder commented on July 21, 2024

Will be in 0.9.9.

from compress.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.