Giter Club home page Giter Club logo

Comments (8)

GoogleCodeExporter avatar GoogleCodeExporter commented on July 21, 2024
This is really inherent in the format; you couldn't change this without also 
breaking existing decompressors.

In any case, it's sort of meaningless compressing such large chunks, giving 
that the format internally breaks all data up in 64 kB parts that are processed 
individually anyway. In other words, it will buy you nothing in increased 
compressibility.

Original comment by [email protected] on 14 May 2013 at 9:21

from snappy.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 21, 2024
The point is that this bug is undocumented and hidden from the user. One of two 
resolutions seem reasonable to me:
1. Change all the apis from size_t to uint32 to avoid false advertising
2. Add automatic 4GB chunking to your wrapper layer

Thanks,
-Dan

Original comment by [email protected] on 15 May 2013 at 1:38

from snappy.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 21, 2024
#2 is out of the question, really, since that would break the format (the 
length is stored only once). We could probably change the types in the APIs, 
but that would mean ABI breakage, which is also bad.

Original comment by [email protected] on 15 May 2013 at 1:42

from snappy.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 21, 2024
Thanks. For a future version I would suggest expanding the format as follows: 
check the leading uint32: if it is 0 then assume it's the new format followed 
by a 64 bit size and otherwise fall back to the existing 32 bit algo. That 
would salvage all existing stored data and allow you to extend the format. I 
would also add an extra version number into the header so that you can change 
the format in the future without breaking backwards compatibility.

Original comment by [email protected] on 15 May 2013 at 1:54

from snappy.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 21, 2024
We're really not going to break the format for something people shouldn't 
really do anyway, sorry -- if you have that much data, stream it somehow 
instead of doing one Compress() call. The format has been stable for over eight 
years now, and forwards- and backwards-compatibility is an important feature.

I could probably add a comment saying that there's a hard limit of 4GB, though?

Original comment by [email protected] on 15 May 2013 at 1:57

from snappy.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 21, 2024
I would argue that 8 years ago there was not such a proliferation of server 
hardware with tens or hundreds of GB of RAM. Compressing an array greater than 
4GB in memory today is commonplace, which is why I was so stunned by such a 
"year 2000" bug. But anyway, we get the message. No change is sight. Will 
pursue other alternatives. Thanks for your quick responses.

Original comment by [email protected] on 15 May 2013 at 2:20

from snappy.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 21, 2024
I suppose that having a 32-bit limitation is because it would be a waste of 
space to use 64-bit pointers internally for the relatively small number of 
cases that are going to use it.  For a compression library this is generally 
unacceptable.

If you want to compress large datasets exceeding 4 GB the normal thing is to 
split them in chunks, compress them separately, and then decompress them and do 
the join by hand.  It is more work, but the compression ratios will benefit 
quite a lot. 

Original comment by [email protected] on 20 May 2013 at 5:22

from snappy.

GoogleCodeExporter avatar GoogleCodeExporter commented on July 21, 2024
faltet: It's not actually about that; we already use 16-bit pointers and not 
32-bit for the compression itself (we split into 64 kB chunks that are 
compressed individually), so the encoder could handle it with minimal extra 
work, but since the decoder happened not to support it at some point, we don't 
want to introduce backwards incompatibility for a very small gain.

In any case, it's pretty clear that this is not going to change, so I'm closing 
the bug.

Original comment by [email protected] on 21 May 2013 at 9:34

  • Changed state: WontFix

from snappy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.