Comments (8)
This is really inherent in the format; you couldn't change this without also
breaking existing decompressors.
In any case, it's sort of meaningless compressing such large chunks, giving
that the format internally breaks all data up in 64 kB parts that are processed
individually anyway. In other words, it will buy you nothing in increased
compressibility.
Original comment by [email protected]
on 14 May 2013 at 9:21
from snappy.
The point is that this bug is undocumented and hidden from the user. One of two
resolutions seem reasonable to me:
1. Change all the apis from size_t to uint32 to avoid false advertising
2. Add automatic 4GB chunking to your wrapper layer
Thanks,
-Dan
Original comment by [email protected]
on 15 May 2013 at 1:38
from snappy.
#2 is out of the question, really, since that would break the format (the
length is stored only once). We could probably change the types in the APIs,
but that would mean ABI breakage, which is also bad.
Original comment by [email protected]
on 15 May 2013 at 1:42
from snappy.
Thanks. For a future version I would suggest expanding the format as follows:
check the leading uint32: if it is 0 then assume it's the new format followed
by a 64 bit size and otherwise fall back to the existing 32 bit algo. That
would salvage all existing stored data and allow you to extend the format. I
would also add an extra version number into the header so that you can change
the format in the future without breaking backwards compatibility.
Original comment by [email protected]
on 15 May 2013 at 1:54
from snappy.
We're really not going to break the format for something people shouldn't
really do anyway, sorry -- if you have that much data, stream it somehow
instead of doing one Compress() call. The format has been stable for over eight
years now, and forwards- and backwards-compatibility is an important feature.
I could probably add a comment saying that there's a hard limit of 4GB, though?
Original comment by [email protected]
on 15 May 2013 at 1:57
from snappy.
I would argue that 8 years ago there was not such a proliferation of server
hardware with tens or hundreds of GB of RAM. Compressing an array greater than
4GB in memory today is commonplace, which is why I was so stunned by such a
"year 2000" bug. But anyway, we get the message. No change is sight. Will
pursue other alternatives. Thanks for your quick responses.
Original comment by [email protected]
on 15 May 2013 at 2:20
from snappy.
I suppose that having a 32-bit limitation is because it would be a waste of
space to use 64-bit pointers internally for the relatively small number of
cases that are going to use it. For a compression library this is generally
unacceptable.
If you want to compress large datasets exceeding 4 GB the normal thing is to
split them in chunks, compress them separately, and then decompress them and do
the join by hand. It is more work, but the compression ratios will benefit
quite a lot.
Original comment by [email protected]
on 20 May 2013 at 5:22
from snappy.
faltet: It's not actually about that; we already use 16-bit pointers and not
32-bit for the compression itself (we split into 64 kB chunks that are
compressed individually), so the encoder could handle it with minimal extra
work, but since the decoder happened not to support it at some point, we don't
want to introduce backwards incompatibility for a very small gain.
In any case, it's pretty clear that this is not going to change, so I'm closing
the bug.
Original comment by [email protected]
on 21 May 2013 at 9:34
- Changed state: WontFix
from snappy.
Related Issues (20)
- Various MSVC x64 compiler size_t warnings (C4267) HOT 6
- Mistakes on the start page HOT 5
- cppcheck - Member variable is not initialized in the constructor. HOT 2
- Type 'ssize_t' not defined for MSVC builds HOT 4
- snappy needs a command line utility HOT 5
- MIsspelled in code HOT 1
- testdata/mapreduce-osd-1-pdf contains "DO NOT DISTRIBUTE" disclaimer HOT 3
- Patch for compiling Snappy with MSVC on Windows HOT 3
- Bug in IncrementalCopyFastPath HOT 6
- use ctypes.util.find_library HOT 3
- ARMv6 and unaligned access HOT 3
- Decompression issues with Snappy 1.1.2 HOT 3
- ahsan ullah HOT 2
- Seeing Null Values from Hive with Snappy Compression HOT 1
- No versioned link for current build HOT 2
- ppc64le entry is needed in config.guess file HOT 4
- performance issue in snappy.cc - I am using version 1.1.2 HOT 2
- Unnecessary memory allocation in snappy.cc:Compress HOT 1
- bad_alloc exception not caught in snappy.cc::Compress
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from snappy.