ronanh / intcomp Goto Github PK
View Code? Open in Web Editor NEWFast integer compression library
License: Apache License 2.0
Fast integer compression library
License: Apache License 2.0
Encoding as uint32 or uint64 slices is a bit cumbersome. We can use byte slices instead without any loss of performance. Go even optimizes the binary.ByteOrder into native operations.
Decoding needs an error return. The Uncompress functions are panic bombs now. This is bad for performance, e.g, pull request #13.
The output destination should always be the first argument, like a C copy
or fprintf
, and like a Go io.Copy
or fmt.Fprintf
. Append functions are better of by naming their behaviour explicitly, e.g., strconv.AppendInt
, utf8.AppendRune
or time.AppendFormat
.
DeltaBin and DeltaVar describe the internal mechanism, which is not useful to the API reader/user.
The trailing-zero optimization is just as useful on signed and unsigned integers.
I propose the following API.
const BlockSize32, BlockSize64 = 128, 256
func AppendEnc64[T uint64|int64](dst []byte, values ...T) []byte
func AppendBlockEnc64[T uint64|int64](dst []byte, values ...T) (out []byte, remain []T)
func AppendDec64[T uint64|int64](dst []T, src []byte) (out []T, err error)
func AppendBlockDec64[T uint64|int64](dst []T, src []byte) (out []T, remain []byte, err error)
Force creation of blocks at fixed boundaries to enable arbitrary position decoding and reversed iteration
The idea of fixed offsets doesn't sound like a great solution to seeking.
The easiest is to limit the "frames" (or whatever you call the collection of blocks) you generate. This will allow you to skip forward just by reading the first byte and skipping that number of bytes.
For sorted seeking, however you would need to add an optional min/max value. You only really need 1 bit to signify a min/max value follows the size. So if you limit the size of frames (for 32 bits), you have plenty of bits left to add an indicator that a min/max value follows.
Say your frames are max 1<<20 entries each. This means that you have 4MB of uint32 data. Considering the decompression speed you don't really need more fine grained seeking. The frame size can easily be configurable on the compression side.
This will as a side effect also allow you to compress and decompress these frames concurrently.
If you want reverse seeking you need an additional block at the end with the frame size, so you can skip backwards.
You can of course also have the index separately, where you for each frame record uncompressed length, compressed length and min/max. This would be similar to the index I made for S2 - except for the min/max part.
A separate index of course means that you don't have to modify anything, except make frames smaller.
When small blocks are added to a compressed buffer, the end of the compressed buffer is rewritten to make a bigger block with better compression.
This behaviour is not transparent to the user and easily leads to panics when one needs to read and add to compressed buffers concurrently.
A better approach would be to recompress small blocks when the output buffer needs to grow.
A lot of the packing/unpacking code has variable shifts for ntz. These are relatively expensive on Intel.
However, it is quite likely that ntz==0
, so I suggest you generate a version of your code that omits these.
When you have an incompressible block that yields the same or more than BitPackingBlockSizeXX+1, you should have a fast uncompressed mode.
You have plenty of unused space in your blocks headers:
bitlen
gets 7 bits, uses 5 or 6.ntz
gets 8 bits, only uses 6.You can either have a "magic" all ones header to indicate that the block is uncompressed or you can pack the values tighter and use one of the leftover bits.
That will make compression (after and deltaBitLenAndSign
) and decompression a memcopy.
Is it theoretically possible?
Example:
comp := intcomp.compress([10,20,30])
// get number by value
incomp.Search(comp, 10) // 1
incomp.Search(comp, 20) // 2
incomp.Search(comp, 30) // 3
// find value >= v
incomp.Seek(comp, 8) // 10
incomp.Seek(comp, 10) // 10
incomp.Search(comp, 17) // 20
incomp.Search(comp, 21) // 30
incomp.Search(comp, 31) // -1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.