Giter Club home page Giter Club logo

intcomp's People

Contributors

klauspost avatar pascaldekloe avatar ronanh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

intcomp's Issues

Function signatures

Encoding as uint32 or uint64 slices is a bit cumbersome. We can use byte slices instead without any loss of performance. Go even optimizes the binary.ByteOrder into native operations.

Decoding needs an error return. The Uncompress functions are panic bombs now. This is bad for performance, e.g, pull request #13.

The output destination should always be the first argument, like a C copy or fprintf, and like a Go io.Copy or fmt.Fprintf. Append functions are better of by naming their behaviour explicitly, e.g., strconv.AppendInt, utf8.AppendRune or time.AppendFormat.

DeltaBin and DeltaVar describe the internal mechanism, which is not useful to the API reader/user.

The trailing-zero optimization is just as useful on signed and unsigned integers.

I propose the following API.

const BlockSize32, BlockSize64 = 128, 256
func AppendEnc64[T uint64|int64](dst []byte, values ...T) []byte
func AppendBlockEnc64[T uint64|int64](dst []byte, values ...T) (out []byte, remain []T)
func AppendDec64[T uint64|int64](dst []T, src []byte) (out []T, err error)
func AppendBlockDec64[T uint64|int64](dst []T, src []byte) (out []T, remain []byte, err error)

Seekable streams

Force creation of blocks at fixed boundaries to enable arbitrary position decoding and reversed iteration

The idea of fixed offsets doesn't sound like a great solution to seeking.

The easiest is to limit the "frames" (or whatever you call the collection of blocks) you generate. This will allow you to skip forward just by reading the first byte and skipping that number of bytes.

For sorted seeking, however you would need to add an optional min/max value. You only really need 1 bit to signify a min/max value follows the size. So if you limit the size of frames (for 32 bits), you have plenty of bits left to add an indicator that a min/max value follows.

Say your frames are max 1<<20 entries each. This means that you have 4MB of uint32 data. Considering the decompression speed you don't really need more fine grained seeking. The frame size can easily be configurable on the compression side.

This will as a side effect also allow you to compress and decompress these frames concurrently.

If you want reverse seeking you need an additional block at the end with the frame size, so you can skip backwards.

You can of course also have the index separately, where you for each frame record uncompressed length, compressed length and min/max. This would be similar to the index I made for S2 - except for the min/max part.

A separate index of course means that you don't have to modify anything, except make frames smaller.

Appending to compressed buffer should not mutate the buffer

When small blocks are added to a compressed buffer, the end of the compressed buffer is rewritten to make a bigger block with better compression.

This behaviour is not transparent to the user and easily leads to panics when one needs to read and add to compressed buffers concurrently.

A better approach would be to recompress small blocks when the output buffer needs to grow.

Consider optimizing ntz=0

A lot of the packing/unpacking code has variable shifts for ntz. These are relatively expensive on Intel.

However, it is quite likely that ntz==0, so I suggest you generate a version of your code that omits these.

Fast uncompressed block

When you have an incompressible block that yields the same or more than BitPackingBlockSizeXX+1, you should have a fast uncompressed mode.

You have plenty of unused space in your blocks headers:

  • bitlen gets 7 bits, uses 5 or 6.
  • ntz gets 8 bits, only uses 6.

You can either have a "magic" all ones header to indicate that the block is uncompressed or you can pack the values tighter and use one of the leftover bits.

That will make compression (after and deltaBitLenAndSign) and decompression a memcopy.

Is it possible to implement Search/Seek methods?

Is it theoretically possible?
Example:

comp := intcomp.compress([10,20,30])

// get number by value
incomp.Search(comp, 10) // 1
incomp.Search(comp, 20) // 2
incomp.Search(comp, 30) // 3

// find value >= v
incomp.Seek(comp, 8) // 10
incomp.Seek(comp, 10) // 10
incomp.Search(comp, 17) // 20
incomp.Search(comp, 21) // 30
incomp.Search(comp, 31) // -1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.