Giter Club home page Giter Club logo

Comments (6)

wrobstory avatar wrobstory commented on May 11, 2024

A question: how constrained can Chest's API be in terms of types?

I ask because I think it would be interesting/fun/useful to implement either Thrift or Protobuf (or maybe we could try both and benchmark them) as a backend for Chest and be able to avoid pickle, because it's pickle.

Another thought: If we only need mapping between keys and arrays, we might be able to create a custom serialization format around bcolz, and then we get all the benefits that entails.

Thoughts? This is a problem I'm interested in hacking on.

from dask.

mrocklin avatar mrocklin commented on May 11, 2024

One could parametrize chest with strict key and value types and do some type checking. Not sure if this should live outside or inside core Chest class.

Chest currently can be parametrized by dump/load functions so swapping in other protocols can be made to work that way.

One reason that I actually like pickle is that it handles numpy and pandas very nicely (as long as you use pickle.dump(..., protocol=2)). I'm pretty confident that this does almost no manipulation of the actual data bytes (though I'm sure that the metadata is entirely mangled).

Using BColz (or compression generally) also makes sense. One thing to beware of is that bcolz often trades CPU for Memory bandwidth. In the context of dask we may not have excess CPU to spare and minimally compressed may be better than in a normal, single-core case.

from dask.

mrocklin avatar mrocklin commented on May 11, 2024

@wrobstory if you're at all interested in any of this I'd be thrilled.

I think that efficient spill-to-disk data structures is a potentially very useful and relatively untapped development space. I'd love to see someone revamp chest or even reinvent something newer and better (chest was a quick-and-dirty evening project).

from dask.

mrocklin avatar mrocklin commented on May 11, 2024

I've added a naive LRU solution for chest in mrocklin/chest@24210a5

from dask.

mrocklin avatar mrocklin commented on May 11, 2024

And significantly reduced write costs by being less dumb in mrocklin/chest@ba15e5e

from dask.

mrocklin avatar mrocklin commented on May 11, 2024

And it's now threadsafe (maybe). I'm going to close this for now. Chest meets immediate requirements for use with numpy+dask things. Blogpost forthcoming.

from dask.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.