Giter Club home page Giter Club logo

Comments (8)

daviddias avatar daviddias commented on May 20, 2024

0.4.0 should respect whatever is defined on the spec and if the spec is missing something, now is a good time to add it, so that we can all trust the spec to create our implementations :) @whyrusleeping might have some thoughts though :)

from specs.

greglook avatar greglook commented on May 20, 2024

In the Clojure blocks library the file-store implementation follows the spec, with three subdirectory levels. I'm definitely interested to hear if there are any changes I should be making to stay in sync.

from specs.

masylum avatar masylum commented on May 20, 2024

oh! that's relevant to my interests @greglook.

from specs.

jbenet avatar jbenet commented on May 20, 2024

flatfs in go-ipfs ended up deviating from the spec. its my fault for not pressing for it -- or not updating the spec.

i think we can migrate go-ipfs to follow the 3 tiers. (ls-ing huge directories is annoying anyway. not all filesystems are good)

from specs.

jbenet avatar jbenet commented on May 20, 2024

@tv42 was there a strong reason you opted for single tiering instead of a larger fanout? i recall you mentioning it might be slower (more dirs to traverse?), but this likely varies by fs?

from specs.

greglook avatar greglook commented on May 20, 2024

For filesystem-based stores with good performance you'll probably want to use something like Camlistore's diskpacked store (or the logical version, blobpacked) anyway.

from specs.

tv42 avatar tv42 commented on May 20, 2024

@jbenet It really comes down to this: single level split is easier to understand and easier to program.

Typical Linux modern FS performs well all the way up to hundreds of thousands or even a few million files in one dir, and the current setting of 4-byte fanout (actual amount of entropy depending on whether the slash prefix actually gets stripped or not, that's been changed by others enough that I'm no longer clear on that) oughta work well enough up to tens of terabytes that something else becomes a problem first. I made the split size configurable, so people can fiddle with that, if needed.

https://github.com/ipfs/go-ipfs/blob/b9e8c001cf58b59eadea48cdb691c48924d44355/repo/fsrepo/fsrepo.go#L356-L364

Multi-level split only makes sense if the one-level split would result in the top level dir containing too many entries (once again, I'd expect >> 100k). By that time I expect you'll have other problems; then each directory ought to contain at least 100k items in it, leading to the total storage being easily over 100k * 100k * 256kB = 2PB.

Personally, I've seen affordable large storage only work well in JBOD mode, so the amount of data in a single flatfs is likely in the low tens of TB anyway for the near future. That's < 100M objects tops at 256kB per object, even a 256-way split brings that to ~400k objects per dir, which I expect to show no significant deterioration in performance (= even that split is wide enough). You'll suffer more from things like FS inode record keeping overhead than from the fact that it's a single level sharding.

@greglook That's yet another variation of what I've been calling arena storage. The decision to go with flatfs was because of the combo 1) it's simple 2) we can get it going fast. I agree arena storage can smoke it in performance, mostly because done right, it can manage disk syncs better.

from specs.

tv42 avatar tv42 commented on May 20, 2024

As for the spec:

  1. "N tiers", nothing specifies N as far as I can see
  2. If N is a configured value (basically like the current flatfs prefixLen, but a/b/c/ instead of abc/, cannot change while there are objects stored), the current single-tier setup with a good prefixLen is just as good until the top-level dir gets too crowded. By which time, I want to take a photo of your hard drive.
  3. If N is not a configured value but dynamic, lookups get costly as they need to probe all possible locations.
  4. If N is not a configured value but dynamic, the spec has a possibility for object and prefix dir name collision.
  5. Even if N is a configured value, for short object names the spec has a possibility for object and prefix dir name collision.

from specs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.