Giter Club home page Giter Club logo

flatbush's Introduction

Stand with Ukraine

Hi! I'm Volodymyr Agafonkin.

I'm a software engineer. I created Leaflet, the number one library for interactive web maps, and maintain 40+ other open source projects with a focus on algorithms, computational geometry and data visualization. I'm building the future of maps at Mapbox. Here are a few of my best tech articles: https://agafonkin.com

I'm a musician. I write songs, play guitar and sing in Obiymy Doschu. If you like beautiful, evocative rock music with lush string arrangements, check out our last album.

I'm a father to beautiful 9-year-old twin girls, I'm happily married and live in Kyiv, Ukraine. I love baking, photography, strength training, martial arts, reading sci-fi and fantasy, and exploring quiet parks. You can find tidbits of my life on X (Twitter), Instagram and Facebook.

flatbush's People

Contributors

0xflotus avatar jbuckmccready avatar jdesboeufs avatar kylebarron avatar luciopaiva avatar mourner avatar msfstef avatar orangejulius avatar tomchadwin avatar tyrasd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flatbush's Issues

Try STR packing instead of Hilbert

STR packing should make indexing around ~30–40% slower (more sorting to do), but theoretically make searches ~30% faster (less node overlap), which might be a great tradeoff.

resize (shrink) during finish()?

hi @mourner,

i was wondering if it is possible to resize the index before calling finish()? looking at the code, i think it's possible.

my situation is that i have 100k "objects" (stored as flat arrays of facets, e.g.: prices: [12,19,25,...], gdps: [1000,500,2300,...] ) that i need to render on an x/y chart. i don't know the coords up front as they are calculated from the facet values during rendering. most of these objects may be omitted or clustered if they are too small to render, so i don't need them in the spatial index for hit testing. i may only end up rendering 10k, but i don't know in advance without wasting mem or cpu which ones may end up in this subset. so what i'd like to do is specify a 100k maximum number of items when instantiating Flatbush, but then trim() it to the final size during calling finish().

does this sound like something that can be supported?

thanks!

document what search returns

I think it would be helpful for the documentation to explain when a search is found.

Though my testing, I found that inside, exact match, intersecting, single corner touching, and edge touching are all found, only strictly outside was not. Do you think we should add a diagram to illustrate this or just document as text?

const Flatbush = require('flatbush')

const index = new Flatbush(1)
index.add(0, 0, 10, 10)
index.finish()

// inside
console.log(index.search(1, 1, 9, 9))

// exact match
console.log(index.search(0, 0, 10, 10))

// intersecting
console.log(index.search(5, 5, 15, 15))

// single corner touching
console.log(index.search(10, 10, 20, 20))

// edge touching
console.log(index.search(0, 10, 10, 20))

// outside
console.log(index.search(20, 20, 30, 30))

Why generate/store tree nodes in bottom-up order?

As noted:

// generate nodes at each tree level, bottom-up

Why I ask is, if you utilize the flatbush structure for very large indexes and you are interested in doing a tree search remotely via network or other high-latency it will require fetching the nodes backwards from the end which is an I/O pattern not usually optimized for using prefetch/cache techniques and even more problematic for sources that cannot seek backwards efficiently.

For reference these thoughts was triggered by observations on the flatbush structure as implemented in FlatGeobuf, tracked by flatgeobuf/flatgeobuf#39.

KNN with flatbush?

Hi there,

Sorry to be jumping on this so soon in the development process, but the performance improvements over rbush make this package very exciting 😁

  1. Will it be possible to use rbush-knn with flatbush instead?
  2. Is there anything I can do to help?

Thanks for all the great work!

Investigate radix sorting for faster indexing

From @jbuckmccready:

I wonder if another sorting algorithm, e.g. a type of radix/bin sort, could be used to partially sort down to the nodeSize buckets more quickly. Since we're sorting integers and we know the bucket size ahead of time.

I did some research after finding this article, and adapted fast MSB in-place radix sort to JavaScript here, also improving it to avoid memory allocations: https://gist.github.com/mourner/a1e0e8f48e0125c541d88199ce0b6479

It seems to work ~2x faster than native sort and ~3x faster than manual quicksort, which is very promising! The tricky part is adapting this code to parallel sorting of multiple arrays like we do in Flatbush — ideally that would mean refactoring it so that there are no direct value assignments, only calls to swap(arr, i, j).

.neighbors() call results in "out of memory" error

Hi @mourner, I hope you are well!

Upgrading to the latest flatbush version from 3.3.0 produced a regression in .neighbors() when k = Infinity, resulting in an 'out of memory' error:

const Flatbush = require('[email protected]');
const bush = new Flatbush(1);
bush.add(1,2,3,4);
bush.finish();
bush.neighbors(1, 2);

It's related to the change in 6903d4b but I didn't trace the root cause — please let me know if I should look into that.

Serialization??

Hi @mourner,
Thank you for a great library and all the spatial indexing libraries. It has been great help. Is it possible to serialize & persist the index to be used later? and your other libraries spatial libraries in general?

If not implemented is it a good Idea worth if I look in to it?

Thanks

search performance improvement: remove upperBound binary search

Hello, I was looking through the Flatbush code and wondered whether there is some performance to be gained by eliminating the upperBound call in search. As you build the queue, you can track the current level of each node, or iterate over the tree level by level instead. Since you’d know the current level, end would be an index lookup into _levelBoundaries instead of the binary search over it.

I don’t know a lot about Javascript performance nuances, but it seems like that could lead to a significant search performance improvement - at least for larger trees/queries.

Mostly just curious :)

replace FlatQueue with Heapify?

hey @mourner,

this looks quite promising: https://github.com/luciopaiva/heapify

Flatbush already uses typed arrays, so there should be no new compat issues.

  • 3x faster initial build (assuming it can be used instead of a push loop), otherwise...
  • 2x faster pushes

looks like it requires knowing the queue capacity in advance, but since Flatbush is static anyhow, this should already be known?

Nested rects, get innermost neighbor first?

Hi there,

thanks for the great lib!

I'm using it to do cursor-over detection, where naturally a lot of boxes are nested within each other. Now when I retrieve the next neighbor for any given position I'd want to get the innermost box first.

E.g.:
image

Assuming the red dot is between blue-right and green-right, I'd want to get blue anywhere left of it and green anywhere right from it.

Is that possible / within scope for this lib?

Can it work with rotated rectangles?

I am finally ready to try out an implementation for rbush-knn :) and I was wondering about another issue: can flatbush made to work with rotated rectangles? I think I can write a custom collision function, but is there anything about how the rectangles are indexed (assuming indexing on the bounding box of each rectangle, then collision based on the actual shape) that would make this impossible?

Thank you!

`Flatbush.from` with `Uint8Array` input viewing external `ArrayBuffer`

👋 In kylebarron/geo-index I created a new Rust port of Flatbush and Kdbush. This is different from all previous ports I've seen in that a requirement is maintaining FFI compatibility with your reference JS implementations. This means that the Rust library (or a JS binding to it) can produce an index that Flatbush can read, and vice versa.

One interesting use case is creating a Rust-based WebAssembly library which is able to pass your Flatbush JS a zero-copy slice that Flatbush JS is able to read from. That is, WebAssembly has its own linear memory space contained in an ArrayBuffer, and I'd love to have zero-copy interoperability between a Uint8Array view exported by the Wasm module and Flatbush JS. This doesn't work today because Flatbush requires ArrayBuffer input, so a copy outside of Wasm's memory is needed.

The question here is: would you be interested in a PR that refactors to store a Uint8Array internally instead of an ArrayBuffer? I think the only changes would be to ensure the array views (like here) are created on the specific slice of the underlying buffer referenced by the Uint8Array (taking into account the view's byteOffset).

Risk of divide by zero

At

flatbush/index.js

Lines 130 to 131 in a71e9a5

const x = Math.floor(hilbertMax * ((minX + maxX) / 2 - this.minX) / width);
const y = Math.floor(hilbertMax * ((minY + maxY) / 2 - this.minY) / height);
width or height could be zero.

Error [ERR_REQUIRE_ESM]: require() of ES Module when using remix

I am trying to use this package with https://github.com/remix-run/remix, but I'm getting the error:

Error [ERR_REQUIRE_ESM]: require() of ES Module /home/vedantroy/Desktop/pdf-testing/remix-pdf/node_modules/flatbush/index.js from /home/vedantroy/Desktop/pdf-testing/remix-pdf/build/index.js not supported.
Instead change the require of /home/vedantroy/Desktop/pdf-testing/remix-pdf/node_modules/flatbush/index.js in /home/vedantroy/Desktop/pdf-testing/remix-pdf/build/index.js to a dynamic import() which is available in all CommonJS modules.
    at Object.<anonymous> (/home/vedantroy/Desktop/pdf-testing/remix-pdf/build/index.js:868:31)

From reading this error, it seems like "index.js" is not transpiled (i.e it's an ES module and not UMD), but flatbush.js inside of node_modules is transpiled.

I guess the fix would be to have index.js be UMD format. Although, I assume there's a reason you haven't done this.

maxDistance units when using wgs84 data

Hi, i cannot find the maxDistance unit from neighbors method in readme.

what is the value i need to set if i want neighbors in max 100 meters from x,y using wgs84 bboxs?

thank you

simple existence check

hi @mourner — i think there may be some utility in having a function similar to search but is more efficient for the use case of when we want to simply check if at least 1 result exists such that the first value found in the search that passes the filter function makes it return true. this might only really be useful if the filterFn we're sending does somewhat heavy computation, and i guess it might just make more sense to accept the results and filter outside the hash. i don't know how the search works internally, but if there's any optimizations for an exists function, that'd be awesome!

Consider implementing search as a generator?

I have a collection of features in a map. When the user clicks somewhere on the map, I need to be able to determine if a feature I care about is within n distance from the coordinate they clicked. So I create a bounding box around the coordinate and then run search.

However, currently search returns an array of features within the bounding box I've provided (similar to how rbush works. In some use cases I just need to determine if there is a feature within the extent and then exit. It may be interesting in some cases to be able to be able to yield the values in the bounding box instead so that a higher layer can abort the search early when the search would otherwise produce multiple items.

Not sure if you've played with these ideas already, I was planning to experiment with this concept with a fork of rbush.

Feel free to close if you'd not like to discuss this here. Best!

Proper way to index points

This may be a silly question, but is it correct to use degenerate bounding boxes for indexing points? This module seems otherwise spot-on, but if I'm interested in points only, is there a variant of the *bush modules that would be more appropriate?

Contain numItems, nodeSize and possibly array type in data buffer?

Currently, transferring and recreating the index is a bit cumbersome because of the need to also transfer other parameters associated with the index:

self.postMessage({
  numItems: index.numItems,
  nodeSize: index.nodeSize,
  data: index.data
}, [index.data]);
...
const index = new FlatBush(e.data.numItems, e.data.nodeSize, Int32Array, e.data.data);

What if we include numItems and nodeSize to the data buffer so they don't need to be transferred? We can also go further and include a numeric indicator of the typed array, so that we only need an array buffer and nothing else to recreate the full index:

self.postMessage(index.data, [index.data]);
...
const index = FlatBush.from(e.data);

The number of items and nodeSize will have to be Uint32Array because we can't guess the index type from the buffer alone. We would probably put them at the start of data. The type would be a number from 0 to 8.

@tyrasd what do you think, worth it?

getting back bounding box coords of found indices?

hey @mourner

i'm currently reading back bounding box coords to render a hover rect over the matched index, it looks something like below bound to mousemove. this works well enough for my-sized datasets. i assume there isn't anything more efficient than let offs = index._indices.indexOf(idx) * 4; besides wasting mem/cpu to store some kind of additional mapping between returned index and coords externally?

function searchFlat(xMin, yMin, xMax, yMax) {
  let flatIdxs = index.search(xMin, yMin, xMax ?? xMin + 1, yMax ?? yMin + 1);

  if (flatIdxs.length > 0) {
    for (let j = 0; j < flatIdxs.length; j++) {
      let idx = flatIdxs[j];

      let offs = index._indices.indexOf(idx) * 4;
      let minX = index._boxes[offs++];
      let minY = index._boxes[offs++];
      let maxX = index._boxes[offs++];
      let maxY = index._boxes[offs++];

      if (hoveredIdx !== idx) {
        hoverStyle.display = "block";

        hoverStyle.top = `${Math.round(minY / pxRatio)}px`;
        hoverStyle.left = `${Math.round(minX / pxRatio)}px`;
        hoverStyle.width = `${Math.round((maxX - minX) / pxRatio)}px`;
        hoverStyle.height = `${Math.round((maxY - minY) / pxRatio)}px`;

        hoveredIdx = idx;
      }

      break;
    }
  }
  else {
    hoverStyle.display = "none";
    hoveredIdx = null;
  }
}

Special condensed packing for point data

I wonder if it's possible to use twice less memory specifically for point data, where we know that maxX and maxY duplicate minX and minY. This could look like an option in the constructor (onlyPoints = false), and then adding special case handling for indexed leaves.

Need babel transform

When I pack it with webpack, I found that the exported file is not transformed by babel。
Uglifyjs caught an error because of es6 "class":
image
image

version: 3.1.1

Performance Improvements

Related to investigating the radix sort mentioned in #30 I found a few other ways to improve performance in my c++ usage.

In the quicksort function when the sub array is small enough (I found 8 to be a good size) switch to insertion sort, this resulted in about a 3% improvement when indexing.

// still stop sorting if within same node
if (left / NodeSize >= right / NodeSize) {
  return;
}

if (right - left < 8) {
  // faster than continuing with the quick sort
  insertionSort(values, boxes, indices, left, right);
  return;
}

When searching/querying the index I allow an existing container to be passed in to be used for both the results and the stack for traversal. This was a huge performance boost (20-30%) since I often perform the search in tight loops which would involve many allocation/deallocations for both the results and stack for traversal.

Default add() arguments to zero-width rects?

Flatbush is often used to index points, i.e., zero-width rectangles.

How do you feel about changing the signature of add to

add(minX, minY, maxX = minX, maxY = minY)

as a minor DX improvement for point indexing, so that repeating the coordinates can be avoided in this case?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.