Giter Club home page Giter Club logo

rtree's Introduction

The rtree package offers fast Euclidean within-distance checks and KNN calculations for points in 2D space. It offers significant speed-ups vis-a-vis simple implementations by relying on the R-tree data structure implemented by the Boost geometry library.

rtree was inspired by this example in the Rcpp gallery.

Installation

From CRAN

install.packages("rtree")

Development version

# install.packages("remotes") # Install if needed
remotes::install_github("akoyabio/rtree")

Note: As of version 0.2.0, rtree requires R version 4.0.0 or higher. This is because version 1.75 of boost::geometry requires C++14 which is not well supported in Windows R versions before 4.0.0.

Usage

Say we have two large sets of points, A and B, stored as 2-column matrices of Cartesian coordinates:

## Simulate point coordinates
set.seed(0)
A_n <- 10^4
A <- cbind(runif(A_n), runif(A_n))
B_n <- 10^4
B <- cbind(runif(B_n), runif(B_n))
colnames(A) <- colnames(B) <- c('x', 'y')

Within-Distance Calculation

For each point of set A, ai, we want to know all points of set B that are within distance d of ai. To compute this, we first create an R-Tree index on B:

library(rtree)

## Set index
B_rtree <- RTree(B)

The RTree() function creates an S3 object of class RTree,

inherits(B_rtree, 'RTree')
## [1] TRUE

which essentially just points to a C++ object of class RTreeCpp.

Using the RTree object, we can now perform our query efficiently:

## Within distance calculation
d <- 0.05
wd_ls <- withinDistance(B_rtree, A, d)

wd_ls is a list of length nrow(A)

nrow(A)==length(wd_ls)
## [1] TRUE

…whereby the ith list element contains the row-indices of the points in B that are within distance d of point ai:

print(wd_ls[[1]])
##  [1] 9999 5736  819 2654 7768 8949 2849 5398 7940 1856  622 2151 5223 5964 6410
## [16] 3520 5320 2265 8569 3385 7011  246 4380 9875 9627 2508 6440 2678 4310 1207
## [31] 8408 4945 4402 6573  979 3394 8919 8232 7790 5144 2819 5167 6514 4973 5952
## [46] 8468 1283 7806  900 1277 1233  514 4225 7512 5313 8187 5626 4013 1661 9721
## [61] 4004  475 6321 1632 1772 6458 2379  686 1082 1629 1931 8422 8945  739 9470
## [76] 2515 1459 7517 1151 3991 3070 6498 5770 9752 7770

We can also check the sanity of the result visually:

## Plot points in B within distance d of point a_1
a_1 <- A[1,]  # Get coords of a_1
plot(a_1[1], a_1[2], xlim=c(a_1[1]-d, a_1[1]+d), ylim=c(a_1[2]-d, a_1[2]+d), 
     col='black', asp=1, pch=20, xlab='x', ylab='y')  # Plot a_1
points(B[,1], B[,2], col='grey')  # Plot B in grey
symbols(a_1[1], a_1[2], circles=d, add=TRUE, inches=FALSE)  # Draw circle of radius d
b_wd <- B[wd_ls[[1]],]  # Get relevant points in B
points(b_wd[,1], b_wd[,2], col='red', pch=20)  # Plot relevant points in red

Within distance sanity check.

Nearest Neighbor Calculation

For each point of set A, ai, we want to know the k points in B closest to ai. Recycling the RTree object created above, we perform the knn computation…

## KNN calculation
k <- 10L
knn_ls <- knn(B_rtree, A, k)

…which returns a list of the same format as above, with the exception that each element of knn_ls is exactly of length k.

Again, we may plot the result to inspect its veracity:

## Plot points in B within distance d of point a_1
a_1 <- A[1,]  # Get coords of a_1
plot(a_1[1], a_1[2], xlim=c(a_1[1]-d, a_1[1]+d), ylim=c(a_1[2]-d, a_1[2]+d), 
     col='black', asp=1, pch=20, xlab='x', ylab='y')  # Plot a_1
points(B[,1], B[,2], col='grey')  # Plot B in grey
b_knn <- B[knn_ls[[1]],]  # Get relevant points in B
points(b_knn[,1], b_knn[,2], col='red', pch=20) # Plot relevant points in red

KNN sanity check.

Benchmarking

Within-Distance Benchmarks

We first compare the within-distance functionality to the gWithinDistance() function offered in rgeos (version 0.5.5).

## Load packages
library(sp)
library(rgeos)
library(rbenchmark)

## Simulate data
set.seed(0)
A_n <- 10^3
A <- cbind(runif(A_n), runif(A_n))
B_n <- 10^3
B <- cbind(runif(B_n), runif(B_n))
d <- 0.05

## Encapsulate wd operations in functions, then benchmark
rgeos.wd <- function() {
  wd_mat <- gWithinDistance(spgeom1=SpatialPoints(A), spgeom2=SpatialPoints(B), 
                            dist=d, byid=TRUE)
}
rtree.wd <- function() {
  wd_ls <- withinDistance(RTree(B), A, d)
}
bm.wd <- benchmark(rtree=rtree.wd(),
                   rgeos=rgeos.wd(),
                   replications=10,
                   columns=c("test", "replications", "elapsed", "relative"))

## Print output
print(bm.wd)
##    test replications elapsed relative
## 2 rgeos           10    4.67   116.75
## 1 rtree           10    0.04     1.00
## Plot
barplot(bm.wd$relative, names.arg=bm.wd$test,
        ylab="Relative Time Elapsed", cex.main=1.5)
mtext("within distance", line=3, cex=1.5, font=2)
speedup <- round(bm.wd$relative[bm.wd$test=="rgeos"], 1)
mtext(paste("rtree ", speedup, "x faster than rgeos", sep=""), 
      line=1.5, cex=1.25)

KNN Benchmarks

Next we compare the KNN functionality with the KNN implementation based on d-trees offered in the FNN package (version 1.1). We don’t offer benchmarking statistics against a linear search KNN implementation, which would obviously be much, much slower.

## Load packages
library(FNN)

## Simulate data
set.seed(0)
A_n <- 10^4
A <- cbind(runif(A_n), runif(A_n))
B_n <- 10^4
B <- cbind(runif(B_n), runif(B_n))
k <- 100L

## Encapsulate knn operations in functions, then benchmark
kdtree.knn <- function() {
  nn.idx <- get.knnx(data=B, query=A, k=k, algorithm=c("kd_tree"))
}
rtree.knn <- function() {
  nn_ls <- rtree::knn(RTree(B), A, k)
}
bm.knn <- benchmark(rtree=rtree.knn(),
                    kdtree=kdtree.knn(),
                    replications=10,
                    columns=c("test", "replications", "elapsed", "relative"))

## Print output
print(bm.knn)
##     test replications elapsed relative
## 2 kdtree           10    1.45    1.667
## 1  rtree           10    0.87    1.000
## Plot
barplot(bm.knn$relative, names.arg=bm.knn$test,
        ylab="Relative Time Elapsed", cex.main=1.5)
mtext("KNN", line=3, cex=1.5, font=2)
speedup <- round(bm.knn$relative[bm.knn$test=="kdtree"], 1)
mtext(paste("rtree ", speedup, "x faster than FNN (kd-tree)", sep=""), 
      line=1.5, cex=1.25)

rtree's People

Contributors

ab-kent avatar hunzikp avatar

Stargazers

Andrew Allen Bruce avatar Ivann avatar  avatar  avatar  avatar Yunfei Li avatar Viraj Noorithaya avatar  avatar  avatar Srikanth K S avatar Andrew Gene Brown avatar  avatar César Herrera avatar Siqi Zhang avatar Martin Haringa avatar  avatar Dave avatar Luke Smith avatar  avatar John Sheffield avatar  avatar HarryZhu avatar Roberto Salas avatar Carl Müller-Crepon avatar

Watchers

HarryZhu avatar  avatar

rtree's Issues

can't compile

Hi! can you add the Makefile please?

> library(devtools)
> install_github("hunzikp/rtree")
Downloading GitHub repo hunzikp/rtree@master
from URL https://api.github.com/repos/hunzikp/rtree/zipball/master
Installing rtree
'/usr/local/Cellar/r/3.4.1_1/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL  \
  '/private/var/folders/97/zcd8xrfd5bx569t_md9b37yr0000gn/T/RtmpGGTFq7/devtools594427b81a0a/hunzikp-rtree-8017a0b'  \
  --library='/usr/local/lib/R/3.4/site-library' --install-tests 

* installing *source* package ‘rtree’ ...
** libs
make: Nothing to be done for `all'.
installing to /usr/local/lib/R/3.4/site-library/rtree/libs
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error: package or namespace load failed for ‘rtree’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/usr/local/lib/R/3.4/site-library/rtree/libs/rtree.so':
  dlopen(/usr/local/lib/R/3.4/site-library/rtree/libs/rtree.so, 6): no suitable image found.  Did find:
	/usr/local/lib/R/3.4/site-library/rtree/libs/rtree.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x03
	/usr/local/lib/R/3.4/site-library/rtree/libs/rtree.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x03
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/usr/local/lib/R/3.4/site-library/rtree’
Installation failed: Command failed (1)

installation issues

Just discovered this package and interested in exploring it. I am having trouble installing from github. Has anyone else reported a similiar issues? Using R version 4.0.2 on Windows machine. Any suggestions? Thanks in advance.

CRAN submisssion?

@hunzikp Could I please ask whether this package is being considered for submission to CRAN? With the BH version patch from @AB-Kent in the @akoyabio fork if possible? Motivation: faster finding of spatial point neighbours.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.