spthin (ver. 0.2.0)
This branch includes the source code for the spThin R package, version 0.2.0. This version corresponds to the package updated from the original in Aiello-Lammens et al. 2015.
This branch includes the source code for the spThin R package, version 0.2.0. This version corresponds to the package updated from the original in Aiello-Lammens et al. 2015.
I am working with a dataset of occurrence locations for an endangered species, so I am not able to share data.
I have an excel file with three columns: species, longitude and latitude.
I am working with that in R, and trying to thin the >400 points, because many of them are repeats (individuals within just one population). I have used spThin successfully in the past but am unable to get it to work now. My code is:
#Reading in Occurrences
occs <- read.xlsx("Bapmeg/BAPMEG_FL.xlsx", sheetIndex = 1)
#Thinning Occurrence Points
thin(loc.data = occs,
lat.col = "latitude", long.col = "longitude",
spec.col = "name",
thin.par = 0.2, reps = 100,
locs.thinned.list.return = TRUE,
write.files = TRUE,
max.files = 1,
out.dir = "Bapmeg/", out.base = "Bapmeg_Thin",
write.log.file = TRUE,
log.file = "BapMegThin.txt")
However, I keep getting the same error: Error in .subset2(x, i, exact = exact) :
attempt to select less than one element in get1index
I can't figure out why. I saw another thread where someone added a species column, and the code worked, but I already have a species column. Is there a minimum number of occurrences necessary? This is an extremely rare species, with probably less than 12 occurrences in the area in question.
Thanks! And if I have failed to include information, I apologize. I am very new to all this.
When trying to run thin receive this error:
Error in .subset2(x, i, exact = exact) :
attempt to select less than one element in get1index
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
From Cory - "I got an error when i didn't have a species name column, though the help says its an optional column. Easy to fix by adding a column, but seems better to leave it optional."
I am working with data for two species (cine with 14 locations and atrip with 45 locations) separated by c. 350 km and in quite different climates.
I used all 19 BioClim layers.
My code:
enmeval_results <- ENMevaluate(atrip, env, bg = NULL,
tune.args = list(fc = c("L","LQ","H", "LQH", "LQHP", "LQHPT"), rm = 1:5), partitions = "jackknife",
algorithm = "maxnet")
enmeval_results@results
write.csv(enmeval_results@results, "enmeval_results.csv")
The enmeval_results.csv is exactly the same for both species!! The first time it happened I assumed that I had accidentally used the same species list. But I have redone it very carefully now and I am still getting exactly the same result. I am clearing the lists from R between each run.
Any ideas please?
@paleo13 Stingy has a negative connotation, in general. So why use this term to describe the heuristic algorithm? Is this a common term for such algorithms?
Hello,
I am running severely out of memory when using thin() on a very large number of occurrence points. I looked into the thin.algorithm() and saw that you compute the full set of distances among all points. I experiencing a similar problem when trying to compute distances during a data-processing step. The RANN:nn2() function solved my previous issues by (I guess) only looking at neighbors within a user-defined radius. In doing so, there is no need to compute the full distance matrix which saves time and memory requirements.
I am writing to check whether you think it is possible to perform the same steps of thin() when substituting
rdist.earth(x1=rec.df.orig, miles=FALSE) < thin.par
with RANN:nn2() using the arguments searchtype="radius" and radius=thin.par.
I am sorry for bugging you with this. I am an unexperienced student and do not really have someone else to ask. I would of course not expect you to assist with the function. I just wanted to check whether the full distances are strictly required for steps I am not aware of.
Very much appreciate your time!
@paleo13 the vignette in your recent push doesn't work because in each example there are argument lon.col and lat.col. In the spThin function, the arguments should be x.col and y.col. My preference is for spThin to have arguments lon.col and lat.col, though I'm not fully married to that idea. Do you have an opinion and how hard of a change will it be to edit spThin, as opposed to the vignette?
hi,spthin team!
my code is folling:
thinned_dataset_full <-
thin( loc.data = clean_df,
lat.col = "decimalLatitude", long.col = "decimalLongitude",
thin.par = 20, reps = 1,
locs.thinned.list.return = TRUE,
write.files = TRUE,
max.files = 1,
out.dir = "spthin/", out.base = "name",write.log.file = FALSE)
Beginning Spatial Thinning.
Script Started at: Sat Sep 24 15:43:26 2022Error in vectbl_as_col_location2()
:
! Must extract column with a single valid subscript.
✖ Subscript which(names(locs.df) == spec.col)
has size 0 but must be size 1.
Run rlang::last_error()
to see where the error occurred.
Hi,
Thanks for creating the package it has been so helpful. I'm working on a species for which the available data has varying accuracy. The dataset has a column to indicate accuracy in meters. Is there a way to include an argument to request that the most accurate record is retained when thinning?
Thanks,
Fernanda.
From Cory - "It would be very nice if it could just take spatialPointsDataFrames and handle them automatically, rather than specifying all the columns. might be nice to be allowed to write those out as shape files too, rather than csv, in case people want to keep all the spatial metadata."
I'm pretty sure the next version does this, but have to check.
My dataset includes a unique ID column, which I need in order to merge the thinned dataset with columns containing sampling event data. Is there a way to preserve this column while thinning?
I have tried setting this ID column as the spec.col, however, this just replaces all unique ID numbers with one ID number.
From Cory - "I got an error when i didn't provide an out.dir; seems like i shouldn't have to write out the results if i don't want to."
but does work if you use data.frame(myTibble). perhaps add that internal to the function and return the same class that was used?
Hello,
I am trying to run the thin function in a dataset with ~ 15.000 data points. I expected the run time to grow exponentially as the number of points increased, but this:
thin( loc.data = V, lat.col = "Latitude", long.col = "Longitude", spec.col = "Colony", thin.par = 10, reps = 100, locs.thinned.list.return = TRUE, write.files = T, max.files = 5, out.dir = "spthin_test/", out.base = "V_thinned", write.log.file = TRUE, log.file = "V_spThin_log_file.txt", verbose = T)
Where "V" is the dataset with ~15.000 positions, and Colony contains just 1 level, has been running for 14 hours and hadn't finished when I force-stopped it.
**********************************************
Beginning Spatial Thinning.
Script Started at: Thu Aug 23 19:19:13 2018
Timing stopped at: 3.911e+04 1.37e+04 5.287e+04
I am running this in an i7 machine with 16 Gb of RAM
Is this normal behaviour, or is something weird happening?
Thanks!
@paleo13 is there a way to easily identify which of the datasets resulting from the heuristic thinning match the greatest number of records returned?
Hi,
I experienced a challenge saving files from thin( ) which I describe below and share the modification I made on the "thin.R" script changing how the files are named to both:
PROBLEM: saving thinned data by setting the option write.files = TRUE in “spThin::thin( …)
the thinning function “spThin::thin( … )” has the option to save each species’ thinned dataset as it is generated. However, from the source code (at https://github.com/mlammens/spThin/blob/master/R/thin.R), the way the file names are created results in subsequent “csv file names” to keep increasing in length i.e
If first csv is saved “new.csv”, the second is saved as “new_new.csv”, the 3rd as “new_new_new.csv” etc (line 185 in the source code). The problem with this is that for datasets with very many levels under "spec.col", the file names used become too long as the number of species thinned increase, causing (error: In path.expand(path))
This naming system is used to prevent overwriting since the “base” naming system used (“in line 170”) has no unique identifier for the species and may therefore result in different csv files having the same name. At “line 185”, the names are modified increasing “_new” to each subsequent thinned dataset.
SOLUTION PROPOSED
Modify the “thin()” function’s source code by : “changing how the files names such that the name of the “species” is included in the file name. i.e
At “line 170”, add species name to the thinned output file, i.e Replacing:
csv.files <- paste( out.dir, out.base, "_thin", rep(1:n.csv), ".csv", sep="")
With:
csv.files <- paste( out.dir, out.base, "thin", gsub(" ", "", as.character(species)), rep(1:n.csv), ".csv", sep="")
This will ensure every file name is unique and line 185 which adds the “_new” to every subsequent file name will be unnecessary and can be removed.
RESULTS:
Regards
Hello,
I am having a few issues attempting to thin species records. I am working with a species that has a relatively small range and would like to be able to thin records in a projection that preserves distance (in this case a UTM zone, the range falls only within one zone). My first question would be is it possible to use coordinates other than lat long (UTM in this case) while thinning? The algorithm runs but returns a locations not thinned correctly.
Because that wasn't working I projected my data to WGS84 with lat long coordinates. The data were thinned correctly, however, when I compared the remaining location points with my WGS84 projected data, the points that were preserved after thinning were anywhere from 9 to 15m away from any original location point. I am not sure why they would shift during the thinning process, any suggestions?
Thanks!
@paleo13 I can't figure out how to save the outputs of spThin and currently the vignetted only has an example of saving the output of rarefy. I think the vignette needs an example of saving the output of spThin too.
The write.SpThin function saved all of the files to the current working directory, rather than the temporary directory provided.
The thinning distance used in the vignette should be 10 000 m, rather than 100 000 m, to match the example used in the original spThin manuscript.
@paleo13 I'm testing out the latest pull request. When I run the stingy algorithm the "best thinned dataset" returned has 201 locations - that is, none were removed. When I used the lp_solve method, the "best thinned dataset" contains 123 records. Neither of these match what the original thin
algorithm produced or the "by hand" method yielded. Any idea what's going wrong?
Dear Aiello-Lammens
Is there any criteria to set "Thinning parameter" function? Default is 10 kilometers. I mean, I should set this funtion using another software/package that give me this value (distance)? Or choose any value and after I test my dataset for a spatial autocorrelation?
Is thin()
looping through multiple species if unique(spec.col) >1
? Can regions be used to separate out species instead?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.