mlammens / spthin Goto Github PK

R 27.89% HTML 71.18% CSS 0.93%

spthin's Introduction

spthin (ver. 0.2.0)

This branch includes the source code for the spThin R package, version 0.2.0. This version corresponds to the package updated from the original in Aiello-Lammens et al. 2015.

spthin's People

Contributors

Stargazers

Watchers

Forkers

jeffreyhanson yeelauren jamiemkass garonen ycrc tiwo

spthin's Issues

Error in .subset2(x, i, exact = exact) : attempt to select less than one element in get1index

I am working with a dataset of occurrence locations for an endangered species, so I am not able to share data.

I have an excel file with three columns: species, longitude and latitude.

I am working with that in R, and trying to thin the >400 points, because many of them are repeats (individuals within just one population). I have used spThin successfully in the past but am unable to get it to work now. My code is:

#Reading in Occurrences

occs <- read.xlsx("Bapmeg/BAPMEG_FL.xlsx", sheetIndex = 1)

#Thinning Occurrence Points

thin(loc.data = occs,
lat.col = "latitude", long.col = "longitude",
spec.col = "name",
thin.par = 0.2, reps = 100,
locs.thinned.list.return = TRUE,
write.files = TRUE,
max.files = 1,
out.dir = "Bapmeg/", out.base = "Bapmeg_Thin",
write.log.file = TRUE,
log.file = "BapMegThin.txt")

However, I keep getting the same error: Error in .subset2(x, i, exact = exact) :
attempt to select less than one element in get1index

I can't figure out why. I saw another thread where someone added a species column, and the code worked, but I already have a species column. Is there a minimum number of occurrences necessary? This is an extremely rare species, with probably less than 12 occurrences in the area in question.

Thanks! And if I have failed to include information, I apologize. I am very new to all this.

Error in .subset2

When trying to run thin receive this error:

Error in .subset2(x, i, exact = exact) : 
  attempt to select less than one element in get1index

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Make species column optional

From Cory - "I got an error when i didn't have a species name column, though the help says its an optional column. Easy to fix by adding a column, but seems better to leave it optional."

Using ENMeval to tune Maxent settings and getting exactly the same results for two disjunct species

I am working with data for two species (cine with 14 locations and atrip with 45 locations) separated by c. 350 km and in quite different climates.
I used all 19 BioClim layers.
My code:

enmeval_results <- ENMevaluate(atrip, env, bg = NULL, 
        tune.args = list(fc = c("L","LQ","H", "LQH", "LQHP", "LQHPT"), rm = 1:5), partitions = "jackknife",
        algorithm = "maxnet")

enmeval_results@results

write.csv(enmeval_results@results, "enmeval_results.csv")

The enmeval_results.csv is exactly the same for both species!! The first time it happened I assumed that I had accidentally used the same species list. But I have redone it very carefully now and I am still getting exactly the same result. I am clearing the lists from R between each run.
Any ideas please?

Why use the phrase "Stingy heuristic algorithm"?

@paleo13 Stingy has a negative connotation, in general. So why use this term to describe the heuristic algorithm? Is this a common term for such algorithms?

Avoid full distance matrix within thin.algorithm() to allow for thin() on a large number of points?

Hello,
I am running severely out of memory when using thin() on a very large number of occurrence points. I looked into the thin.algorithm() and saw that you compute the full set of distances among all points. I experiencing a similar problem when trying to compute distances during a data-processing step. The RANN:nn2() function solved my previous issues by (I guess) only looking at neighbors within a user-defined radius. In doing so, there is no need to compute the full distance matrix which saves time and memory requirements.

I am writing to check whether you think it is possible to perform the same steps of thin() when substituting

rdist.earth(x1=rec.df.orig, miles=FALSE) < thin.par

with RANN:nn2() using the arguments searchtype="radius" and radius=thin.par.

I am sorry for bugging you with this. I am an unexperienced student and do not really have someone else to ask. I would of course not expect you to assist with the function. I just wanted to check whether the full distances are strictly required for steps I am not aware of.

Very much appreciate your time!

lon.col and lat.col in Vignette should be x.col and y.col (or vice versa)

@paleo13 the vignette in your recent push doesn't work because in each example there are argument lon.col and lat.col. In the spThin function, the arguments should be x.col and y.col. My preference is for spThin to have arguments lon.col and lat.col, though I'm not fully married to that idea. Do you have an opinion and how hard of a change will it be to edit spThin, as opposed to the vignette?

error

hi，spthin team!
my code is folling：

thinned_dataset_full <-
thin( loc.data = clean_df,
lat.col = "decimalLatitude", long.col = "decimalLongitude",
thin.par = 20, reps = 1,
locs.thinned.list.return = TRUE,
write.files = TRUE,
max.files = 1,
out.dir = "spthin/", out.base = "name",write.log.file = FALSE)

Beginning Spatial Thinning.
Script Started at: Sat Sep 24 15:43:26 2022Error in vectbl_as_col_location2():
! Must extract column with a single valid subscript.
✖ Subscript which(names(locs.df) == spec.col) has size 0 but must be size 1.
Run rlang::last_error() to see where the error occurred.

Is there a way to keep the most recent record?

Hi,

Thanks for creating the package it has been so helpful. I'm working on a species for which the available data has varying accuracy. The dataset has a column to indicate accuracy in meters. Is there a way to include an argument to request that the most accurate record is retained when thinning?

Thanks,

Fernanda.

SUGGESTION - output spatial data object

From Cory - "It would be very nice if it could just take spatialPointsDataFrames and handle them automatically, rather than specifying all the columns. might be nice to be allowed to write those out as shape files too, rather than csv, in case people want to keep all the spatial metadata."

I'm pretty sure the next version does this, but have to check.

Output additional columns?

My dataset includes a unique ID column, which I need in order to merge the thinned dataset with columns containing sampling event data. Is there a way to preserve this column while thinning?

I have tried setting this ID column as the spec.col, however, this just replaces all unique ID numbers with one ID number.

Check that out.dir is optional in next version

From Cory - "I got an error when i didn't provide an out.dir; seems like i shouldn't have to write out the results if i don't want to."

thin.algorithm doesn't work with tibbles

but does work if you use data.frame(myTibble). perhaps add that internal to the function and return the same class that was used?

Unexpectedly (?) long run time for a large dataset

Hello,

I am trying to run the thin function in a dataset with ~ 15.000 data points. I expected the run time to grow exponentially as the number of points increased, but this:
thin( loc.data = V, lat.col = "Latitude", long.col = "Longitude", spec.col = "Colony", thin.par = 10, reps = 100, locs.thinned.list.return = TRUE, write.files = T, max.files = 5, out.dir = "spthin_test/", out.base = "V_thinned", write.log.file = TRUE, log.file = "V_spThin_log_file.txt", verbose = T)

Where "V" is the dataset with ~15.000 positions, and Colony contains just 1 level, has been running for 14 hours and hadn't finished when I force-stopped it.

********************************************** 
 Beginning Spatial Thinning.
Script Started at: Thu Aug 23 19:19:13 2018

Timing stopped at: 3.911e+04 1.37e+04 5.287e+04

I am running this in an i7 machine with 16 Gb of RAM

Is this normal behaviour, or is something weird happening?

Thanks!

Identify "best" results from heuristic thinning?

@paleo13 is there a way to easily identify which of the datasets resulting from the heuristic thinning match the greatest number of records returned?

Error (error: In path.expand(path)) from spThin::thin(..., write.files = TRUE) resulting from long file names where a dataset with where "spec.col" has many levels

Hi,

I experienced a challenge saving files from thin( ) which I describe below and share the modification I made on the "thin.R" script changing how the files are named to both:

have unique file names
avoid file names getting too long

Current state

PROBLEM: saving thinned data by setting the option write.files = TRUE in “spThin::thin( …)

the thinning function “spThin::thin( … )” has the option to save each species’ thinned dataset as it is generated. However, from the source code (at https://github.com/mlammens/spThin/blob/master/R/thin.R), the way the file names are created results in subsequent “csv file names” to keep increasing in length i.e
If first csv is saved “new.csv”, the second is saved as “new_new.csv”, the 3rd as “new_new_new.csv” etc (line 185 in the source code). The problem with this is that for datasets with very many levels under "spec.col", the file names used become too long as the number of species thinned increase, causing (error: In path.expand(path))
This naming system is used to prevent overwriting since the “base” naming system used (“in line 170”) has no unique identifier for the species and may therefore result in different csv files having the same name. At “line 185”, the names are modified increasing “_new” to each subsequent thinned dataset.

SOLUTION PROPOSED

Modify the “thin()” function’s source code by : “changing how the files names such that the name of the “species” is included in the file name. i.e

At “line 170”, add species name to the thinned output file, i.e Replacing:

csv.files <- paste( out.dir, out.base, "_thin", rep(1:n.csv), ".csv", sep="")

With:

csv.files <- paste( out.dir, out.base, "thin", gsub(" ", "", as.character(species)), rep(1:n.csv), ".csv", sep="")

This will ensure every file name is unique and line 185 which adds the “_new” to every subsequent file name will be unnecessary and can be removed.

RESULTS:

This has worked and the bias removal completed on several datasets that had failed before. Could this change be made on the package source code?

Regards

Location points moving

Hello,

I am having a few issues attempting to thin species records. I am working with a species that has a relatively small range and would like to be able to thin records in a projection that preserves distance (in this case a UTM zone, the range falls only within one zone). My first question would be is it possible to use coordinates other than lat long (UTM in this case) while thinning? The algorithm runs but returns a locations not thinned correctly.

Because that wasn't working I projected my data to WGS84 with lat long coordinates. The data were thinned correctly, however, when I compared the remaining location points with my WGS84 projected data, the points that were preserved after thinning were anywhere from 9 to 15m away from any original location point. I am not sure why they would shift during the thinning process, any suggestions?

Thanks!

Need example of saving datasets with new spThin functions

@paleo13 I can't figure out how to save the outputs of spThin and currently the vignetted only has an example of saving the output of rarefy. I think the vignette needs an example of saving the output of spThin too.

write.SpThin did not save to dir

The write.SpThin function saved all of the files to the current working directory, rather than the temporary directory provided.

Thinning distance in vignette is to large

The thinning distance used in the vignette should be 10 000 m, rather than 100 000 m, to match the example used in the original spThin manuscript.

Stingy algorithm returning unthinned datasets

@paleo13 I'm testing out the latest pull request. When I run the stingy algorithm the "best thinned dataset" returned has 201 locations - that is, none were removed. When I used the lp_solve method, the "best thinned dataset" contains 123 records. Neither of these match what the original thin algorithm produced or the "by hand" method yielded. Any idea what's going wrong?

thin.par

Dear Aiello-Lammens
Is there any criteria to set "Thinning parameter" function? Default is 10 kilometers. I mean, I should set this funtion using another software/package that give me this value (distance)? Or choose any value and after I test my dataset for a spatial autocorrelation?

thin multiple species based on spec.col

Is thin() looping through multiple species if unique(spec.col) >1? Can regions be used to separate out species instead?