Giter Club home page Giter Club logo

spthin's Introduction

spthin (ver. 0.2.0)

This branch includes the source code for the spThin R package, version 0.2.0. This version corresponds to the package updated from the original in Aiello-Lammens et al. 2015.

spthin's People

Contributors

brunovilela avatar jeffreyhanson avatar mlammens avatar samuelbosch avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

spthin's Issues

Error in .subset2(x, i, exact = exact) : attempt to select less than one element in get1index

I am working with a dataset of occurrence locations for an endangered species, so I am not able to share data.

I have an excel file with three columns: species, longitude and latitude.

I am working with that in R, and trying to thin the >400 points, because many of them are repeats (individuals within just one population). I have used spThin successfully in the past but am unable to get it to work now. My code is:

#Reading in Occurrences

occs <- read.xlsx("Bapmeg/BAPMEG_FL.xlsx", sheetIndex = 1)

#Thinning Occurrence Points

thin(loc.data = occs,
lat.col = "latitude", long.col = "longitude",
spec.col = "name",
thin.par = 0.2, reps = 100,
locs.thinned.list.return = TRUE,
write.files = TRUE,
max.files = 1,
out.dir = "Bapmeg/", out.base = "Bapmeg_Thin",
write.log.file = TRUE,
log.file = "BapMegThin.txt")

However, I keep getting the same error: Error in .subset2(x, i, exact = exact) :
attempt to select less than one element in get1index

I can't figure out why. I saw another thread where someone added a species column, and the code worked, but I already have a species column. Is there a minimum number of occurrences necessary? This is an extremely rare species, with probably less than 12 occurrences in the area in question.

Thanks! And if I have failed to include information, I apologize. I am very new to all this.

Error in .subset2

When trying to run thin receive this error:

Error in .subset2(x, i, exact = exact) : 
  attempt to select less than one element in get1index

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Make species column optional

From Cory - "I got an error when i didn't have a species name column, though the help says its an optional column. Easy to fix by adding a column, but seems better to leave it optional."

Using ENMeval to tune Maxent settings and getting exactly the same results for two disjunct species

I am working with data for two species (cine with 14 locations and atrip with 45 locations) separated by c. 350 km and in quite different climates.
I used all 19 BioClim layers.
My code:

enmeval_results <- ENMevaluate(atrip, env, bg = NULL, 
        tune.args = list(fc = c("L","LQ","H", "LQH", "LQHP", "LQHPT"), rm = 1:5), partitions = "jackknife",
        algorithm = "maxnet")

enmeval_results@results

write.csv(enmeval_results@results, "enmeval_results.csv")

The enmeval_results.csv is exactly the same for both species!! The first time it happened I assumed that I had accidentally used the same species list. But I have redone it very carefully now and I am still getting exactly the same result. I am clearing the lists from R between each run.
Any ideas please?

Avoid full distance matrix within thin.algorithm() to allow for thin() on a large number of points?

Hello,
I am running severely out of memory when using thin() on a very large number of occurrence points. I looked into the thin.algorithm() and saw that you compute the full set of distances among all points. I experiencing a similar problem when trying to compute distances during a data-processing step. The RANN:nn2() function solved my previous issues by (I guess) only looking at neighbors within a user-defined radius. In doing so, there is no need to compute the full distance matrix which saves time and memory requirements.

I am writing to check whether you think it is possible to perform the same steps of thin() when substituting

rdist.earth(x1=rec.df.orig, miles=FALSE) < thin.par

with RANN:nn2() using the arguments searchtype="radius" and radius=thin.par.

I am sorry for bugging you with this. I am an unexperienced student and do not really have someone else to ask. I would of course not expect you to assist with the function. I just wanted to check whether the full distances are strictly required for steps I am not aware of.

Very much appreciate your time!

lon.col and lat.col in Vignette should be x.col and y.col (or vice versa)

@paleo13 the vignette in your recent push doesn't work because in each example there are argument lon.col and lat.col. In the spThin function, the arguments should be x.col and y.col. My preference is for spThin to have arguments lon.col and lat.col, though I'm not fully married to that idea. Do you have an opinion and how hard of a change will it be to edit spThin, as opposed to the vignette?

error

hi,spthin team!
my code is folling:

thinned_dataset_full <-
thin( loc.data = clean_df,
lat.col = "decimalLatitude", long.col = "decimalLongitude",
thin.par = 20, reps = 1,
locs.thinned.list.return = TRUE,
write.files = TRUE,
max.files = 1,
out.dir = "spthin/", out.base = "name",write.log.file = FALSE)

Beginning Spatial Thinning.
Script Started at: Sat Sep 24 15:43:26 2022Error in vectbl_as_col_location2():
! Must extract column with a single valid subscript.
✖ Subscript which(names(locs.df) == spec.col) has size 0 but must be size 1.
Run rlang::last_error() to see where the error occurred.

Is there a way to keep the most recent record?

Hi,

Thanks for creating the package it has been so helpful. I'm working on a species for which the available data has varying accuracy. The dataset has a column to indicate accuracy in meters. Is there a way to include an argument to request that the most accurate record is retained when thinning?

Thanks,

Fernanda.

SUGGESTION - output spatial data object

From Cory - "It would be very nice if it could just take spatialPointsDataFrames and handle them automatically, rather than specifying all the columns. might be nice to be allowed to write those out as shape files too, rather than csv, in case people want to keep all the spatial metadata."

I'm pretty sure the next version does this, but have to check.

Output additional columns?

My dataset includes a unique ID column, which I need in order to merge the thinned dataset with columns containing sampling event data. Is there a way to preserve this column while thinning?

I have tried setting this ID column as the spec.col, however, this just replaces all unique ID numbers with one ID number.

Unexpectedly (?) long run time for a large dataset

Hello,

I am trying to run the thin function in a dataset with ~ 15.000 data points. I expected the run time to grow exponentially as the number of points increased, but this:
thin( loc.data = V, lat.col = "Latitude", long.col = "Longitude", spec.col = "Colony", thin.par = 10, reps = 100, locs.thinned.list.return = TRUE, write.files = T, max.files = 5, out.dir = "spthin_test/", out.base = "V_thinned", write.log.file = TRUE, log.file = "V_spThin_log_file.txt", verbose = T)

Where "V" is the dataset with ~15.000 positions, and Colony contains just 1 level, has been running for 14 hours and hadn't finished when I force-stopped it.

********************************************** 
 Beginning Spatial Thinning.
Script Started at: Thu Aug 23 19:19:13 2018

Timing stopped at: 3.911e+04 1.37e+04 5.287e+04

I am running this in an i7 machine with 16 Gb of RAM

Is this normal behaviour, or is something weird happening?

Thanks!

Error (error: In path.expand(path)) from spThin::thin(..., write.files = TRUE) resulting from long file names where a dataset with where "spec.col" has many levels

Hi,

I experienced a challenge saving files from thin( ) which I describe below and share the modification I made on the "thin.R" script changing how the files are named to both:

  • have unique file names
  • avoid file names getting too long
  • Current state

PROBLEM: saving thinned data by setting the option write.files = TRUE in “spThin::thin( …)

  • the thinning function “spThin::thin( … )” has the option to save each species’ thinned dataset as it is generated. However, from the source code (at https://github.com/mlammens/spThin/blob/master/R/thin.R), the way the file names are created results in subsequent “csv file names” to keep increasing in length i.e

  • If first csv is saved “new.csv”, the second is saved as “new_new.csv”, the 3rd as “new_new_new.csv” etc (line 185 in the source code). The problem with this is that for datasets with very many levels under "spec.col", the file names used become too long as the number of species thinned increase, causing (error: In path.expand(path))

  • This naming system is used to prevent overwriting since the “base” naming system used (“in line 170”) has no unique identifier for the species and may therefore result in different csv files having the same name. At “line 185”, the names are modified increasing “_new” to each subsequent thinned dataset.

SOLUTION PROPOSED

Modify the “thin()” function’s source code by : “changing how the files names such that the name of the “species” is included in the file name. i.e

At “line 170”, add species name to the thinned output file, i.e Replacing:

csv.files <- paste( out.dir, out.base, "_thin", rep(1:n.csv), ".csv", sep="")

With:

csv.files <- paste( out.dir, out.base, "thin", gsub(" ", "", as.character(species)), rep(1:n.csv), ".csv", sep="")

This will ensure every file name is unique and line 185 which adds the “_new” to every subsequent file name will be unnecessary and can be removed.

RESULTS:

  • This has worked and the bias removal completed on several datasets that had failed before. Could this change be made on the package source code?

Regards

Location points moving

Hello,

I am having a few issues attempting to thin species records. I am working with a species that has a relatively small range and would like to be able to thin records in a projection that preserves distance (in this case a UTM zone, the range falls only within one zone). My first question would be is it possible to use coordinates other than lat long (UTM in this case) while thinning? The algorithm runs but returns a locations not thinned correctly.

Because that wasn't working I projected my data to WGS84 with lat long coordinates. The data were thinned correctly, however, when I compared the remaining location points with my WGS84 projected data, the points that were preserved after thinning were anywhere from 9 to 15m away from any original location point. I am not sure why they would shift during the thinning process, any suggestions?

Thanks!

write.SpThin did not save to dir

The write.SpThin function saved all of the files to the current working directory, rather than the temporary directory provided.

Stingy algorithm returning unthinned datasets

@paleo13 I'm testing out the latest pull request. When I run the stingy algorithm the "best thinned dataset" returned has 201 locations - that is, none were removed. When I used the lp_solve method, the "best thinned dataset" contains 123 records. Neither of these match what the original thin algorithm produced or the "by hand" method yielded. Any idea what's going wrong?

thin.par

Dear Aiello-Lammens
Is there any criteria to set "Thinning parameter" function? Default is 10 kilometers. I mean, I should set this funtion using another software/package that give me this value (distance)? Or choose any value and after I test my dataset for a spatial autocorrelation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.