usfws / fwspp Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 221.77 MB

Query species occurrence observations on USFWS properties

License: Creative Commons Zero v1.0 Universal

R 100.00%

fwspp's People

Contributors

Stargazers

Watchers

fwspp's Issues

Use FWS taxonomy API rather than NPS

Apparently, FWS has a mirror (?) of the NPS taxonomy?

Base url: https://ecos.fws.gov/ServCatServices/v2/rest/taxonomy/searchByScientificName/

Should be a drop in replacement into nps_taxonomy, so simply need to rebranding and documentation updates?

Presumably this applies to nps_taxonomy_by_code as well?

Ensure compatibility with FWSpecies sanity checks...

The FWSpecies database won't permit abundance values unless a species is "Present". Options are to upgrade the occurrence value OR move abundance values to the Abundance Notes field.
A record needs a Nativeness value to be approved. Set all those with missing values to "unknown" or equivalent (see tags in #22)...

add_taxonomy function is returning an error

The add_taxonomy function is returning the error "Taxonomy retrieval failed with the following error:
at least one vector element is required"

The error may be the result of the left_join in the nested join_taxonomy function.

New fxn to check and report problems from a `fws_occ` run

We've taken great pains to catch errors during the fws_occ run. Let's create a fxn to check or review a fws_occ run and report to the user which properties had issues.

fwspp_combine will fail reasonably (but vaguely) when trying to combine fwspp objects with captured errors, but fwspp_review (and specifically xlsx_review) does not...

`install_fws_cadastral` doesn't seem to work anymore...

Need the cadastral to do anything, so figure it out...

Import reviewed spreadsheets; export for FWSpecies upload

Once the spreadsheet for review has been reviewed, and records updated/accepted/rejected, need functionality to:

import spreadsheet into R
drop unaccepted records
update taxonomy (in case any taxon codes were added or changed)
output into FWSpecies upload format? (NRPC suggest csv will work as well)

Proposed edits to the FWSpecies tags

Expect some changes to Seasonality (Occurrence Class), Origin (Nativeness), and Management tags in the FWSpecies application that will need to be accounted for, most noticeably in the xlsx_review_tags and add_review_validation functions but possibly elsewhere (e.g., review_helpers.R, fwspp_review.R)...

I'm not sure we incorporate Seasonality (i.e., "Occurrence Class") tags yet, though we probably should... ditto for "Management"

UpdatedFWSpeciesTags_5-4-2018.docx

ERROR: dependency 'ecoengine' is not available for package 'fwspp'

Installation of fwspp errors out because CRAN removed the ecoengine package (https://cran.r-project.org/web/packages/ecoengine/index.html). There are archived versions of the package that can be installed (https://cran.r-project.org/src/contrib/Archive/ecoengine/).

Update fws_occ function to pull occurrence data from ServCat

The fws_occ function uses the public facing ServCat API to extract taxonomic names for the given refuge using the unit code. The records are pulled from ServCat using the following constraints:

The record includes the bounding box for the given refuge
The record contains no other bounding boxes associated with other properties
The record is one of the following: Book Chapter, Conference Proceeding, Conference Proceeding Paper, Geospatial Dataset, Journal Article, Published Report, Published Report Section, Published Report Series, Resource Brief, Tabular Dataset, or Unpublished Report
The record is associated with at least one digital file

GBIF bio_repo field returning broken evidence links

Example evidence links that are broken have the following prefixes:

pbdb:occ
SereginA
URN:catalog

fws_occ gives error with smaller refuges

Sachuest<-find_fws("Sachuest")
Sachuest_occ <- fws_occ(Sachuest)

causes the following error:
1 properties will be queried:
Sachuest Point NWR (R5)

Processing Sachuest Point NWR
Spherical geometry (s2) switched off
Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is FALSE
Splitting property for more efficient queries.
Spherical geometry (s2) switched on
Server request timeout set to 3 seconds (x4 for GBIF).
Querying the Global Biodiversity Information Facility (GBIF)...
Retrieving 437 records.
Querying Integrated Digitized Biocollections (iDigBio)...
No records found.
Taxonomy retrieval failed with the following error:
at least one vector element is required
Skipping taxonomy. Please send the resulting fwspp object to the maintainer of the
fwspp package. You may also try again later using fwspp::add_taxonomy.

Sachuest_occ
$SACHUEST POINT NATIONAL WILDLIFE REFUGE
<simpleError in if (nrow(idb_recs) > 0) idb_recs <- clean_iDigBio(idb_recs) else idb_recs <- NULL: argument is of length zero>

attr(,"class")
[1] "fwspp"
attr(,"boundary")
[1] "admin"
attr(,"scrubbing")
[1] "strict"
attr(,"buffer_km")
[1] 0
attr(,"query_dt")
[1] "2023-07-20 09:07:05 EDT"

sf (1.0-13) and s2 (1.1.4) update causing error in install_fws_cadastral function

After updating sf and s2 the following error is returned when running install_fws_cadastral:
USFWS Cadastral Database downloaded and installed successfully.
Spherical geometry (s2) switched off
Storing USFWS cadastral geodatabase in a more efficient format. This will take several additional
minutes.
Processing USFWS Interest boundaries.
Error in scan(text = lst[[length(lst)]], quiet = TRUE) :
scan() expected 'a real', got 'ParseException:'
Error in (function (msg) : ParseException: Unknown WKB type 12
The legacy packages maptools, rgdal, and rgeos, underpinning this package
will retire shortly. Please refer to R-spatial evolution reports on
https://r-spatial.org/r/2023/05/15/evolution4.html for details.
This package is now running under evolution status 0

Allow user-passed `sf` object as boundary?

Consider retaining subspecies

Retaining may be as easy as modifying clean_sci_name to retain trinomials, but it will also be necessary to make accommodating changes in several of the taxonomy functions. For example, may have to count words in the scientific name to know whether to retain species or subspecies records...

Need to recalculate timeout based on BISON, I think...

Currently based on GBIF but rgbif::occ_search only retrieves 300 records at a time due to the page size offered by the GBIF API.

Joining with ITIS no longer necessary; kill 'itistools' dependency

We'll use similar functionality by calling National Park Service webservice to retrieve taxon codes, which also contains useful ITIS info (e.g., valid Scientific Name, Common Name, TSN, rank, etc...)

Added common names are currently dumped on modified records

Do not remove `http/https` component of evidence URLs

URLs without the leading http/https are corrupted by ECOS, so retain them...

Tetlin NWR topology exception

fwspp::prep_cadastral(fwspp::find_fws("tetlin"), "admin", T)
#> Error in CPL_geos_union(st_geometry(x), by_feature): Evaluation error: TopologyException: Input geom 0 is invalid: Hole lies outside shell at or near point -141.55936108399999 62.771209905000035 at -141.55936108399999 62.771209905000035.

Consider not extracting direct media URLs

This mainly affects GBIF and BISON queries. Is it generally true that the direct media URL is accessible from the general record URL (e.g., as in iNaturalist)? If so, it may be unnecessary to go fishing for direct media URLs (and faster, in the case of rgbif::occ_data)...

several evidence field values show multiple links

Many links returning more than one link seperated by a comma or semicolon

Update method for `fwspp` object

Store as an attribute the datetime a query was initiated. Subsequently use this to pass to the get_* functions to, I suspect, profoundly reduce the size and increase the speed of query updates.

GBIF API accepts a lastInterpreted parameter
No obvious solution in BISON SOLR API
Maybe datemodified in iDigBio API, but need to check further
Perhaps lastindexed is appropriate VertNet parameter?
No obvious EcoEngine solution, but probably not a huge concern
No obvious AntWeb solution, but likely not a concern

`NA` getting tacked on to combined common names when updating invalid taxa

fwspp::retrieve_taxonomy("Solidago graminifolia")
#>                sci_name          acc_sci_name
#> 1 Solidago graminifolia Euthamia graminifolia
#>                                                       com_name    rank
#> 1 NA, flattop goldentop, flat-top goldentop, slender goldentop Species
#>         category taxon_code   tsn note
#> 1 Vascular Plant     140446 37352 <NA>

Appears to occur when no common name is found for the original taxon but common names are found for the accepted taxon. Probably a simple na.omit fix...

Add species evidence links from ServCat

Add function to:

query ServCat records for a refuge and record species occurrences
clean taxonomy of species occurrence records from ServCat

Split properties with widely-spaced polygons

Some properties are relatively small in actual area compared to the area subsumed by their convex hulls. Two very good examples are Great Thicket and Blackwater.

Maybe partition a MULTIPOLYGON into component polygons if the component polygon area is below some threshold of the MULTIPOLYGON bounding box area? For example, the corresponding percentages for Great Thicket and Blackwater are ~1.5% and 4%, respectively. Could possibly ignore this complication if the number of records was relatively small (< 500K maybe) or the absolute bounding box area was relatively small as well...

This split should occur prior to, and not affect, possible temporally-split queries by get_GBIF.

Some refuges causing trouble with `gbif_count`

Alaska Maritime seems related to crossing the international date line, and thus may be best incorporated with the solution to #2.

Items without details spawned a 500 Server error.

Reduce the over the top error handling

Way too convoluted error handling. The way it's set up, if some stage of manage_gets fails, that should be captured and the process broken, saved, and moved along. Thus, it seems that performing manage_gets safely will be adequate. Removal of purrr::safely from get_verb_N will affect several files in many locations, but worth it for clarity's sake...

Update xlsx_submission function so each record is unique

The workbook created by the xlsx_submission function now contains the "SpeciesListForImport" tab. Each row in this tab is associated with a unique taxon observation, and the data are formatted to be consistent with the FWSpecies bulk submission template.

Get cadastral data from AGOL

Users are currently required to download the entire FWS cadastral dataset from ServCat. It would be more efficient to download specific refuge data using the REST API from ArcGIS Online.

Update repo to meet DGEC standards

Add:

Update:

.gitignore
README.md

Observations associated with http://bins.boldsystems.org link returning wrong value in sci_name field

Most of the data associated with http://bins.boldsystems.org link seem to be retuning the wrong field in the sci_name. For example, the observation that returned the link: "http://bins.boldsystems.org/index.php/Public_RecordView?processid=UAMIC444-13" returns a sci_name value of "Bold:aaa2226".

All are associated with bio_repo="GBIF"

Update spatial function to sf

The package current depends on sp and geos, which will be deprecated in Oct 2023. We need to update all spatial functions to sf.

Update xlsx_submission function to include additional evidence links to improve current FWSpecies records

The workbook generated by the xlsx_submission function now includes a "FWSpecies" tab that contains evidence links associated with taxa already in FWSpecies for the given refuge. These evidence links can be used to update current FWSpecies records.

Accommodate string nativeness requirement

From Sarah Shultz:

The database has a slightly picky rule that requires that a nativeness value be assigned if a record is approved. Here are a couple of options, let me know if either are agreeable:

Set nativeness to "unknown" (where currently null) and then update in the future if/when someone has time to seek out this information

Leave nativeness blank and set the record status to "in review" (but remember, then the records won't show up on the basic checklist)

In short, when processing FWSpecies reviews for submission with fwspp_submission, we need to give the user one of the above options, with the first above as the default.

Review output

THIS ISSUE IS A WORK IN PROGRESS

Currently, output fwspp object is a list of data.frames (one per property) that may have taxonomic information. EDIT: Require them to have taxonomy. Review is frustrating without it...

Need option to take this object and export to spreadsheet for review with the following columns:

org_name (narrow column since it'll be superfluous, but necessary when re-importing)
category
taxon_code
sci_name
com_name
occurrence (as evaluated when compared against existing FWSpecies records for property)
nativeness (imported from existing FWSpecies [?] or blank for new records)
accept_record (defaults to YES for all retained records)
evidence (or ExternalLinks)
note (notes will be useful for identifying records that may be corrected at this stage)

Function to combine `fwspp` objects

Should account for possibility that some will have taxonomy and some will not...

grbio data is useful, but not unique

There are several hundred instances of GRBIO institutions with the same acronym. These are causing problems during cleaning. There's no obvious way to assign unique acronyms and still link to the occurrence data based on the institution code.

Conclusion: cut out grbio linkage and deal with it...

iDigBio seems to be letting varieties and subspecies through?

See, e.g., Erie NWR

#>                        org_name
#> 1 ERIE NATIONAL WILDLIFE REFUGE
#> 2 ERIE NATIONAL WILDLIFE REFUGE
#> 3 ERIE NATIONAL WILDLIFE REFUGE
#>                                      sci_name       lon      lat loc_unc_m
#> 1       Symphyotrichum puniceum var. puniceum -80.00139 41.78611        14
#> 2 Symphyotrichum lanceolatum var. lanceolatum -79.95472 41.56987        NA
#> 3                  Cornus amomum ssp. obliqua -79.98504 41.58978        14
#>   year month day
#> 1 1994     9   7
#> 2 1969     9  27
#> 3 2005     8   4
#>                                                                 evidence
#> 1 portal.idigbio.org/portal/records/bfb1c8bf-b196-48bd-a908-43a5c85cf51a
#> 2 portal.idigbio.org/portal/records/65130d25-9fc9-4171-92f7-836c02bac556
#> 3 portal.idigbio.org/portal/records/d7b5d622-3024-4b7d-a819-3cd2291b8094
#>   bio_repo                com_name       rank       category taxon_code
#> 1  iDigBio        purplestem aster    Variety Vascular Plant     295996
#> 2  iDigBio NA, white panicle aster    Variety Vascular Plant     290904
#> 3  iDigBio           silky dogwood Subspecies Vascular Plant     130553
#>      tsn note
#> 1 566343 <NA>
#> 2 566832 <NA>
#> 3  27801 <NA>

add date argument to fws_occ function

This enhancement allows users to add a date so that the only records edited or changed after the date will be returned

datecollected vs eventDate

Currently, ridigbio returns datecollected by default, which we do not recommend to be used in scientific research. When a data provider does not provide a full date in the Darwin Core eventDate field, this complete value or the missing parts (i.e., month and/or day) are randomly generated and thus may lack any real meaning. The generated dates are difficult to detect, as they are randomly distributed. We are currently working to modify our ingestion pipeline to avoid randomly generating dates. However, dates remain an issue across biodiversity aggregators and the solution is not clear (see GBIF for example).

Why does this matter for fwspp?
I found that datecollected is used by this repository as if it was a real value. This may lead to artificial dates being used to make management decisions!

How to use other fields:
We plan to update the ridigbio package to instead return "data.dwc:eventDate", "data.dwc:year", "data.dwc:month", and "data.dwc:day" - which are all text fields, rather than dates. These fields are not randomly generated, instead the values are directly from data providers therefore they may provide meaning in biological research. See current issue and pull request.

Since this package currently downloads "all" fields, I hoped this solution might be only related to your clean_iDigBio function and not to your get_iDigBio function. Sadly, all fields aren't returned when "all" fields are specified. Instead, you will need to specify what fields you need to download. From your code, I believe you all want scientificname, lat/lon, coordinate uncertainty, catalognumber, UUID, and date. To obtain these fields, this is how you would modify the download:

fields2get <- c("data.dwc:scientificName",  
                           "data.dwc:decimalLatitude",   
                           "data.dwc:decimalLongitude",
                           "data.dwc:coordinateUncertaintyInMeters",  
                           "catalognumber",
                           "uuid", 
                           "data.dwc:eventDate", 
                           "data.dwc:year", 
                          "data.dwc:month", 
                          "data.dwc:day" )
 idb_recs <- try_idb(type = "records", mq = FALSE, rq = rq,  fields = fields2get,
                        max_items = 100000, limit = 0, offset = 0, sort = FALSE,
                        httr::config(timeout = timeout))

Additional modification to clean_iDigBio will also be needed since the date downloaded here will not be in date format - instead, all dates will be text strings. There are many ways to convert these to dates, for example, see gatoRs remove_duplicate function or ridigbio proposed solution here.

Hope this helps and please let me know if you have any questions or want more specific code suggestions.

Update xlsx_submission function to exclude taxa that are already listed FWSpecies

The "SpeciesListForImport" tab in the workbook generated by the xlsx_submission function now excludes taxa that are already in FWSpecies. The function cross-references the taxon codes in the workbook with the data in FWSpecies using the FWSpecies API.

Update xlsx_submission function to generate an “ExternalLinks” tab

The workbook generated by the xlsx_submission function now includes an "ExternalLinks" tab that contains the extra evidence links for the taxa in the "SpeciesListForImport" tab (if the taxon had more than one evidence link).

usfws / fwspp Goto Github PK

fwspp's People

Contributors

Stargazers

Watchers

fwspp's Issues

Recommend Projects

Recommend Topics

Recommend Org