usfws / akaerial Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 236.34 MB

An R package for quality control and analysis of Alaska waterfowl aerial survey data

License: Creative Commons Zero v1.0 Universal

R 100.00%

akaerial's People

Contributors

Watchers

Forkers

cfrost3

akaerial's Issues

release new version for 2021

merge dev branch to main and release new version that contains 2021 data

add spatial QC process to greenlight

Add a process to check spatial data in transcribed observation files. The purpose is to recreate flight line (survey effort) from observation (or track files if they exist). Step one is to add a function that makes a line for every transect,

points2line <- function(x, Year=unique(x$Year), Transect=unique(x$Transect), crs=4326){
#this function accepts sf object of points and returns an sf linestring
# sf object define ONE linestring, not more than one!
#accepted sf object should have an attribute for named Year and Transect,
# if not, supply as a parameter value
linestring <- x %>% cbind(st_coordinates(.)) %>%
as.data.frame() %>%
select(-geometry) %>%
arrange(X, Y) %>%
select(X, Y) %>%
as.matrix() %>%
st_linestring() %>%
st_sfc(crs=4326) %>%
st_sf(geometry=.) %>%
mutate(Year = Year, Transect=Transect)
return(linestring)
}

And the apply this over all the transects in the data:
#Now apply function across all transects
lines <- birds %>% st_transform(crs=4326) %>%
group_split(Year, Transect, Day) %>%
map(points2line) %>%
map_dfr(rbind)

And then plot them in an interactive map for inspection by the observer:
df <- filter(lines, Year==Y)
bdf <- filter(birds, Year==Y) %>% mutate(Day=as.character(Day))
tm <- tm_shape(acp) + tm_polygons(col = "STRATNAME", alpha = 0.5) +
tm_shape(df, name=paste(Y, "Flown Track")) + tm_lines() +
tm_text("Transect", size=2) +
tm_shape(bdf, name=paste(Y,"Bird Obs")) + tm_dots(col="Day") +
tm_basemap(server = "Esri.WorldGrayCanvas") +
tm_scale_bar()
tm

Might also add some QC flags such as calculated speed and/or change in direction. Problems with transect numbering or any other problem that would result in bad effort reconstruction should be "red light" issues.

expand definition of data table to include index definitions from 'AdjustCounts'

in the documentation for the historic data tables, expand the definitions of the various metric so that a user can fully understand the metrics without refoer to other pages of documentation. Specifically, include the following:

itotal - Indicated total. Singles doubled, pairs doubled, opens added, flkdrake 1-4 doubled, flkdrake 5+ added.
ibb - Indicated breeding birds. Singles doubled, pairs doubled, opens removed, flkdrake 1-4 doubled, flkdrake 5+ removed.
total - Total birds. Singles added, pairs doubled, opens added, flkdrake added.
sing1pair2 - Singles and pairs. Singles added, pairs doubled, opens removed, flkdrake removed.
flock - Flocks. Singles removed, pairs removed, opens added, flkdrake added

Also, include any variations from other "standard" waterfowl index calculations. For example, "indicated breeding bird" index for scaup, if these vary from the NAWBPHS or other FWS surveys.

add strata and transect to QC obs files

Add a useful transect number and stratum identifier to QC obs file so that users can summarize observations and compute densities directly from QC obs files.

Create package vignette(s)

Determine which and how many separate vignettes to create. Could be one with several sections or several separate vignette with different topics. Topics covered should be:

Greenlighting (quality control process)
Estimation
Plotting and visualization

The estimation topic might be further split into a 'modelling' section to describe the state-space model and any spatial or other models.

add 'seat' to summary data sets

Add the 'seat' variable for the observer in the internal data sets where this could be relevant: $expanded.table and $output.table

add check for duplication during transcription in QC process

in 2019 Heather Wilson (HMW) ACP data there appears to be duplication of transcribed data on Day = 11, transect = 33. In all data from 2007 to 2023 there are > 5000 duplicate observations that share exact time, location, species and all other data. Many of these are justified cases where the observer recorded multiple observation of the same species and observation type or number on the same WAV file. Another justified case might be when one observer copies the start or end locations from another observer. The occasion noted above had two start location recorded and then had many (~8) duplicate observations of various species such that a cut-and-paste error or that the same WAV file were transcribed multiple times.

There appear to be three occurrences of this in the data set from 2007 to 2023:
2007, RMD, transect 420 (maybe only 6 observations);
2015, HMW, transect 3; (about 34 observations)
2019, HMW, transect 33 (found above)

All of the above have two START points recorded at the same time. Maybe the best way to check for this is to produce a warning when multiple starts are present at the same position? Not sure if the above duplicate data should be deleted or left in?

two copies of 2022 estimate in data tables

There appear to be two copies (rows) for each species in the data tables, see AKaerial::YKGHistoric$combined and AKaerial::YKGHistoric$output.table. They appear to have the same estimates. One should be removed.

AKaerial not in compliance with DGEC required content

Update repo to contain required content. See, https://github.com/USFWS/r7-repo-template

implement workflow for Scribe

Modify AKaerial workflow (and documentation?) for Scribe.

standardize names in output estimates tables

Standardize the naming convention in the output estimate tables. Have a consistent case and use the same name for variable that are the same type of estimate, e.g., itotal in the "combined" data frame should be the same in the "output.table" data frame. Now it is named itotal.est. I would remove the ".est" suffix. Do this for all data sets. Also, give the data sets meaningful names. "output.table" doesn't really mean much.

names(AKaerial::YKGHistoric$combined)
[1] "Year" "Species" "total" "total.var" "total.se"
[6] "itotal" "itotal.var" "itotal.se" "ibb" "ibb.var"
[11] "ibb.se" "sing1pair2" "sing1pair2.var" "sing1pair2.se" "flock"
[16] "flock.var" "flock.se" "area"
names(AKaerial::YKGHistoric$output.table)
[1] "Year" "Observer" "Species" "total.est" "itotal.est"
[6] "ibbtotal.est" "sing1pair2.est" "flock.est" "var.N" "var.Ni"
[11] "var.Nib" "var.Nsing1pair2" "var.Nflock" "SE" "SE.i"
[16] "SE.ibb" "SE.sing1pair2" "SE.flock" "area"

can't install from GitHub

I tried to install AKaerial from GitHub. First time I tried, I got an error about 'not being able to update the package 'rlang'. I checked and there was no package 'rlang' in my directory, so I shut down R and tried again. Second try gave me this:

devtools::install_github("USFWS/AKaerial", ref = "master", build_vignettes = TRUE)
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
there is no package called ‘rlang’

Is this a namespace issue or is it on my side? Using R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

And RStudio Version 1.1.453

standardize and simplify package data sets and column names

Data column names are not case-standardized or share an intuitive naming convention. For example, the point estimate for indicated breeding pair is ibb in $combined and ibbtotal.est in $expanded.table or $output.table, and in the latter ibb referent to the observe number of ibb. Likewise, standard errors have different naming conventions.

Also, there seems to be a lot of unnecessary information in the data table, e.g., the 'cov' or cross product terms.

separate function code into separate files under /R

Follow guidance at https://r-pkgs.org/r.html#code-organising. Group functions as appropriate.

add design transect number to quality controlled output Obs data files

A column should be added to observation data files that gives the transect identifier for each observation. The name of this field should match that in the design geometry file, so that transect length can be extracted from that for analysis.

develop process to incorporate track files (GPS bread crumb trails) into QC obs files

The GPS track files should be part of the normal record of aerial survey data as it records the survey effort independently of any design files. As such, it is a record of the actual survey effort and does not depend on the density of bird observations. A record of the plane position also can serve as a check on transect mislabeling or data data quality problems. The easiest way to indicate GPS observations from human bird observations would be to make a code under Species or Obs_Type and use GPS as the code.

Put another way, the GPS record of the plane track records the "zero bird" observations, whereas the human observer only record there position observation of birds and do not record the frequency or density of "no birds". Also, in displays of the data, it would encourage users to display survey effort as well as bird observations and thus highlight differences in survey effort across space.

add package version and GitHub commit hash to estimate files and QC Obs files

For any output estimate or data file, add the package version (release number) and GitHub committ ID/hash to data file. Use devtools::package_info("AKaerial") to extract this. This might require a change in workflow for release of package version or data files.

source from geopackages and other open source formats

add functionality to source geographic information from geopackages, ERSI shapfiles, or geoJSON.

documentation for DubMatch function

'Details' and 'Value' for DubMatch function gives details and value for ShowMeDouble

release version (1.0.0) does not match main

The release version 1.0.0 does lags content on main branch. For example, the release does not have the function point2line.R. Should update release or make release automatic with a push to main from dev. See, https://github.com/marketplace/actions/automatic-releases

incorporate data validation against a current data dictionary as part of annual QC process.

Before writing or submitting QC data to final repository, validate it against the current data dictionary for the project. If mismatches occur, determine whether it is a data QC issue or if the dictionary needs updating. Update or assign and resolve tasks before data is submitted to repository. Use the validation function found here if useful: https://hdvincelette.github.io/mdJSONdictio/

This issue might be moved to the repo data manager and outside of AKaerial if that seems like a better solution.

add documentation for package data sets

see http://r-pkgs.had.co.nz/data.html

might also need to remove some data set or make them not visable to users. See link above.

fix links in Readme

Links to historic estimates give a 404 error. Remove or fix.

convert to sf

convert AKaerial spatial processing to sf to the extent possible.

remove duplication of QC obs data set csv files

In 2010 the bird observation data for the ACP is written to the output legacy data twice. AKaerial should only write the obs data once for each observer. In years where observers swap out so that for design based estimates combining observer for a side of the plane is needed, only write data once per observer. Leave the combining as an internal process within AKaerial as part of the estimation process.

Look at other years and surveys where this issue might apply and fix.

Improve Readme file

Add content to readme file according to https://r-pkgs.org/release.html#readme

Add USFWS disclaimer.

usfws / akaerial Goto Github PK

akaerial's People

Contributors

Watchers

Forkers

akaerial's Issues

Recommend Projects

Recommend Topics

Recommend Org