usfws / akaerial Goto Github PK
View Code? Open in Web Editor NEWAn R package for quality control and analysis of Alaska waterfowl aerial survey data
License: Creative Commons Zero v1.0 Universal
An R package for quality control and analysis of Alaska waterfowl aerial survey data
License: Creative Commons Zero v1.0 Universal
merge dev branch to main and release new version that contains 2021 data
Add a process to check spatial data in transcribed observation files. The purpose is to recreate flight line (survey effort) from observation (or track files if they exist). Step one is to add a function that makes a line for every transect,
points2line <- function(x, Year=unique(x$Year), Transect=unique(x$Transect), crs=4326){
#this function accepts sf object of points and returns an sf linestring
# sf object define ONE linestring, not more than one!
#accepted sf object should have an attribute for named Year and Transect,
# if not, supply as a parameter value
linestring <- x %>% cbind(st_coordinates(.)) %>%
as.data.frame() %>%
select(-geometry) %>%
arrange(X, Y) %>%
select(X, Y) %>%
as.matrix() %>%
st_linestring() %>%
st_sfc(crs=4326) %>%
st_sf(geometry=.) %>%
mutate(Year = Year, Transect=Transect)
return(linestring)
}
And the apply this over all the transects in the data:
#Now apply function across all transects
lines <- birds %>% st_transform(crs=4326) %>%
group_split(Year, Transect, Day) %>%
map(points2line) %>%
map_dfr(rbind)
And then plot them in an interactive map for inspection by the observer:
df <- filter(lines, Year==Y)
bdf <- filter(birds, Year==Y) %>% mutate(Day=as.character(Day))
tm <- tm_shape(acp) + tm_polygons(col = "STRATNAME", alpha = 0.5) +
tm_shape(df, name=paste(Y, "Flown Track")) + tm_lines() +
tm_text("Transect", size=2) +
tm_shape(bdf, name=paste(Y,"Bird Obs")) + tm_dots(col="Day") +
tm_basemap(server = "Esri.WorldGrayCanvas") +
tm_scale_bar()
tm
Might also add some QC flags such as calculated speed and/or change in direction. Problems with transect numbering or any other problem that would result in bad effort reconstruction should be "red light" issues.
in the documentation for the historic data tables, expand the definitions of the various metric so that a user can fully understand the metrics without refoer to other pages of documentation. Specifically, include the following:
itotal - Indicated total. Singles doubled, pairs doubled, opens added, flkdrake 1-4 doubled, flkdrake 5+ added.
ibb - Indicated breeding birds. Singles doubled, pairs doubled, opens removed, flkdrake 1-4 doubled, flkdrake 5+ removed.
total - Total birds. Singles added, pairs doubled, opens added, flkdrake added.
sing1pair2 - Singles and pairs. Singles added, pairs doubled, opens removed, flkdrake removed.
flock - Flocks. Singles removed, pairs removed, opens added, flkdrake added
Also, include any variations from other "standard" waterfowl index calculations. For example, "indicated breeding bird" index for scaup, if these vary from the NAWBPHS or other FWS surveys.
Add a useful transect number and stratum identifier to QC obs file so that users can summarize observations and compute densities directly from QC obs files.
Determine which and how many separate vignettes to create. Could be one with several sections or several separate vignette with different topics. Topics covered should be:
The estimation topic might be further split into a 'modelling' section to describe the state-space model and any spatial or other models.
Add the 'seat' variable for the observer in the internal data sets where this could be relevant: $expanded.table and $output.table
in 2019 Heather Wilson (HMW) ACP data there appears to be duplication of transcribed data on Day = 11, transect = 33. In all data from 2007 to 2023 there are > 5000 duplicate observations that share exact time, location, species and all other data. Many of these are justified cases where the observer recorded multiple observation of the same species and observation type or number on the same WAV file. Another justified case might be when one observer copies the start or end locations from another observer. The occasion noted above had two start location recorded and then had many (~8) duplicate observations of various species such that a cut-and-paste error or that the same WAV file were transcribed multiple times.
There appear to be three occurrences of this in the data set from 2007 to 2023:
2007, RMD, transect 420 (maybe only 6 observations);
2015, HMW, transect 3; (about 34 observations)
2019, HMW, transect 33 (found above)
All of the above have two START points recorded at the same time. Maybe the best way to check for this is to produce a warning when multiple starts are present at the same position? Not sure if the above duplicate data should be deleted or left in?
There appear to be two copies (rows) for each species in the data tables, see AKaerial::YKGHistoric$combined and AKaerial::YKGHistoric$output.table. They appear to have the same estimates. One should be removed.
Update repo to contain required content. See, https://github.com/USFWS/r7-repo-template
Modify AKaerial workflow (and documentation?) for Scribe.
Standardize the naming convention in the output estimate tables. Have a consistent case and use the same name for variable that are the same type of estimate, e.g., itotal in the "combined" data frame should be the same in the "output.table" data frame. Now it is named itotal.est. I would remove the ".est" suffix. Do this for all data sets. Also, give the data sets meaningful names. "output.table" doesn't really mean much.
names(AKaerial::YKGHistoric$combined)
[1] "Year" "Species" "total" "total.var" "total.se"
[6] "itotal" "itotal.var" "itotal.se" "ibb" "ibb.var"
[11] "ibb.se" "sing1pair2" "sing1pair2.var" "sing1pair2.se" "flock"
[16] "flock.var" "flock.se" "area"
names(AKaerial::YKGHistoric$output.table)
[1] "Year" "Observer" "Species" "total.est" "itotal.est"
[6] "ibbtotal.est" "sing1pair2.est" "flock.est" "var.N" "var.Ni"
[11] "var.Nib" "var.Nsing1pair2" "var.Nflock" "SE" "SE.i"
[16] "SE.ibb" "SE.sing1pair2" "SE.flock" "area"
I tried to install AKaerial from GitHub. First time I tried, I got an error about 'not being able to update the package 'rlang'. I checked and there was no package 'rlang' in my directory, so I shut down R and tried again. Second try gave me this:
devtools::install_github("USFWS/AKaerial", ref = "master", build_vignettes = TRUE)
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
there is no package called ‘rlang’
Is this a namespace issue or is it on my side? Using R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
And RStudio Version 1.1.453
Data column names are not case-standardized or share an intuitive naming convention. For example, the point estimate for indicated breeding pair is ibb in $combined and ibbtotal.est in $expanded.table or $output.table, and in the latter ibb referent to the observe number of ibb. Likewise, standard errors have different naming conventions.
Also, there seems to be a lot of unnecessary information in the data table, e.g., the 'cov' or cross product terms.
Follow guidance at https://r-pkgs.org/r.html#code-organising. Group functions as appropriate.
A column should be added to observation data files that gives the transect identifier for each observation. The name of this field should match that in the design geometry file, so that transect length can be extracted from that for analysis.
The GPS track files should be part of the normal record of aerial survey data as it records the survey effort independently of any design files. As such, it is a record of the actual survey effort and does not depend on the density of bird observations. A record of the plane position also can serve as a check on transect mislabeling or data data quality problems. The easiest way to indicate GPS observations from human bird observations would be to make a code under Species or Obs_Type and use GPS as the code.
Put another way, the GPS record of the plane track records the "zero bird" observations, whereas the human observer only record there position observation of birds and do not record the frequency or density of "no birds". Also, in displays of the data, it would encourage users to display survey effort as well as bird observations and thus highlight differences in survey effort across space.
For any output estimate or data file, add the package version (release number) and GitHub committ ID/hash to data file. Use devtools::package_info("AKaerial") to extract this. This might require a change in workflow for release of package version or data files.
add functionality to source geographic information from geopackages, ERSI shapfiles, or geoJSON.
'Details' and 'Value' for DubMatch function gives details and value for ShowMeDouble
The release version 1.0.0 does lags content on main branch. For example, the release does not have the function point2line.R. Should update release or make release automatic with a push to main from dev. See, https://github.com/marketplace/actions/automatic-releases
Before writing or submitting QC data to final repository, validate it against the current data dictionary for the project. If mismatches occur, determine whether it is a data QC issue or if the dictionary needs updating. Update or assign and resolve tasks before data is submitted to repository. Use the validation function found here if useful: https://hdvincelette.github.io/mdJSONdictio/
This issue might be moved to the repo data manager and outside of AKaerial if that seems like a better solution.
see http://r-pkgs.had.co.nz/data.html
might also need to remove some data set or make them not visable to users. See link above.
Links to historic estimates give a 404 error. Remove or fix.
convert AKaerial spatial processing to sf to the extent possible.
In 2010 the bird observation data for the ACP is written to the output legacy data twice. AKaerial should only write the obs data once for each observer. In years where observers swap out so that for design based estimates combining observer for a side of the plane is needed, only write data once per observer. Leave the combining as an internal process within AKaerial as part of the estimation process.
Look at other years and surveys where this issue might apply and fix.
Add content to readme file according to https://r-pkgs.org/release.html#readme
Add USFWS disclaimer.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.