Giter Club home page Giter Club logo

akaerial's People

Contributors

cfrost3 avatar eosnas avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

cfrost3

akaerial's Issues

add spatial QC process to greenlight

Add a process to check spatial data in transcribed observation files. The purpose is to recreate flight line (survey effort) from observation (or track files if they exist). Step one is to add a function that makes a line for every transect,

points2line <- function(x, Year=unique(x$Year), Transect=unique(x$Transect), crs=4326){
#this function accepts sf object of points and returns an sf linestring
# sf object define ONE linestring, not more than one!
#accepted sf object should have an attribute for named Year and Transect,
# if not, supply as a parameter value
linestring <- x %>% cbind(st_coordinates(.)) %>%
as.data.frame() %>%
select(-geometry) %>%
arrange(X, Y) %>%
select(X, Y) %>%
as.matrix() %>%
st_linestring() %>%
st_sfc(crs=4326) %>%
st_sf(geometry=.) %>%
mutate(Year = Year, Transect=Transect)
return(linestring)
}

And the apply this over all the transects in the data:
#Now apply function across all transects
lines <- birds %>% st_transform(crs=4326) %>%
group_split(Year, Transect, Day) %>%
map(points2line) %>%
map_dfr(rbind)

And then plot them in an interactive map for inspection by the observer:
df <- filter(lines, Year==Y)
bdf <- filter(birds, Year==Y) %>% mutate(Day=as.character(Day))
tm <- tm_shape(acp) + tm_polygons(col = "STRATNAME", alpha = 0.5) +
tm_shape(df, name=paste(Y, "Flown Track")) + tm_lines() +
tm_text("Transect", size=2) +
tm_shape(bdf, name=paste(Y,"Bird Obs")) + tm_dots(col="Day") +
tm_basemap(server = "Esri.WorldGrayCanvas") +
tm_scale_bar()
tm

Might also add some QC flags such as calculated speed and/or change in direction. Problems with transect numbering or any other problem that would result in bad effort reconstruction should be "red light" issues.

expand definition of data table to include index definitions from 'AdjustCounts'

in the documentation for the historic data tables, expand the definitions of the various metric so that a user can fully understand the metrics without refoer to other pages of documentation. Specifically, include the following:

itotal - Indicated total. Singles doubled, pairs doubled, opens added, flkdrake 1-4 doubled, flkdrake 5+ added.
ibb - Indicated breeding birds. Singles doubled, pairs doubled, opens removed, flkdrake 1-4 doubled, flkdrake 5+ removed.
total - Total birds. Singles added, pairs doubled, opens added, flkdrake added.
sing1pair2 - Singles and pairs. Singles added, pairs doubled, opens removed, flkdrake removed.
flock - Flocks. Singles removed, pairs removed, opens added, flkdrake added

Also, include any variations from other "standard" waterfowl index calculations. For example, "indicated breeding bird" index for scaup, if these vary from the NAWBPHS or other FWS surveys.

add strata and transect to QC obs files

Add a useful transect number and stratum identifier to QC obs file so that users can summarize observations and compute densities directly from QC obs files.

Create package vignette(s)

Determine which and how many separate vignettes to create. Could be one with several sections or several separate vignette with different topics. Topics covered should be:

  • Greenlighting (quality control process)
  • Estimation
  • Plotting and visualization

The estimation topic might be further split into a 'modelling' section to describe the state-space model and any spatial or other models.

add 'seat' to summary data sets

Add the 'seat' variable for the observer in the internal data sets where this could be relevant: $expanded.table and $output.table

add check for duplication during transcription in QC process

in 2019 Heather Wilson (HMW) ACP data there appears to be duplication of transcribed data on Day = 11, transect = 33. In all data from 2007 to 2023 there are > 5000 duplicate observations that share exact time, location, species and all other data. Many of these are justified cases where the observer recorded multiple observation of the same species and observation type or number on the same WAV file. Another justified case might be when one observer copies the start or end locations from another observer. The occasion noted above had two start location recorded and then had many (~8) duplicate observations of various species such that a cut-and-paste error or that the same WAV file were transcribed multiple times.

There appear to be three occurrences of this in the data set from 2007 to 2023:
2007, RMD, transect 420 (maybe only 6 observations);
2015, HMW, transect 3; (about 34 observations)
2019, HMW, transect 33 (found above)

All of the above have two START points recorded at the same time. Maybe the best way to check for this is to produce a warning when multiple starts are present at the same position? Not sure if the above duplicate data should be deleted or left in?

two copies of 2022 estimate in data tables

There appear to be two copies (rows) for each species in the data tables, see AKaerial::YKGHistoric$combined and AKaerial::YKGHistoric$output.table. They appear to have the same estimates. One should be removed.

standardize names in output estimates tables

Standardize the naming convention in the output estimate tables. Have a consistent case and use the same name for variable that are the same type of estimate, e.g., itotal in the "combined" data frame should be the same in the "output.table" data frame. Now it is named itotal.est. I would remove the ".est" suffix. Do this for all data sets. Also, give the data sets meaningful names. "output.table" doesn't really mean much.

names(AKaerial::YKGHistoric$combined)
[1] "Year" "Species" "total" "total.var" "total.se"
[6] "itotal" "itotal.var" "itotal.se" "ibb" "ibb.var"
[11] "ibb.se" "sing1pair2" "sing1pair2.var" "sing1pair2.se" "flock"
[16] "flock.var" "flock.se" "area"
names(AKaerial::YKGHistoric$output.table)
[1] "Year" "Observer" "Species" "total.est" "itotal.est"
[6] "ibbtotal.est" "sing1pair2.est" "flock.est" "var.N" "var.Ni"
[11] "var.Nib" "var.Nsing1pair2" "var.Nflock" "SE" "SE.i"
[16] "SE.ibb" "SE.sing1pair2" "SE.flock" "area"

can't install from GitHub

I tried to install AKaerial from GitHub. First time I tried, I got an error about 'not being able to update the package 'rlang'. I checked and there was no package 'rlang' in my directory, so I shut down R and tried again. Second try gave me this:

devtools::install_github("USFWS/AKaerial", ref = "master", build_vignettes = TRUE)
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
there is no package called ‘rlang’

Is this a namespace issue or is it on my side? Using R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

And RStudio Version 1.1.453

standardize and simplify package data sets and column names

Data column names are not case-standardized or share an intuitive naming convention. For example, the point estimate for indicated breeding pair is ibb in $combined and ibbtotal.est in $expanded.table or $output.table, and in the latter ibb referent to the observe number of ibb. Likewise, standard errors have different naming conventions.

Also, there seems to be a lot of unnecessary information in the data table, e.g., the 'cov' or cross product terms.

develop process to incorporate track files (GPS bread crumb trails) into QC obs files

The GPS track files should be part of the normal record of aerial survey data as it records the survey effort independently of any design files. As such, it is a record of the actual survey effort and does not depend on the density of bird observations. A record of the plane position also can serve as a check on transect mislabeling or data data quality problems. The easiest way to indicate GPS observations from human bird observations would be to make a code under Species or Obs_Type and use GPS as the code.

Put another way, the GPS record of the plane track records the "zero bird" observations, whereas the human observer only record there position observation of birds and do not record the frequency or density of "no birds". Also, in displays of the data, it would encourage users to display survey effort as well as bird observations and thus highlight differences in survey effort across space.

incorporate data validation against a current data dictionary as part of annual QC process.

Before writing or submitting QC data to final repository, validate it against the current data dictionary for the project. If mismatches occur, determine whether it is a data QC issue or if the dictionary needs updating. Update or assign and resolve tasks before data is submitted to repository. Use the validation function found here if useful: https://hdvincelette.github.io/mdJSONdictio/

This issue might be moved to the repo data manager and outside of AKaerial if that seems like a better solution.

convert to sf

convert AKaerial spatial processing to sf to the extent possible.

remove duplication of QC obs data set csv files

In 2010 the bird observation data for the ACP is written to the output legacy data twice. AKaerial should only write the obs data once for each observer. In years where observers swap out so that for design based estimates combining observer for a side of the plane is needed, only write data once per observer. Leave the combining as an internal process within AKaerial as part of the estimation process.

Look at other years and surveys where this issue might apply and fix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.