Giter Club home page Giter Club logo

regional_sdm's People

Contributors

akconley avatar christophertracey avatar dnbucklin avatar matthewpaulking avatar npasco avatar rgilb avatar tghoward avatar weberj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

regional_sdm's Issues

presence file names

Maybe this should be included in #14, but what is our input file naming convention? Are we planning on using the CuteCode or EGT?

dbDisconnect problem with rubric INSERT

@ChristopherTracey are you seeing this at the bottom of script 4c?

> dbSendStatement(db, SQLquery)
<SQLiteResult>
  SQL  INSERT INTO tblRubric (model_run_name, spdata_dataqual, spdata_abs, spdata_eval, envvar_relevance, envvar_align, process_algo, process_sens, process_rigor, process_perform, process_review, products_mapped, products_support, products_repo,interative,spdata_dataqualNotes,spdata_absNotes,spdata_evalNotes,envvar_relevanceNotes,envvar_alignNotes,process_algoNotes,process_sensNotes,process_rigorNotes,process_performNotes,process_reviewNotes,products_mappedNotes,products_supportNotes,products_repoNotes,interativeNotes) VALUES ('anaxexsu_20190109_084552','I','A','A','A','A','I','A','A','A','I','A','I','A','A','','','','','','','','','','','','','','');
  ROWS Fetched: 0 [complete]
       Changed: 1
> ## clean up ----
> dbDisconnect(db)
Warning message:
In connection_release(conn@ptr) :
  There are 1 result in use. The connection will be released when they are closed
> 

note that I've tried dbSendQuery, dbExecute, and dbSendStatement and seem to be getting the same results.

presence file schema

Have there been some thoughts on a presence file schema? It would be great to have the same file setup for both aquatic and terrestrial presences.

Right now for aquatics the scripts expect a csv file with the columns:

COMID huc12 group_id EO_ID_ST SCOMNAME SNAME OBSDATE

And for terrestrial we expect a shapefile with the columns:

EO_ID_ST SNAME SCOMNAME SFRACalc OBSDATE

It would be nice to have shapefiles (e.g., points, lines, or polygons) as expected input, and the same table schema for both. I'd propose:

UID SPECIES_CODE EO_ID_ST GROUP_ID SFRACalc OBSDATE
unique feature ID character code for species biotics EO_ID (can be NA) numeric group id ra value; numeric? yyyy-mm-dd or NA

Any other columns or ideas for this?

Look into mapview package for metadata maps?

As a follow up to the metadata MoBI tech team call, I thought you might like to check out the mapview package: https://r-spatial.github.io/mapview/index.html. There is a function mapshot that allows you to take a static snapshot of a Leaflet map. Among other things, this would allow you to use any tiled basemap as the background. For instance, I've pasted two different mapshot outputs with different basemaps, saved as PNG files. mapview handles raster, sp, and sf classes, so there's some good flexibility on what final maps could contain. Just some food for thought, and might save some heavy lifting on map output.

By the way, you all have done an excellent job with the metadata template and typesetting. Working with LaTeX is no picnic! ๐Ÿ‘

image

image

clusterR fails with extent argument

In script 4, if I set an extent and then try to use it in the call like this:

 outRas <- clusterR(envStack, predict, args = list(model=rf.full, type = "prob", index = 2, 
                                                      ext = hucExtent, filename = fileNm),  verbose = TRUE)

it fails, while the single core call works fine

    outRas <- predict(object=envStack, model=rf.full, type = "prob", index=2, ext = hucExtent,
                      filename=fileNm, format = "GTiff", overwrite=TRUE)

Does anyone know any other way to get this to work? Is there any chance we can leverage Microsoft R Open here?

Tim

database schema

Since I originally volunteered to handle this, I'm assigning myself to it. It seems like we need to settle on at least a working database schema, or we'll be running into a lot of merge conflicts...

Right now in the sqlite folder, there is Tim's latest database schema (sqliteDBDump.txt) and the one I had recently updated (sqlite_template_db_nodata.sql), which has a tracking scheme for input presence files and is used in all the recent updates I've made.

I can merge these with naming in favor of sqliteDBDump.txt, but also includes the features and tables I've added (tblVarsUsed and lkpEnvVarsAqua). Does that seem like a good plan? Any features still missing from either database we'd like to add?

Metadata: issue with \end{minipage}

@tghoward, on line 322 of the .rnw in the terrestrial branch there is a \end{minipage} statement that removes comments and rubric from the pdf when the same code is applied in Aquatic? Is it working ok for you?

user_run_SDM

I like this wrapper but I also would like to make sure we can easily run the scripts as pieces too. Any suggestions about how to resurrect that ability? (or the ability for the scripts to serve both alternatives at once?).

populate ModelerID

@ChristopherTracey in order to address the requested metadata citation refinement (ID modeler), we should populate the lkpModelers table and the ModelerID field in the lkpSpecies table. Use ModelerID = 3 for PA, and I'll use NY = 4 as those were the values from the Eastern Regional Project.

metadata: Table 1

Anne suggests that we change the order as follows:

  • aquatic - It makes more sense to me to list PR reaches, Reach groups, then BG reaches.
  • terrestrial - Makes more sense to me to keep presence related numbers together. Suggest polys (capitalize the P), EOs, PR points, and BG points.

model_comments vs metaData_comments

Quick confirmation on the difference between model_comments and metaData_comments needed. Model comments are only stored in the SQLite database, and Metadata Comments are displayed on the metadata for public view, right?

Is there any specific way we need to display them?

Training/presence flowlines removed from model output

I don't think this is how it's supposed to work:
image

The thick red lines represent the training inputs to the model whereas the blue lines are the results (classified by probability). No result flowlines are overlapping with the input points, so it seems like we are deleting these somewhere (script 4??).

Paths and Settings?

Do we still need the 0_pathsAndSettings.R now that we're using here() and the wrapper?

Metadata creation - GUI framework cannot be initialized

When running on MoBIprep, the metadata step was errorring out without creating the pdf with the following error:
In system(sprintf("%s %s.sty", kpsewhich(), name), intern = TRUE) : running command 'kpsewhich framed.sty' had status 1
Tracking it down, it looks like MikTex wanted to install packages, but the pop-up window was blocked.

To fix, in the MikTex Console, change the package installation setting to "Always install missing packages on the fly":
image

Just for your info, one of you can close whenever.

full filename for nm_presfile?

nm_presFile <- here("_data", "occurrence", model_species)

@dnbucklin all of the "nm_" objects in user_run_SDM.R contain the extension for the filename, with the exception of the above quoted line (nm_presfile). Any reason why this one shouldn't also have '.shp' appended too? Seems inconsistent to me. As in, like this:

nm_presFile <- here("_data", "occurrence", paste0(model_species, ".shp"))

timing with crop_mask_rast.R

This is interesting. Crop speed is related to the number of cores, but not as I'd expect. I wonder if it has to do with the number of env vars getting cropped. In this case I am clipping only 4 rasters. The following timings only show (approx) lines 55-56 in script 4.
This timing is using the 11 cores originally in the code

> start_time <- Sys.time()
> source(paste0(loc_scripts, "/helper/crop_mask_rast.R"), local = TRUE)
Reading layer `HUC10' from data source `E:\mobi_repo_tgh_clean\Regional_SDM\_data\other_sp\HUC10.shp' using driver `ESRI Shapefile'
Simple feature collection with 4 features and 17 fields
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: 1831561 ymin: 2366316 xmax: 1881251 ymax: 2433963
epsg (SRID):    NA
proj4string:    +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs
Writing layer `clipshp' to data source `E:/mobi_repo_tgh_clean/Regional_SDM/_data/species/bombferv/inputs/temp_rasts' using driver `ESRI Shapefile'
features:       1
fields:         1
geometry type:  Polygon
Creating raster subsets for species for 4 environmental variables...
> envStack <- stack(newL)
> (diff <- Sys.time() - start_time)
Time difference of 7.505285 secs

This timing ups the cores to 30 (35 available on mobiprep)

> start_time <- Sys.time()
> source(paste0(loc_scripts, "/helper/crop_mask_rast.R"), local = TRUE)
Reading layer `HUC10' from data source `E:\mobi_repo_tgh_clean\Regional_SDM\_data\other_sp\HUC10.shp' using driver `ESRI Shapefile'
Simple feature collection with 4 features and 17 fields
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: 1831561 ymin: 2366316 xmax: 1881251 ymax: 2433963
epsg (SRID):    NA
proj4string:    +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs
Writing layer `clipshp' to data source `E:/mobi_repo_tgh_clean/Regional_SDM/_data/species/bombferv/inputs/temp_rasts' using driver `ESRI Shapefile'
features:       1
fields:         1
geometry type:  Polygon
Creating raster subsets for species for 4 environmental variables...
> envStack <- stack(newL)
> (diff <- Sys.time() - start_time)
Time difference of 11.99645 secs

and this timing reduces the cores to 4

> start_time <- Sys.time()
> source(paste0(loc_scripts, "/helper/crop_mask_rast.R"), local = TRUE)
Reading layer `HUC10' from data source `E:\mobi_repo_tgh_clean\Regional_SDM\_data\other_sp\HUC10.shp' using driver `ESRI Shapefile'
Simple feature collection with 4 features and 17 fields
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: 1831561 ymin: 2366316 xmax: 1881251 ymax: 2433963
epsg (SRID):    NA
proj4string:    +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs
Writing layer `clipshp' to data source `E:/mobi_repo_tgh_clean/Regional_SDM/_data/species/bombferv/inputs/temp_rasts' using driver `ESRI Shapefile'
features:       1
fields:         1
geometry type:  Polygon
Creating raster subsets for species for 4 environmental variables...
> envStack <- stack(newL)
> (diff <- Sys.time() - start_time)
Time difference of 5.81722 secs

we need more testing, and perhaps in real runs we'll always have more rasters than cores, so the point may be mute, but it might be that more cores than layers seems to slow it down.

NA values in aquatic variables

What happens in the scripts if there are NAs in the EnvVar data? Will it fail? Or just not predict for that flowline?

Related to this, I believe that there are areas of the US where we may have more NAs in certain variables (i.e. the desert southwest). Do we need to add something to the code where after the project area is limited to certain HUCs (based on the species range), it drops any variables that have NA values?

batch the wrapper?

@ChristopherTracey @dnbucklin , what's your setup to batch process 0_user_run_SDM.r? We definitely don't want to be manually changing the entries in that script and waiting for it to run before running a new spp.

Automatically create a range map for aquatics based on selected HUCs

Started work on this yesterday:

  • created a range_huc12 table in background.sqlite that contains the HUC12 id and the WKT for the polygon
  • in 4_predictModelToStudyArea.R, I added a step at the end to query this table, based on the selected HUCs, dissolve the HUCs into one polygon, and then convert to a shapefile. It writes it to the 'model predictions folder, and names it using model_run_name.

Still need to integrate this into the map as the study area.

handle polys, points, or both

The terrestrial scripts are going to need to be able to handle polygon only inputs (EO data), points only inputs (observations), and cases where we have both.

'repo_head' not found

repo_head isn't being found when it try to write to the modelrun_meta_data table.

Add G-rank to metadata header

The NatureServe standard is "Scientific name, common name, g-rank (with definition, e.g., 'G1 โ€“ Critically Imperiled')"

st_write: values not successfully written

It looks like I am running into this problem a few times throughout the script. The first is at the bottom of script 2, at st_write

> st_write(points_attributed, paste0("model_input/", filename), delete_layer = T)
Writing layer `bombferv_20181220_140139_att' to data source `model_input/bombferv_20181220_140139_att.shp' using driver `ESRI Shapefile'
features:       1271
fields:         51
geometry type:  Point
There were 12 warnings (use warnings() to see them)
> warnings()[1:2]
Warning messages:
1: In CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  ... :
  GDAL Message 1: Value -13706267 of field crvprox100 of feature 533 not successfully written. Possibly due to too larger number with respect to field width
2: In CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  ... :
  GDAL Message 1: Value -12236988 of field crvslpx100 of feature 540 not successfully written. Possibly due to too larger number with respect to field width

metadata: figure 2

Possibly do away with the abbreviations (eg "dist") as they are now included in the envvar definition table

show group captures in table 3?

@ChristopherTracey - I notice you only show results for reaches in Table 3. I've dropped the EO and poly columns but have kept a column for groups. If you are still validating by groups, wouldn't you still want to have a groups column in this table?

Table 3. Thresholds {\protect\NoHyper\cite{LiuEtAl2005, LiuEtAl2015}\protect\endNoHyper} calculated from the final model. The Value column reports the threshold; Pct indicates the percentage of PR reaches predicted having suitable habitat. Total numbers of PR reaches and contiguous PR reach groups used in the final model are reported in Table 1.

metadata: do we really need to abbreviate PR and BG in Table 1

We have plenty of room to write out "Presence Points" instead of "PR Points" in the table 1 legend:
image

While the abbreviation is used several times in other parts of the of the metadata, we could easily replace it, make a shorter legend for table 1, and eliminate some jargon. Cool?

folder/file structure

@ChristopherTracey @dnbucklin Since the PR where we started talking about this is closed (#4), opening up the discussion again here.

If I'm following correctly, the excellent wrapper system in aquatic (and master) creates folders within a folder named by species code (for example). It also puts, and references, a full set of the repository (the scripts) in there. (species/sppcode/inputs/scripts/Regional_SDM_date).

Since the wrapper will build this file structure for you, I find myself running the wrapper from one copy of the repository, but then somewhere along the line (right away?) it drops into and uses the scripts from the other repository.

I think I understand the reasoning for wanting to save a set of the scripts that were used in that particular run, but it also seems a little funny to not be using the scripts that you have loaded in RStudio!

Perhaps it might make more sense to save the version (commit #?) of the repository but not the entire repository? Can one of you discuss the reasoning here?

Thanks,
Tim

background samples

Should we change how we generate and store background points/reaches tables, given we're using ranges to define model domains? A couple questions related to this:

Is it a necessary (or just a good idea) to use the same background points for every species? Alternatively we could generate them on-the-fly in step 1. This would allow us to vary the density/number of background points to use by species, if desired.

We could also attribute background points (extract values from EVs) within the model process. I think this is a more flexible way to go and shouldn't add much extra time, given we're already loading the data to attribute presence points.

Should we alter the exclusion distance for background points? It's only 30m now - it could be at least increased to the raster resolution.

map sources to spp?

@ChristopherTracey any thoughts how we'll use (or if we'll need to use) the Data Sources table (lkpDataSources) and the map to species table (mapDataSourcesToSpp) in the tracking DB in this project? If you count the mjd as one source, we have only a few sources, but it will vary by species. How might we get these tables filled? Does the mjd count as one source?

location of aquatic variable sqlite db?

As we're working to move the lotic EnvVar to a sqlite version instead of a .csv file in order to increase model speed/performance, I have a quick question about its location. Should it be a table in the main sqlite db, or should it be in its own separate sqlite?

The current aquatic EnvVars table for CONUS is about 16GB.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.