weiming-hu / analogsensemble Goto Github PK

View Code? Open in Web Editor NEW

18.0 6.0 5.0 346.22 MB

The C++ and R packages for parallel ensemble forecasts using Analog Ensemble

Home Page: https://weiming-hu.github.io/AnalogsEnsemble/

License: MIT License

C++ 72.06% R 20.21% CMake 6.79% Shell 0.66% Makefile 0.01% M4 0.28%

cpp11 weather-forecast probabilistic r

analogsensemble's Introduction

PAnEn: Parallel Analog Ensemble

Overview
Citation
Installation
Tutorials
References
Feedbacks

Overview

Parallel Analog Ensemble (PAnEn) generates accurate forecast ensembles relying on a single deterministic model simulation and the historical observations. The technique was introduced by Luca Delle Monache et al. in the paper Probabilistic Weather Prediction with an Analog Ensemble. Developed and maintained by GEOlab at Penn State, PAnEn aims to provide an efficient implementation for this technique and user-friendly interfaces in R and C++ for researchers who want to use this technique in their own research.

The easiest way to use this package is to install the R package, 'RAnEn'. C++ libraries are also available but they are designed for intermediate users with requirement for performance. For installation guidance, please refer to the installation section.

Citation

To cite this package, you have several options:

Using LaTex: Please use this file for citation.
Using R: Simply type citation('RAnEn') and the citation message will be printed.
Using plain text: Please use the following citation format:

Weiming Hu, Guido Cervone, Laura Clemente-Harding, and Martina Calovi. (2019). Parallel Analog Ensemble. Zenodo. http://doi.org/10.5281/zenodo.3384321

Installation

RAnEn is very easy to install if you are already using R. This is the recommended way to start.

RAnEn

The command is the same for RAnEn installation and update.

To install RAnEn, please install the following packages first:

BH: install.packages('BH')
Rcpp: install.packages('Rcpp')
If you are using Windows, please also install the latest version of Rtools.

The following R command install the latest RAnEn.

install.packages("https://github.com/Weiming-Hu/AnalogsEnsemble/raw/master/RAnalogs/releases/RAnEn_latest.tar.gz", repos = NULL)

That's it. You are good to go. Please refer to tutorials or the R documentation to learn more about using RAnEn. You might also want to install RAnEnExtra package with functions for visualization and verification. After RAnEn installation, you can simply run devtools::install_github("Weiming-Hu/RAnEnExtra").

Mac users: if the package shows that OpenMP is not supported. You can do one of the followings:

Avoid using Clang compilers and convert to GNU compilers. To change the compilers used by R, create a file ~/.R/Makevars if you do not have it already and add the following content to it. Of course, change the compilers to what you have. If you do not have any alternative compilers other than Clang, HomeBrew is your friend.

CC=gcc-8
CXX=g++-8
CXX1X=g++-8
CXX14=g++-8

You can also follow the instructions here provided by data.table. They provide similar solutions but stick with Clang compilers.

After the installation, you can always revert back to your original setup and RAnEn will stay supported by OpenMP.

CAnEn

Docker/Singularity

No installation is needed if you are already using docker or singularity. Docker images available here can be directly downloaded and used.

# Download and run the docker image within docker
docker container run -it weiminghu123/panen:default

# Run the dokcer image with a local folder mounted inside the image
docker container run -it -v ~/Desktop:/Desktop weiminghu123/panen:default

# Download and run the docker image within singularity
singularity run docker://weiminghu123/panen:default

From Source

To install the C++ libraries, please check the following dependencies.

Required CMake is the required build system generator.
Required NetCDF provides the file I/O with NetCDF files.
Required Eccodes provides the file I/O with Grib2 files.
Optional Boost provides high-performance data structures. Boost is a very large library. If you don't want to install the entire package, PAnEn is able to build the required ones automatically.
Optional CppUnit provides test frameworks. If CppUnit is found in the system, test programs will be compiled.

To set up the dependency, it is recommended to use conda. I chose minicoda instead of anaconda simply beacause miniconda is the light-weight version. If you already have anaconda, you are fine as well.

The following code sets up the environment from stratch:

# Python version is required because of boost compatibility issues
conda create -n venv_anen python==3.8 -y

# Keep your environment activate during the entire installation process, including CAnEn
conda activate venv_anen

# Required dependency
conda install -c anaconda cmake boost -y
conda install -c conda-forge netcdf-cxx4 eccodes doxygen  -y

# Optional dependency: LibTorch
# If you need libTorch, please go ahead to https://pytorch.org/get-started/locally/ and select
# Stable -> [Your OS] -> LibTorch -> C++/Java -> [Compute Platform] -> cxx11 ABI version
# 
# Please see https://github.com/Weiming-Hu/AnalogsEnsemble/issues/86#issuecomment-1047442579 for instructions
# on how to inlcude libTorch during the cmake process.

# Optional dependency: MPI
conda install -c conda-forge openmpi -y

After the dependencies are installed, let's build CAnEn:

# Download the source files (~10 Mb)
wget https://github.com/Weiming-Hu/AnalogsEnsemble/archive/master.zip

# Unzip
unzip master.zip

# Create a separate folder to store all intermediate files during the installation process
cd AnalogsEnsemble-master/
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=~/AnalogEnsemble ..

# Compile
make -j 4

# Install
make install

CMake Parameters

Below is a list of parameters you can change and customize.

Parameter	Explanation	Default
CMAKE_C_COMPILER	The C compiler to use.	[System dependent]
CMAKE_CXX_COMPILER	The C++ compiler to use.	[System dependent]
CMAKE_INSTALL_PREFIX	The installation directory.	[System dependent]
CMAKE_PREFIX_PATH	Which folder(s) should cmake search for packages besides the default. Paths are surrounded by double quotes and separated with semicolons.	[Empty]
CMAKE_INSTALL_RPATH	The run-time library path. Paths are surrounded by double quotes and separated with semicolons.	[Empty]
CMAKE_BUILD_TYPE	`Release` for release mode; `Debug` for debug mode.	Release
INSTALL_RAnEn	Build and install the `RAnEn` library.	OFF
BUILD_BOOST	Build `Boost` regardless of whether it exists in the system.	OFF
BOOST_URL	The URL for downloading Boost. This is only used when `BUILD_BOOST` is `ON`.	[From SourceForge]
ENABLE_MPI	Build the MPI supported libraries and executables. This requires the MPI dependency.	OFF
ENABLE_OPENMP	Enable multi-threading with OpenMP	ON
ENABLE_AI	Enable PyTorch integration and the power of AI.	OFF

You can change the default of the parameters, for example, cmake -DCMAKE_INSTALL_PREFIX=~/AnalogEnsemble ... Don't forget the extra letter D when specifying argument names.

High-Performance Computing and Supercomputers

Here is a list of instructions to build and install AnEn on supercomputers.

MPI and OpenMP

TL;DR

Launching an MPI-OpenMP hybrid program can be tricky.

If the performance with MPI is acceptable,
disable OpenMP (`cmake -DENABLE_OPENMP=OFF ..`).

If the hybrid solution is desired,
make sure you have the proper setup.

When ENABLE_MPI is turned on, MPI programs will be built. These MPI programs are hybrid programs (unless you set -DENABLE_OPENMP=OFF for cmake) that use both MPI and OpenMP. Please check with your individual supercomputer platform to find out what the proper configuration for launching an MPI + OpenMP hybrid program is. Users are responsible not to launch too many process and threads at the same time which would overtask the machine and might lead to hanging problems (as what I have seen on XSEDE Stampede2).

On NCAR Cheyenne, the proper way to launch a hybrid program can be found here. If you use mpirun, instead of mpiexec_mpt, you will loose the multi-threading performance improvement.

To dive deeper into the hybrid parallelization design, MPI is used for computationally expensive portions of the code, e.g. file I/O and analog generation while OpenMP is used by the master process during bottleneck portion of the code, e.g. data reshaping and information queries.

When analogs with a long search and test periods are desired, MPI is used to distribute forecast files across processes. Each process reads a subset of the forecast files. This solves the problem where serial I/O can be very slow.

When a large number of stations/grids present, MPI is used to distribute analog generation for different stations across processes. Each process takes charge of generating analogs for a subset of stations.

Sitting between the file I/O and the analog generation is the bottleneck which is hard to parallelize with MPI, e.g. reshaping the data and querying test/search times. Therefore, they are parallelized with OpenMP on master process only.

So if the platform support heterogeneous task layout, users can theoretically allocate one core per worker process and more cores for the master process to facilitate its multi-threading scope. But again, only do this when you find the bottleneck is taking much longer time than file I/O and analog generation. Use --profile to have profiling information in standard message output.

Tutorials

Tutorials can be accessed on binder or be found in this directory

Here are also some tips and caveats in this ticket.

References

Feedbacks

We appreciate collaborations and feedbacks from users. Please contact the maintainer Weiming Hu through [email protected] or submit tickets if you have any problems.

Thank you!

# "`-''-/").___..--''"`-._
#  (`6_ 6  )   `-.  (     ).`-.__.`)   WE ARE ...
#  (_Y_.)'  ._   )  `._ `. ``-..-'    PENN STATE!
#    _ ..`--'_..-_/  /--'_.' ,'
#  (il),-''  (li),'  ((!.-'
# 
# Authors: 
#     Weiming Hu <[email protected]>
#     Guido Cervone <[email protected]>
#     Laura Clemente-Harding <[email protected]>
#     Martina Calovi <[email protected]>
#
# Contributors: 
#     Luca Delle Monache
#         
# Geoinformatics and Earth Observation Laboratory (http://geolab.psu.edu)
# Department of Geography and Institute for CyberScience
# The Pennsylvania State University

analogsensemble's People

Contributors

Stargazers

Watchers

Forkers

codacy-badger bognerk 740402059 lhmet-forks

analogsensemble's Issues

Improve the performance of MPI I/O of CAnEnIO

This looks like a solution to the I/O bounded problem.

Select Analog function failed after update to 3.0.9

Selection function failed after updating to 3.0.9. It is already known that it is due to the newly introduced parameter extend_observations.

Adding the issue logo to website

RAnEn with different predictor weights for each point

The RAnEn package uses the same predictor weights for all stations/points. However, the related predictors and the corresponding weights might be dependent on locations. Is it possible to change the weight for each predictor-station combination? It seems to be easy for the independent search case (i.e., search for each location), but I don't how to deal with it with search space extension. Thank you very much in advance.

Memory usage inspection before writing data files

Need a small change in the code before writing data files to disk. Clean the unnecessary objects before writing to save memory requirement.

AnEn Search Space Extension: Discreet output

Describe the bug
When performing verification on the AnEn SSE, histograms show the output to be discreet. See sample plot below:

Code to reproduce: (Or check out /AnalogsSpatial/SSE_test1.R on svn)

# Generate Analogs

source('~/geolab/projects/AnalogsSpatial/code/SSE_CoreSetUp.R')
# Variables
members.size    <- 25
# Identify the day to use
dayi <- 735
# For two year search history:
search.ID.start <- 1
search.ID.end <- 730
paramBeingForecasted <- 2
#############################################################################
# # Set up parameters to compute analogs
# # weights            <- c(1,1,0,0,0) #  "wdir","ws","2T","2DPT","MSLP"
xs <- as.numeric(lon)
ys <- as.numeric(lat)
nx <- 51; ny <- 49
icounter1 <- dayi
# Generate AnEn output, one day at a time.
# for ( icounter1 in 735:735 ) {  # Now, 730; Eventually do this through to 1095 for the 3rd year. So this is training on the first year and testing on the second two years
test.ID.start <- icounter1
test.ID.end <- test.ID.start+20# Generate the analogs for one day at a time.
# A sampling of 100 stations
stations.ID <- stations_ID <- sort(sample(1:2499,100))
config2 <- generateConfiguration('independentSearch')
config2$observation_id <- paramBeingForecasted
config2$test_forecasts <- fcst.aligned[,stations.ID,test.ID.start:test.ID.end,, drop = F]
config2$search_forecasts <- fcst.aligned[,stations.ID,search.ID.start:search.ID.end, , drop = F]
config2$search_times <- as.vector(times[search.ID.start:search.ID.end])
config2$search_flts <- flts[1:dim(config2$search_forecasts)[4]]
tmp.search.observations2 <- obsv.aligned[,,search.ID.start:search.ID.end,,drop=F]  # create a new copy of it
search.observations2 <- aperm(tmp.search.observations2, c(4, 3, 2, 1)) # Reorganizing the structure
search.observations2 <- array(search.observations2,
dim = c(dim(tmp.search.observations2)[3]
* dim(tmp.search.observations2)[4],
dim(tmp.search.observations2)[2],
dim(tmp.search.observations2)[1]))
search.observations2 <- aperm(search.observations2, c(3, 2, 1))
config2$search_observations <- search.observations2
config2$observation_times = rep(config2$search_times, each = length(config2$search_flts)) + config2$search_flts
config2$num_members <- members.size
num.parameters <- dim(config2$search_forecasts)[1]
config2$weights <- rep(1, num.parameters)
config2$test_stations_x <- xs[stations.ID]
config2$test_stations_y <- ys[stations.ID]
config2$search_stations_x <- xs[stations.ID]
config2$search_stations_y <- ys[stations.ID]
config2$preserve_mapping <- T
config2$verbose <- 3
config2$max_flt_nan <- 1
config2$max_par_nan <- 0
config2$extend_observations <- T
# Validate first before using
validateConfiguration(config2)
# Generate analogs
AnEn.ind <- generateAnalogs(config2)
config <- generateConfiguration('extendedSearch')
config$test_forecasts <- fcst.aligned[,stations.ID,test.ID.start:test.ID.end,, drop = F]
config$observation_id <- paramBeingForecasted
config$search_forecasts <- fcst.aligned[,stations.ID,search.ID.start:search.ID.end, , drop = F]
config$search_times <- as.vector(times[search.ID.start:search.ID.end])
config$search_flts <- flts[1:dim(config$search_forecasts)[4]]  # Want this to match with config$search_forecasts
# # We need to convert obsv.aligned from 4 dimensions to 3 dimensions:
tmp.search.observations <- obsv.aligned[,,search.ID.start:search.ID.end,,drop=F]  # create a new copy of it
search.observations <- aperm(tmp.search.observations, c(4, 3, 2, 1)) # Reorganizing the structure
search.observations <- array(search.observations,
dim = c(dim(tmp.search.observations)[3]
* dim(tmp.search.observations)[4],
dim(tmp.search.observations)[2],
dim(tmp.search.observations)[1]))
# Combined (collapsed) the first two dimensions (multiplied the first two dimensions to bring them together )
search.observations <- aperm(search.observations, c(3, 2, 1))  # Flip the location back
# search.observations <- aperm(search.observations, c(3, 2, 1))
#
config$search_observations <- search.observations   # search_observations[parameter, stations, time]
#
# # Need to go do the dim(config$search_observations)[3]
config$observation_times = rep(config$search_times, each = length(config$search_flts)) + config$search_flts
#
#
config$num_members <- members.size
#
num.parameters <- dim(config$search_forecasts)[1]
config$weights <- rep(1, num.parameters)
#
# # Right now, these first four lines are all the same. (because the test stations are the search stations but in other examples, the test and search stations could be different )
config$test_stations_x <- xs[stations.ID]
config$test_stations_y <- ys[stations.ID]
#
# # config$search_stations_x <- xs
# # config$search_stations_y <- ys
config$search_stations_x <- xs[stations.ID]
config$search_stations_y <- ys[stations.ID]
#
config$preserve_mapping <- T
config$verbose <- 3
config$max_flt_nan <- 1
config$max_par_nan <- 0
config$extend_observations <- T  # added on 20181212 <- analog from the point (target) being forecasted for
# # save search stations in the output
config$preserve_search_stations <- T  # Tells you waht the search stations are that you're looking into
# # save metrics in the output
config$preserve_similarity <- T  # Tells you which statiosn were most similar
#
config$num_nearest <- 8  # This is an option you can vary
config$max_num_search_stations <- 10  # Can decrease this if you want. # Can set this to be the same as number of nearest if/when using # nearest.
# # config$max_num_search_stations <- config$num_nearest
# # config$distance <- 1
#
# # Validate first before using
validateConfiguration(config)
#
# # w/ search extension
AnEn <- generateAnalogs(config)


# Look at anen.ver for SSE and IS: 
anen.ver.sse <- array(AnEn$analogs[,,,,1], dim=dim(AnEn$analogs)[1:4])

anen.ver.ind <- array(AnEn.ind$analogs[,,,,1], dim=dim(AnEn.ind$analogs)[1:4])



# Observations 
# # Observations (Analysis Fields)
nc.analy.file <- '~/geolab_storage_V3/data/Analogs/ECMWF_Italy/ItalyAnalysis.nc'
nc.analysis   <- nc_open(nc.analy.file)
obsv          <- ncvar_get(nc.analysis, 'Data')
# dim(obsv) -- 3 parameters   2499 stations   1102 days    4 flt
# Parameter 3 is temperature 
parameter <- 2
dir <- UVtoDir(obsv[1,,,], obsv[2,,,])
spd  <- UVtoSpd(obsv[1,,,], obsv[2,,,])
obs  <- array(spd,dim=dim(obsv)[2:4])


# Source Verification Functions 
source('~/geolab/projects/ExtremeHeat/code/Verification_Functions.R')
source('~/geolab/projects/ExtremeHeat/code/AnEn_functions.R')
library(ncdf4)

# # 2 days every 6 hours
time      <- (seq(0,48,6) )*60*60   ; time <- time[-length(time)]
# rhist.ver=function(anen.ver, obs.ver)

# anen.ver <- array(AnEn.ind$analogs[,,,,1], dim=dim(AnEn.ind$analogs)[1:4])
obs.ver  <- array( NA, dim=c(nrow(obs), dim(anen.ver.ind)[2], dim(obs)[3]*2 ))
for ( d in 1:dim(anen.ver.ind)[2] ) {
  obs.ver[,d,] = array( cbind( obs[ , test.ID.start+d-1, ], obs[ , test.ID.start+d, ] ), dim=c(nrow(obs), 1, dim(obs)[3]*2 ))
}
# Subselect down to the stations chosen. 
# stations.ID are the stations that are randomly kept 
obs.ver <- obs.ver[stations.ID,,]

rankhist.ind <- rhist.ver(anen.ver = anen.ver.ind ,obs.ver = obs.ver )
barplot(rankhist.ind, main = "AnEn.ind")

rankhist.sse <- rhist.ver(anen.ver = anen.ver.sse ,obs.ver = obs.ver )
barplot(rankhist.sse, main = "AnEn SSE")

Change the R interface input parameters to const reference for better performance

Add tests for similarity calculation

Compare the results from C++ and R for similarity calculation

Missing values found in RAnEn

Using RAnEn package and foud missing values in the results.
Here the data:

data.dir  <- "~/geolab_storage_V3/data/ExtremeHeat/NYdata/"
gfs.fname <- "gfs_209_68.Rdata"
pws.fname <- "pws.Rdata"

Code to run (SVN repository: geolab/projects/ExtremeHeat) AnEn_GFS_PWS.R

I found NAs in the results:

AnEn$analogs[1, 256, 12, , ]
      [,1] [,2] [,3]
 [1,]  NaN  NaN  NaN
 [2,]  NaN  NaN  NaN
 [3,]  NaN  NaN  NaN
 [4,]  NaN  NaN  NaN
 [5,]  NaN  NaN  NaN
 [6,]  NaN  NaN  NaN
 [7,]  NaN  NaN  NaN
 [8,]  NaN  NaN  NaN
 [9,]  NaN  NaN  NaN
[10,]  NaN  NaN  NaN
[11,]  NaN  NaN  NaN
[12,]  NaN  NaN  NaN
[13,]  NaN  NaN  NaN
[14,]  NaN  NaN  NaN
[15,]  NaN  NaN  NaN
[16,]  NaN  NaN  NaN
[17,]  NaN  NaN  NaN
[18,]  NaN  NaN  NaN
[19,]  NaN  NaN  NaN
[20,]  NaN  NaN  NaN
[21,]  NaN  NaN  NaN

Improve File IO for SimilarityMatrices

Reading and writing similarity matrices are very slow because of the data structure.

A SimilarityMatrix is a vector of a vector which is not optimized for writing to and reading from a NetCDF file.

Correctness test of AnEn programs

Test the correctness of

RAnEn
analogGenerator
similairtyCalculator + analogSelector

Refactor IO library

Here is the file type standard.

Codes tend to have more potential issues lately. Try to resolve them.

couldn't load data

https://hub.mybinder.org/user/weiming-hu-analogsensemble-sfodl18m/notebooks/demo-1_RAnEn-basics.ipynb

Hi,

I have problem running the code where says that " load('../analysis.RData')
load('../forecasts.RData')" The error says that there's no such file directory

Index out of bounds

Index out of bounds. Probably because analog.index.day difference between observation and forecast

New functionality required: station subset for gribConverter

It turns out that it is needed to add this function to gribConverter so that it can subset stations.

RAnEn config validation efficiency problem

Validation function is really inefficient. This should be improved after issue #50 is complete.

Basic function development in progress

This issue is created for the first release of the package.

Insert documentation in the github page

Error during RAnEn installation

Describe the bug
I got the following error when install the RAnEn package.

> install.packages("https://github.com/Weiming-Hu/AnalogsEnsemble/raw/master/RAnalogs/releases/RAnEn_latest.tar.gz", repos = NULL)
Installing package into ‘/Users/wuh20/Library/R/3.5/library’
(as ‘lib’ is unspecified)
trying URL 'https://github.com/Weiming-Hu/AnalogsEnsemble/raw/master/RAnalogs/releases/RAnEn_latest.tar.gz'
Content type 'application/octet-stream' length 115962 bytes (113 KB)
==================================================
downloaded 113 KB

* installing *source* package ‘RAnEn’ ...
Checking whether R_HOME is already set? R_HOME = /usr/local/Cellar/r/3.5.0_1/lib/R
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... configure: error: in `/private/var/folders/z2/qq0ntf292kj8hy14ckfrmlp80000gp/T/Rtmpmbn6Nq/R.INSTALL27d92b7a8188/RAnEn':
configure: error: cannot run C++ compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
ERROR: configuration failed for package ‘RAnEn’
* removing ‘/Users/wuh20/Library/R/3.5/library/RAnEn’
* restoring previous ‘/Users/wuh20/Library/R/3.5/library/RAnEn’
Warning message:
In install.packages("https://github.com/Weiming-Hu/AnalogsEnsemble/raw/master/RAnalogs/releases/RAnEn_latest.tar.gz",  :
  installation of package ‘/var/folders/z2/qq0ntf292kj8hy14ckfrmlp80000gp/T//Rtmpf4qdEz/downloaded_packages/RAnEn_latest.tar.gz’ had non-zero exit status

My Makevars

CC=gcc-8
CXX=g++-8
CXX1X=g++-8
CXX11=g++-8

Interface Help Following 3.2.1 release (Operational Search added)

Following use of Parallel Ensemble help page (https://weiming-hu.github.io/AnalogsEnsemble/2019/02/12/operational-search.html) and binder (https://hub.mybinder.org/user/weiming-hu-analogsensemble-bmqvhvn1/notebooks/demo-3_operational-search.ipynb) documentation,

request help utilizing new commands added/revised with the inclusion of the Operational Search option
Suggest revision of user interface to reduce duplication and streamline use

RAnEn function generateAnalogs failed with RStudio

I have the following script.

library(RAnEn)
library(maps)

# load("forecasts_ocean.RData")
# load("observations_ocean.RData")

cat("Loading data ...\n")
if ('forecasts' %in% ls()) {
  # Don't reload data
} else {
  load("forecasts_Utah.RData")
  # load("forecasts_Denver.RData")
  # load("forecasts_ocean.RData")
}

if ('observations' %in% ls()) {
  # Don't reload data
} else {
  load("observations_Utah.RData")
  # load("observations_Denver.RData")
  # load("observations_ocean.RData")
}

# Only keep the first 4 FLTs because they are perfect forecasts
if (length(forecasts$FLTs) == 53) {
  #flts.to.keep <- c(1:4)
  #flts.to.keep <- c(1:4, 7)
  flts.to.keep <- c(1:4)
  forecasts$FLTs <- forecasts$FLTs[flts.to.keep]
  forecasts$Data <- forecasts$Data[, , , flts.to.keep, drop = F]
  rm(flts.to.keep)
} else {
  cat("FLTs have already been truncated. No changes are made to the current FLTs.")
}

# Shift Xs range
if (range(forecasts$Xs)[2] > 180) {
  forecasts$Xs <- forecasts$Xs - 360
}

# Configure start and end time
test.start <- 2997
test.end <- 3027
search.end <- test.start - 2
search.start <- search.end - 364
observation_id <- 8
# weights <- c(1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1)
weights <- c(0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0)
# weights <- rep(1, length(forecasts$ParameterNames))
names(weights) <- forecasts$ParameterNames

if (T) {
  cat("Range of test times:", format(as.POSIXct(forecasts$Times[c(
    test.start, test.end)], origin = '1970-01-01', tz = 'UTC'), format = "%Y-%m-%d"),
    "\nRange of search times:", format(as.POSIXct(forecasts$Times[c(
      search.start, search.end)], origin = '1970-01-01', tz = 'UTC'), format = "%Y-%m-%d"),
    "\nPredicted variable is", observations$ParameterNames[observation_id],
    "\nweights are:\n")
  print(weights)
}

# Generate AnEn
config <- generateConfiguration('independentSearch')

config$forecasts <- forecasts$Data
config$forecast_times <- forecasts$Times
config$flts <- forecasts$FLTs
config$search_observations <- observations$Data
config$observation_times <- observations$Times
config$observation_id <- observation_id
config$weights <- weights
config$num_members <- 20
config$verbose <- 6
config$max_par_nan <- 3
config$max_flt_nan <- 1
config$quick <- F
config$circulars <- unlist(lapply(forecasts$ParameterCirculars, function (x) {
  return(which(x == forecasts$ParameterNames))}))

config$test_times_compare <- forecasts$Times[test.start:test.end]
config$search_times_compare <- forecasts$Times[search.start:search.end]

AnEn <- generateAnalogs(config)

obs <- alignObservations(observations$Data, observations$Times, forecasts$Times, forecasts$FLTs)

I get the following results when I'm running it over NAM data. The place that generates this error message is not consistent.

OpenMP is supported.
Package 'RAnEn' version 3.2.4
Copyright (c) 2018 Weiming Hu
Loading data ...
Range of test times: 2017-04-12 2017-05-12 
Range of search times: 2016-04-10 2017-04-10 
Predicted variable is SurfaceTemperature 
weights are:
    2MetreRelativeHumidity             2MetreDewpoint 
                         0                          0 
         2MetreTemperature            SoilTemperature 
                         1                          1 
             SurfaceAlbedo         1000IsobaricInhPaU 
                         0                          0 
        1000IsobaricInhPaV         SurfaceTemperature 
                         0                          1 
           SurfacePressure            TotalCloudCover 
                         1                          0 
        TotalPrecipitation DownwardShortWaveRadiation 
                         0                          0 
 DownwardLongWaveRadiation   UpwardShortWaveRadiation 
                         1                          0 
   UpwardLongWaveRadiation     1000IsobaricInhPaSpeed 
                         1                          0 
      1000IsobaricInhPaDir 
                         0 
Convert R objects to C++ objects ...
A summary of test forecast parameters:
[Parameters] size: 17
[Parameter] ID: 0, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 1, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 2, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 3, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 4, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 5, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 6, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 7, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 8, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 9, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 10, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 11, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 12, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 13, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 14, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 15, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 16, name: UNDEFINED, weight: 0, circular: 1
Computing standard deviation ... 
corrupted size vs. prev_size
Aborted (core dumped)

Multi-threaded read of NetCDF files

SSIM as similarity measure

Integrate SSIM for similarity computation.

Package avaialble OpenCV.

Distinguish NAN values

When NAN values are assigned, leave a unique integer to specify the reason for being NAN values.

RAnEn installation on Mac OS

I have the following errors while installing RAnEn on Mac Air.

> install.packages("https://github.com/Weiming-Hu/AnalogsEnsemble/raw/master/RAnalogs/releases/RAnEn_latest.tar.gz", repos = NULL)
trying URL 'https://github.com/Weiming-Hu/AnalogsEnsemble/raw/master/RAnalogs/releases/RAnEn_latest.tar.gz'
Content type 'application/octet-stream' length 146604 bytes (143 KB)
==================================================
downloaded 143 KB

Warning in strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) :
  unknown timezone 'zone/tz/2018i.1.0/zoneinfo/America/New_York'
* installing *source* package ‘RAnEn’ ...
Checking whether R_HOME is already set? R_HOME = /Library/Frameworks/R.framework/Resources
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
checking whether the C++ compiler works... no
configure: error: in `/private/var/folders/h2/sgb6vf0j61554sqdq_8glx280000gn/T/RtmpjOM3Zt/R.INSTALL17f3f1cf1ddb9/RAnEn':
configure: error: C++ compiler cannot create executables
See `config.log' for more details
ERROR: configuration failed for package ‘RAnEn’
* removing ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/RAnEn’
Warning in install.packages :
  installation of package ‘/var/folders/h2/sgb6vf0j61554sqdq_8glx280000gn/T//RtmpnfzJLt/downloaded_packages/RAnEn_latest.tar.gz’ had non-zero exit status

AnEn Selection Visualization for RAnEn Package

Created a new code for AnEn Selection visualization. Would like to add this to the RAnEn repo as a function for others to use. Can you please provide directions on how to contribute?
AnEn_ItalyGrid_AnalogSelection_sample_1.pdf

A demo for the upcoming lab meeting

A demo post that shows functions in the RAnEn package.

Independent search
Search space extension
Visualization
Some other functions

Insufficient memory when dealing when large objects

When dealing with large objects in R, the memory is exhausted.

> AnEn <- generateAnalogs(config)
Convert R objects to C++ objects ...
Computing standard deviation ... 
Computing mapping from forecast [Time, FLT] to observation [Time]  ... 
Computing search space extension ... 
Computing search windows for FLT ... 
Computing similarity matrices ... 
Error in .generateAnalogs(configuration$test_forecasts, dim(configuration$test_forecasts),  : 
  std::bad_alloc

Workflow for working with large dataset

Demo a workflow of using the C++ executables for generating AnEn from a large dataset.

RAnEn::generateAnalogs return error messages about not matching object types

Hi Alon, could you please upload two things here to assist the debugging?

Run all the codes until the line that will give you the error message. Save the environment (you can remove and clear some variables that won't be used to reduce the file size. Normally it is enough that you save all the input the function that will generate the error) and try uploading it here or directly to me.
Attach the line of code here that you didn't run and will give you the error.

Thank you

NetCDF: Unknown file format when using mpi reading

$ forecastsToObservations -i ItalyAnalysis20190204.nc -o asdfasdfItalyAnalysis20190204_tmp1.nc -v 3
Parallel Ensemble Forecasts --- Forecasts to Observations v 1.0.2
Copyright (c) 2018 Weiming Hu @ GEOlab
Converting Observations to Forecasts
Reading forecast file ...
Reading Parameters from file (ItalyAnalysis20190204.nc) ...
Reading dimension (num_parameters) length ...
Warning: Optional variable (ParameterCirculars) is missing in file (ItalyAnalysis20190204.nc)!
Warning: Optional variable (ParameterWeights) is missing in file (ItalyAnalysis20190204.nc)!
Reading Stations from file (ItalyAnalysis20190204.nc) ...
Reading dimension (num_stations) length ...
Warning: Optional variable (StationNames) is missing in file (ItalyAnalysis20190204.nc)!
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
Error at line=166: (-51) NetCDF: Unknown file format
-------------------------------------------------------
Child job 2 terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
^C
--------------------------------------------------------------------------
(null) detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[36550,2],17]
  Exit code:    205
--------------------------------------------------------------------------

Test the code with Intel Cluster OpenMP

A promising direction for deploying on Clusters by Intel Cluster OpenMP

Add tests for combing observations and forecasts along different dimensions

Issue with RAnEn installation

I have the following errors while installing the RAnEn package on Linux.

> install.packages('/users/yyang/code/R/script/AnalogsEnsemble-master/RAnalogs/releases/RAnEn_latest.tar.gz',repos=NULL)

Installing package into ?.users/yyang/code/R/library?
(as ?.ib?.is unspecified)
* installing *source* package ?.AnEn?....
Checking whether R_HOME is already set? R_HOME = /opt/R/R-3.5.3-intel2017-icc-ifort/lib64/R
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether icpc -std=gnu++11 accepts -g... yes
checking for icpc -std=gnu++11 option to support OpenMP... -fopenmp
configure: creating ./config.status
config.status: creating src/Makevars
** libs
icpc -std=gnu++11 -I"/opt/R/R-3.5.3-intel2017-icc-ifort/lib64/R/include" -DNDEBUG  -I"/users/yyang/code/R/library/Rcpp/include" -I"/users/yyang/code/R/library/BH/include" -I/usr/local/include  -fopenmp -fpic  -g -O2 -c AnEn.cpp -o AnEn.o
In file included from /users/yyang/code/R/library/BH/include/boost/noncopyable.hpp(15),
                 from /users/yyang/code/R/library/BH/include/boost/multi_index/detail/auto_space.hpp(20),
                 from /users/yyang/code/R/library/BH/include/boost/multi_index/detail/rnd_index_ptr_array.hpp(19),
                 from /users/yyang/code/R/library/BH/include/boost/multi_index/detail/rnd_index_ops.hpp(18),
                 from /users/yyang/code/R/library/BH/include/boost/multi_index/random_access_index.hpp(35),
                 from Stations.h(13),
                 from Forecasts.h(16),
                 from Analogs.h(12),
                 from Functions.h(11),
                 from AnEn.h(11),
                 from AnEn.cpp(8):
/users/yyang/code/R/library/BH/include/boost/core/noncopyable.hpp(42): error: defaulted default constructor cannot be constexpr because the corresponding implicitly declared default constructor would not be constexpr
        BOOST_CONSTEXPR noncopyable() = default;
                        ^

compilation aborted for AnEn.cpp (code 2)
make: *** [AnEn.o] Error 2
ERROR: compilation failed for package ?.AnEn?
* removing ?.users/yyang/code/R/library/RAnEn?
Warning message:
In install.packages("/users/yyang/code/R/script/AnalogsEnsemble-master/RAnalogs/releases/RAnEn_latest.tar.gz",  :
  installation of package ?.users/yyang/code/R/script/AnalogsEnsemble-master/RAnalogs/releases/RAnEn_latest.tar.gz?.had non-zero exit status

RAnEn Help Documentation

Updated to the new latest version of RAnEn (using R version 6.0) and when I try to find the documentation on the RAnEn package through the normal means (?RAnEn or ??RAnEn), I receive an error. (Specifically:" Error in fetch(key) : lazy-load database '/usr/local/lib/R/3.6/site-library/RAnEn/help/RAnEn.rdb' is corrupt" prints to the screen when ?RAnEn is used and "No results found" when ??RAnEn is written.) Suggestions? Thanks!

Missing cycle time when constructing the posixct time during gribConverter

The gribConverter only considered year, month, and day, but no model cycle time. This is important because then the POSIXct time for model initialized at a different time other than 00 will be wrong.

Maybe this line should be considered changing.

Switch off standard deviation if possible

Provide the features to accept standard deviation from the R interface. @franchg

Add fileSlice for slicing data set

The utility should be able to slice data files based on configuration files.

RAnEn implementation change for non-advanced configuration

The current configuration copies the data which is very inefficient. Change the implementation to pass-by-reference in C++ and avoid copying data in R.

The number of similarities in the result

I noticed that, while by default twice as many as ensemble members should be kept for similarity, the actual result only keeps the same number as ensemble members. This should be fixed

Automatic generate a NEW page for updates

Question: Independent Search file size and IS vs SSE computing time

These are just questions/observations and I am curious about your thoughts. No hurry and no worries if you don't have time to respond.

Independent Search File size question: Under the new RAnEn v.3.2.5, the general file size of the Independent Search file size is around 1.9MB for analogs generated for one day whereas under earlier versions it was around 4MB for analogs generated for one day. (The AnEn IS I am using for comparison was computed under the most current version on 20190129. I think pre 3.2.x but it may be 3.1.x. ). I have compared the files and both seem like they have reasonable analog results (nothing missing, etc). Beyond the "Changelog" page or the "Issues" page, can you let me know if you changed something so that the filesize is smaller? I am just curious.
Computing time: The IS took a little over an hour to generate 365 days using the IS search while it took about 14.75 hours to compute the SSE for 365 days. Does this seem appropriate to you? It seems like the code may be running faster now but I do not have numerical evidence. Just curious if you posted on this or had any thoughts to share.

Thanks!

Just in case you want to see the code used to configure/execute the AnEnIS and AnEnSSE.

# Objective:  Code to generate the analogs for various scenarios/cases. 
#             Generate AnEnIS output, AnEnSSE output, and configuration files for each.  
# Author:         Laura Clemente-Harding ([email protected]) 
# Collaborators:  Guido Cervone, Weiming Hu
# Note! Weiming Hu is the author of the original RAnEn package. 

# Load libraries, source functions file
library(ncdf4); library(RAnEn)
source('~/geolab/projects/AnalogsSpatial/code/SSE_functions.R', echo=TRUE)

# Load ECMWF forecasts, analysis field, coordinates, calcuate WS and WD from U and V components
source('~/geolab/projects/AnalogsSpatial/code/SSE_loadBasicData.R', echo=TRUE)

# Generation Options 

generateAnEnSSE   <- FALSE # If TRUE, generates AnEnSSE
generateAnEnIS    <- TRUE  # If TRUE, generates AnEnIS
currentDate       <- "20190301"
# Save path and directory 
# savePath          <- "~/geolab_storage_V3/data/Analogs/AnEn-SSE/"   # 
savePath          <- "/Volumes/blackeye/geolab_storage_V3/data/Analogs/AnEn-SSE/"
saveDir           <- "operational_TEMP_20190301/"
operationalCase   <- T   # If TRUE, then it activates the operational case in the config generation files below
# operational       <- TRUE  # By default this is FALSE. 


# ANEN PARAMETERS
# Define variables here: 
members.size       <- 21
# Choose variable to be predicted (predictand) 
predictandParam   <- 3  # parameter 1 is WD; param 2 is WS; param 3 is Temperature
stations.ID       <- 1:2499 
weights           <- rep(1, dim(fcst.aligned)[1])     # <- c(1,1,0,0,0) #  "wdir","ws","2T","2DPT","MSLP"
verbosity         <- 3
preserve_mapping  <- TRUE 
extObs            <- FALSE 
test.start <- 730 # once looping, don't need to define this here
# test.end   <- 730 # once looping, don't need to define this here
search.start <- 1
search.end <- 730
AnEnGen.startDate <- test.start
AnEnGen.endDate   <- 1095 # end of the testing period, all the days to generate information for 


xs <- as.numeric(lon)
ys <- as.numeric(lat)
nx <- 51; ny <- 49   # Preset for Italy dataset 

# icounter1 <- 733 
# Generate AnEn output, one day at a time. 
for ( icounter1 in AnEnGen.startDate:AnEnGen.endDate ) {  # Now, 730; Eventually do this through to 1095 for the 3rd year. So this is training on the first year and testing on the second two years 
  print(paste("Generating AnEn for ", icounter1))
   test.start <- icounter1
  # test.end <- test.start# Generate the analogs for one day at a time. 
  
  # A sampling of 500 stations 
  # stations.ID <- stations_ID <- sort(sample(1:2499,500))
   if ( generateAnEnSSE ){
    config                     <- generateConfiguration('extendedSearch')
    config$observation_id      <- predictandParam
    config$forecasts           <- fcst.aligned   # changed    dim(fcst.aligned)=>  5 2499 1095    8
    config$forecast_times      <- fcst.times # NEW   
    config$flts                <- flts.subset# Want this to match with config$search_forecasts  # NEW-ISH bc search_flts is now flts 
                                  # 14Feb - So this no longer needs to be subset, 
                                  #    but we still subset it simply so it doesn't take the program as long (give it less to search through) 
    config$search_observations <- obsv  # search_observations[parameter, stations, time]
                                   # can constrain this to svae time 
    config$observation_times   <- obsv.times
    config$num_members         <- members.size
    config$weights             <- weights
    config$forecast_stations_x <- xs[stations.ID]
    config$forecast_stations_y <- ys[stations.ID]
    config$verbose             <- verbosity
    config$extend_observations <- extObs 
    # Set up test times to be compared
    config$test_times_compare   <- config$forecast_times[test.start] # One single point in time that the comparing starts from? 
    config$search_times_compare <- config$forecast_times[search.start:search.end]  # This means nothing if operational is changed to TRUE. However, operational is FALSE by default. 
    # Specific to SSE 
    config$preserve_search_stations  <- T  # Tells you waht the search stations are that you're looking into 
    config$preserve_similarity       <- T  # Tells you which statiosn were most similar 
    config$num_nearest               <- 8  # This is an option you can vary
    config$max_num_search_stations   <- 10  # Can decrease this if you want. # Can set this to be the same as number of nearest if/when using # nearest.
    # config$max_num_search_stations <- config$num_nearest 
    # config$distance <- 1
    
    # if ( operationalCase == TRUE ){
    #   config$operational  <- operationalCase
    #   # Additional parameters? 
    # }
    # 
    # Validate first before using 
    if ( validateConfiguration(config) == FALSE ){
      stop("Stop Program: Validation Failed ")
    }
    
    # w/ search extension
    AnEn <- generateAnalogs(config)
    
    # Save the AnEn 
    fname.SSE <- print(paste("AnEn_SSE_nn-",config$num_nearest, "_NumEns_", config$num_members,"_train_",search.start,"-",
                             search.end,"_testday_",test.start,"_op",operationalCase,sep=""))
    save(AnEn, file = print(paste(savePath,saveDir,fname.SSE,".Rdata", sep = "")))
    
    if ( generateAnEnSSE && test.start == 731 ){
      save(config, AnEn, file= print(paste(savePath,"config_AnEnSSE_",currentDate,"_day_",test.start,".Rdata", sep="")) )
    }
    rm(AnEn,fname.SSE)
  } # Ends if statement for AnEnSSE 
  
  
  # AnEn IS ( without search space extension )
  if ( generateAnEnIS ){
    config2                     <- generateConfiguration('independentSearch')
    config2$observation_id      <- predictandParam
    config2$forecasts           <- fcst.aligned   # changed    dim(fcst.aligned)=>  5 2499 1095    8
    config2$forecast_times      <- fcst.times # NEW   
    config2$flts                <- flts.subset# Want this to match with config$search_forecasts  # NEW-ISH bc search_flts is now flts 
    # 14Feb - So this no longer needs to be subset, 
    #    but we still subset it simply so it doesn't take the program as long (give it less to search through) 
    config2$search_observations <- obsv  # search_observations[parameter, stations, time]
    # can constrain this to svae time 
    config2$observation_times   <- obsv.times
    config2$num_members         <- members.size
    config2$weights             <- weights
    config2$forecast_stations_x <- xs[stations.ID]
    config2$forecast_stations_y <- ys[stations.ID]
    config2$verbose             <- verbosity
    config2$extend_observations <- extObs 
    # Set up test times to be compared
    config2$test_times_compare   <- config2$forecast_times[test.start]
    config2$search_times_compare <- config2$forecast_times[search.start:search.end]  # This means nothing if operational is changed to TRUE. However, operational is FALSE by default. 
    
    config2$preserve_mapping    <- preserve_mapping
    config2$verbose             <- verbosity
    
    # Validate first before using 
    validateConfiguration(config2)
    # Generate analogs 
    AnEn.ind <- generateAnalogs(config2)
    
    fname.nonSSE <- print(paste("AnEn_IS_NumEns_", config2$num_members,"_train_",search.start,"-",
                                search.end,"_testday_",test.start,"_op",operationalCase, sep=""))
    save(AnEn.ind, file = print(paste(savePath,saveDir,fname.nonSSE,".Rdata", sep = "")))
    
    
    # Save configuration file for whichever AnEn (IS or SSE) was generated 
    
    if ( generateAnEnIS && test.start == 731 ){
      save(config2, AnEn.ind, file = print(paste(savePath,"config2_AnEnIS_",currentDate,"_day_",test.start,".Rdata", sep="")) )
    }
    
    rm(AnEn.ind,fname.nonSSE)
  } # Ends T/F for AnEn IS generation 
   
  
} # End of anen generation calculator

Different results from RAnEn when subseting stations

When IS is used, AnEn on stations should be independent of each other. However, when computing a portion of the stations, the results for these stations are not the same from when computing all stations at once.

For example, the left three columns come from computing all stations, and the right 3 columns are from computing partial stations. This might be caused by dealing with NA values.

> cbind(AnEn.all$similarity[5, 2, 3, order(AnEn.all$similarity[5, 2, 3, , 3]), ], AnEn$similarity[5, 2, 3, order(AnEn$similarity[5, 2, 3, ,3]),])[1:20, ]
          [,1] [,2] [,3]     [,4] [,5] [,6]
 [1,]      NaN    5    1      NaN    5    1
 [2,] 7.212896    5    2      NaN    5    2
 [3,] 2.776645    5    3      NaN    5    3
 [4,]      NaN    5    4      NaN    5    4
 [5,] 3.268617    5    5 3.268617    5    5
 [6,] 1.744200    5    6 1.744200    5    6
 [7,] 4.941840    5    7 4.941840    5    7
 [8,] 4.340009    5    8 4.340009    5    8
 [9,]      NaN    5    9 3.855888    5    9
[10,]      NaN    5   10 5.344926    5   10
[11,] 5.490306    5   11 5.490306    5   11
[12,] 3.994862    5   12 3.994862    5   12
[13,] 3.262656    5   13      NaN    5   13
[14,] 3.588668    5   14      NaN    5   14
[15,]      NaN    5   15      NaN    5   15
[16,]      NaN    5   16      NaN    5   16
[17,] 3.961612    5   17      NaN    5   17
[18,] 3.799170    5   18      NaN    5   18
[19,] 3.386876    5   19      NaN    5   19
[20,] 4.393606    5   20      NaN    5   20

License of the program

This program has the following dependencies:

NetCDF
Boost

CC=tau_cc.sh CXX=tau_cxx.sh cmake -DENABLE_MPI=ON -DCMAKE_PREFIX_PATH=/home/graduate/wuh20/packages/release/ -DBOOST_TYPE=SYSTEM -DCMAKE_BUILD_TYPE=Debug ..
make -j 16

I encountered the following error.

OMP_NUM_THREADS=3 mpirun -np 1 /home/graduate/wuh20/github/AnalogsEnsemble/output/bin/standardDeviationCalculator -v 6 -i /home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/forecasts/201712.nc /home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/forecasts/201801.nc -o ~/exfat-hu/Data/2019_Hu_AnEn-bias-correction/sds/sds-0001.nc --start 0 0 0 0 0 0 0 0 --count 17 100 31 53 17 100 31 53

Parallel Ensemble Forecasts --- Standard Deviation Calculator v 3.2.1
Copyright (c) 2018 Weiming Hu @ GEOlab
Input parameters:
in_files: /home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/forecasts/201712.nc,/home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/forecasts/201801.nc,
out_file: /home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/sds/sds-0001.nc
verbose: 6
config_file: 
start: 0,0,0,0,0,0,0,0,
count: 17,100,31,53,17,100,31,53,
Checking mode ...
Checking file (/home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/sds/sds-0001.nc) ...
Combining forecasts along the time dimension...
Checking mode ...
Checking file (/home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/forecasts/201712.nc) ...
Checking file type (Forecasts) ...
Checking dimension (num_parameters) ...
Checking dimension (num_stations) ...
Checking dimension (num_times) ...
Checking dimension (num_flts) ...
Checking dimension (num_chars) ...
Checking variable (Data) ...
Checking variable (FLTs) ...
Checking variable (Times) ...
Checking variable (ParameterNames) ...
Checking variable (Xs) ...
Checking variable (Ys) ...
Processing partial meta information ...
Reading Parameters from file (/home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/forecasts/201712.nc) ...
Reading dimension (num_parameters) length ...
Checking variable (ParameterCirculars) ...
Checking variable (ParameterWeights) ...
Reading Stations from file (/home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/forecasts/201712.nc) ...
Reading dimension (num_stations) length ...
Spawning 3 processes to read StationNames ...
Broadcasting variables ...
Child rank #0 received from the parent's broadcast ...
Child rank #1 received from the parent's broadcast ...
Child rank #2 received from the parent's broadcast ...
Child rank #0 reading StationNames with start/count ( 0,33 0,50 ) ...
Child rank #2 reading StationNames with start/count ( 66,34 0,50 ) ...
Child rank #1 reading StationNames with start/count ( 33,33 0,50 ) ...
Parent waiting to gather data from processes ...
Rank #0 sending data (1650) back to the parent ...
Rank #2 sending data (1700) back to the parent ...
Rank #1 sending data (1650) back to the parent ...
[sapphire:02637] *** Process received signal ***
[sapphire:02637] Signal: Segmentation fault (11)
[sapphire:02637] Signal code: Address not mapped (1)
[sapphire:02637] Failing at address: (nil)
[sapphire:02637] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x1288f)[0x7f93c43cf88f]
[sapphire:02637] [ 1] mpiAnEnIO(MPI_Gatherv+0x120)[0x56347a6ffc00]
[sapphire:02637] [ 2] mpiAnEnIO(main+0x10ef)[0x56347a63ab1d]
[sapphire:02637] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6)[0x7f93c3fedb96]
[sapphire:02637] [ 4] mpiAnEnIO(_start+0x29)[0x56347a638f09]
[sapphire:02637] *** End of error message ***
Reading Times from file (/home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/forecasts/201712.nc)   ...
Reading dimension (num_times) length ...
Reading FLTs from file (/home/graduate/wuh20/exfat-hu/Data/2019_Hu_AnEn-bias-correction/forecasts/201712.nc) ...
Reading dimension (num_flts) length ...
Combining times ...
...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node sapphire exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Finding NetCDF c++ library that is not in the same directory of NetCDF C library

Improve this file to account for the above situation.

New visualization functions

plot.AnEn.TS

plot.AnEn.map

from AnEn_functions.R in extreamHeat.project.