Giter Club home page Giter Club logo

hubvis's Introduction

hubVis hubVis website

Lifecycle:experimental R-CMD-check Codecov test coverage

The goal of hubVis is to provide plotting methods for hub model outputs, following the hubverse format. The hubverse is a collection of open-source software and data tools, developed by the Consortium of Infectious Disease Modeling Hubs. For more information, please consult the hubDocs website

Installation

You can install the development version of hubVis like so:

remotes::install_github("hubverse-org/hubVis")

Usage

The R package contains currently one function plot_step_ahead_model_output() plotting 50%, 80%, and 95% quantiles intervals, with a specific color per "model_id".

The function can output 2 types of plots:

  • interactive (Plotly object)
  • static (ggplot2 object)
library(hubVis)
library(hubExamples)
head(scenario_outputs)
head(scenario_target_ts)
projection_data <- dplyr::mutate(scenario_outputs,
     target_date = as.Date(origin_date) + (horizon * 7) - 1)

target_data_us <- dplyr::filter(scenario_target_ts, location == "US",
                                date < min(projection_data$target_date) + 21,
                                date > "2020-10-01")
projection_data_us <- dplyr::filter(projection_data,
                                    scenario_id == "A-2021-03-05",
                                    location == "US")
plot_step_ahead_model_output(projection_data_us, target_data_us)

Faceted plots can be created for multiple scenarios, locations, targets, models, etc.

projection_data_us <- dplyr::filter(projection_data,
                                    location == "US")
plot_step_ahead_model_output(projection_data_us, target_data_us, 
                             use_median_as_point = TRUE,
                             facet = "scenario_id", facet_scales = "free_x", 
                             facet_nrow = 2, facet_title = "bottom left")


Code of Conduct

Please note that the hubVis package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Contributing

Interested in contributing back to the open-source Hubverse project? Learn more about how to get involved in the Hubverse Community or how to contribute to the hubVis package.

hubvis's People

Contributors

luciecontamin avatar annakrystalli avatar harryhoch avatar nickreich avatar zkamvar avatar

Stargazers

 avatar Owain  gaunders avatar  avatar

Watchers

 avatar Evan Ray avatar  avatar  avatar

hubvis's Issues

[ORG NAME CHANGE]: Update repo to hubverse-org organisation name

❗ Please do not merge anything into main until the 19th June

  • Create new branch called to-hubverse from latest dev branch (most likely related to the v3 schema PRs). (Make sure to pull first). If none exist, branch off from main.
  • Find and replace Infectious-Disease-Modeling-Hubs with hubverse-org throughout repo.
  • Replace origin remote to point to the new organisation
    git remote set-url origin https://github.com/hubverse-org/<REPO_NAME>.git
    # Check remotes
    git remote -v
    url <- paste0('https://github.com/hubverse-org/', basename(getwd()), '.git')
    usethis::use_git_remote(name = "origin", url, overwrite = TRUE)
    # Check remotes
    usethis::git_remotes()
  • Push and open PR to original branch.

Update authors

Use definition below
Definitions of roles (Author, maintainer, contributor, etc…)

Going forward, ensure that PRs from new contributors include an update to authorship roles (for R packages, in description file)

`plot_step_ahead_model_output` returns error for model outputs that include PMF output type with `chr` `output_type_id`

The example hub used in the hubEnsembles vignette/manuscript includes mean, median, quantile, and PMF output types. The PMF output types have output_type_id = c("large decrease", "decrease", "stable", "increase", "large increase"), which forces the entire output_type_id column to be chr.
Example hub data: https://github.com/Infectious-Disease-Modeling-Hubs/hubEnsembles/tree/software-manuscript/inst/example-data/example-simple-forecast-hub

I tried to use this model output (filtered to only quantile output type) in plot_step_ahead_model_output(), and I received the following error: Error in x - y : non-numeric argument to binary operator. I traced this back to the step in the function where the interval ribbons are set up (I think the issue was because the function is trying to do something with "0.25" instead of 0.25).

Reproducible example

hub_path <- system.file("example-data/example-simple-forecast-hub",
                        package = "hubEnsembles")
model_outputs <- hubUtils::connect_hub(hub_path) |>
  dplyr::collect()
target_data_path <- file.path(hub_path, "target-data", "covid-hospitalizations.csv")
target_data <- read.csv(target_data_path) |>
    dplyr::mutate(time_idx = `as.Date(time_idx))

This code gives the error.

hubVis::plot_step_ahead_model_output(model_output_data = model_outputs |>
                                                  dplyr::filter(location == "US",
                                                                output_type %in% c("quantile"),
                                                                origin_date == "2022-12-12") |>
                                                  dplyr::mutate(target_date =  origin_date + horizon),
                                              truth_data = target_data |>
                                                  dplyr::filter(location == "US",
                                                                time_idx >= "2022-11-01",
                                                                time_idx <= "2023-03-01"),
                                              facet = "model_id", 
                                              facet_nrow = 1, 
                                              interactive = FALSE,
                                              intervals = 0.5,
                                              one_color = "black",
                                              pal_color = NULL, 
                                              show_legend = FALSE, 
                                              use_median_as_point = TRUE,)

Adding dplyr::mutate(output_type_id = as.double(output_type_id)) solves the problem.

hubVis::plot_step_ahead_model_output(model_output_data = model_outputs |>
                                                  dplyr::filter(location == "US",
                                                                output_type %in% c("quantile"),
                                                                origin_date == "2022-12-12") |>
                                                  dplyr::mutate(target_date =  origin_date + horizon, 
                                                       output_type_id = as.double(output_type_id)),
                                              truth_data = target_data |>
                                                  dplyr::filter(location == "US",
                                                                time_idx >= "2022-11-01",
                                                                time_idx <= "2023-03-01"),
                                              facet = "model_id", 
                                              facet_nrow = 1, 
                                              interactive = FALSE,
                                              intervals = 0.5,
                                              one_color = "black",
                                              pal_color = NULL, 
                                              show_legend = FALSE, 
                                              use_median_as_point = TRUE,)

Input data - Date information

From Issue #1

Arguments specifying input data
forecast_data required data.frame with forecasts in the hub_mdl_out_df format. Noting that maybe we want to start by supporting only certain output types (e.g. quantiles, means, medians, maybe samples once we implement
All forecasts in forecast_data will be plotted, i.e. all filtering needs to happen outside this function. We note per validation specified below that this data.frame must have either a target_date column of both of a origin_date and horizon column.

Currently, the input data is required to have a target_date column, it would be interesting the add the capacity to have input file with origin_date and horizon , with the associated formula and have the capacity to calculate the target_date column if missing.

create plot_step_ahead_forecasts()

We would like to have a plot_step_ahead_forecasts() function as part of the hubUtils package. Ideally, this would have similar functionality to the "original" covidHubUtils::plot_forecasts() function which can be found here. The scope of this function is limited to plotting what we are calling step-ahead forecasts, that is forecasts for a single "target variable" at different horizons in the future.

Based on the original function, and integrating knowledge of the new hubverse toolkit, here is a proposal for what the new function would do and look like. This depends in some ways on hubverse-org/hubUtils#33 which relates to standardized structures for collections of predictions

General functionality

This function will plot forecasts and optional truth data for only one selected step-ahead target variable (which might be represented by one or more task_id variables). Optionally, faceted plots could be created for multiple models, locations and forecast dates are supported with specified facet formula.

Input parameters

(I've copied some text from the original plot_forecasts() function)

Arguments specifying input data

  • forecast_data required data.frame with forecasts in the hub_mdl_out_df format. Noting that maybe we want to start by supporting only certain output types (e.g. quantiles, means, medians, maybe samples once we implement hubverse-org/hubData#8). All forecasts in forecast_data will be plotted, i.e. all filtering needs to happen outside this function. We note per validation specified below that this data.frame must have either a target_date column of both of a origin_date and horizon column.
  • truth_data optional data.frame from with required columns as follows:
    • time_idx
    • value
    • [collection of columns that are task_id variables, but not the ones that define the target date]

Arguments about plotting details

  • facet interpretable facet option for ggplot. Function will error if multiple values of some task_id variables are passed in without the corresponding column in the facet formula.
  • facet_scales argument for scales in [ggplot2::facet_wrap]. Default to "fixed".
  • facet_nrow number of rows for facetting; optional.
  • facet_ncol number of columns for facetting; optional.
  • intervals values indicating which central prediction interval levels to plot. NULL means only plotting point forecasts. If not provided, it will default to c(.5, .8, .95). When plotting 6 models or more, the plot will be reduced to show .95 interval only.
  • use_median_as_point logical for using median quantiles as point forecasts in plot. Default to FALSE.
  • plot_truth logical for showing truth data in plot. Default to TRUE. Data used in the plot is either truth_data or data loaded from truth_source.
  • plot logical for showing the plot. Default to TRUE.
  • fill_by_model logical for specifying colors in plot. If TRUE, separate colors will be used for each model. If FALSE, only blues will be used for all models. Default to FALSE.
  • fill_transparency numeric value used to set transparency of intervals. 0 means fully transparent, 1 means opaque.
  • top_layer character vector, where the first element indicates the top layer of the resulting plot. Possible options are "forecast" and "truth".
  • title optional text for the title of the plot. If left as "default", the title will be automatically generated. If "none", no title will be plotted.
  • subtitle optional text for the subtitle of the plot. If left as "default", the subtitle will be automatically generated. If "none", no subtitle will be plotted.

Input validations

  • at least one of target_metadata entry for the hub must have is_step_ahead set to TRUE.
  • hub target ids include "temporal ID variables" which are either (a) target_date or (b) origin_date and horizon
  • truth data has
    • time_idx
    • value
    • if specified by hub, columns for all task_id variables that are not target_date, origin_date, horizon
  • forecast data is in the hub_mdl_out_df format

interactive vis does not respect the `group` argument

Working with data from hubExamples, the following code produces a static plot that looks as expected:

library(dplyr)
library(hubExamples)
library(hubVis)

plot_step_ahead_model_output(
    model_output_data = forecast_outputs |> filter(output_type %in% c("quantile", "median")),
    target_data = forecast_target_ts |>
        filter(location %in% c("25", "48"),
               date >= "2022-10-01", date <= "2023-04-01"),
    use_median_as_point = TRUE,
    x_col_name = "target_end_date",
    intervals = c(0.5, 0.8, 0.9),
    facet = "location",
    group = "reference_date",
   interactive = FALSE
)

image

However, if we set interactive = TRUE, forecasts from different reference dates are connected:

plot_step_ahead_model_output(
    model_output_data = forecast_outputs |> filter(output_type %in% c("quantile", "median")),
    target_data = forecast_target_ts |>
        filter(location %in% c("25", "48"),
               date >= "2022-10-01", date <= "2023-04-01"),
    use_median_as_point = TRUE,
    x_col_name = "target_end_date",
    intervals = c(0.5, 0.8, 0.9),
    facet = "location",
    group = "reference_date",
   interactive = TRUE
)
Screenshot 2024-04-25 at 4 43 12 PM

Legend for `model_id` facet plot

          Found this generally very clear and easy to follow. 

I find the attached plot (faceted by model) a bit confusing, with the legend only showing the one model (the first panel). In the interactive plotly, if you then click to not display the model, only the first panel projection disappears (pictured). I would propose to have the default of this plot to not have a legend, but not sure exactly how to set this up practically (tricky with the option default being true for other plots). Another option might be to allow each of the panels to still be a different colour so the legend corresponds to something meaningful?

Screen Shot 2023-10-05 at 12 23 19 PM

Originally posted by @saraloo in #2 (comment)

In this case the legend should either be:

  • if fill_by = "model_id": each model-id with a different color and with each one a legend item. For plotly, with the possibility to show or not show each one of them
  • if fill_by is not set to "model_id": all model-id should have the same color and the legend will refer to the fill_by variable. For plotly, with the possibility to show or not show all of them at once.

Documentation

It would be nice to have more documentation on the package:

  • examples with the function
  • more complete README with small example

Unit testing

It would be nice to add tests on the package and if so, the test coverage information

Upgrade Docs to hubStyle

Update package config

  • Create enhancement/hubstyle branch
  • Run hubDevs::use_hubdev_pkgdown(add_logo = TRUE) to update pkgdown
  • Add logo to your README
  • Append standard footer to README with hubDevs::append_hubdev_readme_footer(). Render
  • Run hubDevs::use_hubdev_community() to update community docs
  • Run hubDevs:::hubdev_ignore() to ignore std files
  • Check authors
  • Add hubverse & r-package topics to repos

Set up Netlify PR Previews

  • Log into Hubverse Netlify account and drag in pkg docs direcory to creat new site.
  • Edit site name with [pkg_name]-pr-previews and copy NETLIFY_SITE_ID
  • Create a NETLIFY_AUTH_TOKEN if required.
  • Add NETLIFY_AUTH_TOKEN & NETLIFY_SITE_ID to repository secrets

Push to Github, open a PR and check Preview 🎉

Error message unclear when number of facets to high.

The current error message is not very informative if the number of facet requested with the facet_nrow and facet_ncol is too high and causes issues:

Error in (function (obj, domains)  : 
  'list' object cannot be coerced to type 'double'
In addition: Warning messages:
1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
  longer argument not a multiple of length of shorter
2: In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
  data length is not a multiple of split variable

packagedown site

Would be nice to make a packagedown site for this package, with examples and documentation

y-axis label

On the facet plots, the y-axis label is not consistent across all subplots

Change action message in numeric coercion warning

I was running through an example and I got a confusing warning message:

! `output_type_id` column must be a numeric. Class applied by default.

I wasn't sure what "Class applied by default" meant in this context. Should it be "Converting to numeric?"

hubVis/R/validate_format.R

Lines 142 to 146 in 7b2bd6c

model_output_data$output_type_id <-
as.numeric(model_output_data$output_type_id)
cli::cli_warn(c("!" = "{.arg output_type_id} column must be a numeric.
Class applied by default."))
}

Point and interval forecasts are not clearly separated by forecast date when plotted

When forecasts made on multiple forecasts dates are plotted at the same time, all of the forecasts are connected even though they should not be and this is not desirable behavior. See the attached images at the bottom for the desired behavior (achievable using covidHubUtils::plot_forecasts()) vs the current behavior demonstrated by hubVis::plot_step_ahead_model_output.

This behavior is particularly noticeable when there is a large gap between forecast dates, like during the off season for flu forecasting from around June to the end of September, when there should be no plotted forecasts.

A simple, reproducible example is given below using the data in the attached zip file:

library(tidyverse)
library(lubridate)
library(hubUtils)
library(hubVis)

flu_truth_all <- readr::read_rds("flu_truth_all.rds")
flu_forecasts_baseline <- readr::read_rds("flu_baseline_hub.rds")
flu_dates_21_22 <- as.Date("2022-01-24") + weeks(0:21)
flu_dates_22_23 <- as.Date("2022-10-17") + weeks(0:30)
all_flu_dates <- c(flu_dates_21_22, flu_dates_22_23)

select_dates <- all_flu_dates[seq(1, 53, 4)]

baseline_ca <- flu_forecasts_baseline |> 
  dplyr::filter(location == "06", forecast_date %in% select_dates)

truth_ca <- flu_truth_all |> 
  dplyr::filter(location == "06")
  
plot_step_ahead_model_output(
  baseline_ca,
  truth_ca,
  use_median_as_point=TRUE,
  show_plot=TRUE,
  x_col_name="target_end_date",
  x_truth_col_name = "target_end_date",
  show_legend=FALSE,
  facet="model_id",
  facet_nrow=5,
  interactive=FALSE,
  fill_transparency = 0.45,
  intervals=c(0.5, 0.8, 0.95),
  title="Weekly Incident Hospitalizations for Influenza in California"
)

ca_forecasts_distinct
ca_forecasts_connected

flu_baseline_truth.zip

add ability to plot sample output_type data

Currently, a limitation of plot_step_ahead_model_data() is that it only accepts median and quantile output_types.

We should support the plotting of sample output_type data by inferring quantiles from it. Currently, a user would have to manually transform the data to quantiles. Possible approach: add sample output_type to the list of valid types and make the appropriate transformation to quantile output_type data.

Acceptance criteria:

  • a user passes sample output_type data to plot_step_ahead_model_output() and gets a plot returned, with intervals if specified

Interactive plotly - hovertext

The hovertext of the plot should follow the same standard for the numerics (for example 2 digits) and have a space after ":"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.