flusightnetwork / cdc-flusight-ensemble Goto Github PK

Guidelines and forecasts for a collaborative U.S. influenza forecasting project.

R 2.98% JavaScript 2.21% Shell 0.27% HTML 73.85% Makefile 0.08% TeX 3.44% Python 0.20% Jupyter Notebook 16.26% Vue 0.59% CSS 0.10%

cdc-flusight-ensemble's People

Contributors

Stargazers

Watchers

cdc-flusight-ensemble's Issues

make metadata_template into yml

modify make-cv-ensemble-forecast-files script

needs to be able to take in multiple weight files
change over to using standard model_ids
change to use all weeks, not just the common subset, since will only be using complete models

investigate 2016/17 ensemble outputs

Need to compare the 2016/17 ensemble outputs to all submitted models to see how they would have compared. Could also compare to an unweighted average of just the models submitted by teams that are participating in the ensemble project (i.e. Delphi, CU, KoT, and LANL models)

make leave-one-season-out cross-validated weights file

Each ensemble specification will have its own csv file with weights in it. Each file should have the following three columns:

season: the left-out season for which the weights apply
component_model_id: the folder name that has the forecasts in it
weight: the weight
And the file may also have the following columns, depending on how detailed the ensemble weights are stratified:
target: these should be in the same format as expected in an entry file
location: again, same format as entry file

If the file doesn't have one or both of these columns, then we assume them to be the same across all targets or all locations.

We can impose the check that, for a fixed target (t) and location (l) (if specified) $\sum_{i=i}^M weights_{i, t, l} = 1$.

Currently, the script that turns weights into CV ensemble entries needs the above format for the weights file. See, as an example, this file that has a functioning set of example weights.

once all models are in, turn off automatic re-scoring of all ensemble/component models.

review ensemble file submissions to ensure that they have bins for [53,54) in 2014/2015

If they don't have these bins, we need to fix this. Relevant do discussion in #14

Create a list of submission files with errors to avoid errors in visualizer build

Error in files don't get detected untill read, which happens while calculating scores. The code right now skips adding rows for those files in the scores.csv. A list of files with mishappens will make it easy for viz to skip through those, as well as help in debugging errors since we are returning 0 exit code.

why are basis regression PIs not covering the point estimates?

Delphi Uniform model isn't showing up in the visualizer

I assume a problem with the visualizer, but not sure.

remove individual model columns from cv-ensemble files

reorganize file and folder structure

Need to create three sets of folders for forecasts

component model forecasts (to be included in visualization, scores)
CV ensemble model forecasts (to be included in scores)
Real-time ensemble model forecasts (to be included in visualization)

Also, this will require changing file-paths in other scripts (visualizations, score calculations, etc...) that are dependent on these files.

minor discrepancy in spot-checked distributions

I'm getting very small differences in my spot-checks of ensemble distributions, particularly in 1-4 week ahead forecasts with TRW, TW, TTW [see example image below)

Very possibly a problem on my end, which I'm checking, but also wanted to ask whether any rounding happens in creating the new distribution?

Needs more written description of how to interpret

What is "Weighted ILI (%)"?
What does a probability of 0.3 mean? Any individual has a 30% chance of getting the flu? A 0.3% chance?
What does it mean that the mean log score for 3 wk is -7.93?
Perhaps a blub or some hover-over text with overall information would help. I clicked through a few different github pages and am figuring out it's some sort of competition hosted by the CDC. Is this one team's effort? A visualization of all of them put together into an ensemble model?

minor fixes to new scores table

remove disclaimer about "final data" at bottom
add explicit "NA" for missing scores
@brookslogan why are scores for Delphi Uniform? related to #24 ?
add sorting feature on table
For this table, why does only week 1 have a bold "first place"?
Once I am at the scores tab, I am unable to re-navigate back to another tab.

modify weight calculations per #41

Potential small bug in travis scoring

I've finished checked the scores generated in travis against scores calculated using the FluSight R package. We're down to 110 discrepancies of greater than 10^-12, all related to peak week in the 2014/15 season. The errors occur in Regions 2, 3, 5, and 7, as well as US National, all of which have week 52 as the peak week.

I looked at one particular error in detail - the target-based model forecasts for Epiweek 53. Specific target is HHS Region 3 peak week. Correct score should be -0.627, summing probabilities of weeks 51, 52, and 53. Travis is assigning -1.11, apparently from summing probabilities of weeks 51, 52, and 1.

wrap description text in metadata files so that when people follow the links to the files they are easily readable

scores tables in app are empty

write ensemble weights generation script

Will create one file for each ensemble weighting scheme. prospective weights will be listed as season "2017/2018"

To standardize, LANL entry file names should have _ instead of - in names.

in 2013/2014, there is no "observed data" until EW52

CUBMA model missing files

Some files from the CUBMA model are missing. When running the validate_predictions file, I got an error because the CUBMA/EW51-2010-CUBMA.csv file doesn't exist. there may be others that don't exist as well, this was just the first one it ran across.

link to README file on main visualization page

Could we have the "FluSight Network" and "CDC FluSight Network" text in the top left of the visualization homepage link directly to the README file, so folks who find this page could understand the context?

check updated team and model names/abbreviations

We updated the metadata template so the information was more efficiently captured. @brookslogan @tkcy can you check your metadata files to make sure that I didn't screw anything up in the transition?

SARIMA2 label not showing up in visualizer

CU files not showing up right in visualization

SARIMA Models are missing EW20 files for 2015.

These folders should have 234 files, only have 233.

add log scores for the seasonal targets to evaluation table

I.e. onset week, peak week, peak incidence.

Fix visualization of week targets ranges

Point predictions are outside the range. Maybe this is because the visualizer prefers point prediction written in the Point row which is not matching with the distributions in the csvs.

DELPHI API forecasts are not loading on the site

@lepisma says "Some error in (at least one) metadata file in new PR." Not sure what these are, If you specify exactly, @brookslogan might be able to fix and resubmit.

fix broken links

links to model description files (e.g. from scores tables) still point to old folder locations
link to "source" at bottom of page points to the app, not the github repo

need reproducible load of epiforecast package in forecast validation script

cdc-flusight-ensemble/scripts/validate_predictions.R

Line 8 in 891cb03

devtools::load_all("~/files/epiforecast-R/epiforecast")

@brookslogan to have this script be more portable, can we move this code back to loading the package directly?

update import code to handle new metadataformat

I updated the metadata files to have these new fields:

team_name (max 10 chars)
model_name (max 50 chars?)
model_abbr (max 15 chars)

These fields should be used to populate the visualization legend.

Add note that scores on visualizer are single bin scores

Or fix the issue with using generated scores.csv in visualizer early

unknown issue with new CU files

Travis is saying # Some error in CU-EAKFC_SEIRS 2011-19 for HHS Region 4, Season onset. Is there some obvious error with this entry file?

FWIW - the (long) transcripts from the site builds (including errors) can be seen here:
https://travis-ci.org/FluSightNetwork/cdc-flusight-ensemble

Funny point forecasts for peak week

The point forecast for US National peak week doesn't match up at all with the underlying distribution. The point forecast is for week 17, but the probabilistic values put it in the week 51-7 range. Could the code to generate the point forecast not be dealing with the New Year's transition correctly?

Wrong distributions for weekly targets

Region 6 week 46

create "complete-models" file with columns for model-id, team_name, and model-file-path

automate collaborative weights generation script

add button to remove CI entirely

Have "90%", "50%" and "none" as options.

KCDE model missing all EW20 files.

Currently the ReichLab-KCDE folder has 227 files, should be 234. Doesn't have any EW20 files.

write script to turn weights data.frame and component model forecasts into CV ensemble forecasts

decide which EW files to include in scoring and weighting calculations

Previously, in #14 , we had decided that files from EW40 of year k through EW20 of year k+1 would be submitted. However, @tkcy brought up the point that the challenge does not run for those weeks this year, so we are training on weeks that are not in the competition. A question for @craigjmcgowan : what is the "EW" label for the first and last files that will be submitted for the 2017/2018 season?

flusightnetwork / cdc-flusight-ensemble Goto Github PK

cdc-flusight-ensemble's People

Contributors

Stargazers

Watchers

Forkers

cdc-flusight-ensemble's Issues

Recommend Projects

Recommend Topics

Recommend Org