Giter Club home page Giter Club logo

forecasts-ncov's Introduction

Logo

This repository is archived and contains the content used to build the documentation and splash page found in nextstrain.org. This content can now be found here.

License and copyright

Copyright 2014-2018 Trevor Bedford and Richard Neher.

Source code to Nextstrain is made available under the terms of the GNU Affero General Public License (AGPL). Nextstrain is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

forecasts-ncov's People

Contributors

corneliusroemer avatar dependabot[bot] avatar huddlej avatar jameshadfield avatar joverlee521 avatar marlinfiggins avatar trvrb avatar tsibley avatar victorlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

forecasts-ncov's Issues

Requested viz updates

Internal review from Slack:

  • start plots 10 days after first estimate from model to avoid boundary issues where Rt values are artifactual (viz-updates branch)
  • cases estimates are off for Netherlands (resolved in #11)
  • smoothing of Rt needs to be updated to match notebooks (resolved in #13)
  • omit pivot category in Growth Advantage plots and update panel abstract to insert pivot category (viz-updates branch)
  • display frequencies y-axis as percentages, e.g. "10%", "20%" instead of "0.1", "0.2" (viz-updates branch)
  • use sans serif Helvetica font for static plots (viz-updates branch)
  • change ordering of plots (should match the order of static plots in nextstrain.org) (viz-updates branch)
  • change Rt point estimates to one or two significant digits (viz-updates branch)

Forecase Automation: Provision counts files

(Original issue copied over for public view)

Here's an outline of my plan:

  • Create a new nextstrain/counts repo for this work
  • Port over the Python script from John's PR in blab/rt-from-frequencey-dynamics
    • Remove date, country, clade cutoffs to get full counts (cutoffs would be added to the modeling scripts)
    • Separate case counts and clade counts scripts. (Side note: there is an API for the CDC state case counts data that allows us to select and filter data before download, e.g. we can select specific columns and filter out 0 counts)
 curl 'https://data.cdc.gov/resource/9mfq-cb36.csv?$select=submission_date,state,new_case&$where=new_case>0
  • Set up GitHub actions for
    • scheduled daily case counts
    • ncov-ingest triggered GISAID clade counts
    • ncov-ingest triggered Open clade counts
  • Update ncov-ingest to trigger clade counts actions once updated metadata.tsv.gz has been uploaded to S3

A couple questions I have:

  1. Should these full count files continue to exclude rows with 0 cases/sequences? In past experience, it's better to be explicit about 0 counts to differentiate 0 vs NA.
  2. Do we want Slack notifications for the automated count updates? (I think yes since we don't have a good monitoring system set up yet.)
  3. If yes to Slack notifications, would #forecasting-automation be the appropriate channel for update messages?

Add new Nextstrain clade definitions

Context

This is dependent on designation of new clades and the release of a new Nextclade datasets that includes the designations.

Once the ncov-ingest workflow has been run with the new Nextclade dataset, the new clades will be available in the metadata TSV on S3 and will be summarized in the clade counts. If they pass our data thresholds, then they will be included in the model results. Without the clade definition config, the new Nextstrain clades will appear as the default dark grey color in the viz.

Solution

Add the new Nextstrain clades to the config file so that they are labeled and colored appropriately in the forecasts viz.

Counts update doesn't invalidate CloudFront

Current Behavior

When update-ncov-gisaid-clade-counts.yaml or update-ncov-open-clade-counts.yaml are run, they run:

S3_DST=s3://nextstrain-data/files/workflows/forecasts-ncov/gisaid
./ingest/bin/upload-to-s3 global_clade_counts.tsv "$S3_DST"/nextstrain_clades/global.tsv.gz

to push updated counts file to S3 bucket nextstrain-data. However, despite the existence of cloudfront-invalidate script, I'm pretty sure the invalidation is not called as part of the GitHub Action workflow.

I discovered this today when I was trying to update the forecasts to include clade 23B. This was in the available metadata, but just running the GitHub Action (or walking through the commands called by the GitHub Action), resulted in a push to S3, but not an invalidation. I just fixed today's issue with a manual invalidation from the AWS Console.

Expected behavior

When files are pushed to the S3 bucket nextstrain-data, these files should be invalidated on the CloudFront domain fronting data.nextstrain.org.

Country include description out of sync?

Current Behavior

The website currently states for clade frequencies:

Only locations with more than 100 sequences from samples collected in the previous 150 days are included.

image

We show the following countries:

  • Australia
  • Belgium
  • Canada
  • China
  • Denmark
  • Finland
  • France
  • Germany
  • Iceland
  • Ireland
  • Italy
  • Japan
  • Netherlands
  • Singapore
  • South Korea
  • Spain
  • Switzerland
  • Sweden
  • USA
  • UK

This doesn't seem to be correct, or at least missing important context, as when I look for countries with more than 100 sequences with collection date <150 days ago on covSpectrum (https://cov-spectrum.org/explore/World/AllSamples/from%3D2023-07-02%26to%3D2023-11-22/variants/international-comparison?&) I get the following countries:

Country Total Variant Sequences First seq. found at Last seq. found at
United States 99114 2023-26 2023-47
Canada 33981 2023-26 2023-46
United Kingdom 22897 2023-26 2023-46
Japan 22771 2023-26 2023-45
South Korea 18858 2023-26 2023-45
France 17394 2023-26 2023-46
Spain 14246 2023-26 2023-46
China 13271 2023-26 2023-46
Australia 7386 2023-26 2023-46
Sweden 6758 2023-26 2023-47
Italy 5333 2023-26 2023-47
Denmark 4696 2023-27 2023-46
Singapore 4517 2023-26 2023-44
Germany 3514 2023-26 2023-46
Netherlands 3139 2023-26 2023-46
Belgium 3077 2023-26 2023-47
Brazil 2781 2023-26 2023-45
New Zealand 2668 2023-26 2023-43
Israel 2617 2023-26 2023-45
Greece 2469 2023-27 2023-40
Ireland 2343 2023-26 2023-47
Russia 1963 2023-26 2023-44
Switzerland 1916 2023-27 2023-46
Finland 1668 2023-26 2023-45
Austria 1411 2023-27 2023-46
Peru 1254 2023-26 2023-43
Luxembourg 1213 2023-27 2023-43
Portugal 1198 2023-27 2023-45
Mexico 1074 2023-26 2023-42
Croatia 858 2023-27 2023-43
Chile 787 2023-27 2023-43
Thailand 773 2023-26 2023-43
Slovenia 752 2023-26 2023-42
Iceland 676 2023-27 2023-46
Colombia 653 2023-26 2023-43
Ukraine 652 2023-27 2023-44
Taiwan 581 2023-26 2023-45
South Africa 493 2023-27 2023-41
Turkey 465 2023-28 2023-40
Poland 459 2023-28 2023-45
Norway 441 2023-26 2023-44
Romania 364 2023-27 2023-40
Argentina 359 2023-26 2023-38
Malaysia 359 2023-26 2023-43
Costa Rica 341 2023-26 2023-43
Guatemala 321 2023-27 2023-40
India 285 2023-26 2023-44
Georgia 272 2023-27 2023-40
Mauritius 270 2023-27 2023-44
Bulgaria 254 2023-27 2023-43
Dominican Republic 200 2023-27 2023-35

Expected behavior

Brazil | 2781 | 2023-26 | 2023-45
New Zealand | 2668 | 2023-26 | 2023-43
Israel | 2617 | 2023-26 | 2023-45
Greece | 2469 | 2023-27 | 2023-40
Russia | 1963 | 2023-26 | 2023-44
Austria | 1411 | 2023-27 | 2023-46
Peru | 1254 | 2023-26 | 2023-43
Luxembourg | 1213 | 2023-27 | 2023-43
Portugal | 1198 | 2023-27 | 2023-45
Mexico | 1074 | 2023-26 | 2023-42
Croatia | 858 | 2023-27 | 2023-43
Chile | 787 | 2023-27 | 2023-43
Thailand | 773 | 2023-26 | 2023-43
Slovenia | 752 | 2023-26 | 2023-42
Colombia | 653 | 2023-26 | 2023-43
Ukraine | 652 | 2023-27 | 2023-44
Taiwan | 581 | 2023-26 | 2023-45
South Africa | 493 | 2023-27 | 2023-41
Turkey | 465 | 2023-28 | 2023-40
Poland | 459 | 2023-28 | 2023-45
Norway | 441 | 2023-26 | 2023-44
Romania | 364 | 2023-27 | 2023-40
Argentina | 359 | 2023-26 | 2023-38
Malaysia | 359 | 2023-26 | 2023-43
Costa Rica | 341 | 2023-26 | 2023-43
Guatemala | 321 | 2023-27 | 2023-40
India | 285 | 2023-26 | 2023-44
Georgia | 272 | 2023-27 | 2023-40
Mauritius | 270 | 2023-27 | 2023-44
Bulgaria | 254 | 2023-27 | 2023-43
Dominican Republic | 200 | 2023-27 | 2023-35

Notably, we include Iceland with only 700 sequences but exclude Brazil with 2500

Allow force include of Pango lineages that might not meet threshold

Context

Requested by @trvrb in today's forecasts-automation meeting :

I’d also like to add functionality to ensure specific Pango lineages are included even if they don’t meet count threshold (important for JN.1 at the moment)

Possible solution

We can use the force-include-clades config for scripts/prepare-data.py to prevent lineages from being added to other.

We will have to update scripts/collapse-lineage-counts.py with a new option to force include lineages and prevent them from being collapsed into their parent lineages.

Flag potential lineages for new Nextstrain clades

Context

Requested by @trvrb in today's forecasting-automation meeting:

Separately, I’m still running Mathematica version of fitting linear regression to logit-transformed frequencies to demarcate our >0.05 per day growth in frequency threshold

  • Could we write a small script to scan forecast-ncov results for this threshold? It really should be clade-in-question vs “other” however.
  • Should be able to convert relative growth advantage to per-day growth rate (site: delta in models)
  • Multiply Pango lineage frequency by relative fitness will give mean fitness
  • Per day relative fitness is in the model already

Reduce disk write of 20+GB metadata file by filtering on the fly

Context

Ingest runs quite slowly partially because it involves writing around 20GB to disk rather than streaming directly into tsv-filter here:

rule fetch_metadata:
output:
metadata = temp("data/{data_provenance}/metadata.tsv")
params:
s3_src = lambda w: config[w.data_provenance]["s3_metadata"]
benchmark:
"benchmarks/{data_provenance}/fetch_metadata.txt"
shell:
"""
./vendored/download-from-s3 {params.s3_src:q} {output.metadata}
"""
rule subset_metadata:
input:
metadata = "data/{data_provenance}/metadata.tsv"
output:
subset_metadata = "data/{data_provenance}/subset_metadata.tsv"
params:
subset_columns = lambda w: ",".join(config[w.data_provenance]["subset_columns"])
benchmark:
"benchmarks/{data_provenance}/subset_metadata.txt"
shell:
"""
tsv-select -H \
-f {params.subset_columns:q} \
{input.metadata} > {output.subset_metadata}
"""

The two rules could be turned into one and do the filtering on the fly.

TypeError: Object of type ArrayImpl is not JSON serializable

Current Behavior

When executing the workflow using my ambient nextstrain conda environment, I get the following error:

$ nextstrain build --ambient . --configfile config/config.yaml --config data_provenances=gisaid variant_classification=nextstrain_clades geo_resolutions=global
...
# Relevant snakemake rule:
python -u ./scripts/run-mlr-model.py             --config config/mlr-config.yaml             --seq-path data/gisaid/nextstrain_clades/global/prepared_seq_counts.tsv             --export-path results/gisaid/nextstrain_clades/global/mlr             --pivot '22B (Omicron)'             --data-name 2023-05-03 2>&1 | tee logs/gisaid/nextstrain_clades/global/mlr/2023-05-03.txt

# Fails with:
sample: 100%|██████████| 400/400 [00:31<00:00, 12.75it/s, 511 steps of size 7.34e-03. acc. prob=0.94] 
Traceback (most recent call last):
  File "/Users/corneliusromer/code/forecasts-ncov/./scripts/run-mlr-model.py", line 272, in <module>
    multi_posterior = fit_models(
  File "/Users/corneliusromer/code/forecasts-ncov/./scripts/run-mlr-model.py", line 151, in fit_models
    posterior.save_posterior(f"{path}/models/{location}.json")
  File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/evofr/posterior/posterior_handler.py", line 37, in save_posterior
    json.dump(self.samples, file, cls=EvofrEncoder)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/evofr/posterior/posterior_helpers.py", line 163, in default
    return json.JSONEncoder.default(self, obj)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ArrayImpl is not JSON serializable

Expected behavior

No error

How to reproduce

Steps to reproduce the current behavior:

  1. Execute nextstrain build . --configfile config/config.yaml --config data_provenances=gisaid variant_classification=nextstrain_clades geo_resolutions=global
  2. See error

Versions

macOS 13.3.1 (ARM)

Full env details
$ nextstrain version --verbose
nextstrain.cli 6.2.1

Python
  /opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/bin/python3.10
  3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:31:57) [Clang 14.0.6 ]

Runners
  ambient (default)
    augur 21.1.0
    auspice 2.45.1
$ pip list
Package                       Version
----------------------------- ---------
absl-py                       1.4.0
aiobotocore                   2.4.2
aioeasywebdav                 2.4.0
aiohttp                       3.8.4
aioitertools                  0.11.0
aiosignal                     1.3.1
amply                         0.1.5
appdirs                       1.4.4
appnope                       0.1.3
asttokens                     2.2.1
async-timeout                 4.0.2
attmap                        0.13.2
attrs                         22.2.0
autopep8                      2.0.2
backcall                      0.2.0
backports.functools-lru-cache 1.6.4
backrefs                      5.2
bcbio-gff                     0.6.9
bcrypt                        3.2.2
biopython                     1.80
black                         23.3.0
blackjax                      0.9.6
boltons                       23.0.0
boto3                         1.24.59
botocore                      1.27.59
bracex                        2.2.1
brotlipy                      0.7.0
bx-python                     0.9.0
cachetools                    5.3.0
certifi                       2022.12.7
cffi                          1.15.1
charset-normalizer            2.1.1
click                         8.1.3
colorama                      0.4.6
comm                          0.1.3
commonmark                    0.9.1
conda                         23.3.1
conda-package-handling        2.0.2
conda_package_streaming       0.7.0
ConfigArgParse                1.5.3
connection-pool               0.0.3
constellations                0.1.10
contourpy                     1.0.7
crc32c                        2.3.post0
cryptography                  39.0.0
cvxopt                        1.3.0
cycler                        0.11.0
dataclasses                   0.8
datrie                        0.8.2
debugpy                       1.6.7
decorator                     5.1.1
deepdiff                      6.3.0
defusedxml                    0.7.1
distlib                       0.3.6
docutils                      0.19
dpath                         2.1.5
dropbox                       11.36.0
entrypoints                   0.4
epiweeks                      2.2.0
evofr                         0.1.18
exceptiongroup                1.1.1
executing                     1.2.0
fasteners                     0.17.3
fastjsonschema                2.16.3
fastprogress                  1.0.3
filechunkio                   1.8
filelock                      3.12.0
fonttools                     4.39.3
frozenlist                    1.3.3
fsspec                        2023.4.0
ftputil                       5.0.4
future                        0.18.3
gitdb                         4.0.10
GitPython                     3.1.31
google-api-core               2.10.0
google-api-python-client      2.86.0
google-auth                   2.17.3
google-auth-httplib2          0.1.0
google-cloud-core             2.3.2
google-cloud-storage          2.8.0
google-crc32c                 1.1.2
google-resumable-media        2.5.0
googleapis-common-protos      1.57.0
grpcio                        1.46.3
httplib2                      0.22.0
humanfriendly                 10.0
idna                          3.4
importlib-metadata            6.6.0
importlib-resources           5.12.0
iniconfig                     2.0.0
ipdb                          0.13.13
ipykernel                     6.22.0
ipython                       8.7.0
isal                          1.1.0
isodate                       0.6.1
isort                         5.12.0
jax                           0.4.1
jaxlib                        0.3.22
jaxopt                        0.5.5
jedi                          0.18.2
Jinja2                        3.1.2
jmespath                      1.0.1
joblib                        1.2.0
jsonpatch                     1.32
jsonpointer                   2.0
jsonschema                    3.2.0
jupyter_client                8.2.0
jupyter_core                  5.3.0
kiwisolver                    1.4.4
libcst                        0.4.9
libmambapy                    1.2.0
llist                         0.7.1
logmuse                       0.2.6
mamba                         1.2.0
markdown-it-py                2.2.0
MarkupSafe                    2.1.2
matplotlib                    3.7.1
matplotlib-inline             0.1.6
mdurl                         0.1.0
memory-profiler               0.61.0
MonkeyType                    23.3.0
multidict                     6.0.4
multipledispatch              0.6.0
munkres                       1.1.4
mypy                          1.2.0
mypy-extensions               1.0.0
natsort                       8.3.1
nbformat                      5.8.0
nest-asyncio                  1.5.6
networkx                      2.8.8
nextstrain-augur              21.1.0
nextstrain-cli                6.2.1
nodeenv                       1.7.0
numpy                         1.24.3
numpyro                       0.11.0
nwkfmt                        0.1.1
oauth2client                  4.1.3
opt-einsum                    3.3.0
ordered-set                   4.1.0
orjson                        3.8.11
packaging                     23.1
pandas                        1.5.3
pango-aliasor                 0.3.0
pango-designation             1.19
paramiko                      3.1.0
parso                         0.8.3
pathspec                      0.11.1
peppy                         0.35.5
pexpect                       4.8.0
phylo-treetime                0.9.6
pickleshare                   0.7.5
Pillow                        9.2.0
pip                           23.1.2
pipenv                        2023.4.29
plac                          1.3.5
platformdirs                  3.5.0
pluggy                        1.0.0
ply                           3.11
polars                        0.17.11
pooch                         1.7.0
prettytable                   3.7.0
prompt-toolkit                3.0.38
protobuf                      3.20.3
psutil                        5.9.5
ptpython                      3.0.20
ptyprocess                    0.7.0
PuLP                          2.7.0
pure-eval                     0.2.2
py                            1.11.0
pyasn1                        0.4.8
pyasn1-modules                0.2.7
pycodestyle                   2.10.0
pycosat                       0.6.4
pycparser                     2.21
pyfastx                       0.8.4
Pygments                      2.15.1
pygraphviz                    1.10
PyJWT                         2.6.0
pyllist                       0.3
PyNaCl                        1.5.0
pyOpenSSL                     23.1.1
pyparsing                     3.0.9
pyright                       1.1.306
pyrsistent                    0.19.3
pysam                         0.20.0
pysftp                        0.2.9
PySocks                       1.7.1
pytest                        7.3.1
python-dateutil               2.8.2
python-irodsclient            1.1.6
python-lzo                    1.14
pytz                          2023.3
pyu2f                         0.1.5
PyYAML                        6.0
pyzmq                         25.0.2
ratelimiter                   1.2.0
regex                         2023.5.4
requests                      2.29.0
reretry                       0.11.8
retry                         0.9.2
rich                          13.3.5
rsa                           4.9
ruamel.yaml                   0.17.22
ruamel.yaml.clib              0.2.7
s3fs                          2023.3.0
s3transfer                    0.6.0
scikit-learn                  1.2.2
scipy                         1.10.1
setuptools                    67.7.2
setuptools-scm                7.1.0
shellingham                   1.5.1
six                           1.16.0
slacker                       0.14.0
smart-open                    6.3.0
smmap                         3.0.5
snakefmt                      0.8.4
snakemake                     7.25.2
stack-data                    0.6.2
stone                         3.3.1
stopit                        1.1.2
tabulate                      0.9.0
threadpoolctl                 3.1.0
throttler                     1.2.1
toml                          0.10.2
tomli                         2.0.1
toolz                         0.12.0
toposort                      1.10
tornado                       6.3
tqdm                          4.65.0
traitlets                     5.9.0
typer                         0.9.0
typing_extensions             4.5.0
typing-inspect                0.8.0
ubiquerg                      0.6.2
unicodedata2                  15.0.0
uritemplate                   4.1.1
urllib3                       1.26.15
veracitools                   0.1.3
virtualenv                    20.23.0
virtualenv-clone              0.5.4
wcmatch                       8.3
wcwidth                       0.2.6
wheel                         0.40.0
wrapt                         1.15.0
xopen                         1.7.0
xxhash                        0.0.0
yarl                          1.9.1
yte                           1.5.1
zipp                          3.15.0
zstandard                     0.19.0

Broader viz suggestions

Internal review from Slack:

  • Toggle for frequencies panel between natural y-axis and logit y-axis
  • Toggle for cases panel between natural y-axis and logit y-axis

These are broader suggestions that can be implemented after the launch of the automated site.

Document which `geo_resolution`'s are available

Working through README, I didn't find information on which values are accepted by geo_resolution. Only global is mentioned in the README, but there should be others, otherwise this config would be pointless.

europe didn't work for example. usa works.

Edit: silly me, it is documented...

Send Slack notifications for clades without definition

Context

As noted in #73, it may be helpful to have internal notifications of clades that lack clade definitions in our configs. This can serve as a reminder to us to add these clade definitions.

Possible solution

We previously had Slack notifications for clade-without-variant that has since been removed from the pipeline.

Our prepare-data script had an --output-clade-without-variant option that would emit a list of clades to a file. Then we had a Snakemake rule that calls on a custom notify script that will only send the Slack notification if the file was not empty.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.