Giter Club home page Giter Club logo

Comments (6)

huddlej avatar huddlej commented on June 28, 2024 1

@joverlee521 Can we close this given that #2 has been merged?

from forecasts-ncov.

joverlee521 avatar joverlee521 commented on June 28, 2024

Comment from @trvrb:

Should these full count files continue to exclude rows with 0 cases/sequences? In past experience, it's better to be explicit about 0 counts to differentiate 0 vs NA.

I like dropping 0s in this case as otherwise the file sizes are much larger than they need to be. There are a bunch of short lived variants.

Do we want Slack notifications for the automated count updates? (I think yes since we don't have a good monitoring system set up yet.)

I'd make a new channel for this.

Note that I added an additional criteria in count provisioning: https://github.com/blab/rt-from-frequency-dynamics/tree/master/data/variants-us

I drop samples that have QC_overall_status listed as bad.


We should also be clear with data.nextstrain.org/files locations for open data and s3://nextstrain-ncov-private/ locations for GISAID data so that we have a hopefully stable pseudo-API (like we've been trying to do with files/).

We need a stable system for global targets that define location as country-level vs country targets that define location as division-level. Initial datasets would be:

  • GISAID global
  • GISAID US
  • open global
  • open US

from forecasts-ncov.

joverlee521 avatar joverlee521 commented on June 28, 2024

We should also be clear with data.nextstrain.org/files locations for open data and s3://nextstrain-ncov-private/ locations for GISAID data so that we have a hopefully stable pseudo-API (like we've been trying to do with files/).

Currently files in data.nextstrain.org/files/ncov/open/ match files in s3://nextstrain-ncov-private/.
To keep this consistent, I propose the following:

# Public
data.nextstrain.org/files/ncov/open/counts/global/case-counts.tsv.gz
data.nextstrain.org/files/ncov/open/counts/global/clade-counts.tsv.gz
data.nextstrain.org/files/ncov/open/counts/usa/case-counts.tsv.gz
data.nextstrain.org/files/ncov/open/counts/usa/clade-counts.tsv.gz

# Private
s3://nextstrain-ncov-private/counts/global/clade-counts.tsv.gz
s3://nextstrain-ncov-private/counts/usa/clade-counts.tsv.gz

from forecasts-ncov.

tsibley avatar tsibley commented on June 28, 2024

I'd suggest not putting this data under https://data.nextstrain.org/files/… as the current usage and intent of that prefix is for pathogen-build related files that directly correspond to https://nextstrain.org/… paths. These counts are a separate thing, right? (IIUC, some of the counts are downstream of the same ncov data, but not part of the actual input/build?) I'm missing a lot of context here but what about https://data.nextstrain.org/counts/…?

curl 'https://data.cdc.gov/resource/9mfq-cb36.csv?…

I'd recognize a Socrata URL anywhere! If you haven't seen them yet, there are lots of dev/API docs at https://dev.socrata.com/. Socrata (looks like acquired now by "Tyler Technologies"??) for years lead big pushes for public orgs at all levels to use the Socrata data portal and run it at data.X domains.

from forecasts-ncov.

tsibley avatar tsibley commented on June 28, 2024

@joverlee521 and I chatted about this a bit in our 1:1 today, with the takeaway that this question relates back to the larger questions around structure/organization of data.nextstrain.org/files/… and how it relates (or doesn't) to nextstrain.org/… URLs which also intersects with larger questions of nextstrain remote download/upload behaviour. We'll fold discussion of this counts data specifically into a (pending) larger discussion of those issues, as they also arose recently with nextstrain/ncov#910.

from forecasts-ncov.

joverlee521 avatar joverlee521 commented on June 28, 2024

Closed by #2

from forecasts-ncov.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.