Giter Club home page Giter Club logo

desh-data's Introduction

Pango lineage information for German SARS-CoV-2 sequences

This repository contains a join of the metadata and pango lineage tables of all German SARS-CoV-2 sequences published by the Robert-Koch-Institut on Github.

The data here is updated every hour, automatically through a Github action, so whenever new data appears in the RKI repo, you will see it here within at most an hour.

The resulting dataset can be downloaded here, beware it's currently around 50MB in size: https://raw.githubusercontent.com/corneliusroemer/desh-data/main/data/meta_lineages.csv

Omicron share plot

Type N means representative surveillance. Type X means unknown, but since this is unlikely to be heavily targeted and includes quite a number of labs I include it now in the main plot (hence type NX).

Omicron Logit Plot

Omicron Logit Plot

Omicron share by zip code area

Description of data

Column description:

  • IMS_ID: Unique identifier of the sequence
  • DATE_DRAW: Date the sample was taken from the patient
  • SEQ_REASON: Reason for sequencing, one of:
    • X: Unknown
    • N: Random sampling
    • Y: Targeted sequencing (exact reason unknown)
    • A[<reason>]: Targeted sequencing because variant PCR indicated VOC
  • PROCESSING_DATE: Date the sample was processed by the RKI and added to Github repo
  • SENDING_LAB_PC: Postcode (PLZ) of lab that did the initial PCR
  • SEQUENCING_LAB_PC: Postcode (PLZ) of lab that did the sequencing
  • lineage: Pango lineage as reported by pangolin
  • scorpio_call: Alternative, rough, variant as determined by scorpio (part of pangolin), this is less precise but a bit more robust than pangolin.

Excerpt

Here are the first 10 lines of the dataset.

IMS_ID,DATE_DRAW,SEQ_REASON,PROCESSING_DATE,SENDING_LAB_PC,SEQUENCING_LAB_PC,lineage,scorpio_call
IMS-10294-CVDP-00001,2021-01-14,X,2021-01-25,40225,40225,B.1.1.297,
IMS-10025-CVDP-00001,2021-01-17,N,2021-01-26,10409,10409,B.1.389,
IMS-10025-CVDP-00002,2021-01-17,N,2021-01-26,10409,10409,B.1.258,
IMS-10025-CVDP-00003,2021-01-17,N,2021-01-26,10409,10409,B.1.177.86,
IMS-10025-CVDP-00004,2021-01-17,N,2021-01-26,10409,10409,B.1.389,
IMS-10025-CVDP-00005,2021-01-18,N,2021-01-26,10409,10409,B.1.160,
IMS-10025-CVDP-00006,2021-01-17,N,2021-01-26,10409,10409,B.1.1.297,
IMS-10025-CVDP-00007,2021-01-18,N,2021-01-26,10409,10409,B.1.177.81,
IMS-10025-CVDP-00008,2021-01-18,N,2021-01-26,10409,10409,B.1.177,
IMS-10025-CVDP-00009,2021-01-18,N,2021-01-26,10409,10409,B.1.1.7,Alpha (B.1.1.7-like)
IMS-10025-CVDP-00010,2021-01-17,N,2021-01-26,10409,10409,B.1.1.7,Alpha (B.1.1.7-like)
IMS-10025-CVDP-00011,2021-01-17,N,2021-01-26,10409,10409,B.1.389,

Suggested import into pandas

You can import the data into pandas as follows:

#%%
import pandas as pd

#%%
df = pd.read_csv(
    'https://raw.githubusercontent.com/corneliusroemer/desh-data/main/data/meta_lineages.csv',
    index_col=0,
    parse_dates=[1,3],
    infer_datetime_format=True,
    cache_dates=True,
    dtype = {'SEQ_REASON': 'category',
             'SENDING_LAB_PC': 'category',
             'SEQUENCING_LAB_PC': 'category',
             'lineage': 'category',
             'scorpio_call': 'category'
             }
)
#%%
df.rename(columns={
    'DATE_DRAW': 'date',
    'PROCESSING_DATE': 'processing_date',
    'SEQ_REASON': 'reason',
    'SENDING_LAB_PC': 'sending_pc',
    'SEQUENCING_LAB_PC': 'sequencing_pc',
    'lineage': 'lineage',
    'scorpio_call': 'scorpio'
    },
    inplace=True
)
df

License

The underlying files that I use as input are licensed by RKI under CC-BY 4.0, see more details here: https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland#lizenz.

The software here is licensed under the "Unlicense". You can do with it whatever you want.

For the data, just cite the original source, no need to cite this repo since it's just a trivial join.

desh-data's People

Contributors

corneliusroemer avatar lenaschimmel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

desh-data's Issues

Write better tick label formatter for logit scale that produces 50% and 99.99% instead of 50.00 or 100%

The current tick label formatter is a bad hack. We need something more robust that produces the following behaviour:
1%,10%,50%,99.9%,99.999% etc.

This is the current hack (from SO I think):

ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda y, _: f'{float(f"{100*y:.1g}"):g}%'))

The challenge is to display decimals only for trailing 9s but not for trailing zeros.

This might do the job, but not 100% sure: https://numpy.org/doc/stable/reference/generated/numpy.format_float_positional.html

@lenaschimmel interested?

Edit: np.format_float_positional does the job:

np.format_float_positional(1.000, trim='-') # 1
np.format_float_positional(99.99, trim='-') # 99.99

What about non-BA.1-variants of Omicron?

Currently only cases with lineage == 'BA.1' are counted as Omicron (see source).

There are some cases with lineage BA.2, BA.3 or just B.1.1.529. Shouldn't they be counted as well? Otherwise, I think the wording on the graph should be updated from "Omikron" to "BA.1".

To date, these are all 32 of them (sorted by DATA_DRAW), making up 1,37% of total Omicron cases including those:

IMS_ID                                               DATE_DRAW   SEQ_REASON    PROCESSING_DATE  SENDING_LAB_PC  SEQUENCING_LAB_PC  lineage    scorpio_call
IMS-10183-CVDP-81E05ED2-68B2-45C9-AE92-FE0747BD7C1A  2021-11-30  Y             2021-12-10       22081           22081              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10261-CVDP-0EC19B38-8711-4617-8D20-B19F3C75E2F8  2021-12-01  A[B.1.1.529]  2021-12-13       32105           32105              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10004-CVDP-E64B5426-4FB5-4D41-AFEC-77D84720E886  2021-12-02  A[B.1.1.529]  2021-12-20       21502           21502              BA.3       Omicron (BA.3-like)
IMS-10338-CVDP-DEB4E3F4-4E65-4E95-9E9B-77EB04A50226  2021-12-03  X             2021-12-17       64283                              B.1.1.529  Omicron (B.1.1.529-like)
IMS-10641-CVDP-677D2DB5-8A78-4238-BF38-CC4BC8247275  2021-12-03  N             2021-12-27       06120           06120              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10013-CVDP-2857098B-37D6-49EA-B92A-748F97328D42  2021-12-06  N             2021-12-18       01665           04779              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10004-CVDP-17A54357-705F-43BD-81F4-1A87C79F9FA4  2021-12-06  N             2021-12-20       21502           21502              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10209-CVDP-93C23280-BFE2-4DD7-A9DE-460B5420EE08  2021-12-06  X             2021-12-28       78467           78467              BA.2       Omicron (BA.2-like)
IMS-10036-CVDP-B81B32E6-AD2D-4E05-9109-7B35544A6407  2021-12-07  A             2021-12-21       12247           16321              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10001-CVDP-FD7B08A6-39E9-462A-BB81-34D2D72DE174  2021-12-07  A[Y]          2021-12-25       87435           87435              B.1.1.529  Omicron (B.1.1.529-like)
IMS-10183-CVDP-DB2FDBCC-5F6A-445D-9F75-20D87840C180  2021-12-09  N             2021-12-17       22081           22081              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10183-CVDP-75514806-B96C-4825-B5FD-EF389CC8D1EA  2021-12-10  Y             2021-12-17       22081           22081              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10183-CVDP-DCCF53C4-C30E-4D1C-A2B7-ECD99B7551EE  2021-12-10  N             2021-12-17       22081           22081              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10261-CVDP-BB754EC4-4185-4B28-A872-DA062436D447  2021-12-13  A[B.1.1.529]  2021-12-22       32105           32105              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10004-CVDP-40148A1B-A7BC-4302-B4EB-9993F89C48F8  2021-12-13  A[B.1.1.529]  2021-12-28       21502           21502              BA.2       Omicron (BA.2-like)
IMS-10001-CVDP-0EA49D87-CBD9-48B0-8536-7F5AFFAC321F  2021-12-14  A[Y]          2021-12-25       87435           87435              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10001-CVDP-36C59D9E-72E8-4B2C-A635-0D69C4B9C9FB  2021-12-14  A[Y]          2021-12-25       87435           87435              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10150-CVDP-1D7B1F19-0AA1-486C-BFE2-2DE49596B981  2021-12-16  X             2021-12-22       51375           92637              BA.2       Omicron (BA.2-like)
IMS-10183-CVDP-FF1E061C-F0E6-41BE-9DA0-35154066D3C0  2021-12-17  N             2021-12-24       22081           22081              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10261-CVDP-DFACA834-5290-4855-BC54-AC7AB9B0B49B  2021-12-17  A[B.1.1.529]  2021-12-27       32105           32105              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10001-CVDP-F576FBA5-8F15-4E9E-8E70-F3287A33FDDB  2021-12-19  A[Y]          2021-12-25       87435           87435              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10001-CVDP-909F8C1F-9DF7-47B0-AA3C-C981406B56C0  2021-12-19  A[Y]          2021-12-25       87435           87435              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10001-CVDP-4BBE02BC-9479-4E6F-8B28-9F575E60A615  2021-12-19  A[Y]          2021-12-25       87435           87435              B.1.1.529  Omicron (B.1.1.529-like)
IMS-10001-CVDP-295624E6-E260-4456-9B36-E67512ACEA20  2021-12-20  A[Y]          2021-12-25       87435           87435              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10001-CVDP-047F5A00-3CE6-4038-8308-6F85FA8E40E5  2021-12-20  A[Y]          2021-12-25       87435           87435              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10001-CVDP-FEEEA8B2-0F57-40BE-A50D-4D7A6B0031E6  2021-12-20  A[Y]          2021-12-25       87435           87435              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10001-CVDP-AB0193AA-F6E3-4569-8C70-4E507F1037D0  2021-12-20  A[Y]          2021-12-25       87435           87435              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10001-CVDP-76508575-1AC0-4E0F-94C6-8FCDE164BE02  2021-12-20  A[Y]          2021-12-25       87435           87435              B.1.1.529  Omicron (B.1.1.529-like)
IMS-10001-CVDP-1E673F95-62A2-4576-A94F-8A46797FEF14  2021-12-20  A[Y]          2021-12-25       87435           87435              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10337-CVDP-6549EC0D-4E8F-427A-96ED-6E2F47E00941  2021-12-20  X             2021-12-28       23538           23538              BA.2       Omicron (BA.2-like)
IMS-10001-CVDP-3F43636E-F55C-4C1C-BD9A-EF792ED6E550  2021-12-21  A[B.1.617.2]  2021-12-25       87435           87435              B.1.1.529  Probable Omicron (B.1.1.529-like)
IMS-10004-CVDP-E9819B99-144D-4AE6-A47C-46042F231AEF  2021-12-22  N             2021-12-28       21502           21502              B.1.1.529  Probable Omicron (B.1.1.529-like)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.