Giter Club home page Giter Club logo

curate_outbreak_data's Introduction

Curate.outbreak.info Data and results

This Repository contains de-identified data from curate.outbreak.info, a project in which citizen scientists thematically classified COVID-19 datasets collected primarily from Zenodo and Figshare during the first few months of the COVID-19 outbreak.

About the data

Each user was instructed to classify datasets with up to 5 thematic categories and instructed to preferentially categorize with specific categories over broad categories whenever possible.

Each dataset was classified by at least 3 citizen scientists. For each dataset, a category needed to have been selected by at least 65% of the users to be considered valid for that category. Users were also asked to mark Datasets which could not be classified (ie- did not have sufficient information, were not in their language).

Analyzing the results

To analyze the performance of citizen scientists, specific categories were mapped to the broader categories for comparison with predictions by an out-the-box algorithm that was trained on LitCovid-classified abstracts. Out of 530 dataset classifications which had reached the threshold needed (ie- 65% of users agreed on a category for this dataset), 344 of the datasets matched the predictions of the algorithm and are considered true positives. 186 dataset classifications did not match the classification predicted by the algorithm and were manually inspected.

Manual evaluation of non-matches

Of the 186 dataset classifications that were manually inspected due to disagreement between predicted category and citizen science curated category:

  • 46 were found to match both the curated and predicted categories (the categories are not necessarily mutually exclusive)
  • 62 were found to better match the curated category due to limitations of LitCovid categories
  • 54 were found to better match the algorithm than the curators
  • 16 were found to match neither curator nor algorithm well
  • 8 ignored (pdb datasets) due to the limited availability of metadata

The table of manually inspected classifications can be found in this repository at: /results/Evalation%20of%20not-matches.xlsx

Given the above findings, the number of correct classifications is estimated to be 452, incorrect classifications is estimated to be ~78

The Categories

The specific subcategories (left) are mapped to the broader category (right) {"Clinical":"Clinical", "Case Descriptions":"Clinical", "Risk Factors":"Clinical", "Diagnosis":"Diagnosis", "Symptoms":"Diagnosis", "Rapid Diagnostics":"Diagnosis", "Antibody Detection":"Diagnosis", "Virus Detection":"Diagnosis", "Testing Prevalence":"Diagnosis", "Pathology/Radiology":"Diagnosis", "Forecasting":"Forecasting", "Mechanism":"Mechanism", "Virus Factors":"Mechanism", "Host Factors":"Mechanism", "Immunological Response":"Mechanism", "Mechanism of Infection":"Mechanism", "Mechanism of Transmission":"Mechanism", "Prevention":"Prevention", "Public Health Interventions":"Prevention", "Individual Prevention":"Prevention", "Transmission":"Transmission", "Host/Intermediate Reservoirs":"Transmission", "Viral Shedding / Persistence":"Transmission", "Treatment":"Treatment", "Vaccines":"Treatment", "Pharmaceutical Treatments":"Treatment", "Repurposing":"Treatment", "Biologics":"Treatment", "Medical Care":"Treatment", "Epidemiology":"Epidemiology", "Molecular epidemiology":"Epidemiology", "Classical epidemiology":"Epidemiology", "Behavioral Research":"Behavioral Research"}

curate_outbreak_data's People

Contributors

gtsueng avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.