Giter Club home page Giter Club logo

ncbi-get-all-children-organism-under-ancestor's Introduction

NCBI-get-all-children-organism-under-ancestor

Use the Pyton scripts as follow to retrieve all children organism under an ancestor in NCBI taxonomy.

  1. Download taxdmp.zip from https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/.
  2. Unzip taxdmp.zip and place nodes.dmp and names.dmp in this folder.
  3. Run nodes_to_csv.py and names_to_csv.py to get nodes.csv and names.csv respectively.
  4. Run concat_names_to_nodes.py to get taxonomy.csv.
  5. Compute the direct children of each organism (node) using get_direct_children_from_tax.py to get taxonomy_with_direct_children.csv.
  6. Compute all children (may take several hours) using get_all_children_from_tax.py to get taxonomy_with_all_children.csv.
  7. Run query.py --ancestor 8782 to retrieve all chilren organism with the ancestor Aves. Replace 8782 with the tax_id of the ancestor you decide.

taxonomy_with_all_children.csv is the final csv you may need to analyze NCBI taxonomy tree.

query.py:

  • get all children of any organism
  • after getting all scientific_names of all children of an organism (ancestor), you can retrieve all SRA data related to all organisms with the same ancestor from BigQuery by running the generated SQL in BigQuery

Note: NCBI hosts SRA data in BigQuery. It is convenient for large amount of data retrieval.

Example of retrieval of SRA data from BigQuery

SELECT *
FROM `nih-sra-datastore.sra.metadata`,
WHERE organism = "Homo sapiens";

ncbi-get-all-children-organism-under-ancestor's People

Contributors

tracywong117 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.