Giter Club home page Giter Club logo

gdc-tsv-tool's Introduction

GDC Spreadsheet Download Tool


Quick Start:

  1. Download manifest from https://portal.gdc.cancer.gov/
  2. python gdc-tsv-tool.py <manifest_file>

The GDC Spreadsheet Download Tool will download clinical and/or biospecimen metadata for a given set of files in a tab-delimited format. These file sets can be passed to the tool in a manifest downloaded from the GDC Portal (https://portal.gdc.cancer.gov/) or in a plain text list of file UUIDs. The tab delimited output is compatible with Microsoft Excel or any other spreadsheet program.

The GDC Spreadsheet Download Tool produces TSVs in which each row represents one file and each column represents a clinical or biospecimen field. Because of the structure of the GDC Data Model, files can be associated with more than one of each field (e.g. a VCF associated with a tumor sample and a normal sample), which produces more than one column. This tool divides the TSV into smaller TSVs of equal column number.

Usage: python gdc-tsv-tool.py [options] <manifest_file>

Options:

  • -h, --help : Displays documentation
  • -o, --output : Designate prefix for output files (Default: metadata)
  • -c, --clinical : Outputs clinical metadata
  • -b, --biospecimen : Outputs biospecimen metadata
  • -u, --uuid-list : Passing UUID List instead of manifest
  • -l, --legacy : Manifest from GDC Legacy Archive
  • -s, --simple : Output a simple set of fields (file name, file id, project id, case barcode, sample type)
  • -x, --mafout: Output separate metadata file for MAF or XLSX file (warning: messy)
  • -a, --allop: Output does not remove empty or datetime columns

Notes:

  • A test manifest is provided for troubleshooting: python gdc-tsv-tool.py Test_Manifest.txt
  • The default parameters produce both clinical and biospecimen data, which is the same as passing both -c and -b.
  • Passing the simple (-s) argument overrides both the clinical (-c) and biospecimen (-b) arguments.
  • Not familiar with using the command line interface? See the CLI_Instructions.txt file for step-by-step directions on using this tool.

Release Notes:

Version 2.0: September 15, 2017

  • Increased compatibility with Python 3, should work the same with Python 2.
  • Added command-line interface instructions for users (CLI_Instructions.txt)
  • Added this "Release Notes" section
  • Updated Test_Manifest.txt to contain files that still appear in the GDC Data Portal

Known Issues:

  • Using a list of UUIDs (-u option) will not separate file metadata by type.
  • Including Biotab files from the Legacy Archive in the manifest will cause the program to stall.

Version 1.0: May 10, 2017

  • Initial release!

Known Issues:

  • Using a list of UUIDs (-u option) will not separate file metadata by type.
  • Including Biotab files from the Legacy Archive in the manifest will cause the program to stall.

gdc-tsv-tool's People

Contributors

wwysoc2 avatar

Watchers

Gnanavel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.