Giter Club home page Giter Club logo

varfish-anno's Introduction

Varfish Anno

The purpose of this project is to convert databases that are required by varfish into a format that can be easily imported (i.e. TSV files with a header containing the column names of the corresponding varfish database table).

Requirements

  • bcftools

Installation

Requirements

We recommend the installation of the requirements via conda:

conda install bcftools bedops samtools

Clone project

git clone [email protected]:bihealth/varfish-anno.git
cd varfish-anno

Usage

Initialize folder structure

make init

This step creates the folder structure in databases/.

Download databases

make download

The downloads will be stored in databases/<database_name>/download/. The reference is placed in downloads/.

Please note that the download routine is not sophisticated. You might want to double check the process, especially in case something breaks. It is thought as extensive instructions to download the required databases. If the files are already available to you, you can place them in the corresponding download folder and omit this step. Note that in this case the conversion scripts might need some adaption to match the correct file name (see next section).

The download links are defined in downloads/Makefile and the variable names are prefixed with URL_. Those variables are safe to change (if the downloaded file contains the expected format).

  • The KEGG database is not automatically downloadable. Instructions are printed to obtain the required files (or see instructions below). They need to be placed in databases/kegg/downloads.
  • The case files are in .ped format and are individual depending on your project. You need to place them in databases/case/download.
  • Copy the resulting -vars file from Varhab to databases/annotation
  • Copy the resulting -gts file from Varhab to databases/smallvariant

Note that ExAC, gnomAD and dbSNP databases are rather large files and will take time to download.

KEGG download

GeneToKegg
  • https://genome.ucsc.edu
  • Tools -> Table Browser
    • group: All tables
    • assembly: GRCh37
    • table: keggPathway
    • output format: selected fields from primary and related tables
    • output file: genetokegg.tsv
    • get output
      • Linked Tables -> knownGene -> allow selection from checked tables
      • Linked Tables -> ensGtp -> allow selection from checked tables
      • Select Fields from keggPathway -> mapID
      • ensGtp fields -> gene
      • get output
KeggInfo
  • https://genome.ucsc.edu
  • Tools -> Table Browser
    • group: All tables
    • assembly: GRCh37
    • table: keggMapDesc
    • output format: selected fields from primary and related tables
    • output file: kegginfo.tsv
    • get output
      • Select Fields from keggMapDesc -> mapID
      • Select Fields from keggMapDesc -> description
      • get output

Convert databases

make convert

Every script defines a HEADER, INPUT and OUTPUT variable, and, if needed, a REF variable. The names should be self-explanatory. They are preset to the downloaded files. You can change the INPUT and REF, if needed.

Note that ExAC, gnomAD and dbSNP databases are rather large files and especially dbSNP will take time to convert.

varfish-anno's People

Contributors

stolpeo avatar holtgrewe avatar

Stargazers

 avatar

Watchers

 avatar James Cloos avatar  avatar Mikko Nieminen avatar  avatar  avatar

varfish-anno's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.