Giter Club home page Giter Club logo

gaptools's Introduction

GaPTools

dbGaP data validation tool. GaPTools is distributed as a docker image on Docker Hub. See GaPTools.md for more information about the tool.

Pre-requisites:

Docker Installation:

You must have Docker installed and working to be able to run GaPTools. Docker is available on many different operating systems, including most modern Linux distributions, like CentOS, Debian, Ubuntu, etc. Follow the link below for more information about how to install Docker on your particular operating system.

To ensure that you can run GaPTools under your user account, run the below command and check for a response similar to the one below (your version and build numbers might be different than the ones below). The minimum supported Docker version to run GaPTools is 17.04.0.

docker -v

Docker version 19.03.6, build 369ce74a3c
Docker Compose Installation:

GaPTools uses docker-compose to run multiple containers under a single service. Follow the link below for more details on how to install docker-compose.

Access to data files

The docker host running GaPTools requires access to the data files that need to be validated. The files can either be on a local file system, a network file share (NFS) or in a storage bucket on the cloud. If the files are on a network file share (NFS) or in a storage bucket on the cloud, they need to be mounted as file system on the docker host. Below are some tools that are commonly used to mount cloud storage buckets as file systems on linux servers

  1. s3-fuse for Amazon Web Services (AWS)
  2. gcsfuse for Google Cloud Platform
Unused port 8080 on your docker host

GaPTools requires port 8080 to be available on the host system running docker. Run the below command to check if port 8080 is available on the docker host. If the below command does not produce any output, then port 8080 is available on the docker host.

netstat -an | grep "8080"

Setup

Once all pre-requisites are met, follow the instructions below to setup GaPTools. The setup can be validated using a sample study that is included as part of GaPTools installation. The input files for the sample study are inside the input_files/1000_Genomes_Study/ directory of the cloned GaPTools GitHub repository.

For the sample study, we will have GaPTools generate all output files inside the output_files/1000_Genomes_Study/ directory.

git clone https://github.com/ncbi/gaptools

cd gaptools
mkdir -p output_files/1000_Genomes_Study

# Change file permissions to allow GaPTools to write output files on docker host
chmod -R o+w output_files

Execution

Once GaPTools is setup, to execute it on the included sample study, run the below script from inside the same directory where the GaPTools GitHub repository is cloned.

./dbgap-docker.bash -i ./input_files/1000_Genomes_Study/ -o ./output_files/1000_Genomes_Study -m ./input_files/1000_Genomes_Study/metadata.json up

GaPTools uses Apache Airflow behind the scenes as the workflow orchestrator to perform all the validation tasks. To view the validation results of the dbGaP validation tool, browse to the following URL:

http://<your_docker_host_ip>:8080

At the end of the workflow, the output files will be created under the specified output directory.

Usage

To use GaPTools for your study, modify the above command and pass as input parameters:

-i -- path to the input files for your study

-o -- path where output files should be generated

-m -- path to the manifest file for your study

Stop Docker Containers

Once your study is processed, run the below command to stop the GaPTools service.

./dbgap-docker.bash down

Contact

If you have any questions or to report any issues, please contact us at: [email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.