Giter Club home page Giter Club logo

enaget's Introduction

enaget

A simple script for downloading ENA fastq data by accession number.

Usage

-h, --help          show this help message and exit
-i INPUT, --input INPUT
                    Path to file containing list of accession numbers.
-o OUTPUT, --output OUTPUT
                    Path to output fastq files.
-l, --list          Generates a list of URL's for retrieval at a later
                    time (i.e. does not invoke wget).
-r, --remove        Removes temporary files generated during URL resolution
-p, --parallel      Executes wget in parallel for faster downloads.

Input File

The script is expecting a text file with a list of accession numbers, eg:

ACC000001
ACC000002
ACC000003
SRR589123

It will retrieve any fastq files stored under that accession number only. The script can use any accession number attributed to a single sample.

It won't work with study numbers - if you want to retrieve all data in a particular study, or a single acession number, I'd recommmend using ENA's own scripts: https://github.com/enasequence/enaBrowserTools - this is also useful if you want something other than fastqs.

How does it work?

The script queries ENA using the provided accession numbers and locates the correpsonding file report. From this report, the fastq URL is extracted and handed to wget.

This should mean that you can use any one of the sample accession numbers ENA uses, including cross archived SRA accessions, secondary accessions etc.

It's worked on everything I've passed to it, just let me know if you're having problems.

I wanna go fast

Use the -p flag to invoke wget under GNU parallel. This will usually spawn as many wget processes as you have available threads, and allows multiple concurrent downloads instead of a one-by-one approach. Useful for high speed connections or when downloading a lot of files. I've found that this script in general is a lot faster than fastq-dump by default, and using --parallel/-p is a different league entirely.

How do I add options to wget?

Edit the script at line 117. Additional flags must be enclosed in quotation marks, with any spaces demarked with commas. Alternatively, use the -l flag to simply generate a list of URLs that you can retrieve on your own.

enaget's People

Contributors

stevenjdunn avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.