Giter Club home page Giter Club logo

ena-bulk-webincli's Introduction

Code style: black

ENA Webin-CLI Bulk Submission Tool

Introduction

This tool is a wrapper to bulk submit read, un-annotated genome, targeted sequence or taxonomic reference data to the ENA using Webin-CLI.

The tool requires an appropriate metadata spreadsheet which it uses to generate manifest files for the user and validate or submit their submission. The tool does not handle study and sample registration, therefore visit ENA Submissions Documentation for more information on this. The documentation also provides information on manifest file fields for your type of submission (which correlate to the headers in the spreadsheet file).

An example template spreadsheet has been provided (example_template_input.txt). This file is a tab-delimited text file, however the script also consumes spreadsheets in native MS Excel formats (e.g. .xslx) or comma-separated (.csv).

Installation

Docker

To ease in usage, the tool has been containerised using Docker. The only requirement is to have Docker installed. Once installed, run the following commands to setup:

  1. Clone the repository: git clone https://github.com/nadimm-rahman/ena-bulk-webincli.git && cd ena-bulk-webincli
  2. Build the docker image: docker build --tag ena-bulk-webincli .
  3. Ready to go! Run the tool using docker using the following command: docker run --rm -v <LOCAL_DATA_DIRECTORY>:/data ena-bulk-webincli -h (for help)

<LOCAL_DATA_DIRECTORY> is recommended to be the ena-bulk-webincli directory on your local machine. In the example below, docker is used to submit reads to the test environment. The /workdir directory on the docker image is the working directory, containing the script, input spreadsheet and data files.

docker run --rm -v pathto/ena-bulk-webincli:/workdir ena-bulk-webincli -u Webin-XXXX -p XXXX -g reads -s /workdir/example_template_read.txt -m submit -t

Note: For data files to be submitted, relative file paths in accordance to <LOCAL_DATA_DIRECTORY> must be provided within the input spreadsheet. In the above example, the spreadsheet described a file /workdir/data/Test_1.fq, corresponding to the local file in <LOCAL_DATA_DIRECTORY>/data/Test_1.fq.

Singularity

In addition to the Docker container, a Singularity container has also been generated to ease in setup and running the tool. To install Singularity, see their Installation Guide. Once installed,build the singularity image using the definition file (ena-bulk-webincli.def):

  1. Clone the repository: git clone https://github.com/nadimm-rahman/ena-bulk-webincli.git && cd ena-bulk-webincli
  2. Build the image: sudo singularity build ena-bulk-webincli.sif ena-bulk-webincli.def
  3. Ready to go! Run the tool using singularity with the following command: singularity run --bind <LOCAL_DATA_DIRECTORY>:/data ena-bulk-webincli.sif -h (for help)

Other

To use the tool without a container:

  1. Clone the repository: git clone https://github.com/nadimm-rahman/ena-bulk-webincli.git && cd ena-bulk-webincli
  2. Download the latest version of Webin-CLI installed.
  3. Download tool dependencies listed below.
  4. Run the tool using python bulk_webincli.py --help(for help).
  5. The path to your downloaded webin-cli.jar file should be provided using the -w [--webinCliPath].

The script accepts full paths to files (to be submitted e.g. fastq/fasta) within the input spreadsheet. To control location of outputs, a specific directory can be provided using the --directory/-d parameter, where the folders listed below will be generated.

Usage

Mandatory arguments include Webin submission account username and password, genetic context and metadata spreadsheet. Note that the --test/-t flag can be specified to use Webin test submission services.

By default, the script utilises two additional directories:

  1. 'manifests' - which houses all generated manifest files and report files.
  2. 'submissions' - housing all validation and submission related reports and files, includes analysis and receipt XMLs of submissions.
Examples

Submitting reads to the test environment (sequential):

python bulk_webincli.py -u Webin-XXXXX -p XXXXX -g reads -s INPUT_SPREADSHEET -m submit -t

docker run --rm -v localpathto/ena-bulk-webincli:/workdir ena-bulk-webincli -u Webin-XXXXX -p XXXXX -g reads -s /workdir/INPUT_SPREADSHEET -m submit -t

singularity run --bind localpathto/ena-bulk-webincli:/workdir ena-bulk-webincli.sif -u Webin-XXXXX -p XXXXX -g reads -s /workdir/INPUT_SPREADSHEET -m submit -t

 

Submitting genomes to the production environment (in parallel with 5 cores):

python bulk_webincli.py -u Webin-XXXXX -p XXXXX -g genome -s INPUT_SPREADSHEET -m submit -pc 5

docker run --rm -v localpathto/ena-bulk-webincli:/workdir ena-bulk-webincli -u Webin-XXXXX -p XXXXX -g genome -s /workdir/INPUT_SPREADSHEET -m submit -pc 5

singularity run --bind localpathto/ena-bulk-webincli:/workdir ena-bulk-webincli.sif -u Webin-XXXXX -p XXXXX -g genome -s /workdir/INPUT_SPREADSHEET -m submit -pc 5

 

Validating reads, specifying an output directory (sequential):

python bulk_webincli.py -u Webin-XXXXX -p XXXXX -g reads -s INPUT_SPREADSHEET -d OUTPUT_DIRECTORY -m validate

docker run --rm -v localpathto/ena-bulk-webincli:/workdir ena-bulk-webincli -u Webin-XXXXX -p XXXXX -g reads -s /workdir/INPUT_SPREADSHEET -d /workdir/OUTPUT_DIRECTORY -m validate

singularity run --bind localpathto/ena-bulk-webincli:/workdir ena-bulk-webincli.sif -u Webin-XXXXX -p XXXXX -g reads -s /workdir/INPUT_SPREADSHEET -d /workdir/OUTPUT_DIRECTORY -m validate

Dependencies

In addition to Webin-CLI, the tool runs using Python3.6+ and requires installation of Python Pandas and joblib. This can be installed in a virtual environment. If using Aspera instead of FTP to upload files using Webin-CLI, ensure that you have downloadd Aspera and included it within your $PATH.

ena-bulk-webincli's People

Contributors

cocathail avatar ismailm avatar nadimm-rahman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ena-bulk-webincli's Issues

Non-default argument follows default argument syntax error

When running the following command: "python3 read_validator_4.py -s read_validator_test_spreadsheet.xlsx -g genome -u Webin-55868 -p '#####' -m submit -t" the following error is thrown:

line 114
def webin_cli_validate_submit(WEBIN_USERNAME, WEBIN_PASSWORD, manifest_file, context, mode, upload_file_dir="", center_name="", test):
^
SyntaxError: non-default argument follows default argument

no csh support

Issues with how this passes characters depending on the shell interpreter. Using C-shell throws errors with certain character combos. Reccommend telling users to only deploy in bash.

Create more efficient folders for performance

Scalability of submission can start to break as the N of submissions increases. In excess of 10,000 submissions creates many many files and folders which the script needs to search through. A more performant method of searching to enhance scale of the program may be needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.