Giter Club home page Giter Club logo

pranjalpruthi / bhedi Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 6.95 MB

βHΞDI (Biomarker-based Heuristic Engine for Dengue Identification) is a computational tool designed for the identification of Dengue virus serotypes in wastewater next-generation sequencing data.

Home Page: https://amr.igib.res.in/bhedi/

License: GNU Affero General Public License v3.0

Go 67.20% Python 32.80%
bioinformatics biomarker-discovery biomarkers dask-dataframes data-science go kmer metagenomics microsatellite palindrome

bhedi's Introduction

βHΞDI

Go Python Matplotlib NumPy Pandas Plotly CMake Docker Swagger Streamlit App

Introduction

βHΞDI (Biomarker-based Heuristic Engine for Dengue Identification) is a computational tool designed for the identification of Dengue virus serotypes in wastewater next-generation sequencing data. It leverages specific genomic fragments, referred to as sankets, to detect sequences associated with the Dengue virus. This repository contains the command-line interface (CLI) and API for processing FASTQ files and identifying Dengue virus serotypes.

FASTA-41

Installation

Prerequisites

  • Go (1.15 or later)
  • SeqKit

Installing SeqKit

SeqKit must be installed as a prerequisite. You can install SeqKit by following the instructions on its GitHub repository: SeqKit GitHub.

Setting Up the BHEDI CLI Tool

  1. Clone the repository:
   git clone https://github.com/pranjalpruthi/bhedi.git
  1. Navigate to the cloned directory:
   cd bhedi
  1. Build the CLI tool:
   go build -o bhedi-cli

Usage

CLI Tool

To process a FASTQ file and generate a Parquet file with the analysis results, run:

./bhedi-cli -i <input_dir> -o <output_dir>

11da13e1-f06d-45fe-b757-f426801aac98

Replace <input_dir> with the directory containing your FASTQ files and <output_dir> with the directory where you want the results to be saved.

API

To start the API server, run:

go run api/main.go

The API will be available at http://localhost:3000.

FASTA-43

Dependencies

CLI Dependencies

  • Standard Library Packages: bufio, encoding/csv, flag, fmt, io, log, math, os, os/exec, path/filepath, strconv, strings, sync
  • Third-Party Packages: github.com/shenwei356/seqkit, github.com/cheggaaa/pb/v3, github.com/shenwei356/bio/seqio/fastx, github.com/xitongsys/parquet-go-source/local, github.com/xitongsys/parquet-go/writer

API Dependencies

  • Standard Library Packages: Same as CLI, minus flag
  • Third-Party Packages: github.com/gofiber/fiber/v2, github.com/gofiber/fiber/v2/middleware/cors, github.com/gofiber/fiber/v2/middleware/logger, plus all third-party packages listed under CLI Dependencies

Notes

  • Ensure seqkit is installed and accessible in your system's PATH.
  • Manage dependencies using Go modules (go.mod and go.sum) for reproducible builds.
  • The API component requires the Fiber web framework and its middleware for CORS and logging.

Use SimP to Plot reports from βHΞDI-CLI

SimP Tool

Introduction

SimP (Simple Plotter) is a visualization tool designed to plot data processed by the βHΞDI CLI tool. It leverages Python libraries such as Pandas, Dask, HoloViews, and Plotly to generate insightful plots from Parquet files containing analysis results of Dengue virus serotypes in wastewater next-generation sequencing data. SimP supports various plot types including GC percentage box plots, serotype frequency heatmaps, and B score distributions.

Installation

Prerequisites

  • Python 3.10 or later
  • Conda or virtualenv (recommended for managing Python packages)

Dependencies

SimP requires the following Python packages:

  • pandas
  • dask
  • holoviews
  • plotly
  • argparse
  • numpy

You can install these dependencies using pip:

pip install pandas dask holoviews plotly argparse numpy

Or, if you prefer using Conda or Mamba, you can create a new environment and install the required packages:

conda create -n simp_env python=3.10 pandas dask holoviews plotly numpy
conda activate simp_env
mamba create -n simp_env python=3.10 pandas dask holoviews plotly numpy
mamba activate simp_env

Installing SimP

Currently, SimP is provided as a Python script (sim.py). Ensure you have the required dependencies installed in your environment before running the script.

Usage

To use SimP for plotting, you need to specify the input directory containing the Parquet files processed by βHΞDI CLI and the output directory where the plots will be saved.

python sim.py -i <input_dir> -o <output_dir>

Replace <input_dir> with the directory containing your Parquet files and <output_dir> with the directory where you want the plots to be saved.

Example

Assuming you have Parquet files in /path/to/parquet_files and you want to save the plots in /path/to/plots, run:

python sim.py -i /path/to/parquet_files -o /path/to/plots

This will generate various plots such as GC percentage box plots, serotype frequency heatmaps, and B score distributions, and save them as HTML files in the specified output directory.

Running on High-Performance Computing Clusters

SimP can also be run on HPC clusters using SLURM. Here's an example SLURM script:

#!/bin/bash
#SBATCH --job-name=SimP
#SBATCH --output=./log/SimP%j.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --mem=16GB
#SBATCH --partition=short

# Activate your Conda environment or Python virtual environment
conda activate simp_env

# Run SimP
time python sim.py -i /path/to/parquet_files -o /path/to/plots

]

Adjust the SLURM parameters according to your cluster's configuration and your job's requirements.

Contributing

Contributions to the βHΞDI project are welcome. Please refer to the CONTRIBUTING.md file for guidelines on how to contribute.

License

This project is licensed under the AGPLv3 License - see the LICENSE file for details.

bhedi's People

Contributors

pranjalpruthi avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.