Giter Club home page Giter Club logo

bioinformatic-resources's Introduction

Bioinformatic-resources

Trio & family

Quality control

  • adapterremoval: rapid adapter trimming, identification, and read merging

Tools with bam

  • Alfred: BAM alignment statistics, feature counting and feature annotation
  • bamkit: Tools for common BAM file manipulations
  • bam-readcount: count DNA sequence reads in BAM files
  • bamtools
  • biobambam2: Tools for early stage alignment file processing
  • mosdepth: fast BAM/CRAM depth calculation for WGS, exome, or targetted sequencing.
  • VariantBam: Filtering and profiling of next-generational sequencing data using region-specific rules

Alignment (Illumina)

  • bwa: Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

fusion gene

  • arriba: Fast and accurate gene fusion detection from RNA-Seq data
  • FuSeq: A fast detection of fusion genes from paired-end RNA-seq data
  • GeneFuse: Gene fusion detection and visualization
  • fusioncatcher: Finder of Somatic Fusion Genes in RNA-seq data
  • STAR-Fusion: STAR-Fusion codebase
  • STAR-Fusion-Tutorial: Tutorial for STAR-Fusion, FusionInspector, and de novo reconstruction of fusion transcripts using Trinity

16S rRNA resources

data format

16S rRNA gene database

  • RDP: RDP provides quality-controlled, aligned and annotated Bacterial and Archaeal 16S rRNA sequences
  • SILVA: SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences
  • GreenGene: Greengenes is a quality controlled, comprehensive 16S reference database and taxonomy based on a de novo phylogeny that provides standard operational taxonomic unit sets
  • rrnDB: A searchable database documenting variation in ribosomal RNA operons (rrn) in Bacteria and Archaea
  • EzTaxon-e: It contains comprehensive 16S rRNA gene sequences of taxa with valid names as well as sequences of uncultured taxa

Tools

  • dada2: Accurate sample inference from amplicon data with single nucleotide resolution

Metagenome

Tools

pipeline

Single cell transcriptome

  • BISCUIT_SingleCell_IMM_ICML_2016: R Codebase for BISCUIT: Infinite Mixture Model to cluster and impute single cells.
  • cisTopic: Probabilistic modelling of cis-regulatory topics from single cell epigenomics data
  • CONICS: COpy-Number analysis In single-Cell RNA-Sequencing
  • DoubletDetection: Doublet detection in single-cell RNA-seq data.
  • dropSeqPipe: A SingleCell RNASeq pre-processing pipeline built on snakemake
  • HoneyBADGER: HMM-integrated Bayesian approach for detecting CNV and LOH events from single-cell RNA-seq data
  • ImmuneResistance: Single-cell RNA-seq of melanoma ecosystems reveals sources of T cell exclusion linked to immunotherapy clinical outcomes
  • inferCNV: Inferring CNV from Single-Cell RNA-Seq
  • SAVER: Single-cell RNA-seq Gene Expression Recovery
  • scanpy: Single-Cell Analysis in Python. Scales to >1M cells. http://scanpy.rtfd.io
  • scde: R package for analyzing single-cell RNA-seq data
  • scImpute: Accurate and robust imputation of scRNA-seq data
  • scell: Single-CELL rna-seq analysis software
  • scg_lib_structs: Collections of library structure and sequence of popular single cell genomic methods
  • single_cell_portal_core: Rails/Docker application for the Broad Institute single cell RNA-seq data portal
  • single-cell-pseudotime: An overview of algorithms for estimating pseudotime in single-cell RNA-seq data
  • single-cell-tutorial
  • SingleR: Single-cell RNA-seq cell types Recognition
  • scRNA-tools: Table of software for the analysis of single-cell RNA-seq data.
  • seurat: R toolkit for single cell genomics
  • snATAC: Ren Lab in-house dual-barcode single nucleus ATAC-seq (snATAC-seq) analysis pipeline
  • STREAM_atac: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data. Preprocessing steps for single cell atac-seq data
  • STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data
  • tenx: Pipelines for the analysis of 10x single-cell RNA-sequencing data
  • awesome-single-cell
  • scPipe: a pipeline for single cell RNA-seq data analysis
  • Linnarsson Lab Single-cell analysis of mouse cortex
  • Human MTG single nucleus RNA-seq data
  • scMerge:Statistical technique for removing unwanted variation from multiple scRNA-seq datasets
  • scRNA-seq-workshop-Fall-2018
  • SoupX: R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data

Nanopore

pipeline

  • NanoDJ: A Dockerized Jupyter Notebook for Interactive Oxford Nanopore MinION Sequence Manipulation and Genome Assembly
  • gnomad-sv-pipeline: Code and custom scripts relevant to gnomAD-SV (Collins*, Brand*, et al., 2019)
  • sv-benchmark: Public Benchmark of Long-Read Structural Variant Caller on PacBio CCS HG002 Data
  • Pomoxis: comprises a set of basic bioinformatic tools tailored to nanopore sequencing
  • Nanoflow: a NANOpore sequencing data bioinformatics workFLOW
  • Scrappie: a technology demonstrator for the Oxford Nanopore Research Algorithms group
  • wub: Tools and software library developed by the ONT Applications group
  • nanopore-scripts
  • nano-snakemake: A snakemake pipeline for SV analysis from nanopore genome sequencing
  • pipeline-pinfish-analysis: Pipeline for annotating genomes using long read transcriptomics data with pinfish
  • hpv_minION_analysis: Contains scripts used to analyze HPV samples sequenced on ONT minIONs.
  • Nanopype: https://nanopype.readthedocs.io/en/stable/
  • tiptoft: Predict plasmids from uncorrected long read data
  • nanoflow: De novo assembly of nanopore reads using nextflow
  • wub: Tools and software library developed by the ONT Applications group
  • monica: MinION Open Nucleotide Identifier for Continuous Analysis - an open source pathogen identifier for real-time analysis on MinION output
  • Step by step blasr installation example

quality control

  • albacore: a professional quality suite of Rake tasks for building .NET or Mono based systems
  • Basecalling-comparison: A comparison of different Oxford Nanopore basecallers
  • fast5_fetcher: A tool for fetching nanopore fast5 files after filtering via demultiplexing, alignment, or other, to improve downstream processing efficiency
  • SquiggleKit: A toolkit for manipulating nanopore signal data
  • fast5seek: Subset of fast5 files contained in a fastq, BAM, or SAM file
  • albacore: Dockerfile for the Albacore basecaller from Oxford Nanopore
  • Basecalling-comparison: A comparison of different Oxford Nanopore basecallers
  • npBarcode: Demultiplex barcoded Oxford Nanopore sequencing
  • npReader: Real-time extraction and analysis Oxford Nanopore sequencing data
  • nanopore adapters
  • NanoFilt: https://github.com/wdecoster/nanofilt
  • Deepbinner: a signal-level demultiplexer for Oxford Nanopore reads
  • Porechop: adapter trimmer for Oxford Nanopore reads
  • poretools: a toolkit for working with Oxford nanopore data
  • NanoPlot: Plotting scripts for long read sequencing data
  • longread_plots: A collection of plots for long read sequencing FastQ files from devices like Oxford Nanopore's MinION and PromethION.
  • Nanopolish
  • nanoQC: Quality control tools for nanopore sequencing data
  • NanoR: R package for user-friendly analysis and comparison of ONT data
  • pomoxis: Analysis components from Oxford Nanopore Research
  • poretools document
  • poretools github: a toolkit for working with Oxford nanopore data
  • qcat: qcat is Python command-line tool for demultiplexing Oxford Nanopore reads from FASTQ files
  • pycoQC: pycoQC computes metrics and generates Interactive QC plots from the sequencing summary report generated by Oxford Nanopore technologies basecaller (Albacore/Guppy)
  • nanopack: Easily install all nanopack scripts together
  • nanocomp: Comparison of multiple long read datasets
  • nanolyse: Remove lambda phage reads from a fastq file
  • nanomath: A few simple math function for other Oxford Nanopore processing scripts

/nanopack

Assembly

  • NovoGraph: building whole genome graphs from long-read-based de novo assemblies
  • wtdbg2: A fuzzy Bruijn graph approach to long noisy reads assembly
  • smartdenovo: Ultra-fast de novo assembler using long noisy reads
  • MECAT2
  • quickmerge: A simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.
  • npGraph: Resolve assembly graph in real-time using nanopore data
  • Canu
  • shasta: De novo assembly from Oxford Nanopore reads

Variants

  • Longshot: diploid SNV caller for error-prone reads
  • NanoSatellite: Dynamic time warping of Oxford Nanopore squiggle data to characterize tandem repeats
  • Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing

Methylation

  • deepsignal: Detecting methylation using signal-level features from Nanopore sequencing reads
  • tombo: a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data
  • DeepMod: a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications
  • nanopore-methylation
  • mCaller: A python program to call methylation (m6A in DNA) from nanopore signal data
  • EpiNano: Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads

Mapping tools

  • graphmap: A highly sensitive and accurate mapper for long, error-prone reads
  • rkmh: Classify sequencing reads using MinHash

simulator

  • NanoSim: Nanopore sequence read simulator
  • DeepSimulator: The first deep learning based Nanopore simulator which can simulate the process of Nanopore sequencing.

data

Transcriptome

Pacbio data

  • PacBioEDA: Python scripts for Exploratory Data Analysis of Pacific Biosciences sequence data
  • GenomicConsensus: PacBio® variant and consensus caller
  • pbalign: pbalign maps PacBio reads to reference sequences and saves alignments to a BAM file
  • pbmm2: A minimap2 frontend for PacBio native data formats

Assembly genome

Assembly with short reads

  • w2rap-contigger: An Illumina PE genome contig assembler, can handle large (17Gbp) complex (hexaploid) genomes.
  • w2rap: WGS (Wheat) Robust Assembly Pipeline
  • GFA-spec: Graphical Fragment Assembly (GFA) Format Specification
  • HapCUT2: software tools for haplotype assembly from sequence data
  • masurca: MaSuRCA Genome Assembler Quick Start Guide
  • minia: Minia is a short-read assembler based on a de Bruijn graph
  • npScarf: Scaffold and Complete assemblies in real-time fashion
  • redundans: Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.
  • Scaff10X: Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
  • SDA: Segmental Duplication Assembler (SDA)
  • shovill: Faster SPAdes assembly of Illumina reads
  • SOAPdenovo2

Assembly with long reads

  • FALCON-Phase: FALCON-Phase integrates PacBio long-read assemblies with Phase Genomics Hi-C data to create phased, diploid, chromosome-scale scaffolds
  • wtdbg2: A fuzzy Bruijn graph approach to long noisy reads assembly
  • DBG2OLC: The genome assembler that reduces the computational time of human genome assembly from 400,000 CPU hours to 2,000 CPU hours, utilizing long erroneous 3GS sequencing reads and short accurate NGS sequencing reads.
  • Flye: Fast and accurate de novo assembler for single molecule sequencing reads
  • PBcR (http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR)
  • SALSA: A tool to scaffold long read assemblies with Hi-C data
  • smartdenovo: Ultra-fast de novo assembler using long noisy reads
  • NovoGraph: Genome Graph of Long-read De Novo Assemblies

fill gap && polish

  • quickmerge: A simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.
  • PBJelly: Gap-closing-with-PBJelly
  • GapCloser

Assembly transcriptome

  • Corset: Software for clustering de novo assembled transcripts and counting overlapping reads

Variants

Somatic variants

  • ascatNgs: Somatic copy number analysis using paired end wholegenome sequencing
  • needlestack: Multi-sample somatic variant caller
  • seurat: Tumor-Normal Variant Caller
  • facets: Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
  • Shimmer: a software package for the characterization of genetic differences between two very similar samples, e.g., a tumor sample and its matched normal tissue sample
  • neusomatic: Deep convolutional neural networks for accurate somatic mutation detection
  • Pisces: Somatic and germline variant caller for amplicon data.
  • deTiN: DeTiN is designed to measure tumor-in-normal contamination and improve somatic variant detection sensitivity when using a contaminated matched control.
  • DeepSVR: a machine learning model approach to somatic variant refinement
  • somaticseq: An ensemble approach to accurately detect somatic mutations using SomaticSeq
  • MuSiC2: identifying mutational significance in cancer genomes

Germline variants

  • benchmarking germline small-variant calls: Repository for the GA4GH Benchmarking Team work developing standardized benchmarking methods for germline small variant calls
  • vt: A tool set for short variant discovery in genetic sequence data
  • dna-seq-gatk-variant-calling: This Snakemake pipeline implements the GATK best-practices workflow
  • deepvariant: an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
  • speedseq: A flexible framework for rapid genome analysis and interpretation
  • vg: tools for working with genome variation graphs
  • GEMINI: integrative exploration of genetic variation and genome annotations

Haplotype & phase

Imputation

LD

  • PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format(VCF) files
  • emeraLD: tools to efficiently retrieve and calculate LD
  • ngsLD: Calculation of pairwise Linkage Disequilibrium (LD) under a probabilistic framework
  • LD Hub
  • LDSC and associated files
  • abstar: VDJ assignment and antibody sequence annotation. Scalable from a single sequence to billions of sequences.

Structural variants

SV Caller for third generation sequences

  • svim: Structural Variant Identification Method using Long Reads
  • SURVIVOR: Toolset for SV simulation, comparison and filtering
  • Sniffles: Structural variation caller using third generation sequencing
  • NanoSV: SV caller for nanopore data
  • smrtsv2: long read structural variant caller
  • pbsv: PacBio structural variant (SV) calling and analysis tools
  • Picky: Structural Variants Pipeline for Long Reads
  • NanoVar: Structural variant caller using low-depth Nanopore sequencing

SV with illumina data

  • svtyper: Bayesian genotyper for structural variants
  • lumpy-sv: a general probabilistic framework for structural variant discovery
  • parliament2: Runs a combination of tools to generate structural variant calls on whole-genome sequencing data
  • delly: Structural variant discovery by integrated paired-end and split-read analysis
  • manta: Structural variant and indel caller for mapped sequencing data
  • parliament2: Runs a combination of tools to generate structural variant calls on whole-genome sequencing data
  • SV2: Support Vector Structural Variation Genotyper
  • pindel: identify the breakpoints of these variants from paired-end short reads
  • MetaSV: An accurate and integrative structural-variant caller for next generation sequencing
  • svaba: Structural variation and indel detection by local assembly
  • wham: Structural variant detection and association testing
  • gridss: Genomic Rearrangement IDentification Software Suite
  • breakdancer: SV detection from paired end reads mapping
  • SVenX: Pipeline for SV detection using 10X genomics data
  • paragraph: Graph realignment tools for structural variants
  • svtools: Tools for processing and analyzing structural variants
  • SnowmanSV: Structural variation and indel detection using rolling local string graph assembly
  • truvari: Structural variant comparison tool for VCFs

CNV

  • Control-FREE: a tool for assessing copy number and allelic content using next generation sequencing data
  • canvas: Canvas Copy Number Variant Caller
  • CNVnator: a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads
  • cnv_facets: Somatic copy variant caller (CNV) for next generation sequencing
  • CNV-Visualizer: Visualizing Copy Number Variations
  • facets: Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
  • cnvkit: Copy number variant detection from targeted DNA sequencing
  • ADTEx: detect somatic copy number variations (CNVs)
  • NGSEPcore: an integrated framework for analysis of high throughput sequencing (HTS) reads. The main functionality of NGSEP is the variants detector, which allows to make integrated discovery and genotyping of Single Nucleotide Variants (SNVs), insertions, deletions, and genomic regions with copy number variation (CNVs)
  • aCNViewer: Comprehensive genome-wide visualization of absolute copy number and copy neutral variations
  • cancerTitanCNA: Analysis of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH)

CNV workflow

General

  • svaba: Structural variation and indel detection by local assembly
  • truvari: Structural variant comparison tool for VCFs
  • smoove: structural variant calling and genotyping with existing tools, but, smoothly
  • sv-pipeline: Pipeline for structural variation detection in cohorts
  • svtools: Tools for processing and analyzing structural variants
  • samplot: Plot structural variant signals from many BAMs and CRAMs
  • svviz2: visual evaluation of read support for structural variation

SV annotation

  • AnnotSV: Annotation and Ranking of Human Structural Variations
  • Nirvana: Nirvana provides clinical-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, and SVs (including CNVs)
  • StructuralVariantAnnotation: R package designed to simplify structural variant analysis

SV (CNV) annotation database

GWAS

QTL

  • QTLseqr: QTLseqr is an R package for QTL mapping using NGS Bulk Segregant Analysis

ATAC

ChIP

Methylation

  • Methylation QTL data for brain and blood
  • methylpy: WGBS/NOMe-seq Data Processing & Differential Methylation Analysis
  • ViewBS: a powerful toolkit for visualization of high-throughput bisulfite sequencing data
  • mCaller: A python program to call methylation (m6A in DNA) from nanopore signal data
  • DNA-methylation-analysis: notes on DNA methylation analysis (arrays and sequencing data)
  • bs3: BS-Seeker3: An Ultra-fast, Versatile Pipeline for Mapping Bisulfite-treated Reads
  • bsseq: Devel repository for bsseq

Hi-C

  • Hi-C data
  • tadtool: an interactive tool for the identification of meaningful parameters in TAD-calling algorithms for Hi-C data.
  • juicebox_scripts: A collection of scripts for working with Hi-C data, Juicebox, and other genomic file formats
  • ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
  • genomedisco: Software for comparing contact maps from HiC, CaptureC and other 3D genome data
  • 3DChromatin_ReplicateQC: Software to compute reproducibility and quality scores for Hi-C data
  • hic_breakfinder

10X data

ngs

  • ngsDist:Estimation of pairwise distances under a probabilistic framework
  • NGS-pipe: next-generation sequencing pipelines for precision oncology
  • ngsPopGen: Population genetics analyses from NGS data
  • ngsTools: Programs to analyse NGS data for population genetics purposes
  • viral-ngs: Viral genomics analysis pipelines
  • NGSCheckMate: Software program for checking sample matching for NGS data
  • abtools: Analysis of antibody NGS data
  • alignment-and-variant-calling-tutorial: basic walk-throughs for alignment and variant calling from NGS sequencing data

Plotter

UMI

  • zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
  • umis: Tools for processing UMI RNA-tag data

TCR

  • tcR: Advanced Data Analysis of Immune Receptor Repertoires

Bioinformatics tutorial

database & Websites

Deal with vcf

  • cyvcf2: fast VCF and BCF processing
  • CyVCF document
  • CyVCF: A fast Python library for VCF files leveraging Cython for speed.
  • rtg-tools: Utilities for accurate VCF comparison and manipulation
  • spVCF: Sparse Project VCF: evolution of VCF to encode population genotype matrices efficiently
  • vcf2phylip: Convert SNPs in VCF format to PHYLIP, NEXUS, binary NEXUS, or FASTA alignments for phylogenetic analysis
  • vcflib: a simple C++ library for parsing and manipulating VCF files, + many command-line utilities
  • GTShark: Genotype compression in large projects

Websites for cancer data

Blogs

Labs

Tool resources

Python

Machine Learning

bioinformatic-resources's People

Contributors

zhikunwu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.