Giter Club home page Giter Club logo

mutscape's Introduction

MutScape: an analytical toolkit for probing the mutational landscape in cancer genomics

issues stars license all_figure

Introduction

We developed a user-friendly Python toolkit, MutScape, which provides a comprehensive pipeline of filtering, combination, transformation, analysis, and visualization for researchers, to easily explore the cohort-based mutational characterization for studying cancer genomics when obtaining somatic mutation data. MutScape can not only preprocess millions of mutation records in a few minutes, but offers various analyses simultaneously. Furthermore, MutScape supports somatic variant data in both Variant Call Format (VCF) and Mutation Annotation Format (MAF), and leverages caller combination strategies to quickly eliminate false-positives. With only two simple commands, robust results and publication-quality images are generated automatically.

Quick installation

Before implement quick installation, please be sure that you have installed MiniConda3, created a new conda environment and activate it. Also, to make this implementation run smoothly, please confirm that the Internet is connected always and the server/computer has enough storage memory.

git clone https://github.com/anitalu724/MutScape.git
bash MutScape/mutscape/installation/quickInstall_1.sh
bash vcf2maf-1.6.20/MutScape/mutscape/installation/quickInstall_2.sh

Prerequisite installation

Requirements

The latest tested version in parentheses:

  1. Using Miniconda (py37_4.9.2) to install:

    samtools (v1.10), ucsc-liftover (v377), bcftools (v1.10.2), htslib (v1.10.2) and ensembl-vep (v102.0)

  2. Download vcf2maf (v1.6.20) and git clone MutScape (v1.0)

  3. Download VEP cache data of GRCh37 and the reference FASTA (v102.0)

Install Miniconda3

Numerous modules for this toolkit will be installed by conda. If you have never installed conda, please refer to Miniconda website. For high compatibility, we recommended users install Miniconda3-py37_4.9.2. (SHA256 hash 79510c6e7bd9e012856e25dcb21b3e093aa4ac8113d9aa7e82a86987eabe1c31)

There is a script for users to install Miniconda quickly.

wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh
sha256sum Miniconda3-py37_4.9.2-Linux-x86_64.sh
bash Miniconda3-py37_4.9.2-Linux-x86_64.sh
export PATH="$HOME/miniconda3/bin:$PATH"

MutScape is preferred to be implementing under a brand-new conda environment.

conda create --name MutScape
conda activate MutScape

Install Ensembl's VEP

If you have already install Ensembl's VEP, you may skip this part and directly into the next part to install vcf2maf. (However, you must confirm that your VEP version is compatible to vcf2maf. Here, we recommended installing ensembl-vep=102.0. )

conda install -c bioconda -c conda-forge samtools=1.10 ucsc-liftover=377 bcftools=1.10.2 htslib==1.10.2
conda install -c bioconda -c conda-forge -c defaults ensembl-vep=102.0 

Install vcf2maf

For transforming the VCF into the MAF, this procedure is implemented by vcf2maf utility, which processes variant annotation and transcript prioritization. You can refer to this script or just follow the commands below. (Before this step, you must be sure that you have installed Ensembl's VEP)

wget https://github.com/mskcc/vcf2maf/archive/refs/tags/v1.6.20.tar.gz
tar -zxf v1.6.20.tar.gz
cd vcf2maf-1.6.20
perl vcf2maf.pl --man
perl maf2maf.pl --man

Before we start to use vcf2maf, we need to download VEP cache data and the reference FASTA.
⚠️ Since these files are quite large, it may take a long time to download them!
⚠️ Be sure that your available memory is at least 30GB!
ℹ️ Here we recommended to download 102_GRCh37

mkdir -p $HOME/.vep/homo_sapiens/102_GRCh37/
wget ftp://ftp.ensembl.org/pub/grch37/release-102/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
mv Homo_sapiens.GRCh37.dna.toplevel.fa.gz  $HOME/.vep/homo_sapiens/102_GRCh37/
gzip -d $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
bgzip -i $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
samtools faidx $HOME/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
wget ftp://ftp.ensembl.org/pub/release-102/variation/indexed_vep_cache/homo_sapiens_vep_102_GRCh37.tar.gz
mv homo_sapiens_vep_102_GRCh37.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_102_GRCh37.tar.gz -C $HOME/.vep/

Install MutScape

MutScape is provided on the Github website, please download it.

git clone https://github.com/anitalu724/MutScape.git

If you have never installed pip, install it by conda.

conda install -c anaconda pip

To make sure all code smoothly implement, you need to install several modules that are used in MutScape:

cd MutScape/mutscape
bash installation/install_module.sh

Implementation

MutScape has simply separated into two main modules: data preprocessing and analysis and visualization. Detailed structure please refer to Fig. 1.

Data Preprocessing

MutScape accepts both VCF and MAF files as input data. For multiple VCF/MAF files will be implemented simultaneously, MutScape requires a limited-format TSV file as input. For the detailed format please refer to example files such as examples/tsv/testData_vcf.tsv and examples/tsv/testData_maf.tsv or just see Wiki.

Quick start from VCFs

For VCFs as input data, -f, -o and -m are required while -vf, -ra, -v2m and -mf are optional. Some simple test commands are displayed below.
See Wiki for detailed information.

S1

python3 dataPreprocess.py \
-f examples/tsv/testData_vcf.tsv \
-o examples/output \
-m examples/meta \
-vf CI "*,*,*,6,*,*,*,*"


python3 dataPreprocess.py \
-f examples/tsv/testData_vcf.tsv \
-o examples/output \
-m examples/meta \
-vf GI [1,3] \
-v2m 8 


python3 dataPreprocess.py \
-f examples/tsv/testData_vcf.tsv \
-o examples/output \
-m examples/meta \
-vf GI "{1: [*,*], 2 : [1, 300000]}" CI "15,15,0,6,0,0.05,8,8" PA 0 AV 0.9 \
-v2m 


python3 dataPreprocess.py \
-f examples/tsv/testData_vcf.tsv \
-o examples/output \
-m examples/meta \
-v2m 8 \
-mf GI [1,3]
  • Reject and accept list (-ra)

    Schematic diagram is shown in S2.

    python3 dataPreprocess.py \
    -f examples/tsv/testData_vcf.tsv \
    -ra examples/test_data/vcf/reject.vcf examplestest_data/vcf/accept.vcf \
    -o examples/output \
    -m examples/meta \
    -vf CI "*,*,*,6,*,*,*,*" \
    -v2m 8 \
    -mf GI [1,3]

Quick start from MAFs

For MAFs as input data, -f, -o and -m are required while -mf are optional. Some simple test commands are displayed below.

python3 dataPreprocess.py \
-f examples/tsv/testData_maf.tsv \
-mf GI [1:3] \
-o examples/output \
-m examples/meta 


python3 dataPreprocess.py \
-f examples/tsv/testData_maf.tsv \
-mf GI [1:3] CI "15,15,0,0,0,0.05,8,8" TE [BLCA,5] PAC 1 HY 500 \
-o examples/output \
-m examples/meta

Analysis and Visualization

MutScape provides 9 different analyses and some of them generate plots after analysis.
See Wiki for detailed information.

Quick start

Some simple test commands are displayed below.

  1. Significantly mutated gene detection

    python3 mafAnalysis.py \
    -f examples/test_data/maf/TCGA_test.maf \
    -smg \
    -o examples/output \
    -p examples/pic/
    
  2. Known cancer gene annotation

    python3 mafAnalysis.py \
    -f examples/test_data/maf/TCGA_test.maf \
    -kcga \
    -o examples/output \
    -p examples/pic/
    
  3. Mutation burden statistics

    python3 mafAnalysis.py \
    -f examples/test_data/maf/TCGA_test.maf \
    -tmb 60456963 \
    -o examples/output \
    -p examples/pic/
    
  4. CoMut plot analysis

    Output figure is shown like Fig. 2.
    See Wiki for detailed information.

    python3 mafAnalysis.py \
    -f examples/test_data/maf/TCGA_test.maf \
    -cm 60456963 \
    -o examples/output \
    -p examples/pic/
    
    
    python3 mafAnalysis.py \
    -cmp examples/tsv/comut.tsv examples/tsv/comut_info.tsv 0 comut.pdf \
    -o examples/output \
    -p examples/pic/
    
  5. Mutational signature

    Signature refitting: the output figure of -ms 0 is shown in Wiki.
    De novo extraction: the output figure of -ms 1 and -ms 2 is shown like Fig. 3.

    python3 mafAnalysis.py \
    -f examples/test_data/maf/ms.maf \
    -ms 0 "[SBS1, SBS5, SBS40, SBS87]" \
    -o examples/output \
    -p examples/pic/
    
    
    python3 mafAnalysis.py \
    -f examples/test_data/maf/ms.maf \
    -ms 1 "[2,9,10]" \
    -o examples/output \
    -p examples/pic/
    
    
    python3 mafAnalysis.py \
    -f examples/test_data/maf/ms.maf \
    -ms 2 "[3]" \
    -o examples/output \
    -p examples/pic/
    
  6. HRD Score

    Output figure is shown like Fig. 4A, B.

    python3 mafAnalysis.py \
    -hrd examples/tsv/hrd.tsv grch37 \
    -o examples/output \
    -p examples/pic/
    
  7. Whole-genome doubling (WGD) and Chromosome instability (CIN)

    Output figure is shown like Fig. 4C, D.

    python3 mafAnalysis.py \
    -wgdcin examples/tsv/hrd.tsv \
    -o examples/output \
    -p examples/pic/
    
  8. HRD, CIN and WGD Comparison

    Output figure is shown like Fig. 5.

    python3 mafAnalysis.py \
    -hcwc examples/tsv/hcw_comparison.tsv grch37 \
    -o examples/output \
    -p examples/pic/
    
  9. Actionable mutation (drug) annotation

    oncokb-annotator was free under the GPL 3.0 license.
    [your_oncokb_token] is gotten from OncoKB Website. You must create your own account and get your personal API token.
    Output figure is shown like Fig. 6.

    python3 mafAnalysis.py \
    -f examples/test_data/maf/TCGA_test.maf \
    -oncokb ../oncokb-annotator/ [your_oncokb_token] 4 examples/test_data/oncokb/clinical_input.txt \
    -o examples/output \
    -p examples/pic/
    

Reference

If you use MutScape in your work, please cite

Lu, C. H., Wu, C. H., Tsai, M. H., Lai, L. C., & Chuang, E. Y. (2021). MutScape: an analytical toolkit for probing the mutational landscape in cancer genomics. NAR genomics and bioinformatics, 3(4), lqab099.

mutscape's People

Contributors

anitalu724 avatar

Stargazers

 avatar  avatar  avatar Pan Haoran avatar Mike Olufawo avatar Pratik Chandrani, PhD avatar SimonY avatar  avatar Ivan Alexander Kristanto avatar Zuber avatar Tao Wu avatar  avatar Bipin Singh avatar  avatar  avatar Ronak Shah avatar Heng-Jui Chang avatar

Watchers

 avatar

mutscape's Issues

Impossible to intall

Hi

I am currently attempting to install MutScape using Conda, but I am encountering difficulties as the installation process seems to be unsuccessful.

I have followed the specified steps, but it appears that there might be an issue with the installation. I would greatly appreciate any guidance or assistance you could provide to troubleshoot and resolve this matter.

Best

Victor

Error installing MutScape

Hi,
Thank you for developing this useful tool.
I encountered an error when I tried to install MutScape using quick installation.
The error message is as follows:

===============
Install PyVCF...

Collecting PyVCF
Using cached PyVCF-0.6.8.tar.gz (34 kB)
ERROR: Command errored out with exit status 1:
command: /home/lordaaa/miniconda3/envs/MutScape/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-yhsoj5_v/pyvcf/setup.py'"'"'; file='"'"'/tmp/pip-install-yhsoj5_v/pyvcf/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-g_fi1ew8
cwd: /tmp/pip-install-yhsoj5_v/pyvcf/
Complete output (1 lines):
error in PyVCF setup command: use_2to3 is invalid.
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

The environment I use is a clean, newly created Google Cloud VM, and the OS is Ubuntu 18.04.6 LTS.
Any suggestions?

Thank you,
Yi-Chen

GRCh37 vs GRCh38

Hi, the readme indicates a requirement for GRCh37. Does MutScape support vcf files generated with alignment to GRCh38? I presume I would need to track down all the equivalent GRCh38 files and substitute them in the set up.

MutScape for CCLE maf file

Hello, I would like to use your toolkit to analyze ccle maf files. However, the column names are not exactly in the same format as seen in the TCGA.maf. IS there a way to bypass and use it For .eg. CCLe maf file does not have protein position.

Thank you for this tool.

Best,
Shwetha

Add Support for Freebayes somatic variant caller

A commonly used NGS pipeline for variant calling is nf-core/sarek The variant callers implemented in this pipeline are Mutect2, Strekla and Freebayes. It would be nice to add support for freebayes into this package to allow for a seemless transition of variant calling into interpretation using MutScape.

Killed in the step of Start VCF combination

Hi, I use mutect2 to make vcf file, and put it into MutScape, then I get the error message as the title says.
the step of "Start VCF combination", "Reading TSV file" and "Formalizing VCF files" are passed, and Killed in the step of Start VCF combination (only print Killed), so I do not know what happen.
Thanks for your answer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.