Giter Club home page Giter Club logo

ssb_selection's Introduction

SSB-dN/dS Method to detect positive and negative selection in cancer

Luis Zapata

SSB dN/dS is a tool that calculates the ratio of nonsynonymous to synonymous mutations (dN/dS) for genes using annotated variant data. Given that some mutational processes are more common than others, SSB-dN/dS uses a context correction method that normalizes for this bias (Somatic Substitution Bias, SSB). The tool and the results are described in the manuscript:

"Negative Selection on tumour evolution acts on essential cellular functions and the immunopeptidome".

Installation

Dependencies

bedtools 2.26.0 https://github.com/arq5x/bedtools2

R-3.3.3 or higher

Python (to install the synapse client)

R library tidyr

perl 5

variant effect predictor v89 or higher

GNU command line tools

Important Notes

  • earlier versions of bedtools will not work
  • tab encoding should be \t (might be a problem for windows/OSX versions)
  • genome file is a two column file specifying the fasta id and the length of the sequence (see how to obtain it at the bottom)
  • Restrict your input dataset to chromosomes 1-22 and X and Y. Remove the rest.

To install first clone the tool

git clone https://github.com/luisgls/SSB_selection.git

Install synapse client *need pip from python

pip install synapseclient

Download zipped data files from synapse syn11681952 (you need to create a user in https://www.synapse.org/)

cd SSB_selection
synapse get -r syn11681952

Go to the tool directory and

1.1) Create data folder within the cloned folder

mkdir data

1.2) Unzip data files into data folder (Data.zip for HG19, and Data2.zip for GRCh38)

mkdir data/
mv Data.zip data/ 
mv Data2.zip data/
cd data/
unzip Data.zip
unzip Data2.zip

1.3) Unzip Example_files into example directory

mkdir example
mv ExampleFile.zip example

1.4) Get genome fasta and chrom sizes files. Edit run_negDriver script and specify the location of the genome file (e.g. hg19.genome) and the fasta file (e.g. hg19.fasta)

Genomes

To get hg19 fasta genome, you can download it from UCSC:

wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes

To get hg38 fasta genome, you can download it from UCSC:

wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes

1.5) You need to also mody the BASEDIR parameter where you have clone the repository.

1.6) Transform the main script into executable

chmod +x run_negDriver
  1. To see help
./run_negDriver

Input file

The input file is the standard output of variant effect predictor using the following command line (by providing to vep the ensembl default input file format)

perl variant_effect_predictor.pl -i input -o input.annotated --cache --all_refseq --assembly GRCh37 --pick --symbol --no_stats --fasta genome.fasta

If you want to filter putative germline variants use the option --plugin ExAC when running VEP. Example input files can be found on synapse: ID syn11681983

Important points before running

a) No header needed for input VEP file

b) VEP annotated first column must be in the format (chr_pos_ref/alt)

c) VEP annotated file must only have chromosomes that are 1,2,3,4...22 or uppercase X,Y

d) After you run vep with the option for ExAC frequencies, it would be necessary to remove all variants present in more than 0.1 percent of the population. You could apply the filter using:

filter_vep -i input.annotated -f "ExAC_AF < 0.1 or not ExAC_AF" --ontology --filter "Consequence is coding_sequence_variant" 

e) Be sure that you are using the GNU command line if you are running in a MacOS (https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x/)

f) Add dependencies to your path for easy running or hardcode the scripts

g) The genome file used in SSB is a two column file that contains the info of the name of the fasta id (column 1) and the length of that sequence (column 2).

h) The UNIX system used should be able to recognize \t as a tab separator

NOTES: This version is a simplified and easy-to-use version of the tool developed in the manuscript "Negative Selection on tumour evolution acts on essential cellular functions and the immunopeptidome" (https://doi.org/10.1186/s13059-018-1434-0). For the reproducibility of the results presented in the manuscript, please contact the authors directly.

The tool to run the analysis of the immunopeptidome is called SOPRANO

Run the tool for other reference assemblies

To run the tool with version 38 of the human genome, simply update the path of you GENOME and FASTA file in the run_negdriver script. Also, use the Data file provided in synapse labeled as Data2.zip. It contains updated transcript information.

Remember to modify the fasta file to contain only uppercase letters.

ssb_selection's People

Contributors

luisgls avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ssb_selection's Issues

A little bug in run_negDriver, forget to create the dictionary $OUTDIR

Hi, Luis:

When running your code, you may miss to create the dictionary: $OUTDIR in line 210 of run_negDriver.

##Add hugo name to file; line 210
if [ ! -d $OUTDIR ] 
then
	mkdir $OUTDIR
fi

You may just add a short code to create the dictionary $OUTDIR, like above.

Your Sincerely,

Qingjian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.