Giter Club home page Giter Club logo

a-robust-method-for-detecting-positive-selection-on-regulatory-sequences's Introduction

A-robust-method-for-detecting-positive-selection-on-regulatory-sequences-

We developed a method to detect positive selection of transcription factor binding sites (TFBSs) evolution based on binding affinity changes. This is achieved by comparing the observed binding affinity changes in evolution to a null distribution. The effects of substitutions on binding affinity change can be accurately predicted by deltaSVM (Lee et al. 2015), a machine leaning based method to predict the effects of regulatory variations de novo from sequence.

  1. The procedures of detecting positive selection

1). Training of the gapped k-mer support vector machine (gkm-SVM)

Firstly, we defined a positive training set and its corresponding negative training set. The positive training set is ChIP-seq narrow peaks of transcription factors. The negative training set is an equal number of sequences which randomly sampled from the genome with matched the length, GC content and repeat fraction of the positive training set. This negative training set was generated by using “genNullSeqs”, a function of gkm-SVM R package (Ghandi et al. 2016). Then, we trained a gkm-SVM with default parameters except -l=10 (meaning we use 10-mer as feature to distinguish positive and negative training sets). The classification performance of the trained gkm-SVM was measured by using receiver operating characteristic (ROC) curves with fivefold cross-validation. The gkm-SVM training and cross-validation were achieved by using the “gkmtrain” function of “LS-GKM: a new gkm-SVM software for large-scale datasets” (Lee 2016). For details, please check https://github.com/Dongwon-Lee/lsgkm.

2). Generate SVM weights of all possible 10-mers based on the trained gkm-SVM

The SVM weights of all possible 10-mers were generated by using the “gkmpredict” function of “LS-GKM”.

3). Infer ancestor sequence

The ancestor sequence was inferred from sequence alignment with a sister species and an outgroup.

4). Infer positive selection

After we got the SVM weights of all possible 10-mers, and both the ancestor and focal sequences, we infered signal of positive selection by using "testPosSelec.pl". This script was saved in "scripts" folder, and was modified from "deltasvm.pl", a script that calculates deltaSVM scores, which contributed by Lee et al. (2015).

  1. The scripts were used to generate all figures in the paper

Please check "selection_analysis.R" in the "scripts" folder

  1. The data was used to generate all figures in the paper

Please check the "data" folder

  1. Reference

Ghandi M, Mohammad-Noori M, Ghareghani N, Lee D, Garraway L, Beer MA. 2016. gkmSVM: an R package for gapped-kmer SVM. Bioinformatics 32:2205–2207.

Lee D. 2016. LS-GKM: a new gkm-SVM for large-scale datasets. Bioinformatics 32:2196–2198.

Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS, Beer MA. 2015. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47:955–961.

a-robust-method-for-detecting-positive-selection-on-regulatory-sequences's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.