Giter Club home page Giter Club logo

dreamtf's Introduction

ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge

Instructions

Competition: **source: https://www.synapse.org/#!Synapse:syn6131484/wiki/402033 ** Install meme suite to use ama to calculate tf scores along the DNA.

Define a base directory and, on top of that, a writeup directory.

Copy all the github files in the base/writeup/ directory.

Copy these data in the base directory:

Extract all files

training_data.ChIPseq.tar
training_data.DNASE_wo_bams.tar
training_data.RNAseq.tar
training_data.annotations.tar

Under the writeup dirctory, at the beginning of 'functions_for_main_program.R' set the variables tf, for what transcription factor will be assessed and the base directory.

In annotations directory, use bedtools to extract fasta from bed coordinates:

 bedtools getfasta -fi hg19.genome.fa -bed test_regions.blacklistfiltered.bed -fo test_regions.blacklistfiltered.fa
 bedtools getfasta -fi hg19.genome.fa -bed ladder_regions.blacklistfiltered.bed -fo ladder_regions.blacklistfiltered.fa

Get background file for meme suite (ama) in the writeup directory:

 fasta-get-markov ../annotations/hg19.genome.fa hg19markov.bkg

Then source the file 'functions_for_main_program.R'

source('functions_for_main_program.R')

As it is stated in the main.R execute the function run_execbash_sh_on_tf_folder_after_this(tf) and, using the terminal(I used ubuntu xenial), go to the folder 'base/writeup/results/tf' where the tf was defined in the beginning and execute execbash.sh


Important

Execute execbash.sh on the tf directory, to get the file with the TF score using the meme suit


Running Machine Learning and Generating Files to Submit (Leaderboard and Test, if any)

At this point, everything shoud work fine, generating the files to submit in the respective tf directory:

preprocess_writeup(tf)
dd<-load_features(tf)
#machine learnig section
for(e in c(leaderboard,test)){
  xgscore<-xgbtrain(e)
  rfscore<-rftrain(e)
  fcs(xgscore,rfscore,e)
}

Name: maximus

Ricardo Paixao dos Santos

Prof. Katlin Brauer Massirer, Ph.D.

Lab of RNA and microRNA Regulation in Disease,

Center for Molecular Biology and Genetic Engineering

University of Campinas, UNICAMP

Av Candido Rondon, 400 Campinas,13083-875, Brazil

ps: For now, this codes are enouth to get file to submit. Later I can add some graphics and insights.

dreamtf's People

Contributors

rpsantosa avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.