Competition: **source: https://www.synapse.org/#!Synapse:syn6131484/wiki/402033 ** Install meme suite to use ama to calculate tf scores along the DNA.
Define a base directory and, on top of that, a writeup directory.
Copy all the github files in the base/writeup/ directory.
Copy these data in the base directory:
training_data.ChIPseq.tar
training_data.DNASE_wo_bams.tar
training_data.RNAseq.tar
training_data.annotations.tar
Under the writeup dirctory, at the beginning of 'functions_for_main_program.R' set the variables tf, for what transcription factor will be assessed and the base directory.
In annotations directory, use bedtools to extract fasta from bed coordinates:
bedtools getfasta -fi hg19.genome.fa -bed test_regions.blacklistfiltered.bed -fo test_regions.blacklistfiltered.fa
bedtools getfasta -fi hg19.genome.fa -bed ladder_regions.blacklistfiltered.bed -fo ladder_regions.blacklistfiltered.fa
fasta-get-markov ../annotations/hg19.genome.fa hg19markov.bkg
Then source the file 'functions_for_main_program.R'
source('functions_for_main_program.R')
As it is stated in the main.R execute the function run_execbash_sh_on_tf_folder_after_this(tf) and, using the terminal(I used ubuntu xenial), go to the folder 'base/writeup/results/tf' where the tf was defined in the beginning and execute execbash.sh
Execute execbash.sh on the tf directory, to get the file with the TF score using the meme suit
At this point, everything shoud work fine, generating the files to submit in the respective tf directory:
preprocess_writeup(tf)
dd<-load_features(tf)
#machine learnig section
for(e in c(leaderboard,test)){
xgscore<-xgbtrain(e)
rfscore<-rftrain(e)
fcs(xgscore,rfscore,e)
}
Name: maximus
Ricardo Paixao dos Santos
Prof. Katlin Brauer Massirer, Ph.D.
Lab of RNA and microRNA Regulation in Disease,
Center for Molecular Biology and Genetic Engineering
University of Campinas, UNICAMP
Av Candido Rondon, 400 Campinas,13083-875, Brazil
ps: For now, this codes are enouth to get file to submit. Later I can add some graphics and insights.