Giter Club home page Giter Club logo

reptile's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

reptile's Issues

Variables in the training data missing in newdata

Hi,
I am trying to run reptile on pre-trained model mm_model_coreMarks.reptile using methylation data. Is there any issue with bw generation, I have methylation base call bed files containing chr no, start, end, methylation rate. I convereted it into bw file using the following commands:
awk '{printf "%s\t%d\t%d\t%2.3f\n" , $1,$2,$3,$4}' myBed.bed > myFile.bedgraph
sort -k1,1 -k2,2n myFile.bedgraph > myFile_sorted.bedgraph
bedGraphToBigWig myFile_sorted.bedgraph myChrom.sizes myBigWig.bw

I tried alone Meth epimark as well as all four H3K4me1 etc given for mm_model_coreMarks.reptile model. The output of REPTILE_preprocess.py is preprocessed.region_with_epimark.tsv file and look like this:
chr start end id Meth_E4 H3K4me1_E4 H3K4me3_E4 H3K27ac_E4
chr1 0 2000 bin_0 0.0 0.0 0.0 0.0
chr1 100 2100 bin_1 0.0 0.0 0.0 0.0
chr1 200 2200 bin_2 0.0 0.0 0.0 0.0
chr1 300 2300 bin_3 0.0 0.0 0.0 0.0
chr1 400 2400 bin_4 0.0 0.0 0.0 0.0
chr1 500 2500 bin_5 0.0 0.0 0.0 0.0
chr1 600 2600 bin_6 0.0 0.0 0.0 0.0
chr1 700 2700 bin_7 0.0 0.0 0.0 0.0
chr1 800 2800 bin_8 0.0 0.0 0.0 0.0
chr1 900 2900 bin_9 0.0 0.0 0.0 0.0
chr1 1000 3000 bin_10 0.0 0.0 0.0 0.0
.
.
chr1 3211200 3213200 bin_32112 5.0 5.0 5.0 5.0
chr1 3211300 3213300 bin_32113 5.0 5.0 5.0 5.0
chr1 3211400 3213400 bin_32114 5.0 5.0 5.0 5.0
chr1 3211500 3213500 bin_32115 4.0 4.0 4.0 4.0
chr1 3211600 3213600 bin_32116 3.3 3.3 3.3 3.3
chr1 3211700 3213700 bin_32117 2.54545 2.54545 2.54545 2.54545
chr1 3211800 3213800 bin_32118 2.69231 2.69231 2.69231 2.69231
chr1 3211900 3213900 bin_32119 3.0 3.0 3.0 3.0
chr1 3212000 3214000 bin_32120 2.85714 2.85714 2.85714 2.85714

Now when I run the compute score command:
REPTILE_compute_score.R -i data_info_file2 -m mm_model_coreMarks.reptile -a tmp/mm39_w2kb_s100bp_preprocessed.region_with_epimark.tsv -s E4 -o tmp/E4__compute_pred

I get the following error:
Error in predict.randomForest(reptile_classifier, epimark, type = "prob") :
variables in the training data missing in newdata
Calls: reptile_predict_genome_wide ... reptile_predict_one_mode -> predict -> predict.randomForest
Execution halted
Are there any specific trained model available for only DNA methylation data to predict enhancers.
Note: I tried with both genome wide and region specific.

The full training data is missing

Hi, I would like to train a model for genome wide predictions and I found that the example dataset given is having a subset of training dataset (Chr19). Can you please share the full training dataset used for training.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.