Giter Club home page Giter Club logo

deeppn-methyl's Introduction

deepPN-methyl

We have developed a tool that can predict the methylation of m6A from the sequencing data of pacbio and nanopore, and achieved excellent performance in both training data and test data. For pacbio data, the methylation sequence detected by IPDsummary software was used for training; for nanopore data, the current information detected by Tombo software was used for training.

Depetency

Python3.7 pysam>=0.15 biopython>=1.78 pandas>=0.24 numpy>=1.18 sklearn>=0.22 pytorch>=1.5

Note:the current prediction and training scripts must be CUDA

The pre training model is in the network disk, please download it by yourself 链接:https://pan.baidu.com/s/1hRof76lXcGYzX7JPU_UEAA 提取码:1234

Usage

1. Pretreatment

1.1 Pacbio-Pretreatment

bam files from the lower machine and the reference genome were first aligned using pbmm2 alignment software.

pbmm2 index [options] <ref.fa|xml> <out.mmi>

pbmm2 align [options] <ref.fa|xml|mmi> <in.bam|xml|fa|fq> [out.aligned.bam|xml]

Please refer to https://github.com/PacificBiosciences/pbmm2 , The sequences of interest are then called from the alignment file using the kinetics tool

ipdSummary aligned.bam --reference ref.fasta --identify m6A,m4C --gff basemods.gff

(please refer to https://github.com/PacificBiosciences/kineticsTools) and the result is written to a csv file from which the desired sequences are extracted written to a fasta file. If a site is known and needs to be verified as a methylated site, extract a sequence with a total length of 41 BP each 20bp up - and downstream of the site, write to a fasta file.

1.2 Nanopore-Pretreatment

First use tombo resquiggle path/to/fast5s/ genome.fasta --processes 4 --num-most-common-errors 5

Please refer to https://github.com/nanoporetech/tombo , this step processes the fast5 files down the machine, and the processing results are re written back to the fast5 files. Then use deepsignal extract --fast5_dir fast5s.al/ --reference_path GCF_000146045.2_R64_genomic.fna --write_path fast5s.al.CpG.signal_features.17bases.rawsignals_360.tsv --corrected_group RawGenomeCorrected_001 --nproc 10

Please refer to https://github.com/bioinfomaticsCSU/deepsignal, this step will extract the sequence information and current information from the fast5 file.

2. Predict

Use our trained model to predict directly and write the score value of the result into the file which is named by yourself.

python predict.py predict_fa (.fa) -model_path (checkpoint) -outfile (file_name)

3. Train by yourself

Use our pretreatment process to train a new model with your known A site.We also provide a negative set for your training.More hyperparametric adjustment instructions can be obtained using the python train.py -h

python train.py -pos_fa (pos.fa) -neg_fa (neg.fa) -outdir (dir_name)

Additional

If you want to get not only the predicted score, but also the various evaluation indicators of the prediction, you can use python test.py.Please refer to python test.py -h

deeppn-methyl's People

Contributors

laguda-1996 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.