Giter Club home page Giter Club logo

batmeth2's Introduction

BatMeth2: An Integrated Package for Bisulfite DNA Methylation Data Analysis with Indel-sensitive Mapping.

This is a README file for the usage of Batmeth2.

BatMeth2 tutotial: https://www.dna-asmdb.com/tools/batmeth2-tutorial/batmeth2.html

REQUIREMENTS

  1. gcc (v4.8) , gsl library, zlib

  2. R (ggplot2, pheatmap, xtable)

  3. samtools (suggest: v1.3.1)

  4. fastp, raw reads as input need

INSTALL

a) Download 1)

b) unzip 1)

c) Change directory into the top directory of b) "BatMeth2/"

d) Type

  • ./configure
  • make
  • make copy

If your do not need process file of gzip format, you can install the tools:

  • ./configure
  • make nogzip
  • make copy-nogzip

e) The binary of BatMeth2 will be created in bin/

BUILDING INDEX

a) Have a fasta-formatted reference file ready

b) Type "BatMeth2 build_index GENOME.fa" for WGBS or BatMeth2 build_index rrbs GENOME.fa for RRBS to make the neccessary pairing data-structure based on FM-index.

c) Run "BatMeth2" to see information on usage.

USAGE of BatMeth2

Example Data

You can download the test data on https://drive.google.com/open?id=1SEpvJbkjwndYcpkd39T11lrBytEq_MaC .

Example data contain files:

  • input fastq.gz (paired end)
  • genome file
  • usage code and details
  • gene annotation file

Citation:

Zhou Q, Lim J-Q, Sung W-K, Li G: An integrated package for bisulfite DNA methylation data analysis with Indel-sensitive mapping. BMC Bioinformatics 2019, 20:47.

BatMeth2 pipeline

An easy-to-use, auto-run package for DNA methylation analyses

​ In order to complete the DNA methylation data analysis more conveniently, we packaged all the functions to complete an easy-to-use, auto-run package for DNA methylation analysis. During the execution of BatMeth2 Tool, an html report is generated about statistics of the sample.

The usage is here:

Raw reads:

BatMeth2 pipel --fastp ~/location/to/fastp -1 Raw_reads_1.fq.gz -2 Raw_read_2.fq.gz -g ./batmeth2index/genome.fa -o meth -p 6 --gff ./gene.gff

Or clean reads:

BatMeth2 pipel -1 Clean_reads_1.fq.gz -2 Clean_read_2.fq.gz -g ./batmeth2index/genome.fa -o meth -p 6 --gff ./gene.gff

BatMeth2 pipeline parameters

BatMeth2 [mode][paramaters]

mode: build_index, pipel, align, calmeth, annoation, methyPlot, batDMR, visul2sample

[build_index]
Usage: (must run this step first)

  1. BatMeth2 build_index genomefile.

  2. BatMeth2 build_index rrbs genomefile.

[pipel (Contains: align, calmeth, annoation, methyPlot, mkreport)]
[fastp location]
   --fastp fastp program location.
   If --fastp is not defined, the input file should be clean data.
[select aligner]
--aligner    BatMeth2(default), bwa-meth, bsmap, bismark2, no (exit output_prefix.sam file, no need align again)
[other aligners paramaters]
--go     Name of the genome, contaion index build by aligner. (bwa-meth/bismark2)
[main paramaters]
--config   [config file].    When we run pipel function in batches datasets,
     please fill in the specified configuration file.
     And there is a sample file (multirun.onf) in the BatMeth2 directory.
--mp [4]   When batch processing data, we set the number of samples to run at a time (-mp, default is 4), and each sample needs six threads (- P parameter) by default.
-o   Name of output file prefix
-O   Output of result file to specified folder, default output to current folder (./)
[alignment paramaters]
-i    Name of input file, if paired-end. please use -1, -2, input files can be separated by commas
-1    Name of input file left end, if single-end. please use -i
-2    Name of input file left end
-g   Name of the genome mapped against
-n   maximum mismatches allowed due to seq. errors
-p   Launch threads
[calmeth paramaters]
--Qual     calculate the methratio while read QulityScore >= Q. default:10
--redup    REMOVE_DUP, 0 or 1, default 0
--region   Bins for DMR calculate , default 1000bp .
-f    for sam format outfile contain methState. [0 or 1], default: 0 (dont output this file).
[calmeth and annoation paramaters]
--coverage    >= coverage. default:5
--binCover    >= nCs per region. default:3
--chromstep   Chromosome using an overlapping sliding window of 100000bp at a step of 50000bp. default step: 50000(bp)
[annoation paramaters]
--gtf/--gff/--bed   Gtf or gff file / bed file
--distance    DNA methylation level distributions in body and -bp flanking sequences. The distance of upstream and downstream. default:2000
--step    Gene body and their flanking sequences using an overlapping sliding window of 5% of the sequence length at a step of 2.5% of the sequence length. So default step: 0.025 (2.5%)
-C    <= coverage. default:1000
[mkreport paramaters]
Make a batmeth2 html report, can see the detail in BatMeth2_Report/ directory.
-o [outprefix]
[align paramaters:]
see the details in 'BatMeth2 align'
[calmeth paramaters:]
see the details in 'BatMeth2 calmeth'
[annotion paramaters:]
see the details in 'BatMeth2 annoation'
[methyPlot paramaters:]
see the details in 'BatMeth2 methyPlot'
[batDMR paramaters:]
see the details in 'BatMeth2 batDMR'
[visul2sample paramaters:]
see the details in 'BatMeth2 visul2sample'
-h|--help usage

OUTPUT FILE

Output file format and details see "https://github.com/GuoliangLi-HZAU/BatMeth2/blob/master/output_details.pdf".

Output report details see "https://www.dna-asmdb.com/download/batmeth2.html" .

Functions in BatMeth2

BatMeth2 has the following main features:

  1. Batmeth2 has efficient and accurate alignment performance.
  2. Batmeth2 can calculate DNA methylation level of base site
  3. BatMeth2 also can caculate and annotation DNA methylation level on chromosome region or gene/TE etc. functional region.
  4. By integrating BS-Seq data visualization (DNA methylation distribution on chromosome and gene etc) and BatMeth2 can show the results of the DNA methylation data more clearly.
  5. BatMeth2 can perform effective DNA methylation differential regions analysis based on the number of input samples and user requirements. And BatMeth2 provide differential methylation annotation ability.

1. BatMeth2 alignment

Program: BatMeth2 align

Version: v1.0

[ Single-end-reads ]

Usage: batmeth2 -g INDEX -i INPUT -o OUTPUT

Example: batmeth2 -g /data/index/hg19/hg19.fa -i Read.fq -o outPrefix -p 10

[ Paired-end-reads ]

Usage: batmeth2 -g INDEX -i INPUT_left -i INPUT_right -o OUTPUT

Example: batmeth2 -g /data/index/hg19/hg19.fa -i Read_R1_left.fq -i Read_R2_right.fq -o outPrefix -p 10

Parameters :

​ --inputfile | -i Name of input file

​ --genome | -g Name of the genome mapped against

​ --outputfile | -o Name of output file prefix

​ --indelsize indel size

​ --non_directional Alignments to all four bisulfite strands will be reported. Default: OFF.

​ --insertsize | -s inital insert size, default 800

​ --std | -d standard deviatiion of reads distribution

​ --flanksize | -f size of flanking region for Smith-Waterman

​ --swlimit | try at most sw extensions

​ --threads | -p Launch threads

​ --NoInDels | -I not to find the indels result

​ --help | -h Print help

Note: To use BatMeth2, you need to first index the genome with `build_all genome'.

2. BatMeth2 calmeth

Command Format : calmeth [options] -g GENOME -i/-b <Samfile/Bamfile> -m -p 6

Usage:

​ -g|--genome Genome

​ -i|--input Sam format file

​ -b|--binput Bam format file

​ -p|--threads the number of threads.

​ -n|--Nmismatch Number of mismatches

​ -m|--methratio [MethFileNamePrefix] Predix of methratio output file

​ -Q [int] caculate the methratio while read QulityScore >= Q. default:10

​ -c|--coverage >= coverage. default:5

​ -nC >= nCs per region. default:5

​ -R |--Regions Bins for DMR caculate , default 1kb .

​ --binsfile DNA methylation level distributions in chrosome, default output file: {methratioPrefix}.methBins.txt

​ -s|--step Chrosome using an overlapping sliding window of 100000bp at a step of 50000bp. default step: 50000(bp)

​ -r|--remove_dup REMOVE_DUP, default:false

​ -f|--sam [outfile] f for sam format outfile contain methState. default: sam format.

​ --sam-seq-beforeBS Converting BS read to the genome sequences.

​ -h|--help

3. BatMeth2 annotation

BatMeth2: MethyGff v1.0

Command Format : methyGff [options] -o <OUT_PREFIX> -G GENOME -gff /-gtf /-b -m [-B][-P]

Usage:

​ -o|--out Output file prefix

​ -G|--genome Genome

​ -m|--methratio Methratio output file.

​ -c|--coverage >= coverage. default:5

​ -C <= coverage. default 600.

​ -nC >= Cs per bins or genes. default:5

​ -gtf|-gff Gtf/gff file

​ -b|--BED Bed file, chrom start end (strand)

​ -d|--distance DNA methylation level distributions in body and -bp flanking sequences. The distance of upstream and downstream. default:2000

​ -B|--body For different analysis input format, gene/TEs body methylation level. [Different Methylation Gene(DMG/DMT...)]

​ -P|--promoter For different analysis input format.[Different Methylation Promoter(DMP)]

​ --TSS Caculate heatmap for TSS. [Outfile: outPrefix.TSS.cg.n.txt]

​ --TTS Caculate heatmap for TTS. [Outfile: outPrefix.TTS.cg.n.txt]

​ --GENE Caculate heatmap for GENE and flank 2k. [Outfile: outPrefix.GENE.cg.n.txt]

​ -s|--step Gene body and their flanking sequences using an overlapping sliding window of 5% of the sequence length at a step of 2.5% of the sequence length. So default step: 0.025 (2.5%)

​ -S|--chromStep Caculate the density of genes/TEs in chromsome using an overlapping sliding window of 100000bp at a step of 50000bp, must equal "-s" in Split.. default step: 50000(bp)

​ -h|--help

4. BatMeth2 methyPlot

Usage1:

  • methyPlot chromsome.bins.txt chrosome.methy.distri.pdf stepdefault:0.025 Infile1.from.batmeth2:methyGff out1.pdf starLabel endLabel Infile2 out2.pdf

eg: methyPlot chromsome.bins.txt chrosome.methy.distri.pdf 0.025 gene.meth.Methylevel.1.txt methlevel.pdf TSS TTS gene.meth.AverMethylevel.1.txt elements.pdf

**Usage2: **

  • methyPlot chromsome.bins.txt chrosome.methy.distri.pdf 0.025 gene.meth.Methylevel.1.txt methlevel.pdf TSS TTS gene.meth.AverMethylevel.1.txt elements.pdf test.annoDensity.1.txt test.density.pdf sampleElmentName test.mCdensity.txt test.mCdensity.pdf test.mCcatero.txt test.mCcatero.pdf 0.8 0.1 0.1

Visulization case 2: two more samples

Contains:

  • The density of gene, transposon elements (TE) and the level of DNA methylation in the different samples of the whole genome.
    $ Rscript density_plot_with_methyl.r inputFile1 input2 genedensityFile TEdensity output.pdf label1 label2

    *example: Rscript ~/software/batmeth2/src/density_plot_with_methyl.r WT.methChrom.Rd.txt Mutant.methChrom.Rd.txt WT.noRd.Gff.gffDensity.1.txt WT.noRd.TE.gffDensity.1.txt density.Out.pdf WT mutant *

  • DNA methylation level distribution across genes/TEs in different samples.

    $ Rscript methylevel.elements.r step(default:0.025) Input.Sample1.from.Batmeth2:methyGff Input.Sample2 outfilePrefix xLab1 xLab2 Sample1Prefix Sample2Prefix

    *example: Rscript methylevel.elements.compare.r 0.025 sample1.gene.meth.Methylevel.1.txt sample2.gene.meth.Methylevel.1.txt methlevel TSS TTS mutant WT *

5. BatMeth2 DMC or DMR/DMG

dmr-batmeth2.png

BatMeth2: DMR

Command Format : DMR [options] -g genome.fa -o_dm <DM_result> -1 [Sample1-methy ..] -2 [sample2-methy ..]

Usage:

​ -o_dm output file

​ -o_dmr when use auto detect by dmc

​ -g|--genome Genome

​ -1 sample1 methy files, sperate by space.

​ -2 sample2 methy files, sperate by space.

​ -FDR adjust pvalue cutoff default : 0.05

​ -methdiff the cutoff of methylation differention. default: 0.25 [CpG]

​ -element caculate gene or TE etc function elements.

​ -L predefinded regions

​ -h|--help

  1. Pre-definded regions (Gene/TE/UTR/CDS...,but must run 'combined.element sample1 sample2 sample1out sample2out' before batDMR)

$ BatMeth2 batDMR -g genome -L -o_dm dm.output.txt -1 [sample1.methC.txt replicates ..] -2 [sample2.methC.txt replicates ..]

  1. Auto define DMR region according the dmc

$ BatMeth2 batDMR -g genome -o_dm dm.output.txt -o_dmr dmr.output.txt -1 [sample1.methC.txt replicates ..] -2 [sample2.methC.txt replicates ..]

The output format
    1. dmc <br> Chrom position starnd context pvalue adjust_pvalue combine_pvalue corrected_pvalue cover_sample1 meth_sample1 cover_sample2 cover_sample2 meth.diff 
    
    2. dmr <br> Chrom start end dmr score meth.diff aver_corrected_pvalue 
Filter the result
awk '$6<0.05 && sqrt($11*$11)>0.6 ' H9vsIMR90.gene.dmr.txt > H9vsIMR90.gene.dmr.filter.txt
DM annotation

BatMeth2: DMCplot

Command Format : DMCannotationPlot [options] -o <Out_File> -G GENOME -g -d -c

Usage:

​ -o|--out Output file name.

​ -G|--genome Genome file

​ -d|--dmcFile dmc file. Format: Chrome Location strand

​ Format: chr pos strand

​ -g|--gff Gff files, 1 or more

​ -c|--context mC context, CG[default]/CHG/CHH/C.

​ -h|--help

DMCannotationPlot [options] -o <Out_File> -G GENOME -g <GFF files.. eg: TE.gff gene.gff CDS.gff intron.gff lncRNA.gff ...> -d <dmc file> -c <mC context default: CG>

Attention:
1.DMC file format : chr pos strand
2.GFF files are separated by spaces



Make sure all index files reside in the same directory.

Built with BatMeth2 build_index Genome.fa

=-=-=-=-=-=-=-=-=-=

GNU automake v1.11.1, GNU autoconf v2.63, gcc v4.4.7.

Tested on Red Hat 4.4.7-11 Linux

Thank you for your patience.

batmeth2's People

Contributors

zhouqiangwei avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.