BatMeth2: An Integrated Package for Bisulfite DNA Methylation Data Analysis with Indel-sensitive Mapping.
BatMeth2 tutotial: https://www.dna-asmdb.com/tools/batmeth2-tutorial/batmeth2.html
-
gcc (v4.8) , gsl library, zlib
-
R (ggplot2, pheatmap, xtable)
-
samtools (suggest: v1.3.1)
-
fastp, raw reads as input need
a) Download 1)
b) unzip 1)
c) Change directory into the top directory of b) "BatMeth2/"
d) Type
- ./configure
- make
- make copy
If your do not need process file of gzip format, you can install the tools:
- ./configure
- make nogzip
- make copy-nogzip
e) The binary of BatMeth2 will be created in bin/
a) Have a fasta-formatted reference file ready
b) Type "BatMeth2 build_index GENOME.fa
" for WGBS or BatMeth2 build_index rrbs GENOME.fa
for RRBS to make the neccessary pairing data-structure based on FM-index.
c) Run "BatMeth2
" to see information on usage.
Example Data
You can download the test data on https://drive.google.com/open?id=1SEpvJbkjwndYcpkd39T11lrBytEq_MaC
.
Example data contain files:
- input fastq.gz (paired end)
- genome file
- usage code and details
- gene annotation file
An easy-to-use, auto-run package for DNA methylation analyses
In order to complete the DNA methylation data analysis more conveniently, we packaged all the functions to complete an easy-to-use, auto-run package for DNA methylation analysis. During the execution of BatMeth2 Tool, an html report is generated about statistics of the sample.
The usage is here:
Raw reads:
BatMeth2 pipel --fastp ~/location/to/fastp -1 Raw_reads_1.fq.gz -2 Raw_read_2.fq.gz -g ./batmeth2index/genome.fa -o meth -p 6 --gff ./gene.gff
Or clean reads:
BatMeth2 pipel -1 Clean_reads_1.fq.gz -2 Clean_read_2.fq.gz -g ./batmeth2index/genome.fa -o meth -p 6 --gff ./gene.gff
BatMeth2 [mode][paramaters]
mode: build_index, pipel, align, calmeth, annoation, methyPlot, batDMR, visul2sample
[build_index]
Usage: (must run this step first)
-
BatMeth2 build_index genomefile.
-
BatMeth2 build_index rrbs genomefile.
[pipel (Contains: align, calmeth, annoation, methyPlot, mkreport)]
[fastp location]
--fastp fastp program location.
If --fastp is not defined, the input file should be clean data.
[select aligner]
--aligner BatMeth2(default), bwa-meth, bsmap, bismark2, no (exit output_prefix.sam file, no need align again)
[other aligners paramaters]
--go Name of the genome, contaion index build by aligner. (bwa-meth/bismark2)
[main paramaters]
--config [config file]. When we run pipel function in batches datasets,
please fill in the specified configuration file.
And there is a sample file (multirun.onf) in the BatMeth2 directory.
--mp [4] When batch processing data, we set the number of samples to run at a time (-mp, default is 4), and each sample needs six threads (- P parameter) by default.
-o Name of output file prefix
-O Output of result file to specified folder, default output to current folder (./)
[alignment paramaters]
-i Name of input file, if paired-end. please use -1, -2, input files can be separated by commas
-1 Name of input file left end, if single-end. please use -i
-2 Name of input file left end
-g Name of the genome mapped against
-n maximum mismatches allowed due to seq. errors
-p Launch threads
[calmeth paramaters]
--Qual calculate the methratio while read QulityScore >= Q. default:10
--redup REMOVE_DUP, 0 or 1, default 0
--region Bins for DMR calculate , default 1000bp .
-f for sam format outfile contain methState. [0 or 1], default: 0 (dont output this file).
[calmeth and annoation paramaters]
--coverage >= coverage. default:5
--binCover >= nCs per region. default:3
--chromstep Chromosome using an overlapping sliding window of 100000bp at a step of 50000bp. default step: 50000(bp)
[annoation paramaters]
--gtf/--gff/--bed Gtf or gff file / bed file
--distance DNA methylation level distributions in body and -bp flanking sequences. The distance of upstream and downstream. default:2000
--step Gene body and their flanking sequences using an overlapping sliding window of 5% of the sequence length at a step of 2.5% of the sequence length. So default step: 0.025 (2.5%)
-C <= coverage. default:1000
[mkreport paramaters]
Make a batmeth2 html report, can see the detail in BatMeth2_Report/ directory.
-o [outprefix]
[align paramaters:]
see the details in 'BatMeth2 align'
[calmeth paramaters:]
see the details in 'BatMeth2 calmeth'
[annotion paramaters:]
see the details in 'BatMeth2 annoation'
[methyPlot paramaters:]
see the details in 'BatMeth2 methyPlot'
[batDMR paramaters:]
see the details in 'BatMeth2 batDMR'
[visul2sample paramaters:]
see the details in 'BatMeth2 visul2sample'
-h|--help usage
Output file format and details see "https://github.com/GuoliangLi-HZAU/BatMeth2/blob/master/output_details.pdf".
Output report details see "https://www.dna-asmdb.com/download/batmeth2.html" .
BatMeth2 has the following main features:
- Batmeth2 has efficient and accurate alignment performance.
- Batmeth2 can calculate DNA methylation level of base site
- BatMeth2 also can caculate and annotation DNA methylation level on chromosome region or gene/TE etc. functional region.
- By integrating BS-Seq data visualization (DNA methylation distribution on chromosome and gene etc) and BatMeth2 can show the results of the DNA methylation data more clearly.
- BatMeth2 can perform effective DNA methylation differential regions analysis based on the number of input samples and user requirements. And BatMeth2 provide differential methylation annotation ability.
Program: BatMeth2 align
Version: v1.0
[ Single-end-reads ]
Usage: batmeth2 -g INDEX -i INPUT -o OUTPUT
Example: batmeth2 -g /data/index/hg19/hg19.fa -i Read.fq -o outPrefix -p 10
[ Paired-end-reads ]
Usage: batmeth2 -g INDEX -i INPUT_left -i INPUT_right -o OUTPUT
Example: batmeth2 -g /data/index/hg19/hg19.fa -i Read_R1_left.fq -i Read_R2_right.fq -o outPrefix -p 10
Parameters :
--inputfile | -i Name of input file
--genome | -g Name of the genome mapped against
--outputfile | -o Name of output file prefix
--indelsize indel size
--non_directional Alignments to all four bisulfite strands will be reported. Default: OFF.
--insertsize | -s inital insert size, default 800
--std | -d standard deviatiion of reads distribution
--flanksize | -f size of flanking region for Smith-Waterman
--swlimit | try at most sw extensions
--threads | -p Launch threads
--NoInDels | -I not to find the indels result
--help | -h Print help
Note: To use BatMeth2, you need to first index the genome with `build_all genome'.
Command Format : calmeth [options] -g GENOME -i/-b <Samfile/Bamfile> -m -p 6
Usage:
-g|--genome Genome
-i|--input Sam format file
-b|--binput Bam format file
-p|--threads the number of threads.
-n|--Nmismatch Number of mismatches
-m|--methratio [MethFileNamePrefix] Predix of methratio output file
-Q [int] caculate the methratio while read QulityScore >= Q. default:10
-c|--coverage >= coverage. default:5
-nC >= nCs per region. default:5
-R |--Regions Bins for DMR caculate , default 1kb .
--binsfile DNA methylation level distributions in chrosome, default output file: {methratioPrefix}.methBins.txt
-s|--step Chrosome using an overlapping sliding window of 100000bp at a step of 50000bp. default step: 50000(bp)
-r|--remove_dup REMOVE_DUP, default:false
-f|--sam [outfile] f for sam format outfile contain methState. default: sam format.
--sam-seq-beforeBS Converting BS read to the genome sequences.
-h|--help
BatMeth2: MethyGff v1.0
Command Format : methyGff [options] -o <OUT_PREFIX> -G GENOME -gff /-gtf /-b -m [-B][-P]
Usage:
-o|--out Output file prefix
-G|--genome Genome
-m|--methratio Methratio output file.
-c|--coverage >= coverage. default:5
-C <= coverage. default 600.
-nC >= Cs per bins or genes. default:5
-gtf|-gff Gtf/gff file
-b|--BED Bed file, chrom start end (strand)
-d|--distance DNA methylation level distributions in body and -bp flanking sequences. The distance of upstream and downstream. default:2000
-B|--body For different analysis input format, gene/TEs body methylation level. [Different Methylation Gene(DMG/DMT...)]
-P|--promoter For different analysis input format.[Different Methylation Promoter(DMP)]
--TSS Caculate heatmap for TSS. [Outfile: outPrefix.TSS.cg.n.txt]
--TTS Caculate heatmap for TTS. [Outfile: outPrefix.TTS.cg.n.txt]
--GENE Caculate heatmap for GENE and flank 2k. [Outfile: outPrefix.GENE.cg.n.txt]
-s|--step Gene body and their flanking sequences using an overlapping sliding window of 5% of the sequence length at a step of 2.5% of the sequence length. So default step: 0.025 (2.5%)
-S|--chromStep Caculate the density of genes/TEs in chromsome using an overlapping sliding window of 100000bp at a step of 50000bp, must equal "-s" in Split.. default step: 50000(bp)
-h|--help
Usage1:
- methyPlot chromsome.bins.txt chrosome.methy.distri.pdf stepdefault:0.025 Infile1.from.batmeth2:methyGff out1.pdf starLabel endLabel Infile2 out2.pdf
eg: methyPlot chromsome.bins.txt chrosome.methy.distri.pdf 0.025 gene.meth.Methylevel.1.txt methlevel.pdf TSS TTS gene.meth.AverMethylevel.1.txt elements.pdf
**Usage2: **
- methyPlot chromsome.bins.txt chrosome.methy.distri.pdf 0.025 gene.meth.Methylevel.1.txt methlevel.pdf TSS TTS gene.meth.AverMethylevel.1.txt elements.pdf test.annoDensity.1.txt test.density.pdf sampleElmentName test.mCdensity.txt test.mCdensity.pdf test.mCcatero.txt test.mCcatero.pdf 0.8 0.1 0.1
Visulization case 2: two more samples
Contains:
-
The density of gene, transposon elements (TE) and the level of DNA methylation in the different samples of the whole genome.
$Rscript density_plot_with_methyl.r inputFile1 input2 genedensityFile TEdensity output.pdf label1 label2
*example: Rscript ~/software/batmeth2/src/density_plot_with_methyl.r WT.methChrom.Rd.txt Mutant.methChrom.Rd.txt WT.noRd.Gff.gffDensity.1.txt WT.noRd.TE.gffDensity.1.txt density.Out.pdf WT mutant *
-
DNA methylation level distribution across genes/TEs in different samples.
$
Rscript methylevel.elements.r step(default:0.025) Input.Sample1.from.Batmeth2:methyGff Input.Sample2 outfilePrefix xLab1 xLab2 Sample1Prefix Sample2Prefix
*example: Rscript methylevel.elements.compare.r 0.025 sample1.gene.meth.Methylevel.1.txt sample2.gene.meth.Methylevel.1.txt methlevel TSS TTS mutant WT *
Command Format : DMR [options] -g genome.fa -o_dm <DM_result> -1 [Sample1-methy ..] -2 [sample2-methy ..]
Usage:
-o_dm output file
-o_dmr when use auto detect by dmc
-g|--genome Genome
-1 sample1 methy files, sperate by space.
-2 sample2 methy files, sperate by space.
-FDR adjust pvalue cutoff default : 0.05
-methdiff the cutoff of methylation differention. default: 0.25 [CpG]
-element caculate gene or TE etc function elements.
-L predefinded regions
-h|--help
- Pre-definded regions (Gene/TE/UTR/CDS...,but must run 'combined.element sample1 sample2 sample1out sample2out' before batDMR)
$ BatMeth2 batDMR -g genome -L -o_dm dm.output.txt -1 [sample1.methC.txt replicates ..] -2 [sample2.methC.txt replicates ..]
- Auto define DMR region according the dmc
$ BatMeth2 batDMR -g genome -o_dm dm.output.txt -o_dmr dmr.output.txt -1 [sample1.methC.txt replicates ..] -2 [sample2.methC.txt replicates ..]
1. dmc <br> Chrom position starnd context pvalue adjust_pvalue combine_pvalue corrected_pvalue cover_sample1 meth_sample1 cover_sample2 cover_sample2 meth.diff
2. dmr <br> Chrom start end dmr score meth.diff aver_corrected_pvalue
awk '$6<0.05 && sqrt($11*$11)>0.6 ' H9vsIMR90.gene.dmr.txt > H9vsIMR90.gene.dmr.filter.txt
BatMeth2: DMCplot
Command Format : DMCannotationPlot [options] -o <Out_File> -G GENOME -g -d -c
Usage:
-o|--out Output file name.
-G|--genome Genome file
-d|--dmcFile dmc file. Format: Chrome Location strand
Format: chr pos strand
-g|--gff Gff files, 1 or more
-c|--context mC context, CG[default]/CHG/CHH/C.
-h|--help
DMCannotationPlot [options] -o <Out_File> -G GENOME -g <GFF files.. eg: TE.gff gene.gff CDS.gff intron.gff lncRNA.gff ...> -d <dmc file> -c <mC context default: CG>
Attention:
1.DMC file format : chr pos strand
2.GFF files are separated by spaces
Make sure all index files reside in the same directory.
Built with BatMeth2 build_index Genome.fa
=-=-=-=-=-=-=-=-=-=
GNU automake v1.11.1, GNU autoconf v2.63, gcc v4.4.7.
Tested on Red Hat 4.4.7-11 Linux
Thank you for your patience.