Comments (10)
Hi Tim,
Yes, I've looked a bit at the VCF output from Bis-SNP. While I can certainly just implement that as a multi-sample output option, it's never really been clear to me if that's the best way to go (e.g., I'm not even sure that there are any tools for analysing BSseq data in R that would natively accept that format). What I don't want to do is create yet another output format that nothing supports. Ideally, I or someone else would try to get a bunch of people together to agree on some sort of (likely VCF related) format...but I've just not had time to do that. At least with a larger group discussing a common format we could agree on what sorts of things in addition to the raw counts to include (larger sequence context, distance to nearest indel or SNP, etc.).
Anyway, if you'd find a Bis-SNP like VCF output useful I'd be happy to get it added.
Devon
from methyldackel.
It's not so much that it's useful, rather that it's already there :-)
The situation with DNA methylation output formats is a bit ridiculous. It would appear that the number of formats is equal to or greater than the number of people working on it.
The one thing that would be nice in PoM is an option to output a DNAm fractional bedGraph (0-1) and one for counts (C+T total) instead of the Grand Unified BedGraph it does right now. When I have a moment I will implement that properly (instead of as the hideous little script it is now).
VCF for SNPs and bedGraph/bigWig for fraction/counts seems to be as close as there is to a standard now.
--t
On Apr 21, 2015, at 12:00 AM, Devon Ryan [email protected] wrote:
Hi Tim,
Yes, I've looked a bit at the VCF output from Bis-SNP. While I can certainly just implement that as a multi-sample output option, it's never really been clear to me if that's the best way to go (e.g., I'm not even sure that there are any tools for analysing BSseq data in R that would natively accept that format). What I don't want to do is create yet another output format that nothing supports. Ideally, I or someone else would try to get a bunch of people together to agree on some sort of (likely VCF related) format...but I've just not had time to do that. At least with a larger group discussing a common format we could agree on what sorts of things in addition to the raw counts to include (larger sequence context, distance to nearest indel or SNP, etc.).
Anyway, if you'd find a Bis-SNP like VCF output useful I'd be happy to get it added.
Devon
—
Reply to this email directly or view it on GitHub.
from methyldackel.
A colleague just emailed me the updated VCFv4.X specification with methylation-specific changes that IHEC is using. I'll use that since it looks like the previous BisSNP-specific stuff is being replaced.
from methyldackel.
see also https://github.com/zwdzwd/biscuit
from methyldackel.
I think wig, bigwig or bedgraph were still the most popular way for methylation analysis and storage.
from methyldackel.
@Shicheng-Guo True, but VCF has a number of benefits to it, such as allowing you to easily filter out positions that contain apparent variants, or natively supporting storing multiple samples. Whenever I have a bit of free time I'll get this implemented (though obviously one could also just use biscuit).
from methyldackel.
Yes. I agree with you. VCF format can be integrated in most genetics analysis software or algorithm.
from methyldackel.
Opening a VCF is one thing, but you need to do something useful with it in the context of methylation for it to be the defacto methylation format.
Personally, i'm not sure there is a perfect format for the data. Some people want methylation as a %, some as a ratio, some with total counts, some with counts broken out into individual bases... right now PoM supports a lot of formats, which is nice. That's like the best possible scenario. If I were to suggest anything as an enhancement, it would simply be that you could make your own format with some keywords like --format chr start_pos_0 end_pos_1 count_A count_T percentage
or whatever. Or, you know, use SQLite since we're writing all this to disk as a table in ASCII anyway ;)
from methyldackel.
IHEC is trying to put together a standard VCF representation for methylation data. Ideally that'd catch on and then people could just use a single format, rather than every single tool reinventing the wheel.
from methyldackel.
Well if that could catch on, I think that would be the best outcome -- perhaps IHEC is one of the few consortium big enough to get it to work too :)
from methyldackel.
Related Issues (20)
- Genome browsing from MethylDackel bedGraphCpG file
- per-fragment methylation HOT 2
- mbias result is different between bismark and bwameth output HOT 1
- CURL_OPENSSL conflict with samtools HOT 3
- Installation failure: "bigWig.h: No such file or directory" HOT 1
- Clarification on definition of "unmethylated C" HOT 1
- Coverage of C sites HOT 1
- mbias HOT 1
- Does indel effect the methylation calling or C context determination HOT 1
- Positions in cytosine_report did not match the regions in providing bed file
- about M-bias HOT 1
- Could not repeat a CpG extraction with the same reference file and its index
- Mixed up reads within bam file
- How to index genome file for MethylDackel? HOT 1
- Confused regarding CTOT, CTOB. Are there suggested values? HOT 3
- Issue running MethylDackel extract in parallel mode using minConversionEfficiency
- Question about minimum coverage
- Different CpG calls when using different regions of inclusion for Methyldackel extract
- Alignment trimming for Soft Clip reads?
- Can MethylDackel extract per-base methylation metrics for TAPS?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from methyldackel.