Giter Club home page Giter Club logo

Comments (10)

dpryan79 avatar dpryan79 commented on July 22, 2024

Hi Tim,

Yes, I've looked a bit at the VCF output from Bis-SNP. While I can certainly just implement that as a multi-sample output option, it's never really been clear to me if that's the best way to go (e.g., I'm not even sure that there are any tools for analysing BSseq data in R that would natively accept that format). What I don't want to do is create yet another output format that nothing supports. Ideally, I or someone else would try to get a bunch of people together to agree on some sort of (likely VCF related) format...but I've just not had time to do that. At least with a larger group discussing a common format we could agree on what sorts of things in addition to the raw counts to include (larger sequence context, distance to nearest indel or SNP, etc.).

Anyway, if you'd find a Bis-SNP like VCF output useful I'd be happy to get it added.

Devon

from methyldackel.

ttriche avatar ttriche commented on July 22, 2024

It's not so much that it's useful, rather that it's already there :-)

The situation with DNA methylation output formats is a bit ridiculous. It would appear that the number of formats is equal to or greater than the number of people working on it.

The one thing that would be nice in PoM is an option to output a DNAm fractional bedGraph (0-1) and one for counts (C+T total) instead of the Grand Unified BedGraph it does right now. When I have a moment I will implement that properly (instead of as the hideous little script it is now).

VCF for SNPs and bedGraph/bigWig for fraction/counts seems to be as close as there is to a standard now.

--t

On Apr 21, 2015, at 12:00 AM, Devon Ryan [email protected] wrote:

Hi Tim,

Yes, I've looked a bit at the VCF output from Bis-SNP. While I can certainly just implement that as a multi-sample output option, it's never really been clear to me if that's the best way to go (e.g., I'm not even sure that there are any tools for analysing BSseq data in R that would natively accept that format). What I don't want to do is create yet another output format that nothing supports. Ideally, I or someone else would try to get a bunch of people together to agree on some sort of (likely VCF related) format...but I've just not had time to do that. At least with a larger group discussing a common format we could agree on what sorts of things in addition to the raw counts to include (larger sequence context, distance to nearest indel or SNP, etc.).

Anyway, if you'd find a Bis-SNP like VCF output useful I'd be happy to get it added.

Devon


Reply to this email directly or view it on GitHub.

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

A colleague just emailed me the updated VCFv4.X specification with methylation-specific changes that IHEC is using. I'll use that since it looks like the previous BisSNP-specific stuff is being replaced.

from methyldackel.

ttriche avatar ttriche commented on July 22, 2024

see also https://github.com/zwdzwd/biscuit

from methyldackel.

Shicheng-Guo avatar Shicheng-Guo commented on July 22, 2024

I think wig, bigwig or bedgraph were still the most popular way for methylation analysis and storage.

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

@Shicheng-Guo True, but VCF has a number of benefits to it, such as allowing you to easily filter out positions that contain apparent variants, or natively supporting storing multiple samples. Whenever I have a bit of free time I'll get this implemented (though obviously one could also just use biscuit).

from methyldackel.

Shicheng-Guo avatar Shicheng-Guo commented on July 22, 2024

Yes. I agree with you. VCF format can be integrated in most genetics analysis software or algorithm.

from methyldackel.

JohnLonginotto avatar JohnLonginotto commented on July 22, 2024

Opening a VCF is one thing, but you need to do something useful with it in the context of methylation for it to be the defacto methylation format.

Personally, i'm not sure there is a perfect format for the data. Some people want methylation as a %, some as a ratio, some with total counts, some with counts broken out into individual bases... right now PoM supports a lot of formats, which is nice. That's like the best possible scenario. If I were to suggest anything as an enhancement, it would simply be that you could make your own format with some keywords like --format chr start_pos_0 end_pos_1 count_A count_T percentage or whatever. Or, you know, use SQLite since we're writing all this to disk as a table in ASCII anyway ;)

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

IHEC is trying to put together a standard VCF representation for methylation data. Ideally that'd catch on and then people could just use a single format, rather than every single tool reinventing the wheel.

from methyldackel.

JohnLonginotto avatar JohnLonginotto commented on July 22, 2024

Well if that could catch on, I think that would be the best outcome -- perhaps IHEC is one of the few consortium big enough to get it to work too :)

from methyldackel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.