Giter Club home page Giter Club logo

Comments (24)

dpryan79 avatar dpryan79 commented on July 22, 2024

I suppose something equivalent to how MethylDackel mbias works internally would work here as well. There, instead of the status for each C being written to the output, it's stored in a vector according to whether it's C or T. What sort of output format are you looking for with this?

Out of curiosity, what sort of project would benefit from something like this?

from methyldackel.

nchernia avatar nchernia commented on July 22, 2024

from methyldackel.

nchernia avatar nchernia commented on July 22, 2024

Would it be better for me to modify the mbias code instead to extract this information?

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

@nchernia Actually, I just double checked and I use a pileup there as well. I can hack something together easily enough to do this, but do you really need it to be multithreaded? It's easy enough to write a single-threaded application to do this (one could even use python with pysam), but it'll take a lot more time to write it to handle threads. Could you privately send me the basic reason for wanting to do this (I've modified versions of this for in-the-pipeline commercial products before, so I'm used to not blabbing about other's projects)? There almost has to be a more efficient way than going over a gigantic text file with read names and either a bunch of ....h..H... or percentages.

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

FYI, the perRead stuff is now going into the perRead branch. One can now MethylDackel perRead ... with that and it seems not to crash. But I haven't really tested it yet, so hold off on trying it unless you're a glutton for punishment. It's written to be multithreaded, though I don't think the actual options for that are there at the moment.

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

The perRead branch seems to produce correct results now. Multithreading works properly and I added some other rudimentary options. It should be easy enough to add other options in. Let me know if you need anything else from that or need the output in a different format.

from methyldackel.

nchernia avatar nchernia commented on July 22, 2024

Thanks! I will give it a try very soon.

from methyldackel.

nchernia avatar nchernia commented on July 22, 2024

I've been working with this function, thanks. One question - I should expect the reads to appear more than once in the output if they align in multiple places, right?

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

Good question. You should see each instance, since it doesn't use filter_func(). I'll try to make a short test of that, though.

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

I can confirm that you should see each instance. If you run MethylDackel perRead tests/cg100.fa tests/cg_aln.bam in the source tree you'll see each instance of read1 in the output. I should add a method to filter reads, though. If you need --ignoreFlags or --requireFlags now then let me know.

from methyldackel.

nchernia avatar nchernia commented on July 22, 2024

Yes, I should have clarified that I do see multiple instances, just double checking that I understood how it worked. Yes, I need --ignoreFlags and --requireFlags though this isn't urgent.

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

Both those options should now be supported (they both work properly when I test them locally).

from methyldackel.

nchernia avatar nchernia commented on July 22, 2024

Hi - perRead seems to include MAPQ 0 reads. These are filtered out by default in "extract", right?

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

extract filters base score of 0 to avoid double calling overlapping ends of mates. The default -q will also skip MAPQ 0, but that can be changed. If I never added a MAPQ thresholding option to perRead I can do that, just let me know if you need it.

from methyldackel.

nchernia avatar nchernia commented on July 22, 2024

I'm a little confused. In the usage for extract it says:
Options:
-q INT Minimum MAPQ threshold to include an alignment (default 10)

So I thought reads with MAPQ < 10 were not included in extract. Is that correct?

I had assumed perRead would have these same defaults. For me it's not strictly necessary but may be useful for others.

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

You're correct regarding extract. I wrote perRead to be pretty bare-bones, so it includes most things by default. I can add the -q and -p options, though, with the defaults from extract.

from methyldackel.

nchernia avatar nchernia commented on July 22, 2024

Thanks; whatever you think is best.

Just one more clarification question, perRead looks just at CpG methylation, right? Not CHH or CHG (unless these are passed in)?

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

Correct, it's just CpG at this point. I'll try to get MAPQ and base quality filtering added today.

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

The -q and -p options should now be present with the same default values as extract. I've not thoroughly tested these, so let me know if they're buggy.

from methyldackel.

nchernia avatar nchernia commented on July 22, 2024

Question - sometimes the output is

readname chr position 0 0

Why does it output "no evidence"? Is it just seeing that the read is in the appropriate region but doesn't cover a CpG?

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

My guess is that this happens when all of the bases are ignored due to quality filtering, but I'm not sure off-hand.

from methyldackel.

SamERoss avatar SamERoss commented on July 22, 2024

Hello,

Would it also be possible to add CHG and CHH options for perRead?

from methyldackel.

amatthopkins avatar amatthopkins commented on July 22, 2024

Hi there!

Thanks for making this! Just wanted to ask about column 5 of the output - number of informative bases. Is the number of informative bases the sum of the number of methylated Cs and unmethylated Cs in CpGs in a given read?

from methyldackel.

dpryan79 avatar dpryan79 commented on July 22, 2024

@amatthopkins Yes, exactly.

from methyldackel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.