Comments (24)
I suppose something equivalent to how MethylDackel mbias
works internally would work here as well. There, instead of the status for each C being written to the output, it's stored in a vector according to whether it's C or T. What sort of output format are you looking for with this?
Out of curiosity, what sort of project would benefit from something like this?
from methyldackel.
from methyldackel.
Would it be better for me to modify the mbias code instead to extract this information?
from methyldackel.
@nchernia Actually, I just double checked and I use a pileup there as well. I can hack something together easily enough to do this, but do you really need it to be multithreaded? It's easy enough to write a single-threaded application to do this (one could even use python with pysam), but it'll take a lot more time to write it to handle threads. Could you privately send me the basic reason for wanting to do this (I've modified versions of this for in-the-pipeline commercial products before, so I'm used to not blabbing about other's projects)? There almost has to be a more efficient way than going over a gigantic text file with read names and either a bunch of ....h..H...
or percentages.
from methyldackel.
FYI, the perRead stuff is now going into the perRead
branch. One can now MethylDackel perRead ...
with that and it seems not to crash. But I haven't really tested it yet, so hold off on trying it unless you're a glutton for punishment. It's written to be multithreaded, though I don't think the actual options for that are there at the moment.
from methyldackel.
The perRead
branch seems to produce correct results now. Multithreading works properly and I added some other rudimentary options. It should be easy enough to add other options in. Let me know if you need anything else from that or need the output in a different format.
from methyldackel.
Thanks! I will give it a try very soon.
from methyldackel.
I've been working with this function, thanks. One question - I should expect the reads to appear more than once in the output if they align in multiple places, right?
from methyldackel.
Good question. You should see each instance, since it doesn't use filter_func()
. I'll try to make a short test of that, though.
from methyldackel.
I can confirm that you should see each instance. If you run MethylDackel perRead tests/cg100.fa tests/cg_aln.bam
in the source tree you'll see each instance of read1
in the output. I should add a method to filter reads, though. If you need --ignoreFlags
or --requireFlags
now then let me know.
from methyldackel.
Yes, I should have clarified that I do see multiple instances, just double checking that I understood how it worked. Yes, I need --ignoreFlags and --requireFlags though this isn't urgent.
from methyldackel.
Both those options should now be supported (they both work properly when I test them locally).
from methyldackel.
Hi - perRead seems to include MAPQ 0 reads. These are filtered out by default in "extract", right?
from methyldackel.
extract
filters base score of 0 to avoid double calling overlapping ends of mates. The default -q
will also skip MAPQ 0, but that can be changed. If I never added a MAPQ thresholding option to perRead
I can do that, just let me know if you need it.
from methyldackel.
I'm a little confused. In the usage for extract it says:
Options:
-q INT Minimum MAPQ threshold to include an alignment (default 10)
So I thought reads with MAPQ < 10 were not included in extract. Is that correct?
I had assumed perRead would have these same defaults. For me it's not strictly necessary but may be useful for others.
from methyldackel.
You're correct regarding extract
. I wrote perRead to be pretty bare-bones, so it includes most things by default. I can add the -q
and -p
options, though, with the defaults from extract
.
from methyldackel.
Thanks; whatever you think is best.
Just one more clarification question, perRead looks just at CpG methylation, right? Not CHH or CHG (unless these are passed in)?
from methyldackel.
Correct, it's just CpG at this point. I'll try to get MAPQ and base quality filtering added today.
from methyldackel.
The -q
and -p
options should now be present with the same default values as extract
. I've not thoroughly tested these, so let me know if they're buggy.
from methyldackel.
Question - sometimes the output is
readname chr position 0 0
Why does it output "no evidence"? Is it just seeing that the read is in the appropriate region but doesn't cover a CpG?
from methyldackel.
My guess is that this happens when all of the bases are ignored due to quality filtering, but I'm not sure off-hand.
from methyldackel.
Hello,
Would it also be possible to add CHG and CHH options for perRead?
from methyldackel.
Hi there!
Thanks for making this! Just wanted to ask about column 5 of the output - number of informative bases. Is the number of informative bases the sum of the number of methylated Cs and unmethylated Cs in CpGs in a given read?
from methyldackel.
@amatthopkins Yes, exactly.
from methyldackel.
Related Issues (20)
- Understanding --minConversionEfficiency argument HOT 1
- Genome browsing from MethylDackel bedGraphCpG file
- per-fragment methylation HOT 2
- mbias result is different between bismark and bwameth output HOT 1
- CURL_OPENSSL conflict with samtools HOT 3
- Installation failure: "bigWig.h: No such file or directory" HOT 1
- Clarification on definition of "unmethylated C" HOT 1
- Coverage of C sites HOT 1
- mbias HOT 1
- Does indel effect the methylation calling or C context determination HOT 1
- Positions in cytosine_report did not match the regions in providing bed file
- about M-bias HOT 1
- Could not repeat a CpG extraction with the same reference file and its index
- Mixed up reads within bam file
- How to index genome file for MethylDackel? HOT 1
- Confused regarding CTOT, CTOB. Are there suggested values? HOT 3
- Issue running MethylDackel extract in parallel mode using minConversionEfficiency
- Question about minimum coverage
- Different CpG calls when using different regions of inclusion for Methyldackel extract
- Alignment trimming for Soft Clip reads?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from methyldackel.