Comments (7)
The MANUAL file is outdated, sorry. There is no --end-to-end and --local options/modes in HISAT2. I'd like to have soft clipping as a default behavior. When reads span introns with very small anchors (a couple of bases), soft-clipping is often done to represent such small anchors.
from hisat2.
Thanks for the information, we were only a bit surprised to find soft-clipped reads because we assumed it was not happening.
Just to understand the issue a bit better, instead of having a say 2bp match over a splice-junction would HISAT2 rather soft-clip it than call it a splice-junction? If I went ahead to use --sp 1000,1000
to prevent soft-clipping would it then do the right thing and give it a 2M5000N100M or the like CIGAR string?
We recently looked a little into effects on soft-clipping and found that it may add a lot of repetitive (and probably wrong) alignments to data set (https://sequencing.qcfail.com/articles/soft-clipping-of-reads-may-add-potentially-unwanted-alignments-to-repetitive-regions/). This might be not so relevant for RNA-Seq, but for regular DNA-alignments it might well add quite some noise.
Thanks a lot, Kind regards, Felix
from hisat2.
Thanks for your comments! Using --sp 1000,1000 reports 2M5000N100M only when the splice junction is supported by some other reads with long anchors (>= 15bp). It won't report 2S100M. (I edited this a bit as I misunderstood your question first.)
Bowtie2 and HISAT2 are quite different when it comes to the use of soft-clipping. I'd like to see how HISAT2 works with/without soft-clipping for your DNA analysis part in which you used Bowtie2 (--end-to-end and --local modes). HISAT2 with soft-clipping should be more conservative than Bowtie2's local mode, I think (less sensitive alignment and more unique alignments than Bowtie2). BTW, that's a really nice description!
from hisat2.
I can certainly re-run the DNA samples with HISAT2, will let you know the outcome!
from hisat2.
HISAT2 on genomic sequences comparison.pdf
Hi Daehwan,
I have now run the same 100bp Input sample with HISAT2
with and w/o soft-clipping, and with and w/o splice junction models. For this data the splice junctions didn't make much of a difference (which is good). I didn't dig very deep into any kind of analysis, but I can report on a few things I found (see also the slides attached):
- The most striking difference between
Bowtie2
andHISAT2
mapping was the overall mapping efficiency (slide 1): the rate of Unaligned reads withHISAT2
was nearly 3 times higher than withBowtie2
, while the rate of multiple alignments dropped dramatically. - Soft-clipping in
HISAT2
does not lead to a lot of extra 'peaks' in the data, if anything it almost looks like there are more regions with more reads in end-to-end (--sp 1000,1000
) mode. (slide 2) - When you look at some regions it appears that
HISAT2
is behaving nicely, e.g. in the region of satellite repeats on chrX in slide 3 that gains lost of extra reads inBowtie2
local mode - there are quite a lot of regions that lose coverage compared to
Bowtie2
(in either mode) such as in slide 4. This can often be seen in regions with many predicted genes (Gm or RIKEN...), or close to regions that generally look dodgy e.g. close to gaps etc.
Altogether I was quite surprised to see the rather big overall differences between Bowtie2
and HISAT2
, but when you look in more detail then it actually looks like the two agree very well in the vast majority of the genome, and only differ in some regions that look sort of dodgy to me (even though I haven't investigated this any further) whereby my gut feeling is that HISAT2
is more trust worthy in these regions. Again I can't base this on facts though. If you would like to follow any of this up I could share the data on an FTP site with you if need be.
Do you think it would be possible to give users the option to chose between soft-clipping (which you would like to use as the default) and no-softclipping such as an option --end-to-end
or --no-softclipping
(even if this would be setting a very high soft-clipping penalty behind the scenes) because it really isn't very obvious that you would need to do --sp 1000,1000
or the like? Just a thought.
Cheers, Felix
from hisat2.
Hi Felix,
Thank you again for this detailed information. I'll think more about this analysis to see how I can make HISAT2 better.
As you suggested, I will provide a new option, --no-softclipping, in the next release.
Thanks,
Daehwan
from hisat2.
Great, thanks for your quick feedback! Felix
from hisat2.
Related Issues (20)
- hisat2 hangs aligning axolotl reads HOT 1
- Output files(.snp, .haplotype) of hisat2_extract_snps_haplotypes_*.py are empty
- Please add the pbat option of hisat-3n
- A question about methylation information extraction
- Any plans to support Apple Silicon architecture? HOT 1
- Installation Issue Error 1 - make HOT 1
- -np argument seemingly not working
- ERR): "fastq file.fastq" does not exist. Exiting now ...
- [Bug Report] hisat2-align exited with value 137, space complexity of hisat2
- hisat2 location does not exist
- Hisat-3N mapping quality
- hisat2-build index for circRNA-seq
- hisat2-build failed for Segmentation fault
- [Future request] hisat-3n table option to report conversions summarized to genomic feature or reads counts
- Issue with hisatgenotype HOT 1
- Mapping using different parameters --very-sensitive and default
- (ERR): "ref.genome" does not exist Exiting now ...
- --directional-mapping-reverse vs. --rna-strandness on HISAT3N
- Question about calculation of base counts in hisat3Ntable
- mkfifo failed error and change $temp_dir HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hisat2.