Comments (11)
Hi! I'm having the same issue and i would like to know how to add the option u=True or v=True in my command line.
from intervene.
Hi!!
I find this tool fairly helpful. I tested intervene on my files and it works fine for me. I actually manually calculated the overlapping fraction of regions between two files and intervene calculated the correct percentage. When finding intersecting regions what important is to know which file will act as your reference. Above the diagonal of the matrix file that intervene produce are the scenarios where the files representing the column names are taken as reference (i.e. -a $column_header_files).
Below diagonals are the scenarios where the row names represent the files taken as a reference (-a). It is fair enough to calculate what fraction of regions of the reference file overlaps with the other bed file. Using -u option will prevent the overcounting of features present in the reference files that overlap with other bed files. For two files at a time intervene performs two intersections. As indicated by (daler/pybedtools#45 (comment))
Not using u=True results in another issue -- the total number of features for overlapping with b is greater than the number of features in a in the first place:
If the amount of overlap is your concern you can use 50% of overlap as threshold and -r option of bedtools intersect which does not count the overlaps discussed by (daler/pybedtools#45 (comment)) by using --bedtools-options f=0.50,r function
from intervene.
Thanks @Rohit-Satyam for your interest and input.
@amizeranschi Intervene
is not designed to give you a list of overlapping regions, rather it spits out how many of the individual regions in a.bed
overlap with b.bed
and vise vera. To get a list of overlapping regions you need to use bedtools, bedops, or bedr.
For the u=True; v=True
I just posted here #35 (comment)
You can also provide additional bedtools
arguments using --bedtools-options
from intervene.
from intervene.
I want to get the intersection region of A, B, C. Orange track from your tools. can you explain what happened?
from intervene.
When you intersect with intervene the order of your files does matter and you will end up having slightly different results with a different order. Also, make sure that for Intervene we are using pybedtoost with u=True or v=True
. I hope this helps,
from intervene.
Hi @asntech
Regarding your earlier comment that When you intersect with intervene the order of your files does matter and you will end up having slightly different results with a different order.
Why is this so? Set intersection should be a commutative operation, even when evaluating the overlaps between sets of genomic regions.
Based on this older remark (daler/pybedtools#45 (comment)), it would make sense to run bedtools
with u=False
, because reporting the actual overlapping regions correctly is more informative than simply counting the number of overlaps between regions. Similarly, I don't see the logic for setting v=True
.
The way I see it, genomic regions are also sets themselves, with the unit element being the base pair. Thus, intersecting (sets of) genomic regions should accurately output the base-pairs common to those regions, even if the resulting number of overlapping elements won't match with the numbers of regions from the original BED files.
from intervene.
Thanks for your thoughts. Keep in mind, I'm not interested in counting the overlapping regions as separate entities. Instead, I'm interested in getting the total length (in bp) of the overlapping regions (1 bp minimum overlap) from two or more BED files. From this point of view, the intersection operation should be commutative, i.e. the order of input BED files shouldn't matter (see the example below).
I've also compared Intervene
's results with those of two other tools and the results are different (even though the other tools "agree" with each other). Have a look: #34.
I would like to somehow get Intervene
to give me the same results as bedops
and bedr
as shown in the previous link. According to daler/pybedtools#45 (comment), running bedtools
within Intervene
with u=False
should give me the right results. Using the example from the previous link:
a.bed ------------- --------------------------
b.bed ----------- ----- ------
Overlaps: --------- ----- ------
However, I don't see a way of doing this with Intervene
right now.
Out of curiosity, have you used bedops
before? Its --intersect
operation is much simpler (IMO) than that of bedtools
. It also, supposedly, runs quite faster, when the input files are sorted. https://bedops.readthedocs.io/en/latest/content/reference/set-operations/bedops.html#intersect-i-intersect.
It seems that Intervene
's idea of intersecting BED files is more akin to the --element-of
operation from bedops
: https://bedops.readthedocs.io/en/latest/content/reference/set-operations/bedops.html#element-of-e-element-of.
from intervene.
@asntech I was hoping to use Intervene
for my use case, due to its user-friendly way of creating UpSet diagrams. While bedr
does what I need when setting feature = "bp"
, it only outputs Venn diagrams for up to 5 BED files.
from intervene.
Hi @asntech
I have some more doubts regarding the fractions of overlap reported by intervene. I will try to talk this with an example:
I had two-bed files adipose-tissue.csv and blood.csv
and the following is the number of features (coordinates) in them
adipose-tissue.csv 978
blood.csv 3326
I tried doing what intervene would do given two files
bedtools intersect -a blood.csv -b adipose-tissue.csv -f 0.50 -u | wc -l
Gives me 354 overlapping features. If I had to calculate the fraction of features of adipose-tissue.csv overlapping with the blood.csv, I would calculate 354/978 which will give me 0.3619. However, Intervene is calculating 354/3326 which I think is quite misleading. Can you help me understand why Intervene does that?
from intervene.
Thanks so much for the great tool of intervene!
I have some doubts and wish can have your guidance!
It seems the --bedtools-options r ("bedtools intersect -r ") is default from the venn plot result, which in the Venn plot may has less overlap, while 90% of bed1 file may has overlap with 30% of bed2, in the -r mode, it may only display 10% overlap in the Venn plot.
So which is better to display in the Venn plot, with or without the -r mode?
Thanks so much for your guidance! @asntech Appreciate!
from intervene.
Related Issues (20)
- --bedtools-options doesn't work
- conda install not working with existing environment python=3.7 HOT 1
- search not working in readthedocs HOT 1
- error when attempting to use pairwise HOT 5
- total number diff in venn plot HOT 2
- Bedtools-options does not produce different results HOT 1
- upsetr script not plotting HOT 2
- Intervene venn –save-overlaps not working HOT 5
- Unable to install using conda HOT 5
- ipython-notebook not installing in conda HOT 4
- Intervene Docker HOT 1
- The intervene venn diagram is misleading HOT 3
- Support *.bedpe format
- --save-overlaps is not working HOT 1
- intervene pairwise: ValueError: keyword grid_b is not recognized; HOT 1
- Memory exhausted - too many files HOT 2
- Error in pairwise module HOT 9
- Differences between bedtools command line vs intervene pairwise? HOT 2
- Python 3.10 compatibility issues HOT 3
- Plotting crashed for small matrix with 0 and 1 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from intervene.