hms-dbmi / upsetr Goto Github PK
View Code? Open in Web Editor NEWAn R implementation of the UpSet set visualization technique published by Lex, Gehlenborg, et al..
Home Page: https://cran.rstudio.com/web/packages/UpSetR
License: Other
An R implementation of the UpSet set visualization technique published by Lex, Gehlenborg, et al..
Home Page: https://cran.rstudio.com/web/packages/UpSetR
License: Other
It appears that R linked both "intersection" and "intersect" with the intersect function help page. What would be a suitable name?
The package should be built and tested every time we push code to the repo. Integration with Travis seems to be the most popular way to get CI for R packages:
http://www.r-bloggers.com/continuous-integration-for-r-packages/
That would make it easier to see those data points highlighted in the color of the query.
The font should match the "Intersection Size" font.
Hi, thanks so much for the great package!
it seems that mainbar.y.max can not accept values that are less than the size of the largest intersection size bar. If doing so I get the following error: "Error: Aesthetics must be either length 1 or the same as the data (59): fill In addition: Warning message: Removed 4 rows containing missing values (position_stack)."
I was wondering whether you implemented this with coord_cartesian() in ggplot2 or not. If yes I was wondering why I might get this error.
BW
Philipp
Said custom query function should be included in the package.
Dear UpSetR team,
Thank you for a great piece of work.
My input file has 7 columns with each column having genenames like this
head input.txt
Sample1,Sample2,Sample3,Sample4,Sample5,Sample6,Sample7
uc008vaw.1Rnu11,uc008vaw.1Rnu11,uc012aua.1AB339930,uc012ath.2Rn45s,uc008gfm.1AK197973,uc008gfm.1AK197973,uc012bec.1Mir122a
uc008ztz.1AK212710,uc008ztz.1AK212710,uc008ztz.1AK212710,uc008vaw.1Rnu11,uc008gfl.1AK181808,uc008gfl.1AK181808,uc008yaz.2Alb
uc009phb.3Apoa1,uc008gfl.1AK181808,uc033fml.1Mir8114,uc008ztz.1AK212710,uc012bhe.1Neat1,uc012bhe.1Neat1,uc008vxy.1Errfi1
uc008yaz.2Alb,uc008gfm.1AK197973,uc011zoa.1Mir320,uc008gfl.1AK181808,uc008gfk.1AK148054,uc008gfk.1AK148054,uc008eet.2Ttr
uc007puo.1Hist1h4c,uc011zxb.1Rnu12,uc008vaw.1Rnu11,uc012bhe.1Neat1,uc012bec.1Mir122a,uc008odl.1Pck1,uc009phb.3Apoa1
uc012aua.1AB339930,uc012aua.1AB339930,uc012bty.1Mir2861,uc008gfk.1AK148054,uc008yaz.2Alb,uc012bec.1Mir122a,uc008vxw.1Errfi1
But when I read this CSV file with read.csv function in R and issue the upset command, I get an error saying "Error in start_col:end_col : argument of length 0"
Should I transform my input file?
Kindly advice.
Thanks
G
specific_intersections <- function(data, first.col, last.col, intersections, order_mat,
aggregate, decrease, cut, mbar_color){
sets <- names(data[c(first.col:last.col)])
keep <- unique(unlist(intersections))
remove <- sets[which(!sets %in% keep)]
remove <- which(names(data) %in% remove)
data <- data[-remove]
when keep
is equivalent to sets
remove is set to integer(0)
and data gets wiped out
Browse[2]> remove
integer(0)
Browse[2]> data[-remove]
data frame with 0 columns and 14632 rows
http://www.nature.com/nmeth/journal/v11/n8/full/nmeth.3033.html
(only accessible for subscribers of Nature Methods and causing a 401 HTTP response if users are not logged in) should be http://www.nature.com/nmeth/journal/v11/n8/abs/nmeth.3033.html
(open to everyone).
Using df of values for upset:
require( upset )
upset( df )
returns the following error:
Error in theme(panel.background = element_rect(fill = "white"), plot.margin = unit(c(0.5, :
could not find function "unit"
Fixed by manually loading grid:
require( grid )
upset( df )
** plot produced **
Session info:
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] baseother attached packages:
[1] reshape2_1.4.1 ggplot2_1.0.1 xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-7
[6] UpSetR_1.2.0 stringr_1.0.0 magrittr_1.5 dplyr_0.4.1 plyr_1.8.3loaded via a namespace (and not attached):
[1] assertthat_0.1 colorspace_1.2-6 DBI_0.3.1 digest_0.6.8
[5] gridExtra_2.0.0 gtable_0.1.2 knitr_1.11 labeling_0.3
[9] lazyeval_0.1.10 MASS_7.3-44 munsell_0.4.2 parallel_3.1.1
[13] proto_0.3-10 Rcpp_0.12.1 scales_0.3.0 stringi_0.5-5
[17] tools_3.1.1
Is there a smart way to set the number of intersections to be shown to all? The documentation for the nintersect
parameter does not describe anything like that, e.g. nintersect=NA
or something along those lines.
Hi. Thanks for such a great Package!
The problem I'm facing is due to the upset
function not returning a value, as does ggplot
, for example. I need to store the plot in a variable to print
it later in a different context. Is it possible for the upset
function to return a "printable" plot object?
Best wishes,
Juanje.
Left out option when recreated how attribute plots are added.
Hi, loving UpSetR!!
I'm working on an 8 way comparison, and would like to induce an axis break in the y-axis. I can't seem to find a way to make this happen in all of the info I've dug through on UpSetR... is there any way to do so? The count of intersections between all 8 sets is 4,000 and next greatest is 800. This results in some rather small bars after that first big one!
Alternatively, if this isn't possible, can UpSet log transform the counts?
I love your work with UpSet, and I had the JavaScript piece on my list to do as an htmlwidget
of the week at BuildingWidgets. I had no idea that you had a R
package until I just spotted it on the CRAN feed. I'd love to volunteer to make this into a htmlwidget
if you would like to pair the interactivity of the JavaScript with the engine of R.
Before htmlwidgets
existed, I had played a little integrating UpSet
with rCharts
http://timelyportfolio.github.io/upset where I added a couple R
datasets to the list. The ugly code is in a fork https://github.com/timelyportfolio/upset.
Hi,
I've been trying to apply a simple coloring scheme to the matrix background but i cannot seem to get it to work. Using example 5 from the metadata vignette I identified these two (possibly related) errors:
plots
list (i.e. "matrix_row" before "hist") results in the following error:Error in rep_len(1, ncol) : invalid 'length.out' value
plots
list results in the following error:Error in
tmp[[i]] : subscript out of bounds
.Happy to provide more details if needed.
Doron Betel [email protected]
Dear jake,
Congrat for this very nice and useful package ! I did have one question or request if there is a solution of course. I would like to order the final intersection by names. If I'm comparing 2 group with different members and I would like to see all the intersections within the 1st group first , then within the second one and finally between members of different group. UpsetR order the intersections by set size and I can not find any option to chose the order of the intersection by header column as it appear in the matrix for exemple.
I don't know if I was clear and thanks you in advance for your help
The parameter name aggregate.by
is a misnomer. It should be group.by
. Example 3 in the the basic usage vignette should be renamed, too.
The set labels are in the middle of the set size bars and the set intersection matrix. As they equally label both of them, they should be symmetric, i.e., centered and the tic found for the matrix should also be there for the bars (or not at all).
Key features:
reqs for formatting data should appear earlier in the R doc. Now buried at the end of a hard to find web page.
(via https://twitter.com/fabiencampagne/status/721073665077616641 / via @fac2003)
To be controlled via a parameter empty.intersections
that is FALSE
by default.
The Nature Methods PoV should still be mentioned.
library(UpSetR)
My_data <- read.csv( system.file("extdata", "movies.csv", package = "UpSetR"), header=T, sep=";" )
upset_base(My_data, first.col = 3, last.col = 19, nsets = 6, att.x = "ReleaseDate",att.y = "AvgRating", point.size = 3, att.color = "black", main.bar.color = "black", show.numbers = "yes", queries = list(c("Drama", "Romance", "red"), c("Horror", "Drama", "Thriller", "green"), c("Drama", "Comedy", "blue")))
The order of the query colors in the main bar chart is mixed up for the bar in commit 8b0d810:
This can be handled with a custom plot. We should include the scatter plot, histogram and boxplot example custom plots as methods in the package so that they can be used out of the box.
Users should not have to (and typically shouldn't at all) specify colors for anything, we should instead provide good defaults.
I would suggest to use color only for selections. In this example here:
We could get rid of the blue shading of the background and the blue bars for the sets.The blue bars for the sets might make some sense to distinguish them from the intersections, but on the other hand, they are the same data type.
This is a good starting point for selection colors:
http://colorbrewer2.org/?type=qualitative&scheme=Paired&n=10
> upset(binarymatrix,nsets=6,intersections=list(list("Heart_Up","Muscle_Up"),
+ list("Heart_Down", "Muscle_Up"),
+ list("Heart_Down","Muscle_Unchanged"),
+ list("Heart_Down","Muscle_Down"),
+ list("Heart_Unchanged","Muscle_Up"),
+ list("Heart_Unchanged","Muscle_Unchanged"),
+ list("Heart_Unchanged","Muscle_Down"))
+ )
Error in `[.data.frame`(data, keep) : undefined columns selected
5 sets works fine, and 6 sets works if I don't specify intersections
Quoting @JakeConway:
It could probably work for lower versions. I only put that because thats the version I was using at the time I began working on it. The only limitation on how early the R version can be is if they can use ggplot2 version 1.0.1, gridExtra version 0.9.1, and plyr version 1.8.3
Nice package!
The font size of mainbar.y.label
and the y axis tick marks are too small for my liking. Is it possible to increase their size? Could something like the name.size
argument be added for these components? The demo figures do not seem to have this problem, but I cannot reproduce the large font size on my local machine.
http://docs.ggplot2.org/0.9.2.1/theme.html
Some discussion about which parts of the plot should not be affected by themes will be required.
Hello UpSetR Team,
Is there a way that we can extract the intersection values from upset command into a text file?
Can upset feed its values to Chowruskey?
Thanks
Code to replicate:
library(UpSetR)
movies <- read.csv( system.file("extdata", "movies.csv", package = "UpSetR"), header=TRUE, sep=";" )
UpSetR::upset(movies, keep.order = TRUE, sets = c("Drama", "Thriller", "Action"))
UpSetR::upset(movies, keep.order = TRUE, sets = c("Thriller", "Drama", "Action"))
In the first plot, "Drama" is the largest set in the barplot on the left. In the second plot, "Thriller" is the largest set.
My matrix looks like this
ensembl_gene_id Heart_Down Heart_Unchanged Heart_Up Muscle_Down Muscle_Unchanged Muscle_Up
1 ENSMUSG00000000001 0 1 0 1 0 0
2 ENSMUSG00000000028 0 1 0 0 1 0
3 ENSMUSG00000000031 0 1 0 0 1 0
4 ENSMUSG00000000056 0 1 0 0 1 0
5 ENSMUSG00000000058 0 1 0 0 1 0
6 ENSMUSG00000000078 0 1 0 0 1 0
...
There are 9 meaningful intersections but two of them are empty. Then there are a lot of meaningless intersections. I need a way to display all 9.
The "empty" circles should be a slightly darker gray.
Currently the nintersect
parameter limits the number of intersections shown by default to 40.
Should we show all sets by default?
Feature to adjust position of numbers above bars to prevent overlapping. Begins occurring when a lot of intersections in plot
That would better reflect what they are intended to be used for.
Hello Jake,
I am having an issue when trying a simple intersection with UpSetR. The data is as follows:
Name;Amelonado;Contamana;Criollo;Curaray;Guianna;Iquitos;Maranon;Nacional;Nanay;Purus
Thecc1EG000181;1;0;0;0;0;0;0;0;0;0
Thecc1EG001933;1;0;0;0;0;0;0;0;0;0
Thecc1EG003999;1;0;0;0;0;0;0;0;0;0
Thecc1EG005677;1;0;0;0;0;0;0;0;0;0
Thecc1EG006000;1;0;0;0;0;0;0;0;0;0
.
.
.
I load the data
genes <- read.csv("mygenes",header=T,sep=";")
and try upset with:
upset(genes2, sets = c("Amelonado", "Contamana", "Criollo", "Curaray", "Guianna", "Iquitos", "Maranon", "Nacional", "Nanay", "Purus"), sets.bar.color = "#56B4E9",order.by = "freq", empty.intersections = "on")
I obtain the error:
Error in start_col:end_col : argument of length 0
but I tried comparing with your example sets and I cannot figure out what the potential problem could be.
Has anyone experienced this problem?
thanks
Omar
Hello there,
Trying to use a dataframe in R I generated from an abundance matrix t, I keep getting this error when using UpSetR:
Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?
Here's the beggining of the said dataframe:
OTU_IDS Amphibolite Basalt Coal Coal-Upper Dolomite Hematite_Granite High-calcite_clay mica_schist
1 KC442817.1.1478 1 0 0 0 0 0 0
2 JX222276.1.1475 0 1 0 0 0 0 0
3 HQ218444.1.1344 0 0 0 0 0 0 1
4 DQ517124.1.1380 0 0 0 0 0 0 0
5 KF827260.1.1400 0 1 0 0 0 0 0
6 EF125930.1.1497 0 0 0 0 0 0 0
Any ideas?
Thanks in advance.
André
http://r-pkgs.had.co.nz/vignettes.html
We need to decide if we want to use sweave to keep our compatibility with R 2.5 and later of if we want to switch to a minimum requirement of R 3.0 and use rmarkdown for the vignette.
0, 45 or 90 degrees should be supported. Look at how plot/axis are setting label orientation or how it is typically done in ggplot and follow that pattern.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.