hms-dbmi / upsetr Goto Github PK

View Code? Open in Web Editor NEW

732.0 38.0 147.0 30.69 MB

An R implementation of the UpSet set visualization technique published by Lex, Gehlenborg, et al..

Home Page: https://cran.rstudio.com/web/packages/UpSetR

License: Other

R 100.00%

upset upsetr visualization gehlenborglab rstats ggplot2

upsetr's Introduction

UpSetR

Technique

UpSetR generates static UpSet plots. The UpSet technique visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries. The matrix layout enables the effective representation of associated data, such as the number of elements in the aggregates and intersections, as well as additional summary statistics derived from subset or element attributes.

For further details about the original technique see the UpSet website. You can also check out the UpSetR shiny app. Here is the source code for the shiny wrapper.

A Python package called py-upset to create UpSet plots has been created by GitHub user ImSoErgodic.

Citation

If you use UpSetR in a paper, please cite:

Jake R Conway, Alexander Lex, Nils Gehlenborg UpSetR: An R Package for the Visualization of Intersecting Sets and their Properties doi: https://doi.org/10.1093/bioinformatics/btx364

The original technique and the interactive visualization tool implementing the approach are described here:

Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister,
UpSet: Visualization of Intersecting Sets,
IEEE Transactions on Visualization and Computer Graphics (InfoVis '14), vol. 20, no. 12, pp. 1983–1992, 2014.
doi: https://doi.org/10.1109/TVCG.2014.2346248

Sample Data

Sample data sets for UpSetR are included in the package and can be loaded like this:

movies <- read.csv( system.file("extdata", "movies.csv", package = "UpSetR"), header=T, sep=";" )
mutations <- read.csv( system.file("extdata", "mutations.csv", package = "UpSetR"), header=T, sep = ",")

The movie data set created by the GroupLens Lab and curated by Bilal Alsallakh and the mutations data set was originally created by the TCGA Consortium and represents mutations for the 100 most mutated genes in a glioblastoma multiforme cohort.

Examples

In addition to the examples shown here, we have included a range of UpSetR plots in the paper about the R package, which can be found in a separate GitHub repository.

Vignettes

There are currently four vignettes that explain how to use the features included in the UpSetR package:

Demo

A view of the UpSet plot with additional plots based on elements in the intersections.

upset(movies,attribute.plots=list(gridrows=60,plots=list(list(plot=scatter_plot, x="ReleaseDate", y="AvgRating"),
list(plot=scatter_plot, x="ReleaseDate", y="Watches"),list(plot=scatter_plot, x="Watches", y="AvgRating"),
list(plot=histogram, x="ReleaseDate")), ncols = 2))

A view of UpSetR mimicking the plot published by Lex & Gehlenborg http://www.nature.com/nmeth/journal/v11/n8/abs/nmeth.3033.html

upset(mutations, sets = c("PTEN", "TP53", "EGFR", "PIK3R1", "RB1"), sets.bar.color = "#56B4E9",
order.by = "freq", empty.intersections = "on")

An example using two set queries (war movies and noir movies) along with attribute plots comparing the average rating (top) and average rating vs the number of times the movies have been watched (bottom).

upset(movies, attribute.plots=list(gridrows = 100, ncols = 1, 
plots = list(list(plot=histogram, x="AvgRating",queries=T),
list(plot = scatter_plot, y = "AvgRating", x = "Watches", queries = T))), 
sets = c("Action", "Adventure", "Children", "War", "Noir"),
queries = list(list(query = intersects, params = list("War"), active = T),
list(query = intersects, params = list("Noir"))))

Download

Install the latest released version from CRAN

install.packages("UpSetR")

Download the latest development code of UpSetR from GitHub using devtools with

devtools::install_github("hms-dbmi/UpSetR")

upsetr's People

Contributors

Stargazers

Watchers

Forkers

timelyportfolio gvanzin matthewasimonson vincentfirmansyah damianskipiol abremges leipzig xtmgah ginolhac cyang-2014 rambo2015 anukat2015 mha3 xclu vd4mmind sambuckberry saltzmanj b1234561 nbenn gpcr pinussilvestris mworkentine colinandrus fw1121 wafels zouniact gtuckerkellogg geotheory fmichonneau tiramisutes ericealbright noelnamai harnv001 lachlancoin nijibabulu barzine tyvich2 murraycadzow g-thomson richardjacton pijush1285 brooksph ab-kent xiangmaomeng guangchuangyu garthtarr jonocarroll kaitlyncross yf0205 vallurumk mariobecerra nilesh-iiita zhaoxiaohe maxnordlund yixiangzhang1996 juzheng87 madeleineotway hrk2109 wangpanqiao abhijitcbio brianjohnhaas imarcello rogerzou0108 lukestanbra cj2493 ruixiangliu diegoibt gtrichard yongheshinian yjjang gjhanchem weinformatics yikeshu0611 dtburk silvidc lhuangjs lkuchenb sespesogil niknakk martincadek bioshaun boxizhang hedibmustapha axmedmaxamuud wanliu2019 jayhesselberth feigeliudan01 liupfskygre luoluo690 liuxch5 samdus mustafapir yuster0 dancooke dtdavidgit jdblischak ruienziyuan alexhbnr drag05 ajmaurais

upsetr's Issues

grid library not loaded automatically

Using df of values for upset:

require( upset )
upset( df )

returns the following error:

Error in theme(panel.background = element_rect(fill = "white"), plot.margin = unit(c(0.5, :
could not find function "unit"

Fixed by manually loading grid:

require( grid )
upset( df )
** plot produced **

Session info:

R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] reshape2_1.4.1 ggplot2_1.0.1 xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-7
[6] UpSetR_1.2.0 stringr_1.0.0 magrittr_1.5 dplyr_0.4.1 plyr_1.8.3

loaded via a namespace (and not attached):
[1] assertthat_0.1 colorspace_1.2-6 DBI_0.3.1 digest_0.6.8
[5] gridExtra_2.0.0 gtable_0.1.2 knitr_1.11 labeling_0.3
[9] lazyeval_0.1.10 MASS_7.3-44 munsell_0.4.2 parallel_3.1.1
[13] proto_0.3-10 Rcpp_0.12.1 scales_0.3.0 stringi_0.5-5
[17] tools_3.1.1

Control font size of y label and tick marks

Nice package!

The font size of mainbar.y.label and the y axis tick marks are too small for my liking. Is it possible to increase their size? Could something like the name.size argument be added for these components? The demo figures do not seem to have this problem, but I cannot reproduce the large font size on my local machine.

Lower required R version to minimum possible

Quoting @JakeConway:

It could probably work for lower versions. I only put that because thats the version I was using at the time I began working on it. The only limitation on how early the R version can be is if they can use ggplot2 version 1.0.1, gridExtra version 0.9.1, and plyr version 1.8.3

Import Upset Values For Chowruskey

Hello UpSetR Team,

Is there a way that we can extract the intersection values from upset command into a text file?

Can upset feed its values to Chowruskey?

Thanks

set "Set Size" in regular font not bold

The font should match the "Intersection Size" font.

Restructure boxplot summary input

improve input format documentation in R docs

reqs for formatting data should appear earlier in the R doc. Now buried at the end of a hard to find web page.

(via https://twitter.com/fabiencampagne/status/721073665077616641 / via @fac2003)

change reference in upset documentation to UpSet InfoVis 2014 paper

The Nature Methods PoV should still be mentioned.

intersection matrix background overplots labels on intersection bar chart

In some situations the (white) background of the matrix plot overplots the 0 on the y-axis of the intersection bar chart (see image). Can the background of the matrix be made transparent?

set up continuous integration with Travis

The package should be built and tested every time we push code to the repo. Integration with Travis seems to be the most popular way to get CI for R packages:

http://www.r-bloggers.com/continuous-integration-for-r-packages/

support label rotation on bars

0, 45 or 90 degrees should be supported. Look at how plot/axis are setting label orientation or how it is typically done in ggplot and follow that pattern.

Return value?

Hi. Thanks for such a great Package!

The problem I'm facing is due to the upset function not returning a value, as does ggplot, for example. I need to store the plot in a variable to print it later in a different context. Is it possible for the upset function to return a "printable" plot object?

Best wishes,

Juanje.

Choose a good default color palette.

Users should not have to (and typically shouldn't at all) specify colors for anything, we should instead provide good defaults.

I would suggest to use color only for selections. In this example here:

We could get rid of the blue shading of the background and the blue bars for the sets.The blue bars for the sets might make some sense to distinguish them from the intersections, but on the other hand, they are the same data type.

This is a good starting point for selection colors:

http://colorbrewer2.org/?type=qualitative&scheme=Paired&n=10

keep.order only moves labels on sets, the bars don't change

Code to replicate:

library(UpSetR)
movies <- read.csv( system.file("extdata", "movies.csv", package = "UpSetR"), header=TRUE, sep=";" )
UpSetR::upset(movies, keep.order = TRUE, sets = c("Drama", "Thriller", "Action"))
UpSetR::upset(movies, keep.order = TRUE, sets = c("Thriller", "Drama", "Action"))

In the first plot, "Drama" is the largest set in the barplot on the left. In the second plot, "Thriller" is the largest set.

Axis break?

Hi, loving UpSetR!!

I'm working on an 8 way comparison, and would like to induce an axis break in the y-axis. I can't seem to find a way to make this happen in all of the info I've dug through on UpSetR... is there any way to do so? The count of intersections between all 8 sets is 4,000 and next greatest is 800. This results in some rather small bars after that first big one!

Alternatively, if this isn't possible, can UpSet log transform the counts?

add htmlwidget functionality

I love your work with UpSet, and I had the JavaScript piece on my list to do as an htmlwidget of the week at BuildingWidgets. I had no idea that you had a R package until I just spotted it on the CRAN feed. I'd love to volunteer to make this into a htmlwidget if you would like to pair the interactivity of the JavaScript with the engine of R.

Before htmlwidgets existed, I had played a little integrating UpSet with rCharts http://timelyportfolio.github.io/upset where I added a couple R datasets to the list. The ugly code is in a fork https://github.com/timelyportfolio/upset.

Report intersection size data along with plot

support inclusion of empty intersections in matrix and bar chart

To be controlled via a parameter empty.intersections that is FALSE by default.

Query legend for plot when attribute plots selected

Left out option when recreated how attribute plots are added.

allow rotation of bar labels on the intersection bar chart in 45 degree steps

In the plot below the labels at the top of bars are overlapping. There should be an option to rotate them in 45 degree increments.

Center set-lables and add tics to both sides

The set labels are in the middle of the set size bars and the set intersection matrix. As they equally label both of them, they should be symmetric, i.e., centered and the tic found for the matrix should also be there for the bars (or not at all).

Input Data Format

Dear UpSetR team,

Thank you for a great piece of work.

My input file has 7 columns with each column having genenames like this

head input.txt

Sample1,Sample2,Sample3,Sample4,Sample5,Sample6,Sample7
uc008vaw.1Rnu11,uc008vaw.1Rnu11,uc012aua.1AB339930,uc012ath.2Rn45s,uc008gfm.1AK197973,uc008gfm.1AK197973,uc012bec.1Mir122a
uc008ztz.1AK212710,uc008ztz.1AK212710,uc008ztz.1AK212710,uc008vaw.1Rnu11,uc008gfl.1AK181808,uc008gfl.1AK181808,uc008yaz.2Alb
uc009phb.3Apoa1,uc008gfl.1AK181808,uc033fml.1Mir8114,uc008ztz.1AK212710,uc012bhe.1Neat1,uc012bhe.1Neat1,uc008vxy.1Errfi1
uc008yaz.2Alb,uc008gfm.1AK197973,uc011zoa.1Mir320,uc008gfl.1AK181808,uc008gfk.1AK148054,uc008gfk.1AK148054,uc008eet.2Ttr
uc007puo.1Hist1h4c,uc011zxb.1Rnu12,uc008vaw.1Rnu11,uc012bhe.1Neat1,uc012bec.1Mir122a,uc008odl.1Pck1,uc009phb.3Apoa1
uc012aua.1AB339930,uc012aua.1AB339930,uc012bty.1Mir2861,uc008gfk.1AK148054,uc008yaz.2Alb,uc012bec.1Mir122a,uc008vxw.1Errfi1

But when I read this CSV file with read.csv function in R and issue the upset command, I get an error saying "Error in start_col:end_col : argument of length 0"

Should I transform my input file?

Kindly advice.

Thanks
G

Allow for column set names with spaces

allow querying on set names with spaces for intersection query

need a way to display specific empty intersections

My matrix looks like this

     ensembl_gene_id Heart_Down Heart_Unchanged Heart_Up Muscle_Down Muscle_Unchanged Muscle_Up
1 ENSMUSG00000000001          0               1        0           1                0         0
2 ENSMUSG00000000028          0               1        0           0                1         0
3 ENSMUSG00000000031          0               1        0           0                1         0
4 ENSMUSG00000000056          0               1        0           0                1         0
5 ENSMUSG00000000058          0               1        0           0                1         0
6 ENSMUSG00000000078          0               1        0           0                1         0
...

There are 9 meaningful intersections but two of them are empty. Then there are a lot of meaningless intersections. I need a way to display all 9.

Make circles in matrix proportional to row height.

It looks like the circle in the set intersection matrix have a fixed pixel value? They should be set to always be ~80% of the row height.

Here is an example where it looks awkward:

Adjust position of numbers

Feature to adjust position of numbers above bars to prevent overlapping. Begins occurring when a lot of intersections in plot

rename custom.plots to attribute.plots

That would better reflect what they are intended to be used for.

error reading data

Hello Jake,
I am having an issue when trying a simple intersection with UpSetR. The data is as follows:

Name;Amelonado;Contamana;Criollo;Curaray;Guianna;Iquitos;Maranon;Nacional;Nanay;Purus
Thecc1EG000181;1;0;0;0;0;0;0;0;0;0
Thecc1EG001933;1;0;0;0;0;0;0;0;0;0
Thecc1EG003999;1;0;0;0;0;0;0;0;0;0
Thecc1EG005677;1;0;0;0;0;0;0;0;0;0
Thecc1EG006000;1;0;0;0;0;0;0;0;0;0
.
.
.

I load the data

genes <- read.csv("mygenes",header=T,sep=";")

and try upset with:

upset(genes2, sets = c("Amelonado", "Contamana", "Criollo", "Curaray", "Guianna", "Iquitos", "Maranon", "Nacional", "Nanay", "Purus"), sets.bar.color = "#56B4E9",order.by = "freq", empty.intersections = "on")

I obtain the error:
Error in start_col:end_col : argument of length 0

but I tried comparing with your example sets and I cannot figure out what the potential problem could be.
Has anyone experienced this problem?

thanks
Omar

support for ggplot themes

http://docs.ggplot2.org/0.9.2.1/theme.html

Some discussion about which parts of the plot should not be affected by themes will be required.

att.x and att.y functionality should be removed

This can be handled with a custom plot. We should include the scatter plot, histogram and boxplot example custom plots as methods in the package so that they can be used out of the box.

use lightgray for non-query data points in built-in attribute plots if queries are present

That would make it easier to see those data points highlighted in the color of the query.

Fix boxplot.summary manual description

Applying metadata to the Matrix

Hi,
I've been trying to apply a simple coloring scheme to the matrix background but i cannot seem to get it to work. Using example 5 from the metadata vignette I identified these two (possibly related) errors:

When reversing the order of the metadata plots list (i.e. "matrix_row" before "hist") results in the following error:
Error in rep_len(1, ncol) : invalid 'length.out' value
When listing only "matrix_row" and no other plot in the plots list results in the following error:
Error intmp[[i]] : subscript out of bounds.

Happy to provide more details if needed.
Doron Betel [email protected]

change default color for set size bars to dark gray

replace journal URLs in documentation

http://www.nature.com/nmeth/journal/v11/n8/full/nmeth.3033.html (only accessible for subscribers of Nature Methods and causing a 401 HTTP response if users are not logged in) should be http://www.nature.com/nmeth/journal/v11/n8/abs/nmeth.3033.html (open to everyone).

order of query colors mixed in main bar chart

library(UpSetR)

My_data <- read.csv( system.file("extdata", "movies.csv", package = "UpSetR"), header=T, sep=";" )

upset_base(My_data, first.col = 3, last.col = 19, nsets = 6, att.x = "ReleaseDate",att.y = "AvgRating", point.size = 3, att.color = "black", main.bar.color = "black", show.numbers = "yes", queries = list(c("Drama", "Romance", "red"), c("Horror", "Drama", "Thriller", "green"), c("Drama", "Comedy", "blue")))

The order of the query colors in the main bar chart is mixed up for the bar in commit 8b0d810:

change alternating row background color to light gray

The "empty" circles should be a slightly darker gray.

setting size of mainbar.y.max

Hi, thanks so much for the great package!

it seems that mainbar.y.max can not accept values that are less than the size of the largest intersection size bar. If doing so I get the following error: "Error: Aesthetics must be either length 1 or the same as the data (59): fill In addition: Warning message: Removed 4 rows containing missing values (position_stack)."

I was wondering whether you implemented this with coord_cartesian() in ggplot2 or not. If yes I was wondering why I might get this error.

BW
Philipp

New intersection function name

It appears that R linked both "intersection" and "intersect" with the intersect function help page. What would be a suitable name?

bug in specific_intersections when keep == sets

specific_intersections <- function(data, first.col, last.col, intersections, order_mat,
                                   aggregate, decrease, cut, mbar_color){
  sets <- names(data[c(first.col:last.col)])
  keep <- unique(unlist(intersections))
  remove <- sets[which(!sets %in% keep)]
  remove <- which(names(data) %in% remove)
  data <- data[-remove]

when keep is equivalent to sets remove is set to integer(0) and data gets wiped out

Browse[2]> remove
integer(0)
Browse[2]> data[-remove]
data frame with 0 columns and 14632 rows

images for README

Using existing dataframe with UpSetR

Hello there,

Trying to use a dataframe in R I generated from an abundance matrix t, I keep getting this error when using UpSetR:

Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?

Here's the beggining of the said dataframe:

      OTU_IDS Amphibolite Basalt Coal Coal-Upper Dolomite Hematite_Granite High-calcite_clay mica_schist
  1 KC442817.1.1478           1      0    0          0        0                0                 0
  2 JX222276.1.1475           0      1    0          0        0                0                 0
  3 HQ218444.1.1344           0      0    0          0        0                0                 1
  4 DQ517124.1.1380           0      0    0          0        0                0                 0
  5 KF827260.1.1400           0      1    0          0        0                0                 0
  6 EF125930.1.1497           0      0    0          0        0                0                 0

Any ideas?

Thanks in advance.

André

Original matrix set order

Dear jake,
Congrat for this very nice and useful package ! I did have one question or request if there is a solution of course. I would like to order the final intersection by names. If I'm comparing 2 group with different members and I would like to see all the intersections within the 1st group first , then within the second one and finally between members of different group. UpsetR order the intersections by set size and I can not find any option to chose the order of the intersection by header column as it appear in the matrix for exemple.
I don't know if I was clear and thanks you in advance for your help

CSV data upload
selection of sets
creation of queries based on intersections
selection of attributes for box plots

intersections not working with more than 5 sets, even when nsets is set

> upset(binarymatrix,nsets=6,intersections=list(list("Heart_Up","Muscle_Up"),
+                                               list("Heart_Down", "Muscle_Up"), 
+                                               list("Heart_Down","Muscle_Unchanged"),
+                                               list("Heart_Down","Muscle_Down"),
+                                               list("Heart_Unchanged","Muscle_Up"),
+                                               list("Heart_Unchanged","Muscle_Unchanged"),
+                                               list("Heart_Unchanged","Muscle_Down"))
+ )
Error in `[.data.frame`(data, keep) : undefined columns selected

5 sets works fine, and 6 sets works if I don't specify intersections