Giter Club home page Giter Club logo

upsetr's Introduction

UpSetR

Technique

UpSetR generates static UpSet plots. The UpSet technique visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries. The matrix layout enables the effective representation of associated data, such as the number of elements in the aggregates and intersections, as well as additional summary statistics derived from subset or element attributes.

For further details about the original technique see the UpSet website. You can also check out the UpSetR shiny app. Here is the source code for the shiny wrapper.

A Python package called py-upset to create UpSet plots has been created by GitHub user ImSoErgodic.

Citation

If you use UpSetR in a paper, please cite:

Jake R Conway, Alexander Lex, Nils Gehlenborg UpSetR: An R Package for the Visualization of Intersecting Sets and their Properties doi: https://doi.org/10.1093/bioinformatics/btx364

The original technique and the interactive visualization tool implementing the approach are described here:

Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister,
UpSet: Visualization of Intersecting Sets,
IEEE Transactions on Visualization and Computer Graphics (InfoVis '14), vol. 20, no. 12, pp. 1983–1992, 2014.
doi: https://doi.org/10.1109/TVCG.2014.2346248

Sample Data

Sample data sets for UpSetR are included in the package and can be loaded like this:

movies <- read.csv( system.file("extdata", "movies.csv", package = "UpSetR"), header=T, sep=";" )
mutations <- read.csv( system.file("extdata", "mutations.csv", package = "UpSetR"), header=T, sep = ",")

The movie data set created by the GroupLens Lab and curated by Bilal Alsallakh and the mutations data set was originally created by the TCGA Consortium and represents mutations for the 100 most mutated genes in a glioblastoma multiforme cohort.

Examples

In addition to the examples shown here, we have included a range of UpSetR plots in the paper about the R package, which can be found in a separate GitHub repository.

Vignettes

There are currently four vignettes that explain how to use the features included in the UpSetR package:

Demo

A view of the UpSet plot with additional plots based on elements in the intersections.

Image

upset(movies,attribute.plots=list(gridrows=60,plots=list(list(plot=scatter_plot, x="ReleaseDate", y="AvgRating"),
list(plot=scatter_plot, x="ReleaseDate", y="Watches"),list(plot=scatter_plot, x="Watches", y="AvgRating"),
list(plot=histogram, x="ReleaseDate")), ncols = 2))

A view of UpSetR mimicking the plot published by Lex & Gehlenborg http://www.nature.com/nmeth/journal/v11/n8/abs/nmeth.3033.html

image

upset(mutations, sets = c("PTEN", "TP53", "EGFR", "PIK3R1", "RB1"), sets.bar.color = "#56B4E9",
order.by = "freq", empty.intersections = "on")

An example using two set queries (war movies and noir movies) along with attribute plots comparing the average rating (top) and average rating vs the number of times the movies have been watched (bottom).

image

upset(movies, attribute.plots=list(gridrows = 100, ncols = 1, 
plots = list(list(plot=histogram, x="AvgRating",queries=T),
list(plot = scatter_plot, y = "AvgRating", x = "Watches", queries = T))), 
sets = c("Action", "Adventure", "Children", "War", "Noir"),
queries = list(list(query = intersects, params = list("War"), active = T),
list(query = intersects, params = list("Noir"))))

Download

Install the latest released version from CRAN

install.packages("UpSetR")

Download the latest development code of UpSetR from GitHub using devtools with

devtools::install_github("hms-dbmi/UpSetR")

upsetr's People

Contributors

alanocallaghan avatar alexsb avatar colinandrus avatar davidbernick avatar jakeconway avatar jonocarroll avatar mworkentine avatar ngehlenborg avatar richardjacton avatar richardjamesacton avatar timelyportfolio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

upsetr's Issues

grid library not loaded automatically

Using df of values for upset:

require( upset )
upset( df )

returns the following error:

Error in theme(panel.background = element_rect(fill = "white"), plot.margin = unit(c(0.5, :
could not find function "unit"

Fixed by manually loading grid:

require( grid )
upset( df )
** plot produced **

Session info:

R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] reshape2_1.4.1 ggplot2_1.0.1 xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-7
[6] UpSetR_1.2.0 stringr_1.0.0 magrittr_1.5 dplyr_0.4.1 plyr_1.8.3

loaded via a namespace (and not attached):
[1] assertthat_0.1 colorspace_1.2-6 DBI_0.3.1 digest_0.6.8
[5] gridExtra_2.0.0 gtable_0.1.2 knitr_1.11 labeling_0.3
[9] lazyeval_0.1.10 MASS_7.3-44 munsell_0.4.2 parallel_3.1.1
[13] proto_0.3-10 Rcpp_0.12.1 scales_0.3.0 stringi_0.5-5
[17] tools_3.1.1

Control font size of y label and tick marks

Nice package!

The font size of mainbar.y.label and the y axis tick marks are too small for my liking. Is it possible to increase their size? Could something like the name.size argument be added for these components? The demo figures do not seem to have this problem, but I cannot reproduce the large font size on my local machine.

Lower required R version to minimum possible

Quoting @JakeConway:

It could probably work for lower versions. I only put that because thats the version I was using at the time I began working on it. The only limitation on how early the R version can be is if they can use ggplot2 version 1.0.1, gridExtra version 0.9.1, and plyr version 1.8.3

Import Upset Values For Chowruskey

Hello UpSetR Team,

Is there a way that we can extract the intersection values from upset command into a text file?

Can upset feed its values to Chowruskey?

Thanks

support label rotation on bars

0, 45 or 90 degrees should be supported. Look at how plot/axis are setting label orientation or how it is typically done in ggplot and follow that pattern.

Return value?

Hi. Thanks for such a great Package!

The problem I'm facing is due to the upset function not returning a value, as does ggplot, for example. I need to store the plot in a variable to print it later in a different context. Is it possible for the upset function to return a "printable" plot object?

Best wishes,

Juanje.

Choose a good default color palette.

Users should not have to (and typically shouldn't at all) specify colors for anything, we should instead provide good defaults.

I would suggest to use color only for selections. In this example here:

image

We could get rid of the blue shading of the background and the blue bars for the sets.The blue bars for the sets might make some sense to distinguish them from the intersections, but on the other hand, they are the same data type.

This is a good starting point for selection colors:

http://colorbrewer2.org/?type=qualitative&scheme=Paired&n=10

keep.order only moves labels on sets, the bars don't change

Code to replicate:

library(UpSetR)
movies <- read.csv( system.file("extdata", "movies.csv", package = "UpSetR"), header=TRUE, sep=";" )
UpSetR::upset(movies, keep.order = TRUE, sets = c("Drama", "Thriller", "Action"))
UpSetR::upset(movies, keep.order = TRUE, sets = c("Thriller", "Drama", "Action"))

In the first plot, "Drama" is the largest set in the barplot on the left. In the second plot, "Thriller" is the largest set.

Axis break?

Hi, loving UpSetR!!

I'm working on an 8 way comparison, and would like to induce an axis break in the y-axis. I can't seem to find a way to make this happen in all of the info I've dug through on UpSetR... is there any way to do so? The count of intersections between all 8 sets is 4,000 and next greatest is 800. This results in some rather small bars after that first big one!

Alternatively, if this isn't possible, can UpSet log transform the counts?

add htmlwidget functionality

I love your work with UpSet, and I had the JavaScript piece on my list to do as an htmlwidget of the week at BuildingWidgets. I had no idea that you had a R package until I just spotted it on the CRAN feed. I'd love to volunteer to make this into a htmlwidget if you would like to pair the interactivity of the JavaScript with the engine of R.

Before htmlwidgets existed, I had played a little integrating UpSet with rCharts http://timelyportfolio.github.io/upset where I added a couple R datasets to the list. The ugly code is in a fork https://github.com/timelyportfolio/upset.

Center set-lables and add tics to both sides

The set labels are in the middle of the set size bars and the set intersection matrix. As they equally label both of them, they should be symmetric, i.e., centered and the tic found for the matrix should also be there for the bars (or not at all).

Input Data Format

Dear UpSetR team,

Thank you for a great piece of work.

My input file has 7 columns with each column having genenames like this

head input.txt

Sample1,Sample2,Sample3,Sample4,Sample5,Sample6,Sample7
uc008vaw.1Rnu11,uc008vaw.1Rnu11,uc012aua.1AB339930,uc012ath.2Rn45s,uc008gfm.1AK197973,uc008gfm.1AK197973,uc012bec.1Mir122a
uc008ztz.1AK212710,uc008ztz.1AK212710,uc008ztz.1AK212710,uc008vaw.1Rnu11,uc008gfl.1AK181808,uc008gfl.1AK181808,uc008yaz.2Alb
uc009phb.3Apoa1,uc008gfl.1AK181808,uc033fml.1Mir8114,uc008ztz.1AK212710,uc012bhe.1Neat1,uc012bhe.1Neat1,uc008vxy.1Errfi1
uc008yaz.2Alb,uc008gfm.1AK197973,uc011zoa.1Mir320,uc008gfl.1AK181808,uc008gfk.1AK148054,uc008gfk.1AK148054,uc008eet.2Ttr
uc007puo.1Hist1h4c,uc011zxb.1Rnu12,uc008vaw.1Rnu11,uc012bhe.1Neat1,uc012bec.1Mir122a,uc008odl.1Pck1,uc009phb.3Apoa1
uc012aua.1AB339930,uc012aua.1AB339930,uc012bty.1Mir2861,uc008gfk.1AK148054,uc008yaz.2Alb,uc012bec.1Mir122a,uc008vxw.1Errfi1

But when I read this CSV file with read.csv function in R and issue the upset command, I get an error saying "Error in start_col:end_col : argument of length 0"

Should I transform my input file?

Kindly advice.

Thanks
G

need a way to display specific empty intersections

My matrix looks like this

     ensembl_gene_id Heart_Down Heart_Unchanged Heart_Up Muscle_Down Muscle_Unchanged Muscle_Up
1 ENSMUSG00000000001          0               1        0           1                0         0
2 ENSMUSG00000000028          0               1        0           0                1         0
3 ENSMUSG00000000031          0               1        0           0                1         0
4 ENSMUSG00000000056          0               1        0           0                1         0
5 ENSMUSG00000000058          0               1        0           0                1         0
6 ENSMUSG00000000078          0               1        0           0                1         0
...

There are 9 meaningful intersections but two of them are empty. Then there are a lot of meaningless intersections. I need a way to display all 9.

Adjust position of numbers

Feature to adjust position of numbers above bars to prevent overlapping. Begins occurring when a lot of intersections in plot

error reading data

Hello Jake,
I am having an issue when trying a simple intersection with UpSetR. The data is as follows:

Name;Amelonado;Contamana;Criollo;Curaray;Guianna;Iquitos;Maranon;Nacional;Nanay;Purus
Thecc1EG000181;1;0;0;0;0;0;0;0;0;0
Thecc1EG001933;1;0;0;0;0;0;0;0;0;0
Thecc1EG003999;1;0;0;0;0;0;0;0;0;0
Thecc1EG005677;1;0;0;0;0;0;0;0;0;0
Thecc1EG006000;1;0;0;0;0;0;0;0;0;0
.
.
.

I load the data

genes <- read.csv("mygenes",header=T,sep=";")

and try upset with:

upset(genes2, sets = c("Amelonado", "Contamana", "Criollo", "Curaray", "Guianna", "Iquitos", "Maranon", "Nacional", "Nanay", "Purus"), sets.bar.color = "#56B4E9",order.by = "freq", empty.intersections = "on")

I obtain the error:
Error in start_col:end_col : argument of length 0

but I tried comparing with your example sets and I cannot figure out what the potential problem could be.
Has anyone experienced this problem?

thanks
Omar

att.x and att.y functionality should be removed

This can be handled with a custom plot. We should include the scatter plot, histogram and boxplot example custom plots as methods in the package so that they can be used out of the box.

Applying metadata to the Matrix

Hi,
I've been trying to apply a simple coloring scheme to the matrix background but i cannot seem to get it to work. Using example 5 from the metadata vignette I identified these two (possibly related) errors:

  1. When reversing the order of the metadata plots list (i.e. "matrix_row" before "hist") results in the following error:
    Error in rep_len(1, ncol) : invalid 'length.out' value
  2. When listing only "matrix_row" and no other plot in the plots list results in the following error:
    Error intmp[[i]] : subscript out of bounds.

Happy to provide more details if needed.
Doron Betel [email protected]

replace journal URLs in documentation

http://www.nature.com/nmeth/journal/v11/n8/full/nmeth.3033.html (only accessible for subscribers of Nature Methods and causing a 401 HTTP response if users are not logged in) should be http://www.nature.com/nmeth/journal/v11/n8/abs/nmeth.3033.html (open to everyone).

order of query colors mixed in main bar chart

library(UpSetR)

My_data <- read.csv( system.file("extdata", "movies.csv", package = "UpSetR"), header=T, sep=";" )

upset_base(My_data, first.col = 3, last.col = 19, nsets = 6, att.x = "ReleaseDate",att.y = "AvgRating", point.size = 3, att.color = "black", main.bar.color = "black", show.numbers = "yes", queries = list(c("Drama", "Romance", "red"), c("Horror", "Drama", "Thriller", "green"), c("Drama", "Comedy", "blue")))

The order of the query colors in the main bar chart is mixed up for the bar in commit 8b0d810:

screen shot 2015-06-11 at 5 36 50 pm

setting size of mainbar.y.max

Hi, thanks so much for the great package!

it seems that mainbar.y.max can not accept values that are less than the size of the largest intersection size bar. If doing so I get the following error: "Error: Aesthetics must be either length 1 or the same as the data (59): fill In addition: Warning message: Removed 4 rows containing missing values (position_stack)."

I was wondering whether you implemented this with coord_cartesian() in ggplot2 or not. If yes I was wondering why I might get this error.

BW
Philipp

New intersection function name

It appears that R linked both "intersection" and "intersect" with the intersect function help page. What would be a suitable name?

bug in specific_intersections when keep == sets

specific_intersections <- function(data, first.col, last.col, intersections, order_mat,
                                   aggregate, decrease, cut, mbar_color){
  sets <- names(data[c(first.col:last.col)])
  keep <- unique(unlist(intersections))
  remove <- sets[which(!sets %in% keep)]
  remove <- which(names(data) %in% remove)
  data <- data[-remove]

when keep is equivalent to sets remove is set to integer(0) and data gets wiped out

Browse[2]> remove
integer(0)
Browse[2]> data[-remove]
data frame with 0 columns and 14632 rows

Using existing dataframe with UpSetR

Hello there,

Trying to use a dataframe in R I generated from an abundance matrix t, I keep getting this error when using UpSetR:

Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?

Here's the beggining of the said dataframe:

      OTU_IDS Amphibolite Basalt Coal Coal-Upper Dolomite Hematite_Granite High-calcite_clay mica_schist
  1 KC442817.1.1478           1      0    0          0        0                0                 0
  2 JX222276.1.1475           0      1    0          0        0                0                 0
  3 HQ218444.1.1344           0      0    0          0        0                0                 1
  4 DQ517124.1.1380           0      0    0          0        0                0                 0
  5 KF827260.1.1400           0      1    0          0        0                0                 0
  6 EF125930.1.1497           0      0    0          0        0                0                 0

Any ideas?

Thanks in advance.

André

Original matrix set order

Dear jake,
Congrat for this very nice and useful package ! I did have one question or request if there is a solution of course. I would like to order the final intersection by names. If I'm comparing 2 group with different members and I would like to see all the intersections within the 1st group first , then within the second one and finally between members of different group. UpsetR order the intersections by set size and I can not find any option to chose the order of the intersection by header column as it appear in the matrix for exemple.
I don't know if I was clear and thanks you in advance for your help

aggregate.by should be group.by

The parameter name aggregate.by is a misnomer. It should be group.by. Example 3 in the the basic usage vignette should be renamed, too.

intersections not working with more than 5 sets, even when nsets is set

> upset(binarymatrix,nsets=6,intersections=list(list("Heart_Up","Muscle_Up"),
+                                               list("Heart_Down", "Muscle_Up"), 
+                                               list("Heart_Down","Muscle_Unchanged"),
+                                               list("Heart_Down","Muscle_Down"),
+                                               list("Heart_Unchanged","Muscle_Up"),
+                                               list("Heart_Unchanged","Muscle_Unchanged"),
+                                               list("Heart_Unchanged","Muscle_Down"))
+ )
Error in `[.data.frame`(data, keep) : undefined columns selected

5 sets works fine, and 6 sets works if I don't specify intersections

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.