Giter Club home page Giter Club logo

tidyseurat's Introduction

tidyseurat - part of tidytranscriptomics

Lifecycle:maturing R build status

Watch the video

Brings Seurat to the tidyverse!

website: stemangiola.github.io/tidyseurat/

Please also have a look at

  • tidyseurat for tidy single-cell RNA sequencing analysis
  • tidySummarizedExperiment for tidy bulk RNA sequencing analysis
  • tidybulk for tidy bulk RNA-seq analysis
  • nanny for tidy high-level data analysis and manipulation
  • tidygate for adding custom gate information to your tibble
  • tidyHeatmap for heatmaps produced with tidy principles

visual cue

visual cue

Introduction

tidyseurat provides a bridge between the Seurat single-cell package [@butler2018integrating; @stuart2019comprehensive] and the tidyverse [@wickham2019welcome]. It creates an invisible layer that enables viewing the Seurat object as a tidyverse tibble, and provides Seurat-compatible dplyr, tidyr, ggplot and plotly functions.

Functions/utilities available

Seurat-compatible Functions Description
all
tidyverse Packages Description
dplyr All dplyr APIs like for any tibble
tidyr All tidyr APIs like for any tibble
ggplot2 ggplot like for any tibble
plotly plot_ly like for any tibble
Utilities Description
tidy Add tidyseurat invisible layer over a Seurat object
as_tibble Convert cell-wise information to a tbl_df
join_features Add feature-wise information, returns a tbl_df
aggregate_cells Aggregate cell gene-transcription abundance as pseudobulk tissue

Installation

From CRAN

install.packages("tidyseurat")

From Github (development)

devtools::install_github("stemangiola/tidyseurat")
library(dplyr)
library(tidyr)
library(purrr)
library(magrittr)
library(ggplot2)
library(Seurat)
library(tidyseurat)

Create tidyseurat, the best of both worlds!

This is a seurat object but it is evaluated as tibble. So it is fully compatible both with Seurat and tidyverse APIs.

pbmc_small = SeuratObject::pbmc_small

It looks like a tibble

pbmc_small
## # A Seurat-tibble abstraction: 80 × 15
## # �[90mFeatures=230 | Cells=80 | Active assay=RNA | Assays=RNA�[0m
##    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
##    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
##  1 ATGC… SeuratPro…         70           47 0               A             g2    
##  2 CATG… SeuratPro…         85           52 0               A             g1    
##  3 GAAC… SeuratPro…         87           50 1               B             g2    
##  4 TGAC… SeuratPro…        127           56 0               A             g2    
##  5 AGTC… SeuratPro…        173           53 0               A             g2    
##  6 TCTG… SeuratPro…         70           48 0               A             g1    
##  7 TGGT… SeuratPro…         64           36 0               A             g1    
##  8 GCAG… SeuratPro…         72           45 0               A             g1    
##  9 GATA… SeuratPro…         52           36 0               A             g1    
## 10 AATG… SeuratPro…        100           41 0               A             g1    
## # ℹ 70 more rows
## # ℹ 8 more variables: RNA_snn_res.1 <fct>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>,
## #   PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

But it is a Seurat object after all

pbmc_small@assays
## $RNA
## Assay data with 230 features for 80 cells
## Top 10 variable features:
##  PPBP, IGLL5, VDAC3, CD1C, AKR1C3, PF4, MYL9, GNLY, TREML1, CA2

Preliminary plots

Set colours and theme for plots.

# Use colourblind-friendly colours
friendly_cols <- c("#88CCEE", "#CC6677", "#DDCC77", "#117733", "#332288", "#AA4499", "#44AA99", "#999933", "#882255", "#661100", "#6699CC")

# Set theme
my_theme <-
  list(
    scale_fill_manual(values = friendly_cols),
    scale_color_manual(values = friendly_cols),
    theme_bw() +
      theme(
        panel.border = element_blank(),
        axis.line = element_line(),
        panel.grid.major = element_line(size = 0.2),
        panel.grid.minor = element_line(size = 0.1),
        text = element_text(size = 12),
        legend.position = "bottom",
        aspect.ratio = 1,
        strip.background = element_blank(),
        axis.title.x = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
        axis.title.y = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10))
      )
  )

We can treat pbmc_small effectively as a normal tibble for plotting.

Here we plot number of features per cell.

pbmc_small %>%
  ggplot(aes(nFeature_RNA, fill = groups)) +
  geom_histogram() +
  my_theme

Here we plot total features per cell.

pbmc_small %>%
  ggplot(aes(groups, nCount_RNA, fill = groups)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.1) +
  my_theme

Here we plot abundance of two features for each group.

pbmc_small %>%
  join_features(features = c("HLA-DRA", "LYZ")) %>%
  ggplot(aes(groups, .abundance_RNA + 1, fill = groups)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(aes(size = nCount_RNA), alpha = 0.5, width = 0.2) +
  scale_y_log10() +
  my_theme

Preprocess the dataset

Also you can treat the object as Seurat object and proceed with data processing.

pbmc_small_pca <-
  pbmc_small %>%
  SCTransform(verbose = FALSE) %>%
  FindVariableFeatures(verbose = FALSE) %>%
  RunPCA(verbose = FALSE)

pbmc_small_pca
## # A Seurat-tibble abstraction: 80 × 17
## # �[90mFeatures=220 | Cells=80 | Active assay=SCT | Assays=RNA, SCT�[0m
##    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
##    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
##  1 ATGC… SeuratPro…         70           47 0               A             g2    
##  2 CATG… SeuratPro…         85           52 0               A             g1    
##  3 GAAC… SeuratPro…         87           50 1               B             g2    
##  4 TGAC… SeuratPro…        127           56 0               A             g2    
##  5 AGTC… SeuratPro…        173           53 0               A             g2    
##  6 TCTG… SeuratPro…         70           48 0               A             g1    
##  7 TGGT… SeuratPro…         64           36 0               A             g1    
##  8 GCAG… SeuratPro…         72           45 0               A             g1    
##  9 GATA… SeuratPro…         52           36 0               A             g1    
## 10 AATG… SeuratPro…        100           41 0               A             g1    
## # ℹ 70 more rows
## # ℹ 10 more variables: RNA_snn_res.1 <fct>, nCount_SCT <dbl>,
## #   nFeature_SCT <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
## #   PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

If a tool is not included in the tidyseurat collection, we can use as_tibble to permanently convert tidyseurat into tibble.

pbmc_small_pca %>%
  as_tibble() %>%
  select(contains("PC"), everything()) %>%
  GGally::ggpairs(columns = 1:5, ggplot2::aes(colour = groups)) +
  my_theme

Identify clusters

We proceed with cluster identification with Seurat.

pbmc_small_cluster <-
  pbmc_small_pca %>%
  FindNeighbors(verbose = FALSE) %>%
  FindClusters(method = "igraph", verbose = FALSE)

pbmc_small_cluster
## # A Seurat-tibble abstraction: 80 × 19
## # �[90mFeatures=220 | Cells=80 | Active assay=SCT | Assays=RNA, SCT�[0m
##    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
##    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
##  1 ATGC… SeuratPro…         70           47 0               A             g2    
##  2 CATG… SeuratPro…         85           52 0               A             g1    
##  3 GAAC… SeuratPro…         87           50 1               B             g2    
##  4 TGAC… SeuratPro…        127           56 0               A             g2    
##  5 AGTC… SeuratPro…        173           53 0               A             g2    
##  6 TCTG… SeuratPro…         70           48 0               A             g1    
##  7 TGGT… SeuratPro…         64           36 0               A             g1    
##  8 GCAG… SeuratPro…         72           45 0               A             g1    
##  9 GATA… SeuratPro…         52           36 0               A             g1    
## 10 AATG… SeuratPro…        100           41 0               A             g1    
## # ℹ 70 more rows
## # ℹ 12 more variables: RNA_snn_res.1 <fct>, nCount_SCT <dbl>,
## #   nFeature_SCT <int>, SCT_snn_res.0.8 <fct>, seurat_clusters <fct>,
## #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
## #   tSNE_2 <dbl>

Now we can interrogate the object as if it was a regular tibble data frame.

pbmc_small_cluster %>%
  count(groups, seurat_clusters)
## # A tibble: 6 × 3
##   groups seurat_clusters     n
##   <chr>  <fct>           <int>
## 1 g1     0                  23
## 2 g1     1                  17
## 3 g1     2                   4
## 4 g2     0                  17
## 5 g2     1                  13
## 6 g2     2                   6

We can identify cluster markers using Seurat.

# Identify top 10 markers per cluster
markers <-
  pbmc_small_cluster %>%
  FindAllMarkers(only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25) %>%
  group_by(cluster) %>%
  top_n(10, avg_log2FC)

# Plot heatmap
pbmc_small_cluster %>%
  DoHeatmap(
    features = markers$gene,
    group.colors = friendly_cols
  )

Reduce dimensions

We can calculate the first 3 UMAP dimensions using the Seurat framework.

pbmc_small_UMAP <-
  pbmc_small_cluster %>%
  RunUMAP(reduction = "pca", dims = 1:15, n.components = 3L)

And we can plot them using 3D plot using plotly.

pbmc_small_UMAP %>%
  plot_ly(
    x = ~`UMAP_1`,
    y = ~`UMAP_2`,
    z = ~`UMAP_3`,
    color = ~seurat_clusters,
    colors = friendly_cols[1:4]
  )

screenshot plotly

screenshot plotly

Cell type prediction

We can infer cell type identities using SingleR [@aran2019reference] and manipulate the output using tidyverse.

# Get cell type reference data
blueprint <- celldex::BlueprintEncodeData()

# Infer cell identities
cell_type_df <-
  GetAssayData(pbmc_small_UMAP, slot = 'counts', assay = "SCT") %>%
  log1p() %>%
  Matrix::Matrix(sparse = TRUE) %>%
  SingleR::SingleR(
    ref = blueprint,
    labels = blueprint$label.main,
    method = "single"
  ) %>%
  as.data.frame() %>%
  as_tibble(rownames = "cell") %>%
  select(cell, first.labels)
# Join UMAP and cell type info
pbmc_small_cell_type <-
  pbmc_small_UMAP %>%
  left_join(cell_type_df, by = "cell")

# Reorder columns
pbmc_small_cell_type %>%
  select(cell, first.labels, everything())

We can easily summarise the results. For example, we can see how cell type classification overlaps with cluster classification.

pbmc_small_cell_type %>%
  count(seurat_clusters, first.labels)

We can easily reshape the data for building information-rich faceted plots.

pbmc_small_cell_type %>%

  # Reshape and add classifier column
  pivot_longer(
    cols = c(seurat_clusters, first.labels),
    names_to = "classifier", values_to = "label"
  ) %>%

  # UMAP plots for cell type and cluster
  ggplot(aes(UMAP_1, UMAP_2, color = label)) +
  geom_point() +
  facet_wrap(~classifier) +
  my_theme

We can easily plot gene correlation per cell category, adding multi-layer annotations.

pbmc_small_cell_type %>%

  # Add some mitochondrial abundance values
  mutate(mitochondrial = rnorm(n())) %>%

  # Plot correlation
  join_features(features = c("CST3", "LYZ"), shape = "wide") %>%
  ggplot(aes(CST3 + 1, LYZ + 1, color = groups, size = mitochondrial)) +
  geom_point() +
  facet_wrap(~first.labels, scales = "free") +
  scale_x_log10() +
  scale_y_log10() +
  my_theme

Nested analyses

A powerful tool we can use with tidyseurat is nest. We can easily perform independent analyses on subsets of the dataset. First we classify cell types in lymphoid and myeloid; then, nest based on the new classification

pbmc_small_nested <-
  pbmc_small_cell_type %>%
  filter(first.labels != "Erythrocytes") %>%
  mutate(cell_class = if_else(`first.labels` %in% c("Macrophages", "Monocytes"), "myeloid", "lymphoid")) %>%
  nest(data = -cell_class)

pbmc_small_nested

Now we can independently for the lymphoid and myeloid subsets (i) find variable features, (ii) reduce dimensions, and (iii) cluster using both tidyverse and Seurat seamlessly.

pbmc_small_nested_reanalysed <-
  pbmc_small_nested %>%
  mutate(data = map(
    data, ~ .x %>%
      FindVariableFeatures(verbose = FALSE) %>%
      RunPCA(npcs = 10, verbose = FALSE) %>%
      FindNeighbors(verbose = FALSE) %>%
      FindClusters(method = "igraph", verbose = FALSE) %>%
      RunUMAP(reduction = "pca", dims = 1:10, n.components = 3L, verbose = FALSE)
  ))

pbmc_small_nested_reanalysed

Now we can unnest and plot the new classification.

pbmc_small_nested_reanalysed %>%

  # Convert to tibble otherwise Seurat drops reduced dimensions when unifying data sets.
  mutate(data = map(data, ~ .x %>% as_tibble())) %>%
  unnest(data) %>%

  # Define unique clusters
  unite("cluster", c(cell_class, seurat_clusters), remove = FALSE) %>%

  # Plotting
  ggplot(aes(UMAP_1, UMAP_2, color = cluster)) +
  geom_point() +
  facet_wrap(~cell_class) +
  my_theme

Aggregating cells

Sometimes, it is necessary to aggregate the gene-transcript abundance from a group of cells into a single value. For example, when comparing groups of cells across different samples with fixed-effect models.

In tidyseurat, cell aggregation can be achieved using the aggregate_cells function.

pbmc_small %>%
  aggregate_cells(groups, assays = "RNA")

tidyseurat's People

Contributors

b0ydt avatar davisvaughan avatar mblue9 avatar mojaveazure avatar noriakis avatar olivroy avatar stemangiola avatar william-hutchison avatar wvictor14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tidyseurat's Issues

tidyseurat::aggregate_cells doesn't pull all meta.data columns of Seurat object?

Hi there,

Thanks for a great tool.

tidyseurat::aggregate_cells only pull some but not all metadata columns of the original Seurat object.

Would you mind taking a look?

Thank you again.

seuratObj %>% tidyseurat::aggregate_cells(.sample = c(Leiden39PCRes1p2_clusterID_2024Jan09), assays = "SCT", slot = "data", aggregation_function=Matrix::rowSums) |> filter(.feature %in% gene_vector) -> seuratObj_Data

> ncol(seuratObj_Data)
[1] 25
> ncol([email protected])
[1] 92

reply to CRAN new submission

I verified tidyseurat.pdf is within the build directory

Below are the details of how we solved old issue. I would like to note that although we have identified and worked on the issue, we cannot directly test the server version the issue was originated from. Our local and Github check all give no errors, warning or notes.

biocstyle issue: #10

color palette issue: #12

Slice can't be called on Seurat object directly

First of all, thank you for one of my favorite packages. This has saved me a great deal of effort in analysis.

I am currently trying to downsample a Seurat/tidyseurat object from >100k cells to ~50k. Ideally I would like to keep a significant representation of each cell state, so I intended to group by cell_state and downsample using slice_sample()

However, I've noticed two behaviors which may or may not be intended. If I perform grouping explicitly using group_by(), a data frame is returned:

set.seed(12345)
seurat_obj_subset = seurat_obj %>% 
   group_by(cell_state) %>% 
   slice_sample(n = 3e3) %>% 
   ungroup()

tidyseurat says: A data frame is returned for independent data analysis.

This is in theory fine, but I would ideally like to keep this as a Seurat object instead.

If I slice using the .by argument, however, the function does not run:

set.seed(12345)
seurat_obj_subset = seurat_obj %>% 
  slice_sample(n = 3e3, .by = cell_state)

Error in switch(type, call = "prefix", control = , delim = , subset = "special", : EXPR must be a length 1 vector

I'm wondering if this is intended behavior or not. Possible that I am simply approaching this the wrong way.

Thank you!

Bug in get_abundance_sc_long

In trying and failing to get join_transcripts to work on a CITE marker (it was returning an empty tibble), I found that the problem was in get_abundance_sc_long. I found that the Reduce call was returning empty results when there were multiple assays, and only one assay had results. The fix was running Reduce only on non-empty results:

original:

        values_drop_na = TRUE)) %>% Reduce(function(...) left_join(..., 
        by = c("transcript", "cell")), .)

fix:

        values_drop_na = TRUE)) -> assay_results   # save intermediate results
	has_data <- assay_results %>% map(nrow) %>% unlist %>%  `>`(0). # check which tibbles are note empty
	Reduce(function(...) left_join(..., by=c("transcript", "cell")), assay_results[has_data]) # only Reduce non-empty tibbles

github action error

Hello @mblue9 when you have time, no rush. Could you please give me a hand with a github action error. :) in the master branch

Invoke-History: D:\a\_temp\2992a0a4-6162-4b5d-8d0d-febac0ff55cb.ps1:2
Line |
   2 |  R CMD INSTALL .
     |  ~~~~~~~~~~~~~~~
     | A positional parameter cannot be found that accepts argument 'INSTALL'.

Error: Process completed with exit code 1.

Prepare for upcoming Seurat v5 Release

I am opening this issue as a notification because tidyseurat is listed here as a package that relies (depends/imports/suggests) on Seurat. As you may know, we recently released Seurat v5 as a beta in March of this year, with new updates for spatial, multimodal, and massively scalable analysis. For more information on updates and improvements, check out our website https://satijalab.org/seurat/.
We are now preparing to release Seurat v5 to CRAN, and plan to submit it on October 23rd. While we have tried our best to keep things backward-compatible, it is possible that updates to Seurat and SeuratObject might break your existing functionality or tests. We wanted to reach out before the new version is on CRAN, so that there's time to report issues/incompatibilities and prepare you for any changes in your code base that might be necessary.

We apologize for any disruption or inconvenience, but hope that the improvements to Seurat v5 will benefit your users going forward.
To test the upcoming release, you can install Seurat from the seurat5 branch using the instructions available on this page: https://satijalab.org/seurat/articles/install.

Thank you!
Seurat v5 team

Error: 'ggplot' is not an exported object from 'namespace:tidyseurat'

Hi there,

I was trying to follow this tutorial (https://stemangiola.github.io/tidyseurat/), but I encountered the error "Error: 'ggplot' is not an exported object from 'namespace:tidyseurat'".

Would you mind advising me how to proceed?

Thank you.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.4.4      tidyr_1.3.0        dplyr_1.1.4        tidyseurat_0.7.8   ttservice_0.4.0    reticulate_1.34.0  knitr_1.45        
[8] SeuratObject_5.0.1 sp_2.1-1          

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3     rstudioapi_0.15.0      jsonlite_1.8.7         magrittr_2.0.3         spatstat.utils_3.0-4  
  [6] fs_1.6.3               vctrs_0.6.4            ROCR_1.0-11            spatstat.explore_3.2-5 memoise_2.0.1         
 [11] rstatix_0.7.2          htmltools_0.5.7        usethis_2.2.2          curl_5.1.0             broom_1.0.5           
 [16] sctransform_0.4.1      parallelly_1.36.0      KernSmooth_2.23-22     htmlwidgets_1.6.2      ica_1.0-3             
 [21] plyr_1.8.9             plotly_4.10.3          zoo_1.8-12             cachem_1.0.8           igraph_1.5.1          
 [26] mime_0.12              lifecycle_1.0.4        pkgconfig_2.0.3        Matrix_1.6-3           R6_2.5.1              
 [31] fastmap_1.1.1          fitdistrplus_1.1-11    future_1.33.0          shiny_1.8.0            digest_0.6.33         
 [36] colorspace_2.1-0       patchwork_1.1.3        ps_1.7.5               tensor_1.5             Seurat_5.0.1          
 [41] RSpectra_0.16-1        irlba_2.3.5.1          pkgload_1.3.3          ggpubr_0.6.0           progressr_0.14.0      
 [46] spatstat.sparse_3.0-3  fansi_1.0.5            polyclip_1.10-6        httr_1.4.7             abind_1.4-5           
 [51] compiler_4.3.1         remotes_2.4.2.1        withr_2.5.2            backports_1.4.1        carData_3.0-5         
 [56] fastDummies_1.7.3      pkgbuild_1.4.2         ggsignif_0.6.4         MASS_7.3-60            sessioninfo_1.2.2     
 [61] tools_4.3.1            lmtest_0.9-40          httpuv_1.6.12          future.apply_1.11.0    goftest_1.2-3         
 [66] glue_1.6.2             callr_3.7.3            nlme_3.1-163           promises_1.2.1         grid_4.3.1            
 [71] Rtsne_0.16             cluster_2.1.4          reshape2_1.4.4         generics_0.1.3         spatstat.data_3.0-3   
 [76] gtable_0.3.4           data.table_1.14.8      car_3.1-2              utf8_1.2.4             spatstat.geom_3.2-7   
 [81] RcppAnnoy_0.0.21       ggrepel_0.9.4          RANN_2.6.1             pillar_1.9.0           stringr_1.5.1         
 [86] spam_2.10-0            RcppHNSW_0.5.0         later_1.3.1            splines_4.3.1          lattice_0.22-5        
 [91] deldir_1.0-9           survival_3.5-7         tidyselect_1.2.0       miniUI_0.1.1.1         pbapply_1.7-2         
 [96] randomcoloR_1.1.0.1    gridExtra_2.3          V8_4.4.0               scattermore_1.2        xfun_0.41             
[101] devtools_2.4.5         matrixStats_1.1.0      stringi_1.8.1          lazyeval_0.2.2         codetools_0.2-19      
[106] tibble_3.2.1           BiocManager_1.30.22    cli_3.6.1              uwot_0.1.16            xtable_1.8-4          
[111] munsell_0.5.0          processx_3.8.2         Rcpp_1.0.11            spatstat.random_3.2-1  globals_0.16.2        
[116] png_0.1-8              parallel_4.3.1         ellipsis_0.3.2         prettyunits_1.2.0      dotCall64_1.1-0       
[121] profvis_0.3.8          urlchecker_1.0.1       listenv_0.9.0          viridisLite_0.4.2      scales_1.2.1          
[126] ggridges_0.5.4         leiden_0.4.3.1         purrr_1.0.2            crayon_1.5.2           rlang_1.1.2           
[131] cowplot_1.1.1         

sample_frac function changes the cell order

Hey,
when I try subsetting a large Seurat object to reduce the computing time, the sample_frac() function changes the cell order, so that the Seurat functions do not work anymore. To repeat the error try the code:

pbmc_small = SeuratObject::pbmc_small

pbmc_small_subset <- pbmc_small |> sample_frac(0.9)

pbmc_small_subset <- RunPCA(pbmc_small_subset, reduction.name = 'pca', assay = "RNA")

The error I'm getting is :
Error in validObject(object = x) : invalid class “Seurat” object: 1: all cells in assays must be in the same order as the Seurat object invalid class “Seurat” object: 2: all cells in reductions must be in the same order as the Seurat object invalid class “Seurat” object: 3: all cells in graphs must be in the same order as the Seurat object (offending: RNA_snn) invalid class “Seurat” object: 4: 'active.idents' must be named with cell names

Thanks!

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 stats graphics grDevices datasets utils methods base

other attached packages:
[1] cbmc.SeuratData_3.1.4 SeuratData_0.2.2 scclusteval_0.0.0.9000 SingleCellExperiment_1.18.1 SummarizedExperiment_1.26.1 Biobase_2.56.0 GenomicRanges_1.48.0
[8] GenomeInfoDb_1.32.4 IRanges_2.30.1 S4Vectors_0.34.0 BiocGenerics_0.42.0 MatrixGenerics_1.8.1 matrixStats_0.63.0 tidyseurat_0.5.9
[15] ttservice_0.2.2 RColorBrewer_1.1-3 patchwork_1.1.2 Seurat_4.9.9.9042 SeuratObject_4.9.9.9084 sp_1.6-0 lubridate_1.9.2
[22] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[29] ggplot2_3.4.2 tidyverse_2.0.0

loaded via a namespace (and not attached):
[1] spam_2.9-1 systemfonts_1.0.4 plyr_1.8.8 igraph_1.4.2 lazyeval_0.2.2 splines_4.2.0 RcppHNSW_0.4.1 listenv_0.9.0
[9] scattermore_0.8 digest_0.6.31 htmltools_0.5.5 fansi_1.0.4 magrittr_2.0.3 tensor_1.5 cluster_2.1.4 ROCR_1.0-11
[17] limma_3.52.4 tzdb_0.3.0 globals_0.16.2 timechange_0.2.0 spatstat.sparse_3.0-1 colorspace_2.1-0 rappdirs_0.3.3 ggrepel_0.9.3
[25] textshaping_0.3.6 xfun_0.39 crayon_1.5.2 RCurl_1.98-1.12 jsonlite_1.8.4 progressr_0.13.0 spatstat.data_3.0-1 survival_3.3-1
[33] zoo_1.8-12 glue_1.6.2 polyclip_1.10-4 gtable_0.3.3 zlibbioc_1.42.0 XVector_0.36.0 leiden_0.4.3 DelayedArray_0.22.0
[41] future.apply_1.10.0 abind_1.4-5 scales_1.2.1 spatstat.random_3.1-4 miniUI_0.1.1.1 Rcpp_1.0.10 viridisLite_0.4.1 xtable_1.8-4
[49] reticulate_1.28 dotCall64_1.0-2 htmlwidgets_1.6.2 httr_1.4.5 ellipsis_0.3.2 ica_1.0-3 farver_2.1.1 pkgconfig_2.0.3
[57] sass_0.4.5 uwot_0.1.14 deldir_1.0-6 utf8_1.2.3 here_1.0.1 labeling_0.4.2 tidyselect_1.2.0 rlang_1.1.0
[65] reshape2_1.4.4 later_1.3.0 cachem_1.0.7 munsell_0.5.0 tools_4.2.0 cli_3.6.1 generics_0.1.3 ggridges_0.5.4
[73] evaluate_0.20 fastmap_1.1.1 yaml_2.3.7 ragg_1.2.5 goftest_1.2-3 knitr_1.42 fitdistrplus_1.1-11 RANN_2.6.1
[81] pbapply_1.7-0 future_1.32.0 nlme_3.1-160 mime_0.12 compiler_4.2.0 rstudioapi_0.14 plotly_4.10.1 png_0.1-8
[89] spatstat.utils_3.0-2 bslib_0.4.2 stringi_1.7.12 RSpectra_0.16-1 lattice_0.20-45 Matrix_1.5-1 vctrs_0.6.2 pillar_1.9.0
[97] lifecycle_1.0.3 BiocManager_1.30.20 jquerylib_0.1.4 spatstat.geom_3.1-0 lmtest_0.9-40 RcppAnnoy_0.0.20 data.table_1.14.8 cowplot_1.1.1
[105] bitops_1.0-7 irlba_2.3.5.1 httpuv_1.6.9 R6_2.5.1 promises_1.2.0.1 renv_0.17.3 KernSmooth_2.23-20 gridExtra_2.3
[113] parallelly_1.35.0 codetools_0.2-18 fastDummies_1.6.3 MASS_7.3-58.1 rprojroot_2.0.3 withr_2.5.0 sctransform_0.3.5 GenomeInfoDbData_1.2.8
[121] parallel_4.2.0 hms_1.1.3 grid_4.2.0 rmarkdown_2.21 Rtsne_0.16 spatstat.explore_3.1-0 shiny_1.7.4

Error when attempting to merge Seurat Objects after filtering

Hello,

I am encountering an error when trying to merge two Seurat Objects after using tidyseurat's filter(). The error can only be produced when the attached data's "integrated" assay is present.

The error is produced by calling merge() with the following code:

library("Seurat")
library("tidyseurat")

pbmc_complex <-
  readRDS("pbmc_complex.rds")

pbmc_complex_filtered <-
  pbmc_complex |>
  tidyseurat::filter(sample %in% c("SI-GA-G9", "SI-GA-G6"))

pbmc_complex_filtered_split <- 
  SplitObject(pbmc_complex_filtered, "sample")

pbmc_complex_filtered_merged <- 
  merge(pbmc_complex_filtered_split[[1]], pbmc_complex_filtered_split[[2]])

The error message is:

Error in names(model.list) <- all.levels : 
  attempt to set an attribute on NULL

The pbmc_complex_filtered_split object looks like:

$`SI-GA-G9`
# A Seurat-tibble abstraction: 14 × 111
# Features=59856 | Cells=14 | Active assay=RNA | Assays=RNA, integrated,
#  prediction.score.celltype.l1, prediction.score.celltype.l2, predicted_ADT,
#  prediction.score.curated_cell_type, prediction.score.curated_cell_type_pretty
   .cell       Barcode race  sex   chemi…¹ note  batch BCB   type  DOB   date.…² Sampl…³ Stage…⁴
   <chr>       <chr>   <chr> <chr> <lgl>   <chr> <chr> <chr> <chr> <chr> <chr>   <chr>   <chr>  
 1 8_AGACACTT… AGACAC… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 2 8_TTCCGGTT… TTCCGG… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 3 8_GGCACGTT… GGCACG… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 4 8_ATCACAGA… ATCACA… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 5 8_CGCATGGA… CGCATG… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 6 8_TACTTACT… TACTTA… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 7 8_CAAGCTAG… CAAGCT… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 8 8_CATTCCGC… CATTCC… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 9 8_CATTCTAG… CATTCT… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
10 8_ATCCCTGG… ATCCCT… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
11 8_CTGCGAGT… CTGCGA… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
12 8_AGATAGAC… AGATAG… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
13 8_TCTACCGG… TCTACC… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
14 8_TATATCCC… TATATC… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
# … with 98 more variables: intrinsic.subtype <chr>,
#   STAGE.WHEN.BIOPSY.TAKEN..EBC.vs..MBC. <chr>,
#   Treatment.response.at.time.sample.taken..progressing..responding..stable.disease. <chr>,
#   menopausal.status..premenopausal..perimenopausal..postmenopausal. <chr>,
#   neoadjuvant.therapy..specified. <chr>, radiotherapy..y.n. <chr>,
#   endocrine.therapy..y.n. <chr>, her2.targeted.therapy..y.n. <chr>,
#   adjuvant.chemotherapy..y.n. <chr>, …
# ℹ Use `colnames()` to see all variable names

$`SI-GA-G6`
# A Seurat-tibble abstraction: 6 × 111
# Features=59856 | Cells=6 | Active assay=RNA | Assays=RNA, integrated,
#  prediction.score.celltype.l1, prediction.score.celltype.l2, predicted_ADT,
#  prediction.score.curated_cell_type, prediction.score.curated_cell_type_pretty
  .cell        Barcode race  sex   chemi…¹ note  batch BCB   type  DOB   date.…² Sampl…³ Stage…⁴
  <chr>        <chr>   <chr> <chr> <lgl>   <chr> <chr> <chr> <chr> <chr> <chr>   <chr>   <chr>  
1 4_ATATCCTGT… ATATCC… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
2 4_GGGAGTACA… GGGAGT… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
3 4_CCCTGATAG… CCCTGA… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
4 4_CGCATAATC… CGCATA… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
5 4_GGATCTAAG… GGATCT… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
6 4_CGTTCTGTC… CGTTCT… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
# … with 98 more variables: intrinsic.subtype <chr>,
#   STAGE.WHEN.BIOPSY.TAKEN..EBC.vs..MBC. <chr>,
#   Treatment.response.at.time.sample.taken..progressing..responding..stable.disease. <chr>,
#   menopausal.status..premenopausal..perimenopausal..postmenopausal. <chr>,
#   neoadjuvant.therapy..specified. <chr>, radiotherapy..y.n. <chr>,
#   endocrine.therapy..y.n. <chr>, her2.targeted.therapy..y.n. <chr>,
#   adjuvant.chemotherapy..y.n. <chr>, …
# ℹ Use `colnames()` to see all variable names

Merging with the same data works fine without filtering:

pbmc_complex_split <-
  SplitObject(pbmc_complex, "sample")

pbmc_complex_merged <-
  merge(pbmc_complex_split[[1]], pbmc_complex_split[[2]])

And merging after filtering works fine when the "integrated" assay is removed:

pbmc_complex[['integrated']] <- 
  NULL

pbmc_complex_filtered <-
  pbmc_complex |>
  tidyseurat::filter(sample %in% c("SI-GA-G9", "SI-GA-G6"))

pbmc_complex_filtered_split <- 
  SplitObject(pbmc_complex_filtered, "sample")

pbmc_complex_filtered_merged <- 
  merge(pbmc_complex_filtered_split[[1]], pbmc_complex_filtered_split[[2]])

sessionInfo():

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRblas.so
LAPACK: /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tidyHeatmap_1.9.2  cowplot_1.1.1      ggrepel_0.9.3      sccomp_1.2.1      
 [5] tidyseurat_0.5.9   ttservice_0.2.2    SeuratObject_4.1.3 Seurat_4.3.0      
 [9] forcats_1.0.0      stringr_1.5.0      dplyr_1.1.1        purrr_1.0.1       
[13] readr_2.1.4        tidyr_1.3.0        tibble_3.2.1       ggplot2_3.4.2     
[17] tidyverse_1.3.2   

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                  spatstat.explore_3.1-0      reticulate_1.28            
  [4] tidyselect_1.2.0            htmlwidgets_1.6.2           grid_4.2.1                 
  [7] Rtsne_0.16                  munsell_0.5.0               codetools_0.2-18           
 [10] ica_1.0-3                   future_1.32.0               miniUI_0.1.1.1             
 [13] withr_2.5.0                 spatstat.random_3.1-4       colorspace_2.1-0           
 [16] progressr_0.13.0            Biobase_2.58.0              rstudioapi_0.14            
 [19] stats4_4.2.1                SingleCellExperiment_1.20.1 ROCR_1.0-11                
 [22] tensor_1.5                  listenv_0.9.0               MatrixGenerics_1.10.0      
 [25] labeling_0.4.2              rstan_2.21.8                GenomeInfoDbData_1.2.9     
 [28] polyclip_1.10-4             farver_2.1.1                parallelly_1.35.0          
 [31] vctrs_0.6.1                 generics_0.1.3              timechange_0.2.0           
 [34] doParallel_1.0.17           R6_2.5.1                    GenomeInfoDb_1.34.9        
 [37] clue_0.3-64                 bitops_1.0-7                spatstat.utils_3.0-2       
 [40] DelayedArray_0.24.0         promises_1.2.0.1            scales_1.2.1               
 [43] googlesheets4_1.1.0         gtable_0.3.1                globals_0.16.2             
 [46] processx_3.8.1              goftest_1.2-3               rlang_1.1.0                
 [49] systemfonts_1.0.4           GlobalOptions_0.1.2         splines_4.2.1              
 [52] lazyeval_0.2.2              gargle_1.4.0                spatstat.geom_3.1-0        
 [55] broom_1.0.4                 inline_0.3.19               reshape2_1.4.4             
 [58] abind_1.4-5                 modelr_0.1.11               backports_1.4.1            
 [61] httpuv_1.6.9                tools_4.2.1                 ellipsis_0.3.2             
 [64] RColorBrewer_1.1-3          BiocGenerics_0.44.0         ggridges_0.5.4             
 [67] Rcpp_1.0.10                 plyr_1.8.8                  zlibbioc_1.44.0            
 [70] RCurl_1.98-1.12             ps_1.7.5                    prettyunits_1.1.1          
 [73] deldir_1.0-6                viridis_0.6.2               GetoptLong_1.0.5           
 [76] pbapply_1.7-0               S4Vectors_0.36.2            zoo_1.8-12                 
 [79] SummarizedExperiment_1.28.0 haven_2.5.2                 cluster_2.1.3              
 [82] fs_1.6.1                    magrittr_2.0.3              data.table_1.14.8          
 [85] scattermore_0.8             circlize_0.4.15             lmtest_0.9-40              
 [88] reprex_2.0.2                RANN_2.6.1                  googledrive_2.1.0          
 [91] fitdistrplus_1.1-8          matrixStats_0.63.0          hms_1.1.3                  
 [94] patchwork_1.1.2             mime_0.12                   xtable_1.8-4               
 [97] readxl_1.4.2                shape_1.4.6                 IRanges_2.32.0             
[100] gridExtra_2.3               rstantools_2.3.1            compiler_4.2.1             
[103] KernSmooth_2.23-20          crayon_1.5.1                StanHeaders_2.21.0-7       
[106] htmltools_0.5.5             later_1.3.0                 tzdb_0.3.0                 
[109] RcppParallel_5.1.7          lubridate_1.9.2             DBI_1.1.3                  
[112] ComplexHeatmap_2.14.0       dbplyr_2.3.2                MASS_7.3-57                
[115] boot_1.3-28                 Matrix_1.5-3                cli_3.6.1                  
[118] parallel_4.2.1              igraph_1.4.2                GenomicRanges_1.50.2       
[121] pkgconfig_2.0.3             sp_1.6-0                    plotly_4.10.1              
[124] spatstat.sparse_3.0-1       foreach_1.5.2               xml2_1.3.3                 
[127] XVector_0.38.0              rvest_1.0.3                 callr_3.7.3                
[130] digest_0.6.31               sctransform_0.3.5           RcppAnnoy_0.0.20           
[133] spatstat.data_3.0-1         cellranger_1.1.0            leiden_0.4.3               
[136] dendextend_1.17.1           uwot_0.1.14                 shiny_1.7.4                
[139] rjson_0.2.21                lifecycle_1.0.3             nlme_3.1-157               
[142] jsonlite_1.8.4              viridisLite_0.4.1           limma_3.54.2               
[145] fansi_1.0.4                 pillar_1.8.1                lattice_0.20-45            
[148] loo_2.6.0                   fastmap_1.1.1               httr_1.4.5                 
[151] pkgbuild_1.4.0              survival_3.3-1              glue_1.6.2                 
[154] iterators_1.0.14            png_0.1-8                   stringi_1.7.12             
[157] irlba_2.3.5.1               future.apply_1.10.0

I have attached the data used to produce this error.

Let me know if I can provide any more information. Thank you!

pbmc_complex.rds.zip

Renaming genes

Hello,

I was wondering if tidyseurat is able to rename genes? I know that in base seurat, this is very difficult and often requires renaming the rows prior to constructing the seurat object and then rerunning all of the downstream processes. If possible, I would like to map orthologous genes to compare single cell data between species.

Thanks!

-Alex

documentation with real data, ie SRS/SRA number

Hi,
Thanks for developing this great package. It's single cell and it's tidyverse, OMG ;)

Although it was helpful to use pbmc_small sample data for documentation, I struggled to use the package with real data from PanglaoDB.

What is the suggested way to process data from PangloaDB?
Should I be doing this:

rPanglaoDB::get_samples("SRS3296611") %>%
  tidyseurat::as_tibble()

Or this:

rPanglaoDB::get_samples("SRS3296611") %>% 
  tidyseurat::join_features(all=T)

If there's any sample code using SRS/SRA numbers and you can direct me to that, I'll be glad..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.