Giter Club home page Giter Club logo

humanpilot's Introduction

HumanPilot

DOI

Welcome to the spatialLIBD project! It is composed of the HumanPilot described here as well as:

This web application allows you to browse the LIBD human dorsolateral pre-frontal cortex (DLPFC) spatial transcriptomics data generated with the 10x Genomics Visium platform. Through the R/Bioconductor package you can also download the data as well as visualize your own datasets using this web application. Please check the bioRxiv pre-print for more details about this project.

If you tweet about this website, the data or the R package please use the #spatialLIBD hashtag. You can find previous tweets that way as shown here. Thank you!

Study design

As a quick overview, the data presented here is from portion of the DLPFC that spans six neuronal layers plus white matter (A) for a total of three subjects with two pairs of spatially adjacent replicates (B). Each dissection of DLPFC was designed to span all six layers plus white matter (C). Using this web application you can explore the expression of known genes such as SNAP25 (D, a neuronal gene), MOBP (E, an oligodendrocyte gene), and known layer markers from mouse studies such as PCP4 (F, a known layer 5 marker gene).

This web application was built such that we could annotate the spots to layers as you can see under the spot-level data tab. Once we annotated each spot to a layer, we compressed the information by a pseudo-bulking approach into layer-level data. We then analyzed the expression through a set of models whose results you can also explore through this web application. Finally, you can upload your own gene sets of interest as well as layer enrichment statistics and compare them with our LIBD Human DLPFC Visium dataset.

If you are interested in running this web application locally, you can do so thanks to the spatialLIBD R/Bioconductor package that powers this web application as shown below.

## Run this web application locally
spatialLIBD::run_app()

## You will have more control about the length of the
## session and memory usage.

## You could also use this function to visualize your
## own data given some requirements described
## in detail in the package vignette documentation
## at http://research.libd.org/spatialLIBD/.

Shiny website mirrors

R/Bioconductor package

The spatialLIBD package contains functions for:

  • Accessing the spatial transcriptomics data from the LIBD Human Pilot project (code on GitHub) generated with the Visium platform from 10x Genomics. The data is retrieved from Bioconductor’s ExperimentHub.
  • Visualizing the spot-level spatial gene expression data and clusters.
  • Inspecting the data interactively either on your computer or through spatial.libd.org/spatialLIBD/.

For more details, please check the documentation website or the Bioconductor package landing page here.

Installation instructions

Get the latest stable R release from CRAN. Then install spatialLIBD using from Bioconductor the following code:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("spatialLIBD")

Access the data

Through the spatialLIBD package you can access the processed data in it’s final R format. However, we also provide a table of links so you can download the raw data we received from 10x Genomics.

Processed data

Using spatialLIBD you can access the Human DLPFC spatial transcriptomics data from the 10x Genomics Visium platform. For example, this is the code you can use to access the layer-level data. For more details, check the help file for fetch_data().

## Load the package
library("spatialLIBD")

## Download the spot-level data
spe <- fetch_data(type = "spe")

## This is a SpatialExperiment object
spe
#> class: SpatialExperiment 
#> dim: 33538 47681 
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
#>   ENSG00000268674
#> rowData names(9): source type ... gene_search is_top_hvg
#> colnames(47681): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
#>   TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
#> colData names(66): sample_id Cluster ... spatialLIBD ManualAnnotation
#> reducedDimNames(6): PCA TSNE_perplexity50 ... TSNE_perplexity80
#>   UMAP_neighbors15
#> mainExpName: NULL
#> altExpNames(0):
#> spatialData names(3) : in_tissue array_row array_col
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor

## Note the memory size
lobstr::obj_size(spe) 
#> 2.04 GB

## Remake the logo image with histology information
vis_clus(
    spe = spe,
    clustervar = "spatialLIBD",
    sampleid = "151673",
    colors = libd_layer_colors,
    ... = " DLPFC Human Brain Layers\nMade with github.com/LieberInstitute/spatialLIBD"
)

Raw data

You can access all the raw data through Globus (jhpce#HumanPilot10x). Furthermore, below you can find the links to the raw data we received from 10x Genomics.

SampleID h5_filtered h5_raw image_full image_hi image_lo loupe HTML_report
151507 AWS AWS AWS AWS AWS AWS GitHub
151508 AWS AWS AWS AWS AWS AWS GitHub
151509 AWS AWS AWS AWS AWS AWS GitHub
151510 AWS AWS AWS AWS AWS AWS GitHub
151669 AWS AWS AWS AWS AWS AWS GitHub
151670 AWS AWS AWS AWS AWS AWS GitHub
151671 AWS AWS AWS AWS AWS AWS GitHub
151672 AWS AWS AWS AWS AWS AWS GitHub
151673 AWS AWS AWS AWS AWS AWS GitHub
151674 AWS AWS AWS AWS AWS AWS GitHub
151675 AWS AWS AWS AWS AWS AWS GitHub
151676 AWS AWS AWS AWS AWS AWS GitHub

Citation

Below is the citation output from using citation('spatialLIBD') in R. Please run this yourself to check for any updates on how to cite spatialLIBD.

citation('spatialLIBD')
#> 
#> To cite package 'spatialLIBD' in publications use:
#> 
#>   Pardo B, Spangler A, Weber LM, Hicks SC, Jaffe AE, Martinowich K,
#>   Maynard KR, Collado-Torres L (2022). "spatialLIBD: an R/Bioconductor
#>   package to visualize spatially-resolved transcriptomics data." _BMC
#>   Genomics_. doi:10.1186/s12864-022-08601-w
#>   <https://doi.org/10.1186/s12864-022-08601-w>,
#>   <https://doi.org/10.1186/s12864-022-08601-w>.
#> 
#>   Maynard KR, Collado-Torres L, Weber LM, Uytingco C, Barry BK,
#>   Williams SR, II JLC, Tran MN, Besich Z, Tippani M, Chew J, Yin Y,
#>   Kleinman JE, Hyde TM, Rao N, Hicks SC, Martinowich K, Jaffe AE
#>   (2021). "Transcriptome-scale spatial gene expression in the human
#>   dorsolateral prefrontal cortex." _Nature Neuroscience_.
#>   doi:10.1038/s41593-020-00787-0
#>   <https://doi.org/10.1038/s41593-020-00787-0>,
#>   <https://www.nature.com/articles/s41593-020-00787-0>.
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

HumanPilot code

Re-shaping your data to our structure

As described in the spatialLIBD vignette, you can see the scripts in this repository for re-shaping your data to look like ours. That is.

  • reorganize_folder.R available here re-organizes the raw data we were sent by 10x Genomics.
  • Layer_Notebook.R available here reads in the Visium data and builds a list of RangeSummarizedExperiment() objects from SummarizedExperiment, one per sample (image) that is eventually saved as Human_DLPFC_Visium_processedData_rseList.rda.
  • convert_sce.R available here reads in Human_DLPFC_Visium_processedData_rseList.rda and builds an initial sce object with image data under metadata(sce)$image which is a single data.frame. Subsetting doesn’t automatically subset the image, so you have to do it yourself when plotting as is done by sce_image_clus_p() and sce_image_gene_p(). Having the data from all images in a single object allows you to use the spot-level data from all images to compute clusters and do other similar analyses to the ones you would do with sc/snRNA-seq data. The script creates the Human_DLPFC_Visium_processedData_sce.Rdata file.
  • sce_scran.R available here then uses scran to read in Human_DLPFC_Visium_processedData_sce.Rdata, compute the highly variable genes (stored in our final sce object at rowData(sce)$is_top_hvg), perform dimensionality reduction (PCA, TSNE, UMAP) and identify clusters using the data from all images. The resulting data is then stored as Human_DLPFC_Visium_processedData_sce_scran.Rdata and is the main object used throughout our analysis code (Maynard, Collado-Torres, Weber, Uytingco, et al., 2021).
  • make-data_spatialLIBD.R available in the source version of spatialLIBD and online here is the script that reads in Human_DLPFC_Visium_processedData_sce_scran.Rdata as well as some other outputs from our analysis and combines them into the final sce and sce_layer objects provided by spatialLIBD (Pardo, Spangler, Weber, Hicks, et al., 2022). This script simplifies some operations in order to simplify the code behind the shiny application provided by spatialLIBD.

10X directory

Contains some of the raw files provided by 10X. Given their size, we only included the small ones here. ## Analysis directory

The README.md was the one we initially prepared for our collaborators at an early stage of the project. That README file described some of our initial explorations using packages such as scran, zinbwave and other approaches such as using k-means with X/Y spatial information. These analyses were not used for our manuscript beyond creating the sce object we previously described.

The 10x Genomics file structure is replicated inside Histology where we saved the image segmentation analysis output to estimate the number of cells per spot. This involved running some histology image segmentation software and the counting code that requires our file structure (sgeID input).

The main layer-level analysis code is located at Layer_Guesses, for example layer_specificity.R is the R script for pseudo-bulking the spot-level data to create the layer-level data. The spatialLIBD layer annotation files are saved in the First_Round and Second_Round directories which you can upload to the shiny web application.

We also include directories with code for processing external datasets such as he_layers, allen_data, hafner_vglut.

We would like to highlight that a lot of the plotting code and functionality from these scripts has been implemented in spatialLIBD which would make a lot of our analysis simpler. Finally, for reproducibility purposes we included the R session information in many of our R scripts. Although in general we used R 3.6.1 and 3.6.2 with Bioconductor release 3.10.

outputs directory

Contains outputs from the different unsupervised, semi-supervised, known gene marker based and other clustering results. The analysis code that generates these CSV files is located inside R Markdown files at the Analysis directory such as SpatialDE_clustering.Rmd.

Bibliography

[1] K. R. Maynard, L. Collado-Torres, L. M. Weber, C. Uytingco, et al. “Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex”. In: Nature Neuroscience (2021). DOI: 10.1038/s41593-020-00787-0. https://www.nature.com/articles/s41593-020-00787-0.

[2] B. Pardo, A. Spangler, L. M. Weber, S. C. Hicks, et al. “spatialLIBD: an R/Bioconductor package to visualize spatially-resolved transcriptomics data”. In: BMC Genomics (2022). DOI: 10.1186/s12864-022-08601-w. https://doi.org/10.1186/s12864-022-08601-w.

Internal

  • JHPCE location: /dcs04/lieber/lcolladotor/with10x_LIBD001/HumanPilot
  • Main sce R object file: /dcs04/lieber/lcolladotor/with10x_LIBD001/HumanPilot/Analysis/Human_DLPFC_Visium_processedData_sce_scran.Rdata.

humanpilot's People

Contributors

andrewejaffe avatar lcolladotor avatar lmweber avatar stephaniehicks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

humanpilot's Issues

homology analyses of previous marker genes

can you take the genes in the previous layer marker table, and make a boxplot of mean expression in brainseq phase2 DLPFC, within fetal and then adult? ie 4 boxes: fetal:layerMarker, fetal:notLayerMarker, adult:layerMarker, adult:nonLayerMarker (where there are way more points in the nonLayerMarker boxes)

Multiple samples in a layer guess file

The layer guess file for samples 151507 and 151669

151507,"TTGTTTCCATACAACT-1","Layer 6"
151507,"TTGTTTGTGTAAATTC-1","Layer 1"
151669,"AAACCTAAGCAGCCGG-1","WM"
151669,"AAACGGTTGCGAACTG-1","WM"

151507,"TTGTTTCCATACAACT-1","Layer 6"
151507,"TTGTTTGTGTAAATTC-1","Layer 1"
151669,"AAACAAGTATCTCCCA-1","Layer 4"
151669,"AAACACCAATAACTGC-1","Layer 6"

contain multiple samples.

how to donwload fastq/bam data on blobus

Dear spatialLIBD team,

Thanks for sharing the awesome resources.

I'd like to download the fastq/bam files from http://research.libd.org/globus/

I log in to the platform but the download button is gray.

image

Then, I click on Transfer or Sync to ... and the start button but it still doesn't work. Do I miss anything? Any comments are highly appreciated:)

image

Looking for "Ground Truth"

Hello, I am currently exploring the dataset for a project that I am doing but I can't seem to find a comprehensive documentation on the available data. I am specifically looking for the ground truth but I am not quite sure if I can use layer_guess.

Any help would be appreciated.

Thanks!

Spatially-registered DLPFC snRNA-seq data from paper?

Hello! Thank you for your great work on this package. I found it very easy to use and your documentation is excellent. I have a question regarding the data availability from your paper.
In your paper (Maynard et al 2021 Nat. Neurosci), you spatially registered three independent snRNA-seq datasets, and one of these datasets was generated by your research team. I am wondering whether this annotated snRNA-seq dataset is publicly available. (From my understanding, the raw data provided at the Globus link is Visium data only, and the data which is available for download through the spatialLIBD package is also Visium-based.) Please let me know if there is any way I can access this data. Thank you!

Error in `collect()`: ! Failed to collect lazy table.

Dear spatialLIBD team,
I have completed the installation of spatialLIBD, but when executing spe <- fetch_data(type = "spe"), an error occurred. The error message is as follows:
image

I would like to understand the cause of this error.

Provide more outputs from the cellranger output

Hello,

I was wondering if it was possible to also upload aligned_fiducials.jpg, scalefactors_json.json, and tissue_positions_list.csv from the spatial output in cellranger output?

Thanks,
Chang

naming of columns 5 and 6

bcs[[i]] <- read.csv(tissue_paths[i],col.names=c("barcode","tissue","row","col","imagerow","imagecol"), header = FALSE)

In Lukas' code he refers to column 5 of that table as the columns and column 6 as the rows whereas your code does the reverse.

We need to decide which is correct.

Images quiality

There are quality problems with pathology images, and noise that should not be present can be seen when the original image is enlarged. Are there no high quality pathology images available for download? Or ISH images.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.