Recovering significant 10xGenomics single-cell RNA-seq signal through improved annotation with Oxford Nanopore (ONT) bulk long reads. Applied on chick embryo neuro-epithelial progenitors at 66h of development.
GitHub repository started on 2021-04-02.
- data - Data used for the analyses (e.g.
BAM
,RDS
,tsv
files, ...) - docs - Rendered analysis reports (as
html
files) and figures generated by the notebook - env - Code used to create the conda environment for this study (e.g.
bash
script) and the correspondingYAML
file. - figs - Figures created before the notebook creation and used as input
- rmd - R Markdown analysis files
- src - Reusable code (e.g. functions)
Files found at the root of the repository are of general purpose:
_bookdown.yml
- ConfigurationYAML
file for the notebook_build.log
- Log file (stdout
andstderr
) generated when running the analysis_build.sh
- Bash script to run the analysis_deploy.sh
- Bash script to deploy the analysis on GitHubindex.Rmd
- R Markdown file used for the setup of the analysis (load libraries, define variables, paths...)README.md
- This file_workflowr.yml
- Workflowr configuration file
Current input data files include:
- notebook # javascript code for bookdown output (do not touch)
- processed # output generated when running the notebook
- raw # raw data (input)
- references
- annotations
- ensembl
- ncbi
- ucsc
- rna-seq
- single-cell
- long-read
Current notebook files include:
01-Impact-ref-annotation-scRNA.Rmd
Here we explore the discrepancies between the references annotations (Ensembl and RefSeq) and their impact on common scRNA-seq analyses.
02-Incomplete-annotations-induce-signal-loss.Rmd
We study here the loss in scRNA-seq signal (e.g. genes) due to significant deficiencies in the reference annotation, specially in the 3' UTRs annotations.
03-Approaches-to-improve-transcriptome-with-Long-Reads.Rmd
We compare various tools dedicated to transcriptome reconstruction in bulk RNA-seq (StringTie2, scallop), a dedicated signal detection approach and broad 3' UTR extension and apply them to scRNA-seq data and ONT bulk long reads.
04-Impact-reannotation-scRNA.Rmd
We assess the impact of our various reannotations on common scRNA-seq analyses.
05-Validation-of-novel-genes-with-scRNA.Rmd
We evaluate the ability of our approach to identify novel genes and use scRNA-seq analyses as a filter to highlight genes of biological interest in chick embryo neuro-epithelial progenitors.
06-A-tool-and-pipeline-to-improve-annotation-for-scRNA.Rmd
Description and recommendations to use our pipeline on other scRNA-seq data.
07-Session-info.Rmd
Session info output.
Current code files include:
- analysis
- pipeline
- preprocessing
- utilities
To run the notebook and create the corresponding html
files, you have two options:
- In RStudio, click the
knit
button (you may need to change the knit directory) - In a linux terminal, run the script _build.sh with the command:
bash _build.sh
The output will be stored in the docs folder.