bodegalab / irescue Goto Github PK
View Code? Open in Web Editor NEWUncertainty-aware quantification of Transposable Elements expression in scRNA-seq
License: MIT License
Uncertainty-aware quantification of Transposable Elements expression in scRNA-seq
License: MIT License
The program should raise errors in case something in the input files is wrong, avoiding crashing with uninformative messages such as those dealt with in #1.
Keeping here a list of errors/exceptions to implement:
Hi,
Thanks for the nice job in bioRxiv, hope it will be successfully accepted in a good journal.
Now I want to use it to quantificate my single-cell RNA-seq data.
An error occurred when I ran the command
nohup ~/anaconda3/envs/irescue/bin/irescue -b possorted_genome_bam.bam -p 8 -r /public1/home/sc60481/Axolotl/sc-RNA/03.deal.TE/All.TE.deal.bed -w ./filtered_feature_bc_matrix/barcodes.tsv.gz &
. I am not sure what caused the error.
Hope for your reply and help.
Thanks for your time and work.
I am trying to run IRescue on 10X samples that were aligned using STARSolo and I am getting an error I do not understand. I was wondering if you could help me.
My submission script is:
#!/bin/bash -l
#SBATCH --job-name=IRescue
#SBATCH --account=tcmartinez
#SBATCH --partition=tier2q
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --time=48:00:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=64gb
#SBATCH --output=/gpfs/data/mcnerney-lab/Tanner/TCM230/ir.out
#SBATCH --error=/gpfs/data/mcnerney-lab/Tanner/TCM230/ir.err
module load gcc/12.1.0
module load python/3.10.5
module load samtools/1.18
module load bedtools/2.30.0
irescue -b /gpfs/data/mcnerney-lab/Tanner/TCM230/STARSolo/Aligned.sortedByCoord.out.bam
-g mm10
-p 8
-w /gpfs/data/mcnerney-lab/Tanner/TCM230/STARSolo/whitelist/Anames.tsv\
And the error message I am getting is:
[01/15/2024 - 13:11:21] IRescue job starts
[01/15/2024 - 13:11:21] Found CB and UR tags occurrence in bam's line 1.
[01/15/2024 - 13:11:21] Downloading and parsing RepeatMasker annotation for assembly mm10 from https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/initial/mm10.fa.out.gz ...
[01/15/2024 - 13:12:13] WARNING: The following references contain read alignments but are not found in the TE annotation and will be skipped: chr4_JH584295_random, chrM
[01/15/2024 - 13:14:47] Writing mapped barcodes to ./IRescue_out//barcodes.tsv.gz
[01/15/2024 - 13:14:47] Writing mapped features to ./IRescue_out//features.tsv.gz
Traceback (most recent call last):
File "/apps/software/gcc-12.1.0/python/3.10.5/bin/irescue", line 8, in
sys.exit(main())
File "/apps/software/gcc-12.1.0/python/3.10.5/lib/python3.10/site-packages/irescue/main.py", line 101, in main
bc_per_thread = list(split_bc(barcodes_file, args.threads))
File "/apps/software/gcc-12.1.0/python/3.10.5/lib/python3.10/site-packages/irescue/count.py", line 133, in split_bc
for chunk in split_int(bclen, n):
File "/apps/software/gcc-12.1.0/python/3.10.5/lib/python3.10/site-packages/irescue/count.py", line 119, in split_int
for i in range(0, num, split):
ValueError: range() arg 3 must not be zero
Hi beboli,
It is still me. After I successfully ran the irescue and got the three files (matrix.mtx.gz,features.tsv.gz and barcodes.tsv.gz) of each time point. I ran the command to add the TE assay into the RNA assay.
dpa0.data <- Read10X(data.dir = "/public1/home/sc60481/Axolotl/sc-RNA/dpa0/outs/filtered_feature_bc_matrix")
dpa0 <- CreateSeuratObject(counts = dpa0.data, project = "dpa0", min.cells = 3, min.features = 100)
dpa0.te.data <- Seurat::Read10X('./dpa0/outs/IRescue_out/', gene.column = 1, cell.column = 1)
te.assay <- Seurat::CreateAssayObject(dpa0.te.data)
te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(dpa0))])
dpa0[['TE']] <- te.assay
As the scRNA-seq data has been analyzed and intergrated with annotations of celltype info before I ran irescue, I found that the TE assay of each stage can not be added to the previous seurat object.
Then I re-ran each stage follow aforementioned commands and merged all my seven stages by Harmony and ran the normalization, scale and findcluster analysis based on this object.
As the species I used has 48 subfamilies of TE, the the TE matrix is 48 subfamilies × N cell.
Am I right? I can not understand this TE matrix for why not the matrix is each TE × N cell.
The second confusion of mine is when I ran FindClusters with resolution <1.0, I can only get 3 clusters, while resolution >1.0 (I have try 1.0001),the number of clusters increased to ~9000.
I think I must make something errors. Hope you can help me.
Thank you very much.
Xiangyu
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.