bodegalab / irescue Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 1.0 17.12 MB

Uncertainty-aware quantification of Transposable Elements expression in scRNA-seq

License: MIT License

Python 100.00%

bioinformatics scrna-seq scrnaseq single-cell single-cell-rna-seq transposable-elements

irescue's People

Contributors

Stargazers

Watchers

Forkers

bepoli

irescue's Issues

Better error handling

The program should raise errors in case something in the input files is wrong, avoiding crashing with uninformative messages such as those dealt with in #1.
Keeping here a list of errors/exceptions to implement:

requirement not met (error if not present, warning if version is not supported)
cell barcode and UMI tags not found in BAM
- [might check for header if STARsolo attributes are not included (https://github.com//issues/1#issuecomment-1431680110)
BAM reference names not matching with BED
couldn't download the annotation data for a genome assembly (for any reason)

ValueError: range() arg 3 must not be zero

Hi,
Thanks for the nice job in bioRxiv, hope it will be successfully accepted in a good journal.
Now I want to use it to quantificate my single-cell RNA-seq data.
An error occurred when I ran the command
nohup ~/anaconda3/envs/irescue/bin/irescue -b possorted_genome_bam.bam -p 8 -r /public1/home/sc60481/Axolotl/sc-RNA/03.deal.TE/All.TE.deal.bed -w ./filtered_feature_bc_matrix/barcodes.tsv.gz &. I am not sure what caused the error.
Hope for your reply and help.

Thanks for your time and work.

IRescue error: Traceback (most recent call last): File "/apps/software/gcc-12.1.0/python/3.10.5/bin/irescue", line 8, in <module> sys.exit(main())

I am trying to run IRescue on 10X samples that were aligned using STARSolo and I am getting an error I do not understand. I was wondering if you could help me.

My submission script is:
#!/bin/bash -l
#SBATCH --job-name=IRescue
#SBATCH --account=tcmartinez
#SBATCH --partition=tier2q
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --time=48:00:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=64gb
#SBATCH --output=/gpfs/data/mcnerney-lab/Tanner/TCM230/ir.out
#SBATCH --error=/gpfs/data/mcnerney-lab/Tanner/TCM230/ir.err

module load gcc/12.1.0
module load python/3.10.5
module load samtools/1.18
module load bedtools/2.30.0

irescue -b /gpfs/data/mcnerney-lab/Tanner/TCM230/STARSolo/Aligned.sortedByCoord.out.bam
-g mm10
-p 8
-w /gpfs/data/mcnerney-lab/Tanner/TCM230/STARSolo/whitelist/Anames.tsv\

And the error message I am getting is:

[01/15/2024 - 13:11:21] IRescue job starts
[01/15/2024 - 13:11:21] Found CB and UR tags occurrence in bam's line 1.
[01/15/2024 - 13:11:21] Downloading and parsing RepeatMasker annotation for assembly mm10 from https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/initial/mm10.fa.out.gz ...
[01/15/2024 - 13:12:13] WARNING: The following references contain read alignments but are not found in the TE annotation and will be skipped: chr4_JH584295_random, chrM
[01/15/2024 - 13:14:47] Writing mapped barcodes to ./IRescue_out//barcodes.tsv.gz
[01/15/2024 - 13:14:47] Writing mapped features to ./IRescue_out//features.tsv.gz
Traceback (most recent call last):
File "/apps/software/gcc-12.1.0/python/3.10.5/bin/irescue", line 8, in
sys.exit(main())
File "/apps/software/gcc-12.1.0/python/3.10.5/lib/python3.10/site-packages/irescue/main.py", line 101, in main
bc_per_thread = list(split_bc(barcodes_file, args.threads))
File "/apps/software/gcc-12.1.0/python/3.10.5/lib/python3.10/site-packages/irescue/count.py", line 133, in split_bc
for chunk in split_int(bclen, n):
File "/apps/software/gcc-12.1.0/python/3.10.5/lib/python3.10/site-packages/irescue/count.py", line 119, in split_int
for i in range(0, num, split):
ValueError: range() arg 3 must not be zero

Confued number of clusters by TE matrix

Hi beboli,
It is still me. After I successfully ran the irescue and got the three files (matrix.mtx.gz,features.tsv.gz and barcodes.tsv.gz) of each time point. I ran the command to add the TE assay into the RNA assay.
dpa0.data <- Read10X(data.dir = "/public1/home/sc60481/Axolotl/sc-RNA/dpa0/outs/filtered_feature_bc_matrix")
dpa0 <- CreateSeuratObject(counts = dpa0.data, project = "dpa0", min.cells = 3, min.features = 100)
dpa0.te.data <- Seurat::Read10X('./dpa0/outs/IRescue_out/', gene.column = 1, cell.column = 1)
te.assay <- Seurat::CreateAssayObject(dpa0.te.data)
te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(dpa0))])
dpa0[['TE']] <- te.assay

As the scRNA-seq data has been analyzed and intergrated with annotations of celltype info before I ran irescue, I found that the TE assay of each stage can not be added to the previous seurat object.
Then I re-ran each stage follow aforementioned commands and merged all my seven stages by Harmony and ran the normalization, scale and findcluster analysis based on this object.

As the species I used has 48 subfamilies of TE, the the TE matrix is 48 subfamilies × N cell.

Am I right? I can not understand this TE matrix for why not the matrix is each TE × N cell.
The second confusion of mine is when I ran FindClusters with resolution <1.0， I can only get 3 clusters, while resolution >1.0 (I have try 1.0001),the number of clusters increased to ~9000.
I think I must make something errors. Hope you can help me.
Thank you very much.
Xiangyu

bodegalab / irescue Goto Github PK

irescue's People

Contributors

Stargazers

Watchers

Forkers

irescue's Issues

Better error handling

ValueError: range() arg 3 must not be zero

IRescue error: Traceback (most recent call last): File "/apps/software/gcc-12.1.0/python/3.10.5/bin/irescue", line 8, in <module> sys.exit(main())

Confued number of clusters by TE matrix

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent