bioinfo-biols / ciriquant Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 17.0 264 KB

circular RNA quantification tools

Home Page: https://sourceforge.net/projects/ciri/files/CIRIquant

License: MIT License

Python 46.49% Perl 51.31% R 1.88% Shell 0.11% Makefile 0.21%

ciriquant's People

Contributors

Stargazers

Watchers

Forkers

xflicsu xjyx tw7649116 mywanuo zxclovezby hugs314 likelet osagiei tyrev cleliacort biocard barrydigby song820 saeid136871 chiangtw ji-jinlong boltzz98

ciriquant's Issues

chr1.fa.bwt, not found

I downloaded this file test_data.tar.gz and run your example command and got this error

File: ./chr1.fa.bwt, not found

About CircAtlas sequence length

When using CIRIquant, we are recommended to use CircAtlas to obtain files like GTF files, reference genomes and whatnot.
Are the sequences of circRNAs present in CircAtlas total or partial?
How is it possible that some sequences have no more than 40 nucleotides? Are these sequences reliable? Are they really from real circRNAs? And how is it possible that there are no circRNA sequences with more than what 2k (2000) nucleotides?
Have a nice week

More Precise Definition of Options

I was reading the user guide and I saw that --tool is described as "User provided tool name for circRNA prediction" and I was wondering if this was any name the user would like to use to refer to the tool or if there are a limited set of valid values. I found the answer to my uncertainty by reading the source code and noticing:

TOOLS = ['CIRI2']
if tool not in TOOLS:

This information should be in the user guide and any other parameter which has a restricted set of allowed values should show all of the allowed values in the user guide. I also wonder if there could be a vignette showing the use of CIRI2, CIRIquant and CIRI-vis, to show users a full end-to-end workflow. For example, maftools has a nice vignette showing all of its major features using a built-in AML data set.

Alignment does not end

Hi thank you for developing this cool tool!
I am running ~50 samples at the same time, most of them finished in a few hours but some of them were alining the reads even after three days.
Some of the long taking samples finished the run after three days.

I am using paired end read fastq, the size is about 3~5Gb for each file.
Happy to have any suggestions to solve this situation.

why miss so many circRNA to putout ?

Dear author,
I had input a bed containing 135 circRNAs from DCC, but got only 8 in final gtf, which were all in Chr1.
In addition, there were 127 circRNAs in {sample_nm}_index.fa but only 47 in {sample_nm}_denovo.sorted.bam.

So, why there were so many gaps?

Thanks a lot.

duplicate rows R error in CIRI_DE_replicate

Hi Jyniang,
The program is now working smoothly under python 2.7. Thanks again!
I'm just encountering an R error (duplicated rows) in the very final step (CIRI_DE_replicate):
###################################################
[Fri 2021-03-12 17:26:31] [INFO ] Library information: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/NvsC-library_info.csv
[Fri 2021-03-12 17:26:31] [INFO ] circRNA expression matrix: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/NvsC-circRNA_bsj.csv
[Fri 2021-03-12 17:26:31] [INFO ] gene expression matrix: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/gene_count_matrix.csv
[Fri 2021-03-12 17:26:31] [INFO ] Output DE results: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/03-CIRI_DE_replicate/NvsC/NvsC-circRNA_de.tsv
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
Calls: read.csv -> read.table
In addition: Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
Execution halted
[Fri 2021-03-12 17:26:31] [INFO ] Finished!
###################################################

The weird thing is that I couldn't find any duplicated row in both circRNA_bsj.csv and gene_count_matrix.csv files.
Any clue on how to solve this?

Thanks very much,
Best,
Elton

Small gtf file issue

Rerun a previously used dataset by me on CIRIquant. Getting smaller output gtf file when compared to my previous run. All parameters and files used are the same...I'm wondering wat the issue could be...

Automatically Delete Intermediate Files

total 21G
-rw-r----- 1 dario biostat 5.5M Apr 11 17:32 184136.sorted.bam.bai
-rw-r----- 1 dario biostat 7.8G Apr 11 17:32 184136.sorted.bam
-rw-r----- 1 dario biostat  14G Apr 11 17:26 184136.bam

Unsorted BAM file is quite big. May the pipeline automatically delete all of them to save disk space?

Issue with using find_circ prediction results for CIRIquant

I have predicted circRNAs in my samples using find_circ and am using the CIRIquant command to requantify the predicted circRNAs. This has been mostly successful insofar, however, lately I have been running into a particular error for certain sample files that looks like this:

[Tue 2021-06-15 06:03:18] [INFO ] De novo alignment for circular RNAs ..
Error reading _rstarts[] array: 592600, 596640
Error: Encountered internal HISAT2 exception (#1)
Command: /gpfs/ycga/project/ycga/ygc/pio2/conda_envs/ciriquant/bin/hisat2-align-l --wrapper basic-0 -p 20 --dta -q -x /home/pio2/scratch60/2021_sharma/data/tmp/GC_tmp/CIRIquant_findcirc_quant/circ/103658-001-165_find_circ_index --read-lengths 151,150,136,141,138,133,143,135,139,137,140,131,127,122,134,125,142,132,128,120,116,126,130,119,123,118,117,114,113,111,129,112,108,124,121,106,109,110,149,115,105,107,104,102,100,101,103,95,81,97,90,71,92,99,98,96,94,88,89,87,61,63,31,62,58,56,47,46,41,32,86,80,68,35,147,144,93,84,82,69,67,66,51,45,39,38,34,33,148,91,79,77,54,53,30,145,85,75,57,37,36,83,76,74,59,52,43,40,70,65,64,60,49,42,146,78,73,72,50,48,44,55 -1 /tmp/6579.inpipe1 -2 /tmp/6579.inpipe2
(ERR): hisat2-align exited with value 1

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
[Tue 2021-06-15 06:03:24] [INFO ] Detecting reads containing Back-splicing signals

This is the command I am running:

CIRIquant -t 20 -1 103658-001-165_merged_R1_trimmed_paired.fastq.gz -2 103658-001-165_merged_R2_trimmed_paired.fastq.gz --config sharma_ciriquant_config.yml -o CIRIquant_findcirc_quant -p 103658-001-165_find_circ --circ findcirc_final_output/103658-001-165_CIRCULAR_splice_sites.bed -e CIRIquant_findcirc_quant/logs/103658-001-165_find_circ.log -l 2 --tool find_circ --bam hisat_bam_files/103658-001-165_hisat_align.bam

Do you have any insights into why this error is occurring and what I may do to address it? I can confirm that I was able to run find_circ on this data set (to generate the CIRCULAR_splice_sites.bed file) without error.

Thank you!

quantification and differential expression analysis for RNase R treated samples

Hi kevin,
I'm not very clear the steps of "Generate RNase R effect corrected BSJ information" . Should I run CIRIquant with RNase R treated samples 2 times and add --RNaseR option with "the output gtf of the first time" on the second time? Moreover, how cao I do the differential expression analysis for RNase R treated samples? I'd appreciate it if you could give me some advice. Thanks!

Very low numbers for circular RNAs

Hey,

I am working on a dataset from an experiment that is specifically designed to induce circRNAs. But when I look in the 'library_info.csv', the number of circular RNA reads are very small.

Sample,Total,Mapped,Circular,Group,Subject
S1,55730760,51498102,5530,T,1
S2,66246882,61594648,6730,T,2
S3,40042314,37079726,3940,T,3
Cl1,57359160,53514152,8206,T,1
Cl2,58229712,53897216,7804,T,2
Cl3,45110986,41606718,5644,T,3

If I look in the 'gene_count_matrix', I get numbers like:
head gene_count_matrix.csv
gene_id,S1,S2,S3,Cl1,Cl2,Cl3
ENSG00000132680|KHDC4,7883,9307,5896,2073,2154,1356
ENSG00000145041|DCAF1,3921,4567,2998,11977,12348,8927

I am not sure what I am misunderstanding in this regards. It seems that I am losing the circular RNAs at some level, but I am not sure where.

Cannot split into 12 (16) pieces with size of 6246233965 and named them as /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-noD-R_unmapped.sam. Fatal error. Aborted.

Hi
I encounter this error when running
CIRIquant -t 4 -1 output/fastqs/APO-1-R_1.Tr.fq.gz -2 output/fastqs/APO-1-R_2.Tr.fq.gz --config ciriconfig_full.yaml --library-type 2 -o output/ -p APO-1-R -t 12

log:

[Mon May 30 18:19:05 2022] Loading reference
[Mon May 30 18:19:39 2022] Requesting system to split SAM into 12 pieces
[Mon May 30 18:19:05 2022] CIRI begins running
[Mon May 30 18:19:05 2022] Loading reference
[Mon May 30 18:19:39 2022] Requesting system to split SAM into 12 pieces
Divided SAM sizes:
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samaa	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samab	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samac	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samad	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samae	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samaf	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samag	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samah	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samai	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samaj	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samak	     6136420186
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samal	     6136420184
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samam	     4602315006
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.saman	     4602315423
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samao	     4602315554
/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.samap	     4602314573
Cannot split /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.sam into 12 (16) pieces with size of 6136420186 and named them as /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/HuR-1-R_unmapped.sam.
Fatal error. Aborted.

    please help, thanks. I am pretty suer I have write access to the output dir

DE with Biological Replicates Step 3 Error

Hi,

I've been working on a set of samples using the differential expression analysis for biological replicates. All steps work fine until I get to using CIRI_DE_replicate, where I get the following error:

[Wed 2020-05-20 17:01:42] [INFO ] Library information: /pylon5/mc5frap/mvanhorn/CD11b_library_info.csv
[Wed 2020-05-20 17:01:42] [INFO ] circRNA expression matrix: /pylon5/mc5frap/mvanhorn/CD11b_circRNA_bsj.csv
[Wed 2020-05-20 17:01:42] [INFO ] gene expression matrix: /pylon5/mc5frap/mvanhorn/gene_count_matrix.csv
[Wed 2020-05-20 17:01:42] [INFO ] Output DE results: /pylon5/mc5frap/mvanhorn/CD11b_circRNA_de.tsv
Error in `[.data.frame`(gene_mtx, , rownames(lib_mtx)) : 
  undefined columns selected
Calls: [ -> [.data.frame
Execution halted
[Wed 2020-05-20 17:01:44] [INFO ] Finished!

There is no .tsv file produced after this runs. I have checked the .csv input files and there don't seem to be any issues with them. I am running the pipeline on a supercomputer, not my local machine, but have made sure that the R environment module is loaded and includes edgeR. Any suggestions that you have on what might be causing this error and how to fix it would be greatly appreciated!

Installation error

Hi kevin,
Thanks for developing such a convenient package. I want to use it but I don't know how to install python. I'd appreciate it if you could give me some advice. Thanks!

.fa is empty, index does not exist

I had an error with CIRIquant. starting from Warning: Empty fasta file: '/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_index.fa' then (ERR): "/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_index" does not exist. I ran with 16 threads. CIRIerror.log is empty. Please help. Thanks

Full log:

[Fri 2022-05-27 17:20:13] [INFO ] Running CIRI2 for circRNA detection ..
[Fri May 27 17:20:14 2022] CIRI begins running
[Fri May 27 17:20:14 2022] Loading reference
[Fri May 27 17:20:47 2022] Requesting system to split SAM into 16 pieces
[Fri May 27 17:20:14 2022] CIRI begins running
[Fri May 27 17:20:14 2022] Loading reference
[Fri May 27 17:20:47 2022] Requesting system to split SAM into 16 pieces
 Divided SAM sizes:
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samaa	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samab	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samac	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samad	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samae	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samaf	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samag	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samah	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samai	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samaj	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samak	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samal	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samam	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.saman	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samao	     4395350130
 /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_unmapped.samap	     4395350121
 SAM was divided successfully.
 First read of divided SAM files: 
 APO-1-R_unmapped.samab: A00475:448:HFLNKDRX2:1:2121:1081:20055
 APO-1-R_unmapped.samac: A00475:448:HFLNKDRX2:1:2141:15637:24533
 APO-1-R_unmapped.samad: A00475:448:HFLNKDRX2:1:2161:14235:34507
 APO-1-R_unmapped.samae: A00475:448:HFLNKDRX2:1:2204:3106:5431
 APO-1-R_unmapped.samaf: A00475:448:HFLNKDRX2:1:2223:16224:31125
 APO-1-R_unmapped.samag: A00475:448:HFLNKDRX2:1:2243:20509:16626
 APO-1-R_unmapped.samah: A00475:448:HFLNKDRX2:1:2262:8730:32424
 APO-1-R_unmapped.samai: A00475:448:HFLNKDRX2:2:2104:7012:25848
 APO-1-R_unmapped.samaj: A00475:448:HFLNKDRX2:2:2125:16559:12696
 APO-1-R_unmapped.samak: A00475:448:HFLNKDRX2:2:2145:27679:10316
 APO-1-R_unmapped.samal: A00475:448:HFLNKDRX2:2:2163:20998:27743
 APO-1-R_unmapped.samam: A00475:448:HFLNKDRX2:2:2204:4562:18067
 APO-1-R_unmapped.saman: A00475:448:HFLNKDRX2:2:2223:21305:14779
 APO-1-R_unmapped.samao: A00475:448:HFLNKDRX2:2:2242:14931:8782
 APO-1-R_unmapped.samap: A00475:448:HFLNKDRX2:2:2260:7943:23265
 APO-1-R_unmapped.samaa: A00475:448:HFLNKDRX2:1:2101:1018:1000
 First reads were recorded successfully.
[Fri May 27 17:39:25 2022] First scanning
[Fri May 27 17:39:25 2022] First scanning
 Worker 1 begins to scan APO-1-R_unmapped.samal.
 Worker 2 begins to scan APO-1-R_unmapped.samaj.
 Worker 3 begins to scan APO-1-R_unmapped.samam.
 Worker 4 begins to scan APO-1-R_unmapped.samad.
 Worker 5 begins to scan APO-1-R_unmapped.samag.
 Worker 6 begins to scan APO-1-R_unmapped.samaf.
 Worker 7 begins to scan APO-1-R_unmapped.samai.
 Worker 8 begins to scan APO-1-R_unmapped.samac.
 Worker 9 begins to scan APO-1-R_unmapped.samah.
 Worker 10 begins to scan APO-1-R_unmapped.samab.
 Worker 11 begins to scan APO-1-R_unmapped.samak.
 Worker 12 begins to scan APO-1-R_unmapped.samae.
[Fri 2022-05-27 17:41:16] [INFO ] Extract circular sequence
[Fri 2022-05-27 17:41:16] [INFO ] Building circular index ..
Settings:
  Output files: "/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_index.*.ht2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_index.fa
Warning: Empty fasta file: '/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_index.fa'
Warning: All fasta inputs were empty
Total time for call to driver() for forward index: 00:00:00
Error: Encountered internal HISAT2 exception (#1)
Command: hisat2-build --wrapper basic-0 -p 16 -f /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_index.fa /oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_index 
[Fri 2022-05-27 17:41:20] [INFO ] De novo alignment for circular RNAs ..
(ERR): "/oasis/tscc/scratch/hsher/circSTAMP_pipe/output/circ/APO-1-R_index" does not exist
Exiting now ...

change the permissions on 'CIRI2.pl' while running CIRIquant on cluster

Hello,

I m trying to run CIRIquant on cluster but I get the error message for CIRI2.pl permission. CIRIquant installed using condo on cluster.
Here is the error message.
[ml98b@ghpcc06 2_S2]$ source /share/pkg/condas/2018-05-11/bin/activate && conda activate CIRIquant_1.1.2
(CIRIquant_1.1.2) [ml98b@ghpcc06 2_S2]$ CIRIquant -t 4 -1 ./data/circRNA_fastq_files/2_S2_L001_R1_001.fastq.gz -2 ./data/circRNA_fastq_files/2_S2_L001_R2_001.fastq.gz --config ./dmel_ENS_BDGP6.yml -p 2_S2
Traceback (most recent call last):
File "/share/pkg/condas/2018-05-11/envs/CIRIquant_1.1.2/bin/CIRIquant", line 8, in
sys.exit(main())
File "/share/pkg/condas/2018-05-11/envs/CIRIquant_1.1.2/lib/python2.7/site-packages/CIRIquant/main.py", line 126, in main
os.chmod(lib_path + '/CIRI2.pl', 0o755)
OSError: [Errno 13] Permission denied: '/share/pkg/conda/2018-05-11/envs/CIRIquant_1.1.2/lib/python2.7/site-packages/libs/CIRI2.pl'

Not able to generate the final circ.gtf output

Dear CIRIquant developers,

After over 1 hour running CIRIquant, I'm getting stuck with following error(s):
#########################################################
[Tue 2021-03-09 03:24:07] [INFO ] Input reads: KH1W_S1318_L004_R1-trimmed_paired.fq.gz,KH1W_S1318_L004_R2-trimmed_paired.fq.gz
[Tue 2021-03-09 03:24:07] [INFO ] Library type: TAKARA SMARTer
[Tue 2021-03-09 03:24:07] [INFO ] Output directory: /nobackup/fbsev/LeedsOmics/Kathy-circRNAs/CIRIquant-run/02-CIRIquant/R_normal_1, Output prefix: R_normal_1
[Tue 2021-03-09 03:24:07] [INFO ] Config: config.yml Loaded
[Tue 2021-03-09 03:24:07] [INFO ] 40 CPU cores availble, using 40
[Tue 2021-03-09 03:24:07] [INFO ] Align RNA-seq reads to reference genome ..
[Tue 2021-03-09 04:02:06] [INFO ] Estimate gene abundance ..
[Tue 2021-03-09 04:08:02] [INFO ] No circRNA information provided, run CIRI2 for junction site prediction ..
[Tue 2021-03-09 04:08:02] [INFO ] Running BWA-mem mapping candidate reads ..
[Tue 2021-03-09 04:16:52] [INFO ] Running CIRI2 for circRNA detection ..
[Tue 2021-03-09 04:30:41] [INFO ] Extract circular sequence
^M[Tue 2021-03-09 04:30:41] [0% ] [..................................................]Traceback (most recent call last):
File "/nobackup/fbsev/bioinformatics-tools/CIRIquant/venv/bin/CIRIquant", line 33, in
sys.exit(load_entry_point('CIRIquant==1.1.1', 'console_scripts', 'CIRIquant')())
File "/nobackup/fbsev/bioinformatics-tools/CIRIquant/venv/lib/python3.7/site-packages/CIRIquant-1.1.1-py3.7.egg/CIRIquant/main.py", line 183, in main
out_file = circ.proc(log_file, thread, bed_file, hisat_bam, rnaser_file, reads, outdir, prefix, anchor, lib_type)
File "/nobackup/fbsev/bioinformatics-tools/CIRIquant/venv/lib/python3.7/site-packages/circ/init.py", line 644, in proc
generate_index(log_file, circ_info, circ_fasta)
File "/nobackup/fbsev/bioinformatics-tools/CIRIquant/venv/lib/python3.7/site-packages/circ/init.py", line 189, in generate_index
chrom_seq = extract_seq(utils.FASTA, chrom_start, chrom_length)
File "/nobackup/fbsev/bioinformatics-tools/CIRIquant/venv/lib/python3.7/site-packages/circ/init.py", line 141, in extract_seq
seq = f.read(length)
TypeError: argument should be integer or None, not 'float'
#########################################################

I'm afraid there are uninitialized variables in the code (e.g. length).
Could anyone share a light on how to better solve this?

Thanks very much,
Best,
Elton
PS: I installed the program through your "virtualenv" recommended way (https://ciriquant-cookbook.readthedocs.io/en/latest/installation.html#install-ciriquant-from-source-code). I needed to fix a couple of issues, such as filling up empty init.py files from both "CIRIquant/venv/lib/python3.7/site-packages/circ" and "CIRIquant/venv/lib/python3.7/site-packages/pipeline" with the content from circ.py and pipeline.py, respectively (pulled from "CIRIquant/venv/lib/python3.7/site-packages/CIRIquant-1.1.1-py3.7.egg/CIRIquant/" dir). Also, I needed to replace "izip_longest" occurrences to "zip_longest" within circ.py. I'm now a bit concerned on keep handling your code several times which may probably mess everything up in a certain point. Hope you can help. Thanks again!

Question regarding RNaseR corretion

Hi Jinyang,

I am trying to use CIRIquant on on our RNaseR samples and I have a couple of questions to seek your advice for.
Looks like after I run RNaseR correction, the corrected counts will be typically much smaller than the original raw counts. Taking your test data as example: in the "case.gtf" file (I assume this is un-corrected RNaseR count), you have circRNA chr1:1192372|1203372 detected and its output is:
circ_id "chr1:1192372|1203372"; circ_type "exon"; bsj 10.0; fsj 47.0; junc_ratio 0.299; gene_id "ENSG00000160087.16"; gene_name "UBE2J2"; gene_type "protein_coding".
This means chr1:1192372|1203372 has 10 BSJ reads and 47 fsj reads in RNaseR sample, right?

Then, if we got to the "case_corrected.gtf", the output for this circRNA is:
circ_id "chr1:1192372|1203372"; circ_type "exon"; bsj 2.287; fsj 183.000; junc_ratio 0.024; rnaser_bsj 10.000; rnaser_fsj 47.000; gene_id "ENSG00000160087.16"; gene_name "UBE2J2"; gene_type "protein_coding";
From here we can tell this circRNA originally had 10 BSJ and 47 FSJ before correction, and after correction, the BSJ count becomes 2.287 (which is much smaller than the uncorrected BSJ count 10), and FSJ becomes 183.

Finally, if we go to the "CIRI_DE_corrected.csv" file, the stats for this circRNA is:

circRNA_ID | Case_BSJ | Case_FSJ | Case_Ratio | Ctrl_BSJ | Ctrl_FSJ | Ctrl_Ratio | DE_score | DS_score
chr1:1192372|1203372 | 2 | 183 | 0.024 | 1 | 177 | 0.011 | 0 | None
We can see that the DE module rounded the 2.287 corrected BSJ counts to 2.

Above is one example, but in general this is what I observe from ciriquant:
(1) by applying the RNaseR correction, the corrected BSJ counts are no longer integers and they can be much smaller than the initial uncorrected BSJ counts.
(2) To run downstream differential analysis (such like DESeq2 that does not allow non-integer counts), I will need to round them to integers first.
Is my understanding right? Thank you very much for your input in advance.

best,

Jinghua

Error in 'Detecting FSJ reads from genome alignment file'

Hello,

first sample was done very well, but my second sample is stucking. My shell output is the following:

-1 SID16126_S10_L001_R1_001.fastq.gz -2 SID16126_S10_L001_R2_001.fastq.gz
[Wed 2022-07-20 15:44:37] [INFO ] Input reads: SID16126_S10_L001_R1_001.fastq.gz,SID16126_S10_L001_R2_001.fastq.gz
[Wed 2022-07-20 15:44:37] [INFO ] Library type: unstranded
[Wed 2022-07-20 15:44:37] [INFO ] Output directory: /home/serbe204/Documents/ciri2, Output prefix: SID16126_S10_L001_R1_001.fastq.gz
[Wed 2022-07-20 15:44:37] [INFO ] Config: hg38 Loaded
[Wed 2022-07-20 15:44:37] [INFO ] 32 CPU cores availble, using 4
[Wed 2022-07-20 15:44:37] [INFO ] Align RNA-seq reads to reference genome ..
[Wed 2022-07-20 16:19:24] [INFO ] Estimate gene abundance ..
[Wed 2022-07-20 16:23:31] [INFO ] No circRNA information provided, run CIRI2 for junction site prediction ..
[Wed 2022-07-20 16:23:31] [INFO ] Running BWA-mem mapping candidate reads ..
[Wed 2022-07-20 18:09:05] [INFO ] Running CIRI2 for circRNA detection ..
[Wed 2022-07-20 18:41:14] [INFO ] Extract circular sequence
[Wed 2022-07-20 18:41:20] [100% ] [##################################################]
[Wed 2022-07-20 18:41:20] [INFO ] Building circular index ..
[Wed 2022-07-20 18:41:29] [INFO ] De novo alignment for circular RNAs ..
[Wed 2022-07-20 19:16:26] [INFO ] Detecting reads containing Back-splicing signals
[Wed 2022-07-20 19:29:59] [INFO ] Detecting FSJ reads from genome alignment file
[Wed 2022-07-20 20:24:23] [INFO ] Merge bsj and fsj results
[Wed 2022-07-20 20:24:23] [INFO ] Loading annotation gtf ..
[Wed 2022-07-20 20:24:31] [INFO ] Output circRNA expression values
[Wed 2022-07-20 20:24:32] [WARNING] chrom of contig "KI270711.1" not in annotation gtf, please check
[Wed 2022-07-20 20:24:41] [INFO ] circRNA Expression profile: SID16126_S10_L001_R1_001.fastq.gz.gtf
[Wed 2022-07-20 20:24:41] [INFO ] Finished!
-1 SID16127_S11_L001_R1_001.fastq.gz -2 SID16127_S11_L001_R2_001.fastq.gz
[Wed 2022-07-20 20:24:42] [INFO ] Input reads: SID16127_S11_L001_R1_001.fastq.gz,SID16127_S11_L001_R2_001.fastq.gz
[Wed 2022-07-20 20:24:42] [INFO ] Library type: unstranded
[Wed 2022-07-20 20:24:42] [INFO ] Output directory: /home/serbe204/Documents/ciri2, Output prefix: SID16127_S11_L001_R1_001.fastq.gz
[Wed 2022-07-20 20:24:42] [INFO ] Config: hg38 Loaded
[Wed 2022-07-20 20:24:42] [INFO ] 32 CPU cores availble, using 4
[Wed 2022-07-20 20:24:42] [INFO ] Align RNA-seq reads to reference genome ..
[Wed 2022-07-20 20:59:58] [INFO ] Estimate gene abundance ..
[Wed 2022-07-20 21:03:35] [INFO ] No circRNA information provided, run CIRI2 for junction site prediction ..
[Wed 2022-07-20 21:03:35] [INFO ] Running BWA-mem mapping candidate reads ..
[Wed 2022-07-20 22:27:34] [INFO ] Running CIRI2 for circRNA detection ..
[Wed 2022-07-20 22:55:27] [INFO ] Extract circular sequence
[Wed 2022-07-20 22:55:33] [100% ] [##################################################]
[Wed 2022-07-20 22:55:33] [INFO ] Building circular index ..
[Wed 2022-07-20 22:55:59] [INFO ] De novo alignment for circular RNAs ..
[Wed 2022-07-20 23:25:22] [INFO ] Detecting reads containing Back-splicing signals
[Wed 2022-07-20 23:36:15] [INFO ] Detecting FSJ reads from genome alignment file

Error.log below

[Wed 2022-07-20 20:24:42] [INFO ] Input reads: SID16127_S11_L001_R1_001.fastq.gz,SID16127_S11_L001_R2_001.fastq.gz
[Wed 2022-07-20 20:24:42] [INFO ] Library type: unstranded
[Wed 2022-07-20 20:24:42] [INFO ] Output directory: /home/serbe204/Documents/ciri2, Output prefix: SID16127_S11_L001_R1_001.fastq.gz
[Wed 2022-07-20 20:24:42] [INFO ] Config: hg38 Loaded
[Wed 2022-07-20 20:24:42] [INFO ] 32 CPU cores availble, using 4
[Wed 2022-07-20 20:24:42] [INFO ] Align RNA-seq reads to reference genome ..
Time loading forward index: 00:00:01
Time loading reference: 00:00:00
Multiseed full-index search: 00:31:20
112163215 reads; of these:
112163215 (100.00%) were paired; of these:
58311206 (51.99%) aligned concordantly 0 times
51156655 (45.61%) aligned concordantly exactly 1 time
2695354 (2.40%) aligned concordantly >1 times
----
58311206 pairs aligned concordantly 0 times; of these:
146839 (0.25%) aligned discordantly 1 time
----
58164367 pairs aligned 0 times concordantly or discordantly; of these:
116328734 mates make up the pairs; of these:
110775545 (95.23%) aligned 0 times
5267282 (4.53%) aligned exactly 1 time
285907 (0.25%) aligned >1 times
50.62% overall alignment rate
Time searching: 00:31:20
Overall time: 00:31:21
[bam_sort_core] merging from 104 files and 4 in-memory blocks...
[Wed 2022-07-20 20:59:58] [INFO ] Estimate gene abundance ..
[Wed 2022-07-20 21:03:35] [INFO ] No circRNA information provided, run CIRI2 for junction site prediction ..
[Wed 2022-07-20 21:03:35] [INFO ] Running BWA-mem mapping candidate reads ..
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 298616 sequences (40000253 bp)...
[M::process] read 298588 sequences (40000147 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (3, 60825, 27, 2)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...

...

[M::mem_process_seqs] Processed 23008 reads in 2.423 CPU sec, 0.569 real sec
[main] Version: 0.7.17-r1188
[main] CMD: /home/serbe204/anaconda3/envs/CIRI/bin/bwa mem -t 4 -T 19 /home/serbe204/Documents/GRCh38_Ensemble/Homo_sapiens.GRCh38.dna_rm.primary_assembly.fa /home/serbe204/Documents/RNASeq-iPS-ADAR2_KD/SID16127_S11_L001_R1_001.fastq.gz /home/serbe204/Documents/RNASeq-iPS-ADAR2_KD/SID16127_S11_L001_R2_001.fastq.gz
[main] Real time: 5038.921 sec; CPU: 20444.558 sec
[Wed 2022-07-20 22:27:34] [INFO ] Running CIRI2 for circRNA detection ..
[Wed Jul 20 22:27:34 2022] CIRI begins running
[Wed Jul 20 22:27:34 2022] Loading reference
[Wed Jul 20 22:27:45 2022] Requesting system to split SAM into 4 pieces
[Wed Jul 20 22:27:34 2022] CIRI begins running
[Wed Jul 20 22:27:34 2022] Loading reference
[Wed Jul 20 22:27:45 2022] Requesting system to split SAM into 4 pieces
Divided SAM sizes:
/home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz_unmapped.samaa 30448813406
/home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz_unmapped.samab 30448813406
/home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz_unmapped.samac 30448813406
/home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz_unmapped.samad 30448813403
SAM was divided successfully.
First read of divided SAM files:
SID16127_S11_L001_R1_001.fastq.gz_unmapped.samab: A00722:78:HLC7TDSX2:1:1377:6632:13041
SID16127_S11_L001_R1_001.fastq.gz_unmapped.samac: A00722:78:HLC7TDSX2:1:1675:31403:35117
SID16127_S11_L001_R1_001.fastq.gz_unmapped.samad: A00722:78:HLC7TDSX2:1:2377:32235:16579
SID16127_S11_L001_R1_001.fastq.gz_unmapped.samaa: A00722:78:HLC7TDSX2:1:1101:1181:1000
First reads were recorded successfully.
[Wed Jul 20 22:30:13 2022] First scanning
[Wed Jul 20 22:30:13 2022] First scanning
Worker 1 begins to scan SID16127_S11_L001_R1_001.fastq.gz_unmapped.samac.
Worker 2 begins to scan SID16127_S11_L001_R1_001.fastq.gz_unmapped.samab.
Worker 3 begins to scan SID16127_S11_L001_R1_001.fastq.gz_unmapped.samaa.
Worker 4 begins to scan SID16127_S11_L001_R1_001.fastq.gz_unmapped.samad.
Worker 1 finished reporting.
Worker 2 finished reporting.
Worker 3 finished reporting.
Worker 4 finished reporting.
Candidate reads with splicing signals: 12959
Candidate reads with PEM signals: 12103
Candidate circRNAs found: 3036
[Wed Jul 20 22:41:25 2022] Second scanning
Candidate reads with splicing signals: 12959
Candidate reads with PEM signals: 12103
Candidate circRNAs found: 3036
[Wed Jul 20 22:41:25 2022] Second scanning
Worker 5 begins to scan SID16127_S11_L001_R1_001.fastq.gz_unmapped.samac.
Worker 6 begins to scan SID16127_S11_L001_R1_001.fastq.gz_unmapped.samab.
Worker 7 begins to scan SID16127_S11_L001_R1_001.fastq.gz_unmapped.samaa.
Worker 8 begins to scan SID16127_S11_L001_R1_001.fastq.gz_unmapped.samad.
Worker 5 finished reporting.
Worker 6 finished reporting.
Worker 7 finished reporting.
Worker 8 finished reporting.
[Wed Jul 20 22:53:54 2022] Extracting info from temporary files
Additional candidate reads found: 2846
Additional candidate reads with PEM signals: 2165
[Wed Jul 20 22:55:20 2022] Summarizing
Number of circular RNAs found: 2821
[Wed Jul 20 22:53:54 2022] Extracting info from temporary files
Additional candidate reads found: 2846
Additional candidate reads with PEM signals: 2165
[Wed Jul 20 22:55:20 2022] Summarizing
Number of circular RNAs found: 2821
[Wed Jul 20 22:55:25 2022] CIRI finished its work. Please see output file /home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz.ciri for detail.
[Wed Jul 20 22:55:25 2022] CIRI finished its work. Please see output file /home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz.ciri for detail.

[Wed 2022-07-20 22:55:27] [INFO ] Extract circular sequence
[Wed 2022-07-20 22:55:33] [INFO ] Building circular index ..
Settings:
Output files: "/home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz_index..ht2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
/home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz_index.fa
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 1914635 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 1914635 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:01
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:00
Sanity-checking and returning
Building samples
Reserving space for 44 sample suffixes
Generating random suffixes
QSorting 44 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 44 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Split 6, merged 21; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Split 4, merged 2; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Split 2, merged 3; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 2; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 1; iterating...
Avg bucket size: 1.36152e+06 (target: 1914634)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 30
Reserving size (1914635) for bucket 1
Getting block 2 of 30
Getting block 3 of 30
Reserving size (1914635) for bucket 3
Reserving size (1914635) for bucket 2
Calculating Z arrays for bucket 1
Getting block 4 of 30
Calculating Z arrays for bucket 2
Entering block accumulator loop for bucket 1:
Reserving size (1914635) for bucket 4
Calculating Z arrays for bucket 3
Entering block accumulator loop for bucket 2:
Calculating Z arrays for bucket 4
Entering block accumulator loop for bucket 3:
Entering block accumulator loop for bucket 4:
bucket 1: 10%
bucket 2: 10%
bucket 4: 10%
bucket 3: 10%
bucket 1: 20%
bucket 2: 20%
bucket 4: 20%
bucket 3: 20%
bucket 1: 30%
bucket 2: 30%
bucket 4: 30%
bucket 3: 30%
bucket 1: 40%
bucket 2: 40%
bucket 4: 40%
bucket 1: 50%
bucket 3: 40%
bucket 2: 50%
bucket 4: 50%
bucket 1: 60%
bucket 2: 60%
bucket 3: 50%
bucket 4: 60%
bucket 1: 70%
bucket 2: 70%
bucket 4: 70%
bucket 3: 60%
bucket 1: 80%
bucket 2: 80%
bucket 4: 80%
bucket 1: 90%
bucket 3: 70%
bucket 2: 90%
bucket 4: 90%
bucket 1: 100%
Sorting block of length 1461040 for bucket 1
(Using difference cover)
bucket 3: 80%
bucket 2: 100%
Sorting block of length 830206 for bucket 2
(Using difference cover)
bucket 4: 100%
Sorting block of length 1198780 for bucket 4
(Using difference cover)
bucket 3: 90%
bucket 3: 100%
Sorting block of length 1508491 for bucket 3
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 830207 for bucket 2
Getting block 5 of 30
Reserving size (1914635) for bucket 5
Calculating Z arrays for bucket 5
Entering block accumulator loop for bucket 5:
bucket 5: 10%
bucket 5: 20%
bucket 5: 30%
bucket 5: 40%
bucket 5: 50%
bucket 5: 60%
bucket 5: 70%
bucket 5: 80%
bucket 5: 90%
Sorting block time: 00:00:01
Returning block of 1198781 for bucket 4
Getting block 6 of 30
Reserving size (1914635) for bucket 6
Calculating Z arrays for bucket 6
Entering block accumulator loop for bucket 6:
bucket 5: 100%
Sorting block of length 1730753 for bucket 5
(Using difference cover)
bucket 6: 10%
bucket 6: 20%
bucket 6: 30%
bucket 6: 40%
Sorting block time: 00:00:01
Returning block of 1461041 for bucket 1
Getting block 7 of 30
Reserving size (1914635) for bucket 7
Calculating Z arrays for bucket 7
Entering block accumulator loop for bucket 7:
bucket 6: 50%
bucket 7: 10%
bucket 6: 60%
bucket 7: 20%
bucket 6: 70%
Sorting block time: 00:00:01
Returning block of 1508492 for bucket 3
bucket 7: 30%
Getting block 8 of 30
Reserving size (1914635) for bucket 8
Calculating Z arrays for bucket 8
Entering block accumulator loop for bucket 8:
bucket 6: 80%
bucket 7: 40%
bucket 6: 90%
bucket 8: 10%
bucket 7: 50%
bucket 6: 100%

...

bucket 30: 100%
Sorting block of length 788849 for bucket 30
(Using difference cover)
bucket 29: 90%
bucket 29: 100%
Sorting block of length 1408830 for bucket 29
(Using difference cover)
Sorting block time: 00:00:01
Returning block of 1288403 for bucket 28
Sorting block time: 00:00:01
Returning block of 788850 for bucket 30
Sorting block time: 00:00:01
Returning block of 1408831 for bucket 29
Exited GFM loop
fchr[A]: 0
fchr[C]: 12577814
fchr[G]: 20416140
fchr[T]: 28255216
fchr[$]: 40845540
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 19213931 bytes to primary GFM file: /home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz_index.1.ht2
Wrote 10211392 bytes to secondary GFM file: /home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz_index.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 34288037 bytes to primary GFM file: /home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz_index.5.ht2
Wrote 10304266 bytes to secondary GFM file: /home/serbe204/Documents/ciri2/circ/SID16127_S11_L001_R1_001.fastq.gz_index.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HGFM constructor
Headers:
len: 40845540
gbwtLen: 40845541
nodes: 40845541
sz: 10211385
gbwtSz: 10211386
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 0
eftabSz: 0
ftabLen: 1048577
ftabSz: 4194308
offsLen: 2552847
offsSz: 10211388
lineSz: 64
sideSz: 64
sideGbwtSz: 48
sideGbwtLen: 192
numSides: 212738
numLines: 212738
gbwtTotLen: 13615232
gbwtTotSz: 13615232
reverse: 0
linearFM: Yes
Total time for call to driver() for forward index: 00:00:26
[Wed 2022-07-20 22:55:59] [INFO ] De novo alignment for circular RNAs ..
112163215 reads; of these:
112163215 (100.00%) were paired; of these:
78357984 (69.86%) aligned concordantly 0 times
7892485 (7.04%) aligned concordantly exactly 1 time
25912746 (23.10%) aligned concordantly >1 times
----
78357984 pairs aligned concordantly 0 times; of these:
2036 (0.00%) aligned discordantly 1 time
----
78355948 pairs aligned 0 times concordantly or discordantly; of these:
156711896 mates make up the pairs; of these:
153229929 (97.78%) aligned 0 times
346892 (0.22%) aligned exactly 1 time
3135075 (2.00%) aligned >1 times
31.69% overall alignment rate
[bam_sort_core] merging from 164 files and 4 in-memory blocks...
[Wed 2022-07-20 23:25:22] [INFO ] Detecting reads containing Back-splicing signals
[Wed 2022-07-20 23:36:15] [INFO ] Detecting FSJ reads from genome alignment file

Add support for strand specific RNA-seq data

index not built

Running CIRIquant for circRNA quantification I obtain the error shown at the end of the attached log file.

It seems that the index is not built for some reason.
CAN_FCX.log

Problem with installation of CIRIquant

Hello there,

I had problem with installation of CIRIquant.

It shows below errors:

root@DESKTOP-KB6VI11:/home/choi# pip install CIRIquant                                                     
Requirement already satisfied: CIRIquant in /root/miniconda3/lib/python3.9/site-packages/CIRIquant-1.1.2-py3.9.egg (1.1.2)                                                                                                          Collecting argparse==1.2.1                                                                                          
Using cached argparse-1.2.1.tar.gz (69 kB)                                                                        
Preparing metadata (setup.py) ... done                                                                          
Collecting PyYAML==5.1.1                                                                                            
Using cached PyYAML-5.1.1.tar.gz (274 kB)                                                                         
Preparing metadata (setup.py) ... done                                                                          
Collecting pysam==0.15.2                                                                                            
Using cached pysam-0.15.2.tar.gz (3.2 MB)                                                                         
Preparing metadata (setup.py) ... /
**error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [133 lines of output]**

How can I fix above error?

Thank you.

Read count cut-off for circRNA differential expression

Hi @Kevinzjy,

Thanks for creating this awesome tool. I have been able to implement the CIRIquant command successfully using circRNA predictions results from other tools. I am now interested in differential expression. Would you advise filtering the circRNAs first (i.e. between the quantification and differential expression steps) to get rid of those with low read counts as quantified using CIRIquant? If not, can you explain why as well? Thanks!

Prisca

--RNaseR: command not found

When trying to run Step 2 on total RNA to calculate RNase R efficiency, the mapping/circRNA detection completes, but it says that the command for RNase R is not found (--RNaseR: command not found ) in the error file. There is no extra header information pertaining to RNase R treatment in any outfile. I used the exact same format for the command as listed on the documentation, while inputting my .gtf from the RNase R treated sample.

Biologist Suggestion: Text Summary Tables

My biologist collaborator had a suggestion which I thought I would describe because it might be useful to be a utility in CIRIquant. Currently, the final results are available in GTF format, one file per sample, which requires programming skills to parse. Might a utility be added which takes as input a folder and imports all files with GTF suffix and outputs a table of BSJ counts and another table containing junction ratios? It would make it quick and easy for the biologist to open the TSV or CSV file in a spreadsheet and sort and filter results with software they are familiar with.

Encountered internal HISAT2 exception (#1)

Hi Kevin,

I had exactly the same problem as #15 , but I don't really understand what "reducing the number of threads" means. (I only used 4 cores.) Can you elaborate on that? Or it there another way to solve this problem?

Also it's weird. When I was running the test data as a batch file, the run was terminated due to this error. Then I immediately ran the test data in the command line, and I got the desired output in only 2-3 minutes. However, when I ran the experimental data today, running it as a batch file or in the command line both gave this same error. Do you know why does it happen?

A question about the results of differential expression analysis

Hello everyone,

I downloaded 5 TCGA-BRCA samples (by choosing specific criteria - I got five tumor samples and their five normal samples) and I run CIRIquant on them. I performed differential expression analysis using CIRI_DE_replicate and I got the final table. My question is: Do you have any idea why would I get good log2 fold change values but not statistically significant (all FDR values are greater than 0.05)?

I appreciate your advice and thanks in advance.

Rowname error with CIRI_DE_replicate command

Hi @Kevinzjy,

I am trying to execute the CIRI_DE_replicate command but I keep on getting the following error:

Error in [.data.frame(gene_mtx, , rownames(lib_mtx)) :
undefined columns selected
Calls: [ -> [.data.frame
Execution halted
[Sat 2021-03-06 12:08:08] [INFO ] Finished!

I think the error means that it cannot match the sample names listed in library_info.csv to the sample names in gene_count_matrix.csv. I have checked to verify that the sample naming in both files is the same, and the individual line responsible for subsetting in your CIRI_DE.R script (gene_mtx <- gene_mtx[,rownames(lib_mtx)]) works just as expected when run in isolation on the same data sets. Would you have any insight into why the line is throwing an error in the context of your script? I suspect something may be off with my environment set-up, but I really have no clue.

Thank you! Certainly let me know if you would need any more information to answer my question.

Prisca

Options for single-end data entry

Hi,
When I tried to analyze single-end data with CIRIquant, -h found no corresponding input options.
So point -1 and -2 to the same single-end FASTQ file is a solution ? or, CIRIquant only works with pair-end data ?

KeyError: 'chr1_89476615_89476806'

Dear author,
As the title, I have no idea about the reason. More information is

#####]
[Fri 2020-11-13 08:13:25] [INFO ] Building circular index ..
[Fri 2020-11-13 08:13:26] [INFO ] De novo alignment for circular RNAs ..
[Fri 2020-11-13 08:25:47] [INFO ] Detecting reads containing Back-splicing signa                                                                  ls
Traceback (most recent call last):
  File "/root/miniconda2/bin/CIRIquant", line 11, in <module>
    load_entry_point('CIRIquant==1.1.1', 'console_scripts', 'CIRIquant')()
  File "/root/miniconda2/lib/python2.7/site-packages/CIRIquant-1.1.1-py2.7.egg/C                                                                  IRIquant/main.py", line 183, in main
    out_file = circ.proc(log_file, thread, bed_file, hisat_bam, rnaser_file, rea                                                                  ds, outdir, prefix, anchor, lib_type)
  File "/root/miniconda2/lib/python2.7/site-packages/CIRIquant-1.1.1-py2.7.egg/C                                                                  IRIquant/circ.py", line 655, in proc
    cand_bsj = proc_denovo_bam(denovo_bam, thread, circ_info, anchor, lib_type)
  File "/root/miniconda2/lib/python2.7/site-packages/CIRIquant-1.1.1-py2.7.egg/C                                                                  IRIquant/circ.py", line 320, in proc_denovo_bam
    tmp_cand = job.get()
  File "/root/miniconda2/lib/python2.7/multiprocessing/pool.py", line 572, in ge                                                                  t
    raise self._value
KeyError: 'chr1_89476615_89476806'

My bed format is:

chr1    541981  542012  chr1_541981_542012      0       +
chr1    6880241 6885270 chr1_6880241_6885270    0       +
chr1    8716032 8716500 chr1_8716032_8716500    0       -

Wish your reply.
Thanks.

IOError: [Errno 2] No such file or directory: '/mnt/f/liu-project/lncRNA-mRNA-circRNA/5.circRNA-data/quant-data/circ/test.ciri'

Hi, I run CIRIquant and get this error, I don't know how to fix it, this is my code:

CIRIquant -t 10 \
	-l 2 \
          -1 ../1.clean-data/0a_R1.fq.gz \
          -2 ../1.clean-data/0a_R2.fq.gz \
          --config ./run.yml \
          -o ./quant-data \
          -p test

Thanks!

Different circRNA prediction tools for differential expression

Hi @Kevinzjy ,

In your paper, you report that CIRIquant works excellently for quantification of circRNAs using prediction results from a variety other tools. Do you think you can also comment on the compatibility of CIRIquant's differential expression workflow with different tools? For example, what is the overlap of differentially expressed circRNAs reported using CIRIquant when the prediction results from different tools are used for quantification? This is an analysis I plan to perform on my own dataset, and I would appreciate your insights beforehand!

Prisca

Large DS Scores for Unexpressed Genes

When I sort the results of CIRI_DE from highest to lowest by DS_score column in R, I see

> head(C10)
                   circRNA_ID Case_BSJ Case_FSJ Case_Ratio Ctrl_BSJ Ctrl_FSJ Ctrl_Ratio DE_score DS_score
609    chr4:70000863|70053127        0        0          0        2   446681          0        0 13.46153
604    chr4:70036267|70050495        0        0          0        4   717701          0        0 13.12248
14991 chr12:11267411|11308807        0        0          0        2   172556          0        0 12.08316
2091  chr12:11353223|11353684        0        0          0        1    72125          0        0 11.83855
4419  chr12:11267348|11267632        0        0          0        1    66157          0        0 11.65618
6398  chr12:11308221|11308882        0        0          0        1    59624          0        0 11.55784

These circular RNA are derived from genes which are not expressed in disease and only have 1 or 2 back-splice reads in the healthy condition. Could it be a bug? It doesn't look like real splicing change.

Similarly, for increasing order of splicing score

> head(C10)
                   circRNA_ID Case_BSJ Case_FSJ Case_Ratio Ctrl_BSJ Ctrl_FSJ Ctrl_Ratio DE_score   DS_score
4431  chr11:65500651|65503964        1    77803          0        0        0          0        0 -11.919432
3470  chr11:65500476|65500650        1    67608          0        0        0          0        0 -11.749692
10008 chr11:65500057|65500862        1    66250          0        0        0          0        0 -11.690436
2466  chr11:65500228|65506018        1    31021          0        0        0          0        0 -10.567198
8923  chr12:52520035|52520332        1    22792          0        0        0          0        0 -10.149253
5579   chr7:23254169|23270175        1    20406          0        0        0          0        0  -9.999413

If the counts in case is larger than control, shouldn't the score be positive? It is negative.

CIRI2 Missing FIle Error

I am running CIRIquant without CIRI2 results files, so it is doing the analysis within the pipeline. The first couple of steps seem to work fine, but CIRI2 analysis fails with an error about a missing file.

[Mon 2020-04-13 17:28:53] [INFO ] Input reads: CSCC_0002-M1_R1.fastq.gz,CSCC_0002-M1_R2.fastq.gz
[Mon 2020-04-13 17:28:53] [INFO ] Output directory: /dskh/nobackup/biostat/datasets/HeadNeckCancer/RNAseq/CSCC/circular, Output prefix: CSCC_0002-M1
[Mon 2020-04-13 17:28:53] [INFO ] Config: hg38 Loaded
[Mon 2020-04-13 17:28:53] [INFO ] 32 CPU cores availble, using 12
[Mon 2020-04-13 17:28:53] [INFO ] Align RNA-seq reads to reference genome ..
[Mon 2020-04-13 18:04:24] [INFO ] Estimate gene abundance ..
[Mon 2020-04-13 18:16:52] [INFO ] No circRNA information provided, run CIRI2 for junction site prediction ..
[Mon 2020-04-13 18:16:52] [INFO ] Running BWA-mem mapping candidate reads ..
[Mon 2020-04-13 19:48:59] [INFO ] Running CIRI2 for circRNA detection ..
Traceback (most recent call last):
  File "/dskh/nobackup/biostat/software/PythonLegacy/bin/CIRIquant", line 11, in <module>
    load_entry_point('CIRIquant==1.0.2', 'console_scripts', 'CIRIquant')()
  File "/dskh/nobackup/biostat/software/PythonLegacy/local/lib/python2.7/site-packages/CIRIquant-1.0.2-py2.7.egg/CIRIquant/main.py", line 156, in main
    circ_parser.convert(bed_file)
  File "/dskh/nobackup/biostat/software/PythonLegacy/local/lib/python2.7/site-packages/CIRIquant-1.0.2-py2.7.egg/CIRIquant/utils.py", line 155, in convert
    circ_data = getattr(self, '_' + self.tool.lower())()
  File "/dskh/nobackup/biostat/software/PythonLegacy/local/lib/python2.7/site-packages/CIRIquant-1.0.2-py2.7.egg/CIRIquant/utils.py", line 146, in _ciri2
    with open(self.circ, 'r') as f:
IOError: [Errno 2] No such file or directory: '/dskh/nobackup/biostat/datasets/HeadNeckCancer/RNAseq/CSCC/circular/circ/CSCC_0002-M1.ciri'[Mon 2020-04-13 17:28:53] [INFO ]

What is wrong and can the error be caught (try-catch) and an informative message printed in future?

Error in circ_type when using annotation from UCSC table browser

All circRNAs were labeled as "intergenic" when using GTF downloaded from UCSC table browser

multiple fastq files

I was wondering how CIRIquant deals with multiple input FASTQ files (e.g. multi-lanes)? I tried using "," (comma) as a separator, but no luck.

Generate empty gtf file

Hi
I ran the CIRIquant and everythings goes well but it generate empty gtf file (only has header as follow
##Sample: test
##Total_Reads: 8767472
##Mapped_Reads: 8257908
##Circular_Reads: 0
##version: 1.1.2).

My Code:
CIRIquant -1 trim-ERR2076987_1_val_1.fq.gz -2 trim-ERR2076987_2_val_2.fq.gz --config CIRIquant.yml -o ./test -p test --circ S1.ciri --tool CIRI2 -t 70

CIRIquant.yml
name: Genome
tools:
bwa: /usr/bin/bwa
hisat2: /usr/bin/hisat2
stringtie: /usr/bin/stringtie
samtools: /usr/bin/samtools

reference:
fasta: /home/mrb/MRB/Genome/Sheep/Ensembl/Index/Ovis_aries.Oar_v3.1.77.fa
gtf: /home/mrb/MRB/Genome/Sheep/Ensembl/GTF/Ovis_aries.Oar_v3.1.103.gtf
bwa_index: /home/mrb/MRB/Genome/Sheep/Ensembl/Index/Ovis_aries.Oar_v3.1.77.fa
hisat_index: /home/mrb/MRB/Genome/Sheep/Ensembl/Index/Ovis_aries.Oar_v3.1.77.fa

S1.ciri (change to txt file to drag here
S1.ciri.txt
)

Log file:
[Wed 2021-03-31 12:52:58] [INFO ] Input reads: trim-ERR2076987_1_val_1.fq.gz,trim-ERR2076987_2_val_2.fq.gz
[Wed 2021-03-31 12:52:58] [INFO ] Library type: unstranded
[Wed 2021-03-31 12:52:58] [INFO ] Output directory: /home/mrb/MRB/RSeq_All_v6/Data_Example/circRNA/test, Output prefix: test
[Wed 2021-03-31 12:52:58] [INFO ] Config: Genome Loaded
[Wed 2021-03-31 12:52:58] [INFO ] 80 CPU cores availble, using 70
[Wed 2021-03-31 12:52:58] [INFO ] Align RNA-seq reads to reference genome ..
Time loading forward index: 00:00:14
Time loading reference: 00:00:00
Multiseed full-index search: 00:06:00
4383736 reads; of these:
4383736 (100.00%) were paired; of these:
299106 (6.82%) aligned concordantly 0 times
3652446 (83.32%) aligned concordantly exactly 1 time
432184 (9.86%) aligned concordantly >1 times
----
299106 pairs aligned concordantly 0 times; of these:
23120 (7.73%) aligned discordantly 1 time
----
275986 pairs aligned 0 times concordantly or discordantly; of these:
551972 mates make up the pairs; of these:
400491 (72.56%) aligned 0 times
133063 (24.11%) aligned exactly 1 time
18418 (3.34%) aligned >1 times
95.43% overall alignment rate
Time searching: 00:06:01
Overall time: 00:06:15
[bam_sort_core] merging from 0 files and 70 in-memory blocks...
[Wed 2021-03-31 12:59:53] [INFO ] Estimate gene abundance ..
[Wed 2021-03-31 13:00:33] [INFO ] Using predicted circRNA results from CIRI2: /home/mrb/MRB/RSeq_All_v6/Data_Example/circRNA/circRNA_Results/3_CIRI2/Sample_1/CIRI_output/S1.ciri
[Wed 2021-03-31 13:00:33] [INFO ] Extract circular sequence
[Wed 2021-03-31 13:01:03] [INFO ] Building circular index ..
Settings:
Output files: "/home/mrb/MRB/RSeq_All_v6/Data_Example/circRNA/test/circ/test_index..ht2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
/home/mrb/MRB/RSeq_All_v6/Data_Example/circRNA/test/circ/test_index.fa
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 568893 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 568893 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:01:05
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:00
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 379261 (target: 568892)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
Reserving size (568893) for bucket 1
Getting block 2 of 8
Reserving size (568893) for bucket 2
Getting block 3 of 8
Reserving size (568893) for bucket 3
Calculating Z arrays for bucket 1
Getting block 4 of 8
Reserving size (568893) for bucket 4
Calculating Z arrays for bucket 2
Getting block 5 of 8
Entering block accumulator loop for bucket 1:
Calculating Z arrays for bucket 4
Reserving size (568893) for bucket 5
Getting block 6 of 8
Reserving size (568893) for bucket 6
Entering block accumulator loop for bucket 2:
Getting block 7 of 8
Entering block accumulator loop for bucket 4:
Reserving size (568893) for bucket 7
Calculating Z arrays for bucket 5
Calculating Z arrays for bucket 3
Calculating Z arrays for bucket 6
Getting block 8 of 8
Reserving size (568893) for bucket 8
Calculating Z arrays for bucket 7
Entering block accumulator loop for bucket 5:
Entering block accumulator loop for bucket 3:
Entering block accumulator loop for bucket 6:
Calculating Z arrays for bucket 8
Entering block accumulator loop for bucket 7:
Entering block accumulator loop for bucket 8:
bucket 8: 10%
bucket 1: 10%
bucket 2: 10%
bucket 4: 10%
bucket 3: 10%
bucket 7: 10%
bucket 6: 10%
bucket 5: 10%
bucket 8: 20%
bucket 1: 20%
bucket 2: 20%
bucket 4: 20%
bucket 3: 20%
bucket 7: 20%
bucket 8: 30%
bucket 1: 30%
bucket 6: 20%
bucket 5: 20%
bucket 2: 30%
bucket 8: 40%
bucket 1: 40%
bucket 4: 30%
bucket 3: 30%
bucket 7: 30%
bucket 6: 30%
bucket 2: 40%
bucket 8: 50%
bucket 5: 30%
bucket 1: 50%
bucket 4: 40%
bucket 3: 40%
bucket 7: 40%
bucket 8: 60%
bucket 2: 50%
bucket 1: 60%
bucket 6: 40%
bucket 5: 40%
bucket 8: 70%
bucket 4: 50%
bucket 3: 50%
bucket 1: 70%
bucket 7: 50%
bucket 2: 60%
bucket 8: 80%
bucket 6: 50%
bucket 1: 80%
bucket 4: 60%
bucket 5: 50%
bucket 3: 60%
bucket 2: 70%
bucket 7: 60%
bucket 8: 90%
bucket 1: 90%
bucket 4: 70%
bucket 8: 100%
Sorting block of length 565092 for bucket 8
(Using difference cover)
bucket 2: 80%
bucket 6: 60%
bucket 3: 70%
bucket 5: 60%
bucket 7: 70%
bucket 1: 100%
Sorting block of length 558897 for bucket 1
(Using difference cover)
bucket 2: 90%
bucket 4: 80%
bucket 3: 80%
bucket 6: 70%
bucket 7: 80%
bucket 5: 70%
bucket 2: 100%
Sorting block of length 250942 for bucket 2
(Using difference cover)
bucket 4: 90%
bucket 3: 90%
bucket 7: 90%
bucket 6: 80%
bucket 5: 80%
bucket 3: 100%
Sorting block of length 341167 for bucket 3
(Using difference cover)
bucket 4: 100%
Sorting block of length 320619 for bucket 4
(Using difference cover)
bucket 7: 100%
Sorting block of length 266301 for bucket 7
(Using difference cover)
bucket 6: 90%
bucket 5: 90%
bucket 6: 100%
Sorting block of length 458788 for bucket 6
(Using difference cover)
bucket 5: 100%
Sorting block of length 272283 for bucket 5
(Using difference cover)
Sorting block time: 00:00:01
Returning block of 250943 for bucket 2
Sorting block time: 00:00:01
Returning block of 266302 for bucket 7
Sorting block time: 00:00:01
Returning block of 272284 for bucket 5
Sorting block time: 00:00:02
Returning block of 320620 for bucket 4
Sorting block time: 00:00:02
Returning block of 341168 for bucket 3
Sorting block time: 00:00:02
Returning block of 458789 for bucket 6
Sorting block time: 00:00:03
Returning block of 558898 for bucket 1
Sorting block time: 00:00:03
Returning block of 565093 for bucket 8
Exited GFM loop
fchr[A]: 0
fchr[C]: 846358
fchr[G]: 1500536
fchr[T]: 2169082
fchr[$]: 3034096
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 5210376 bytes to primary GFM file: /home/mrb/MRB/RSeq_All_v6/Data_Example/circRNA/test/circ/test_index.1.ht2
Wrote 758532 bytes to secondary GFM file: /home/mrb/MRB/RSeq_All_v6/Data_Example/circRNA/test/circ/test_index.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1829845 bytes to primary GFM file: /home/mrb/MRB/RSeq_All_v6/Data_Example/circRNA/test/circ/test_index.5.ht2
Wrote 765828 bytes to secondary GFM file: /home/mrb/MRB/RSeq_All_v6/Data_Example/circRNA/test/circ/test_index.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
len: 3034096
gbwtLen: 3034097
nodes: 3034097
sz: 758524
gbwtSz: 758525
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 0
eftabSz: 0
ftabLen: 1048577
ftabSz: 4194308
offsLen: 189632
offsSz: 758528
lineSz: 64
sideSz: 64
sideGbwtSz: 48
sideGbwtLen: 192
numSides: 15803
numLines: 15803
gbwtTotLen: 1011392
gbwtTotSz: 1011392
reverse: 0
linearFM: Yes
Total time for call to driver() for forward index: 00:01:10
[Wed 2021-03-31 13:02:13] [INFO ] De novo alignment for circular RNAs ..
4383736 reads; of these:
4383736 (100.00%) were paired; of these:
4361810 (99.50%) aligned concordantly 0 times
4958 (0.11%) aligned concordantly exactly 1 time
16968 (0.39%) aligned concordantly >1 times
----
4361810 pairs aligned concordantly 0 times; of these:
63 (0.00%) aligned discordantly 1 time
----
4361747 pairs aligned 0 times concordantly or discordantly; of these:
8723494 mates make up the pairs; of these:
8692404 (99.64%) aligned 0 times
6241 (0.07%) aligned exactly 1 time
24849 (0.28%) aligned >1 times
0.86% overall alignment rate
[bam_sort_core] merging from 0 files and 70 in-memory blocks...
[Wed 2021-03-31 13:08:03] [INFO ] Detecting reads containing Back-splicing signals
[Wed 2021-03-31 13:08:04] [INFO ] Detecting FSJ reads from genome alignment file
[Wed 2021-03-31 13:08:57] [INFO ] Merge bsj and fsj results
[Wed 2021-03-31 13:08:57] [INFO ] Loading annotation gtf ..
[Wed 2021-03-31 13:09:03] [INFO ] Output circRNA expression values
[Wed 2021-03-31 13:09:04] [INFO ] circRNA Expression profile: test.gtf
[Wed 2021-03-31 13:09:04] [INFO ] Finished!

Deferentially expressed circRNAs with biological replicate

Hello, I used CIRIquant to calculate the fold change under a certain situation.

I used the sample as text files:

Ciriquant output file:
24_PAU_Case_1 ./AK10_S10.gtf T 1
24_PAU_Case_2 ./AK11_S11.gtf T 2
24_PAU_Case_3 ./AK12_S12.gtf T 3
0_PAU_Control_1 ./AK4_S4.gtf C 1
0_PAU_Control_2 ./AK5_S5.gtf C 2
0_PAU_Control_3 ./AK6_S6.gtf C 3

The below command is use to calculate DE circRNAs
CIRI_DE_replicate --lib 0_24h_PAU_lib.csv --bsj 0_24h_PAU_circRNA_bsj.csv --gene 0_24_PAU_gene_count_matrix.csv --out 0_24_PAU_DE_Cir.tsv

It Generate Output file of the DE CircRNA

Circ_ID logFC logCPM LR PValue DE FDR
4:31022314|31029529 -7.64922634263453 -0.161626549205444 15.0799339995342 0.000103052931416515 0 0.0867705682527054
12:11277575|11320618 6.69328375381015 -1.50848617923073 13.7678102531273 0.000206850296297323 0 0.0870839747411729
12:11262912|11265838 7.65742002735099 -1.98591121764622 12.4043376775065 0.00042833770549839 0 0.120220116009881
3:19812063|19865665 -5.97369371848626 -1.97194836747005 9.66092614644816 0.00188228702661538 0 0.396221419102538
6:3094946|3106151 -6.41385190127333 -3.14131472782527 9.1566548978384 0.00247815160571721 0 0.417320730402778
10:18675348|18689873 7.72459306313907 -1.49011568946911 8.45368794039285 0.00364303965091808 0 0.42321854074561
5:7746966|7747829 5.77609936483112 -3.49785553816328 7.09539540202189 0.00772822404188243 0 0.42321854074561
5:19412253|19472056 -6.72256499136888 -2.65499597431979 7.0899583310074 0.00775170495334329 0 0.42321854074561
9:13492161|13598089 -6.30654725571083 -2.98243003713085 6.46588024949127 0.0109965158338529 0 0.42321854074561

Question:
I'm curious whether the positive logFC value indicates an up-regulated circRNA in the case or control condition.

For example 6.69328375381015 is up regulated circRNA is case or in control condition?

Can CIRIquant be used with single-end data?

The arguments of the CIRIquant seem to anticipate solely pair-end data.

Can CIRI_DE_replicate be used for Differential expression analysis of DS_score

Study without biological replicate, CIRI_DE generates DS_score.

In CIRI_DE_replicate output, only DE was generated. How can we get DS_score using CIRI_DE_replicate?

Thanks

.log show successful completion, but there is no file

Hi!

I run this command


        CIRIquant -t 4 \
            -1 {input.read1} \
            -2 {input.read2} \
            --config {input.yaml} \
            --library-type {params.library_type} \
            -o {params.outdir} \
            -p {params.name} \
            -t 12 \
            --RNaseR {input.rnase_treated_gtf}

the log showed:

[Sun 2022-06-05 20:56:22] [INFO ] De novo alignment for circular RNAs ..
67704673 reads; of these:
  67704673 (100.00%) were paired; of these:
    47403061 (70.01%) aligned concordantly 0 times
    1982294 (2.93%) aligned concordantly exactly 1 time
    18319318 (27.06%) aligned concordantly >1 times
    ----
    47403061 pairs aligned concordantly 0 times; of these:
      4257 (0.01%) aligned discordantly 1 time
    ----
    47398804 pairs aligned 0 times concordantly or discordantly; of these:
      94797608 mates make up the pairs; of these:
        87639670 (92.45%) aligned 0 times
        1025015 (1.08%) aligned exactly 1 time
        6132923 (6.47%) aligned >1 times
35.28% overall alignment rate
[bam_sort_core] merging from 96 files and 12 in-memory blocks...
[Sun 2022-06-05 22:08:21] [INFO ] Detecting reads containing Back-splicing signals
[Sun 2022-06-05 22:17:12] [INFO ] Detecting FSJ reads from genome alignment file
[Sun 2022-06-05 22:31:51] [INFO ] Merge bsj and fsj results
[Sun 2022-06-05 22:31:58] [INFO ] RNase R treatment coefficient correction
[Sun 2022-06-05 22:31:58] [INFO ] Fitting Model
[Sun 2022-06-05 22:32:27] [INFO ] Generate prior distribution ..
[Sun 2022-06-05 22:32:33] [INFO ] Loading annotation gtf ..
[Sun 2022-06-05 22:32:46] [INFO ] Output circRNA expression values
[Sun 2022-06-05 22:34:35] [WARNING] chrom of contig "GL000009.2" not in annotation gtf, please check
[Sun 2022-06-05 22:34:35] [WARNING] chrom of contig "GL000219.1" not in annotation gtf, please check
[Sun 2022-06-05 22:34:35] [WARNING] chrom of contig "KI270742.1" not in annotation gtf, please check
[Sun 2022-06-05 22:34:35] [WARNING] chrom of contig "KI270744.1" not in annotation gtf, please check
[Sun 2022-06-05 22:34:44] [INFO ] circRNA Expression profile: APO-50-A.APO-50-R.gtf
[Sun 2022-06-05 22:34:44] [INFO ] Finished!

but the output file does not exist APO-50-A.APO-50-R.gtf in the filesystem. the .bam file exists though

Limitation on Number of Samples Used/DE Filtering Issue?

Hello,

I have a question regarding whether or not there is a limit to the number of samples used with the CIRI_DE_replicate command in the differential expression portion of the CIRIquant pipeline. I'm currently trying to analyze a data set of about 10500 samples (including biological replicates) and have noticed some strange outputs in my circRNA_de.tsv output file. There were only six DE circRNA, which I initially assumed to be because of the larger sample size. However, when scrolling through the .tsv, I saw that over half (646,400 or 61%) of the circRNAs listed had a logFC = 0, PValue = 1, and FDR = 1. I have run multiple other data sets with CIRIquant and have never had this issue before.

Is it possible that either there is a limitation on the number of samples that can be parsed, or that there is an issue with the filtering step in the DE pathway that let these non-significant circRNA through?

Thank you for your time & any suggestions!

Error while running CIRIquant

Hi Iam trying to run CIRIquant using the following command

CIRIquant -t 4 \ -1 read1.fastq \ -2 read2.fastq \ --config config_file.yml \ -o ~/result \ -p test

here is my configuration file looks like

`tools:
bwa: /home/nyellapu/.conda/envs/CIRI/bin/bwa
hisat2: /home/nyellapu/.conda/envs/CIRI/bin/hisat2
stringtie: /home/nyellapu/.conda/envs/CIRI/bin/stringtie
samtools: /home/nyellapu/.conda/envs/CIRI/bin/samtools

reference:
fasta: /panfs/pfs.local/work/biostat/nyellapu/reference/mouse_mm39/GCF_000001635.27_GRCm39_genomic.fa
gtf: /panfs/pfs.local/work/biostat/nyellapu/reference/mouse_mm39/mouse_mm39.gtf
bwa_index: /panfs/pfs.local/work/biostat/nyellapu/reference/mouse_mm39/bwa_index/bwa
hisat_index: /panfs/pfs.local/work/biostat/nyellapu/reference/mouse_mm39/hisat_index/grcm38/genome`

I am getting the following errors while running

Traceback (most recent call last):
File "/home/nyellapu/.conda/envs/CIRI/bin/CIRIquant", line 8, in
sys.exit(main())
File "/home/nyellapu/.conda/envs/CIRI/lib/python2.7/site-packages/CIRIquant/main.py", line 89, in main
config = check_config(check_file(args.config_file))
File "/home/nyellapu/.conda/envs/CIRI/lib/python2.7/site-packages/CIRIquant/utils.py", line 109, in check_config
return config['name']
KeyError: 'name'

Please help me to resolve this. Trying hard to come out. I can't understand why this error is happening

EdgeR in CIRI_DE_replicate issue

I am performing differential gene expression analysis using the CIRIquant pipeline. Referring to the CIRIquant cookbook, I successfully arrived into this step:

"Usage 3: Differential expression analysis - Study with biological replicates - Step3: Differential expression analysis"

However, there is an EdgeR error occurring (see below). Can someone explain to me what I may have done wrong?

Command used:

CIRI_DE_replicate --lib library_info.csv --bsj circRNA_bsj.csv --gene gene_count_matrix.csv --out circRNA_de.tsv
Log:

[Wed 2021-03-17 22:11:31] [INFO ] Library information: /home/andregabriel/Desktop/CIRI_test/library_info.csv
[Wed 2021-03-17 22:11:31] [INFO ] circRNA expression matrix: /home/andregabriel/Desktop/CIRI_test/circRNA_bsj.csv
[Wed 2021-03-17 22:11:31] [INFO ] gene expression matrix: /home/andregabriel/Desktop/CIRI_test/gene_count_matrix.csv
[Wed 2021-03-17 22:11:31] [INFO ] Output DE results: /home/andregabriel/Desktop/CIRI_test/circRNA_de.tsv

Warning message:
In estimateDisp.default(y = y$counts, design = design, group = group,  :
  No residual df: setting dispersion to NA
Error in glmFit.default(y = y$counts, design = design, dispersion = dispersion,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 treatT
Calls: glmFit -> glmFit.DGEList -> glmFit -> glmFit.default
Execution halted
[Wed 2021-03-17 22:11:35] [INFO ] Finished!

Issue running prepDE.py

Hi,
I was trying to to execute the script prepDE.py and I get the following error:

Traceback (most recent call last): File "utils/prepDE.py", line 266, in <module> transcript_len+=int(v[4])-int(v[3])+1 #because end coordinates are inclusive in GTF ValueError: invalid literal for int() with base 10: '14374r\x16("&/\xcf\xbe\xa9\xb16A\xf9\x04\x14\x1f\x89~\x13\xa5\x83\xff\xf9\xa6l\xa0\xeb\x97\xdb\x7f\x16\xe2|\x80]\xedr?\x0b\xd5\x7f\xb0\xac\x93\x0b5O\x13Q\xceR\x11\x0e\x89\xba}\xbf\x83aC\x8edW\x15\xe4\xfa}\xf3]'

Do you have any idea why this error is occurring and what I may do to address it?

Cannot serialize a string larger than 2 GB

I encounter this error. I hope you can help to solve it. Many thanks.

The last line of the above figure shows the reads in the input file.

full-length sequence of circRNA

Dear Professor,

Hello, thank you for your team's development of CIRIquant, an excellent tool. I have a question that needs your help. I want to know how to obtain the mature full-length sequence of circRNA. Looking forward to your detailed reply!

How to measure whether circRNAs are significantly different

Thank you very much for the very useful tool you have developed.
I used the CIRI-DE module to perform differential circRNA analysis without biological replicates. The output file has a total of more than 4800 lines, but the control and treatment I input only have a total of more than 5000 circRNAs. I wonder if these are all significantly different circRNAs? Should I filter further? What should be the biological meaning and threshold of DE-score and DS-score?

About BSJs and FSJs

This is more like a concept question, but I would like to know if the BSJs and FSJs are counted per read or per million reads?
How does the algorithm count the number of BSJs?
Thanks a lot.

ValueError: file header is empty (mode='rb') - is it SAM/BAM format?

Hello there, I had issue with CIRIquant function.

CIRIquant -t 16 -1 control_1_1.fastq -2 control_1_2.fastq --config CIRI_quant.yaml -o ./test -p test

[Sun 2023-02-12 22:06:55] [INFO ] Input reads: control_1_1.fastq,control_1_2.fastq [Sun 2023-02-12 22:06:55] [INFO ] Library type: unstranded [Sun 2023-02-12 22:06:55] [INFO ] Output directory: /home/choi/test, Output prefix: test [Sun 2023-02-12 22:06:55] [INFO ] Config: mm10 Loaded [Sun 2023-02-12 22:06:55] [INFO ] 16 CPU cores availble, using 16 [Sun 2023-02-12 22:06:55] [INFO ] Align RNA-seq reads to reference genome .. [Sun 2023-02-12 22:33:55] [INFO ] Estimate gene abundance .. [Sun 2023-02-12 22:36:05] [INFO ] No circRNA information provided, run CIRI2 for junction site prediction .. [Sun 2023-02-12 22:36:05] [INFO ] Running BWA-mem mapping candidate reads .. [Sun 2023-02-12 22:51:15] [INFO ] Running CIRI2 for circRNA detection .. [Sun 2023-02-12 23:00:34] [INFO ] Extract circular sequence [Sun 2023-02-12 23:00:34] [100% ] [##################################################] [Sun 2023-02-12 23:00:34] [INFO ] Building circular index .. [Sun 2023-02-12 23:00:36] [INFO ] De novo alignment for circular RNAs .. [Sun 2023-02-12 23:00:41] [INFO ] Detecting reads containing Back-splicing signals Traceback (most recent call last): File "/root/miniconda3/envs/CIRI/bin/CIRIquant", line 10, in <module> sys.exit(main()) File "/root/miniconda3/envs/CIRI/lib/python2.7/site-packages/CIRIquant/main.py", line 183, in main out_file = circ.proc(log_file, thread, bed_file, hisat_bam, rnaser_file, reads, outdir, prefix, anchor, lib_type) File "/root/miniconda3/envs/CIRI/lib/python2.7/site-packages/CIRIquant/circ.py", line 655, in proc cand_bsj = proc_denovo_bam(denovo_bam, thread, circ_info, anchor, lib_type) File "/root/miniconda3/envs/CIRI/lib/python2.7/site-packages/CIRIquant/circ.py", line 306, in proc_denovo_bam sam = pysam.AlignmentFile(bam_file, 'rb') File "pysam/calignmentfile.pyx", line 318, in pysam.calignmentfile.AlignmentFile.__cinit__ (pysam/calignmentfile.c:4730) File "pysam/calignmentfile.pyx", line 574, in pysam.calignmentfile.AlignmentFile._open (pysam/calignmentfile.c:7746) ValueError: file header is empty (mode='rb') - is it SAM/BAM format?

How can I fix above issue?

Thank you.

bioinfo-biols / ciriquant Goto Github PK

ciriquant's People

Contributors

Stargazers

Watchers

Forkers

ciriquant's Issues

Hi Iam trying to run CIRIquant using the following command

here is my configuration file looks like

I am getting the following errors while running

Recommend Projects

Recommend Topics

Recommend Org