kingsford-group / squid Goto Github PK

View Code? Open in Web Editor NEW

40.0 15.0 22.0 17.91 MB

SQUID detects both fusion-gene and non-fusion-gene structural variations from RNA-seq data

License: BSD 3-Clause "New" or "Revised" License

C++ 95.90% Makefile 0.14% Python 3.65% Shell 0.31%

rna-seq structural-variation fusion-genes non-fusion-genes

squid's Introduction

{:height="50%" width="50%"}

OVERVIEW

SQUID is designed to detect both fusion-gene and non-fusion-gene transcriptomic structural variations from RNA-seq alignment.

SQUID paper is published at Genome Biology. To reproduce the result of applying SQUID on simulation data and previously studied cell lines, follow the instructions from squidtest

INSTALLING PRE-COMPILED BINARIES

You do NOT need to install SQUID before using it, find the binary release here!

BUILDING FROM SOURCE

You only need to build from source if either the pre-built binaries (see above) don't work on your system or you want to make a change to the SQUID code.

Compiling SQUID requires Boost, GLPK, BamTools. A step by step installation construction can be found here for linux, and here for mac.

On Mac, you need to additionly run the following command to dynamicly linking dependent libraries:

export DYLD_LIBRARY_PATH=<bamtools_folder>/lib
export DYLD_LIBRARY_PATH=<glpk_folder>/lib

USAGE

SQUID takes in a sorted BAM file of RNA-seq alignment and outputs the detection of TSVs. When the concordant and chimeric alignments are separated into two BAM files in the case of STAR alignment, the concordant BAM file must be sorted. The command to run SQUID and the parameters are as follows.

squid [options] -b <Input_sorted_BAM> -o <Output_Prefix>

Parameters	Default value	Data type	Description
-c		string
-f		string
-pt	0	bool	Phred type: 0 for Phred33, 1 for Phred64
-pl	10	int	Maximum Length of continuous low Phred score to filter alignment
-pm	4	int	Threshold to count as low Phred score
-mq	1	int	Minimum mapping quality
-dp	50000	int	Maximum paired-end aligning distance to be count as concordant alignment
-di	20	int	Maximum distance of segment indexes to be count as read-through
-w	5	int	Minimum edge weight
-r	8	double	Discordant edge ratio multiplier (normal/tumor cell ratio)
-a	5	int	Max allowed degree
-G	0	bool	Whether or not output graph file (0 for not outputing, 1 for outputing)
-CO	0	bool	Whether or not output ordering of connected components (0 for not outputing, 1 for outputing)
-TO	0	bool	Whether or not output ordering of all segments (0 for not outputing, 1 for outputing)
-RG	0	bool	Whether or not output rearranged genome sequence (0 for not outputing, 1 for outputing)

OUTPUT SPECIFICATION

<Output_Prefix>_sv.txt: a list of predicted TSV in bedpe format. This is the main output of SQUID. All positions in the file are 0-based. Each columns represents:
- chr1: chromosome name of the first breakpoint.
- start1: starting position of the segment of the first breakpoint, or the predicted breakpoint position if strand1 is "-".
- end1: ending position of the segment of the first breakpoint, or the predicted breakpoint position if strand1 is "+".
- chr2: chromosome name of the second breakpoint.
- start2: starting position of the segment of the second breakpoint, or the predicted breakpoint position if strand2 is "-".
- end2: ending position of the segment of the second breakpoint, or the predicted breakpoint position if strand2 is "+".
- name: TSV is not named yet, this column shows with dot.
- score: number of reads supporting this TSV (without weighted by Discordant edge ratio multiplier).
- strand1: strand of the first segment in TSV.
- strand2: strand of the second segment in TSV.
- num_concordantfrag_bp1: number of concordant paired-end reads covering the first breakpoint. For a concordant paired-end read, it includes two ends and a inserted region in between, if any of the 3 regions covers the breakpoint, the read is counted in this number.
- num_concordantfrag_bp2: number of concordant paired-end reads covering the second breakpoint. The count is defined in the same way as num_concordantfrag_bp1. example record:
17 38135881 38136308 17 38137195 38137773 . 685 + + 106 221

This means the right end (position 38136308) of segment 38135881-38136308 on chr17 is connected to the right end (position 38137773) of segment 38137195-38137773 also on chr17. The number of supporting reads for this TSV is 685. There are 106 concordant paired-end reads covering the first breakpoint (chr17 38136308), and 221 concordant paired-end reads covering the second breakpoint (chr17 38137773).

5 176370330 176370489 8 128043988 128044089 . 328 - + 588 1029

This means the left end (position 176370330) of segment 176370330-176370489 on chr5 is connected with the right end (position 128044089) of segment 128043988-128044089 on chr8. There are 588 concordant reads covering the first breakpoint (chr5 176370330), and 1029 concordant reads covering the second breakpoint (chr8 128044089).
<Output_Prefix>_graph.txt: genome segment graph, will be output only if -G is set to 1. It has two types of records, nodes (or segment) and edges.
- node: for each node, the following information are included. ID, start position, end position, label for connected component.
- edge: for each edge, the following information are included. ID, node id for the first segment, strand of the first segment, node id for the second segment, strand for the second segment, edge weight.
<Output_Prefix>_component_pri.txt: ordering of each connected component by ILP, will be output only if -CO is set to 1. In this file, ordering of each connected component will be output into a line. Each segment is represented by its id, with a "-" in the front if the ordering suggests the segment should be using its reverse strand.
<Output_Prefix>_component.txt: ordering of the entire genome segments, will be output only is -TO is set to 1. In this file, each newly generated chromosome will be output into one line. Other spefications are the same as <Output_Prefix>_component_pri.txt.
<Output_Prefix>_genome.fa: sequence of rearranged genome sequence, where each chromosome corresponds to one line in <Output_Prefix>_component.txt.

EXAMPLE WORKFLOW

Suppose you have the alignment BAM file, and chimeric BAM file generated by STAR (https://github.com/alexdobin/STAR), run SQUID with:

squid -b alignment.bam -c chimeric.bam -o squidout

Or a combined BAM file of both concordant and discordant alignments generated by BWA (http://bio-bwa.sourceforge.net/) or SpeedSeq (https://github.com/hall-lab/speedseq), run SQUID with

squid --bwa -b combined_alignment.bam -o squidout

An example can be run be downloading the sample data (sampledata.tgz) from (https://cmu.box.com/s/e9u6alp73rfdhfve2a51p6v391vweodq) into example folder, and decompress it with

tar -xzvf sampledata.tgz

Run SQUID command in example/SQUIDcommand.sh. Or if you want to test the workflow of STAR and SQUID, make sure STAR is in your path, and run bash script example/STARnSQUIDcommand.sh.

cd example
./SQUIDcommand.sh
./STARnSQUIDcommand.sh

Annotate SQUID output

To label the predicted TSVs as fusion-gene or non-fusion-gene type, and retrieve the corresponding gene names of fusion-gene TSVs, you can use the following python script.

Python dependencies:

numpy

Usage:

python <squid_folder>/utils/AnnotateSQUIDOutput.py [options] <GTFfile> <SquidPrediction> <OutputFile>

Note that the GTF file must have the same chromosome name as in SQUID output, and must contain 3 attributes in the transcript record: transcript ID, gene ID, and gene symbol (or gene name).

Options	Default value	Data type	Description
--geneid	gene_id	string	GTF gene ID attribute string, the attribute name in GTF record that corresponds to the gene ID
--genesymbol	gene_name	string	GTF gene symbol attribute string, the attribute name in GTF record that corresponds to the gene symbol

squid's People

Contributors

Stargazers

Watchers

squid's Issues

Annotation of output

I have a question regarding annotation. I've been trying to convert the BEDPE format to bed file and then use AnnotSV tool to annotate the breakpoints. My issue is that every time I run the annotation I am not able to properly predict the positioning of the genes in the fusion.
Do you have any tips on how to annotate the output? My goal is pretty simple: build a list of fusion genes. Thank you 😄

Question: Does SQUID accept STAR `WithinBAM` output?

Dear all,

I've been trying to get my input files for SQUID, but I am not sure about the input. As far as I can see, SQUID accepts two types of input:

STAR: separate alignment and chimeric BAM files
BWA: combined BAM file of both concordant and discordant alignments

However, STAR also allow for a combined output (--chimOutType WithinBAM, either HardClip or SoftClip). Is this format compatible with SQUID BWA input (--bwa option)? And should it be HardClip or SoftClip?

thanks,

squid running errors

Hello, when I analyze my BWA mapped BAM files, I got the following error:

squid: src/SegmentGraph.cpp:2489: void SegmentGraph_t::CompressNode(): Assertion `LinkedNode.s
ize()!=0' failed.
/var/spool/torque/mom_priv/jobs/10249993.mesabim3.msi.umn.edu.SC: line 9: 6359 Aborted
(core dumped) squid -b lncap.sv_build.bam -o lncap_squid --bwa

The stand output is below:
Reference name 1 --> 0
Reference name 10 --> 9
Reference name 11 --> 10
Reference name 12 --> 11
Reference name 13 --> 12
Reference name 14 --> 13
Reference name 15 --> 14
Reference name 16 --> 15
Reference name 17 --> 16
Reference name 18 --> 17
Reference name 19 --> 18
Reference name 2 --> 1
Reference name 20 --> 19
Reference name 21 --> 20
Reference name 22 --> 21
Reference name 3 --> 2
Reference name 4 --> 3
Reference name 5 --> 4
Reference name 6 --> 5
Reference name 7 --> 6
Reference name 8 --> 7
Reference name 9 --> 8
Reference name MT --> 24
Reference name X --> 22
Reference name Y --> 23
[Fri Jan 4 10:03:53 2019] Starting reading bam file.
[Fri Jan 4 11:18:49 2019] Building nodes, finish seeding.
[Fri Jan 4 11:18:51 2019] Building nodes, finish expanding to whole genome.
[Fri Jan 4 11:18:57 2019] Finish calculating reads per node.
[Fri Jan 4 11:18:57 2019] Starting building edges.
[Fri Jan 4 11:33:39 2019] Finish raw edges.
[Fri Jan 4 11:33:39 2019] Finish filtering edges from multi-aligned reads.
[Fri Jan 4 11:33:39 2019] Finish adding partial aligned reads.
[Fri Jan 4 11:33:39 2019] Finish building edges.
Error: 0 nodes are connected by edges.

What is the reason to get such error? memory issue?
I run squid with 12 cores and 30gb total memory. My BAM file is 7.5gb. My command is:
squid -b lncap.sv_build.bam -o lncap_squid --bwa

Aborted core dumped error

Hello,

I ran Squid v1.5 using 12 cpus and 96 GB RAM with the following command:

squid -b AUR1.Aligned.sortedByCoord.out.bam -c AUR1_chimeric_sorted.bam -o AUR1.squid.fusions

Here is the output and it was aborted because of a core dump

Reference name 1	-->	0
Reference name 10	-->	1
Reference name 11	-->	2
Reference name 12	-->	3
Reference name 13	-->	4
Reference name 14	-->	5
Reference name 15	-->	6
Reference name 16	-->	7
Reference name 17	-->	8
Reference name 18	-->	9
Reference name 19	-->	10
Reference name 2	-->	11
Reference name 20	-->	12
Reference name 21	-->	13
Reference name 22	-->	14
Reference name 3	-->	15
Reference name 4	-->	16
Reference name 5	-->	17
Reference name 6	-->	18
Reference name 7	-->	19
Reference name 8	-->	20
Reference name 9	-->	21
Reference name MT	-->	22
Reference name X	-->	23
Reference name Y	-->	24
[Tue Dec 27 10:30:53 2022] Start reading bam file.
[Tue Dec 27 10:31:03 2022] Finish sorting Chimeric bam reads.
[Tue Dec 27 10:31:05 2022] Finish removing PCR duplicates.
[Tue Dec 27 10:31:07 2022] Building nodes. |bamdiscordant|=3076123
[Tue Dec 27 10:42:51 2022] Building nodes, finish seeding.
[Tue Dec 27 10:42:51 2022] Building nodes, finish expanding to whole genome.
[Tue Dec 27 10:42:51 2022] Building nodes, calculating read coverage for node 0.
[Tue Dec 27 10:42:52 2022] Finish calculating reads per node.
0	time=7e-06
1000000	time=17.1684
[Tue Dec 27 10:43:20 2022] Starting building edges.
[Tue Dec 27 10:51:54 2022] Finish raw edges.
[Tue Dec 27 10:51:54 2022] Finish filtering edges from multi-aligned reads.
[Tue Dec 27 10:51:55 2022] Finish building edges.
Maximum connected component size=2469
6525	10137
glp_add_rows: nrs = 1; too many rows
Error detected in file api/prob1.c at line 259
run.sh: line 2: 79383 Aborted                 (core dumped) squid -b AUR1.Aligned.sortedByCoord.out.bam -c AUR1_chimeric_sorted.bam -o AUR1.squid.fusions

Following Issue #4 (comment), I get no output from the following command so that means there are no reads with only the first read or second read aligned.
/usr/bin/diff <(samtools view -f64 -F4 AUR1_chimeric_sorted.bam | cut -f1 | sort | uniq) <(samtools view -f128 -F4 AUR1_chimeric_sorted.bam | cut -f1 | sort | uniq)

Here is the header from the Chimeric bam file so you can see the commands used to generate it:

@PG     ID:STAR PN:STAR VN:2.7.10a      CL:STAR   --runThreadN 12   --genomeDir star   --readFilesIn AUR1_S7_L004_R1_001.fastq.gz   AUR1_S7_L004_R2_001.fastq.gz      --readFilesCommand zcat      --outFileNamePrefix AUR1.   --outReadsUnmapped Fastx   --outSAMtype BAM   SortedByCoordinate      --outSAMstrandField intronMotif   --outSAMattrRGline ID:AUR1   SM:AUR1      --alignSJDBoverhangMin 10   --chimSegmentMin 20   --chimJunctionOverhangMin 12   --chimOutType SeparateSAMold      --sjdbGTFfile Homo_sapiens.GRCh38.102.gtf   --twopassMode Basic
@PG     ID:samtools     PN:samtools     PP:STAR VN:1.15.1       CL:samtools view --threads 5 -Sb -o AUR1_chimeric.bam AUR1.Chimeric.out.sam
@PG     ID:samtools.1   PN:samtools     PP:samtools     VN:1.15.1       CL:samtools sort -@ 6 -o AUR1_chimeric_sorted.bam -T AUR1_chimeric_sorted AUR1_chimeric.bam
@PG     ID:samtools.2   PN:samtools     PP:samtools.1   VN:1.14 CL:samtools view -H AUR1_chimeric_sorted.bam
@RG     ID:AUR1 SM:AUR1
@CO     user command line: STAR --genomeDir star --readFilesIn AUR1_S7_L004_R1_001.fastq.gz AUR1_S7_L004_R2_001.fastq.gz --runThreadN 12 --outFileNamePrefix AUR1. --sjdbGTFfile Homo_sapiens.GRCh38.102.gtf --outSAMattrRGline ID:AUR1 SM:AUR1 --twopassMode Basic --chimOutType SeparateSAMold --chimSegmentMin 20 --chimJunctionOverhangMin 12 --alignSJDBoverhangMin 10 --outReadsUnmapped Fastx --outSAMstrandField intronMotif --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat

I tried the same command on a different system and got a similar error except it was reported as a Segmentation fault.

What could be causing the problem? Thank you

Can't find the binary version of SQUID.

Hi,

I did't find the binary version of SQUID under the release tag. Could you please upload one?

Thanks!

void segmentGraph aborted

I attempt to identify the variants in our samples using squid. I aligned the reads using STAR 2-pass including chimeric bam.

Squid run was aborted after build nodes, finish seeding with the following errors:

[Tue Oct 23 14:52:31 2018] Start reading bam file.
[Tue Oct 23 15:31:14 2018] Finish sorting Chimeric bam reads.
[Tue Oct 23 15:45:26 2018] Finish removing PCR duplicates.
[Tue Oct 23 15:49:56 2018] Building nodes. |bamdiscordant|=331264
[Tue Oct 23 15:50:05 2018] Building nodes, finish seeding.

squid: src/SegmentGraph.cpp:662: void SegmentGraph_t::BuildNode_STAR(const std::vector&, SBamrecord_t&, std::string): Assertion `vNodes[i].Length>0 && vNodes[i].Position+vNodes[i].Length<=RefLength[vNodes[i].Chr]' failed.
/var/spool/pbs/mom_priv/jobs/7784253.wlm01.SC: line 36: 19875 Aborted squid -b PT_10_S3_L001Aligned.toTranscriptome.out.bam -c PT_10_S3_L002Aligned.sortedByCoord.out.bam -o PT_10_S3_L001_squid -G 1 -CO 1

Could you please suggest me a fix

Question: removing PCR duplicates

Dear all,

should PCR duplicates be removed from the BAM file before running SQUID?
I see a message saying that SQUID is removing PCR duplicates but it is quite fast, what makes me think that it is only removing reads that are already marked as duplicates. Is it so?

If not, should I remove duplicates from the BAM and chimeric BAM files?

thanks,

erro in Annotate

python AnnotateSQUIDOutput.py --genesymbol \ /public/workspace/lily/INDEX-hg19/anno/ucsc_hg19_gene.gtf \ /public/workspace/lily/squidout_sv.txt \ /public/workspace/lily/zhao_res/squid/CGGA1003.squid.finalout

this is my code to Annotate my result,However, there was an erro:

[3, 5]
Missing GTFfile or SquidPrediction or OutputFile

so i don't know waht's wrong ,i had prepared the three files .

any suggestions will be appreciated!

SQUID annotation error

I was facing a problem repeatedly while trying to annotate the output bedpe file:

here is my gft file:

chr12 refGene exon 98746968 98747158 . + . gene_id "Zc3h14"; transcript_id "NM_001160107"; exon_number "1"; exon_id "NM_001160107.1"; gene_name "Zc3h14";
chr12 refGene exon 98747480 98747522 . + . gene_id "Zc3h14"; transcript_id "NM_001160107"; exon_number "2"; exon_id "NM_001160107.2"; gene_name "Zc3h14";

my command:
$ python AnnotateSQUIDOutput.py mm10.refGene.exon.gtf B.vcf_sv.txt B.anno.vcf

error:

Traceback (most recent call last):
File "AnnotateSQUIDOutput.py", line 335, in
glocater = GeneLocater(Transcripts, GeneTransMap)
File "AnnotateSQUIDOutput.py", line 179, in init
assert(len(set(chrnames)) == 1)
AssertionError

Is there something, that I can do?
Thanks.

Long-read data

Hi,

Stupid question from rookie; Is this working for long read data, like Oxford Nanopore?

Thank you!

Peny

Squid on bacterial genomes

Hi,

Is it possible to use SQUID to identify TSVs on RNA-seq data from bacteria?
I have been trying without success so far.
I tried to run SQUID on the BAM output from bwa mem but it raises the following message: "Assertion failed: (vNodes[i].Length>0 && vNodes[i].Position+vNodes[i].Length<=RefLength[vNodes[i].Chr]), function BuildNode_BWA, file src/SegmentGraph.cpp, line 1075. Abort trap: 6"

I also tried to run SQUID using both -b alignment.bam and -c chimeric.bam (output from Samblaster) but this time it raises a Segmentation fault:11 error (previously reported *).

Thanks for any help you could provide.

Best,

Thibault

docker image

Looks very interesting would it be possible for you guys to setup a docker image? It would make installation so much easier, thanks

cheers, Fabian

squid errors

[Mon Oct 7 13:51:39 2019] Start reading bam file.
[Mon Oct 7 13:51:58 2019] Finish sorting Chimeric bam reads.
[Mon Oct 7 13:52:00 2019] Finish removing PCR duplicates.
[Mon Oct 7 13:52:01 2019] Building nodes. |bamdiscordant|=502321
[Mon Oct 7 14:17:55 2019] Building nodes, finish seeding.
squid: src/SegmentGraph.cpp:664: void SegmentGraph_t::BuildNode_STAR(const std::vector&, SBamrecord_t&, std::string): Assertion `(vNodes[i].Chr!=vNodes[i+1].Chr) || (vNodes[i].Chr==vNodes[i+1].Chr && vNodes[i].Position+vNodes[i].Length<=vNodes[i+1].Position)' failed.
Aborted (core dumped)

Bioconda package?

Hi,
did you have a look at making a Bioconda package for this? Would make installation easy and you will get automatically a Docker container for the tool resolving for example this issue here: #11

tons of people would love to have that to make direct installation fairly easy - giving you more visibility.

https://bioconda.github.io/

https://gitter.im/bioconda/Lobby

Can I use squid tool for scRNA-seq data?

Hello. I have a scRNA-seq seurat object gene expression matrix (rows: genes, columns: cells). May I ask if I can use squid tool for scRNA-seq data? If yes, what would be the input data and do you have an example/vignette? Thank you!

Question: Does Squid work on ISO-seq?

We have ISO-seq alignments using gmap2, Will squid run on these data?

Error: munmap_chunk(): invalid pointer: xxxxxx

I build squid from source and tried it on one of my sample, but encountered an error and it aborted:

$ ./squid-1.0/bin/squid -b Aligned.sortedByCoord.out.bam -c Chimeric.bam -G 1 -CO 1 -o squid
Reference name NR_046235        -->     25
Reference name chr1     -->     0
Reference name chr10    -->     1
Reference name chr11    -->     2
Reference name chr12    -->     3
Reference name chr13    -->     4
Reference name chr14    -->     5
Reference name chr15    -->     6
Reference name chr16    -->     7
Reference name chr17    -->     8
Reference name chr18    -->     9
Reference name chr19    -->     10
Reference name chr2     -->     11
Reference name chr20    -->     12
Reference name chr21    -->     13
Reference name chr22    -->     14
Reference name chr3     -->     15
Reference name chr4     -->     16
Reference name chr5     -->     17
Reference name chr6     -->     18
Reference name chr7     -->     19
Reference name chr8     -->     20
Reference name chr9     -->     21
Reference name chrM     -->     22
Reference name chrX     -->     23
Reference name chrY     -->     24
[Mon Aug  7 16:55:19 2017] Start reading bam file.
[Mon Aug  7 17:01:02 2017] Finish sorting Chimeric bam reads.
[Mon Aug  7 17:01:21 2017] Finish removing PCR duplicates.
[Mon Aug  7 17:01:45 2017] Building nodes. |bamdiscordant|=5085640
[Mon Aug  7 17:30:19 2017] Building nodes, finish seeding.
[Mon Aug  7 17:30:20 2017] Building nodes, finish expanding to whole genome.
[Mon Aug  7 17:30:20 2017] Building nodes, calculating read coverage for node 0.
[Mon Aug  7 17:30:23 2017] Finish calculating reads per node.
0       time=9e-06
1000000 time=366.606
2000000 time=622.674
[Mon Aug  7 17:46:56 2017] Starting building edges.
[Mon Aug  7 18:56:12 2017] Finish raw edges.
[Mon Aug  7 18:56:12 2017] Finish filtering edges from multi-aligned reads.
[Mon Aug  7 18:56:24 2017] Finish building edges.
Maximum connected component size=197
114270  18761
ILP isn't successful
0       used time 0.004122
locate all insertion place. 7.83081
finish insertion. 0.010903
0       used time 0.021042
1000    used time 3.84438
2000    used time 8.35914
3000    used time 13.1669
4000    used time 18.401
5000    used time 23.5152
locate all insertion place. 23.8758
finish insertion. 0.031942
*** Error in `./squid-1.0/bin/squid': munmap_chunk(): invalid pointer: 0x000000008bac3030 ***
Aborted

Segmentation fault (core dumped)

Hi,

My SQUID run returned this error. Any ideas?

squid -b SRR3138207Aligned.sortedByCoord.out.bam -c SRR3138207Chimeric.out.bam -o squidout

Return:

[Thu Apr 19 16:13:22 2018] Start reading bam file.
[Thu Apr 19 16:14:03 2018] Finish sorting Chimeric bam reads.
[Thu Apr 19 16:14:05 2018] Finish removing PCR duplicates.
[Thu Apr 19 16:14:08 2018] Building nodes. |bamdiscordant|=1519417
[Thu Apr 19 16:32:12 2018] Building nodes, finish seeding.
[Thu Apr 19 16:32:12 2018] Building nodes, finish expanding to whole genome.
[Thu Apr 19 16:32:12 2018] Building nodes, calculating read coverage for node 0.
[Thu Apr 19 16:32:23 2018] Finish calculating reads per node.
0       time=0.001708
[Thu Apr 19 16:36:21 2018] Starting building edges.
Segmentation fault (core dumped)

Many thanks.

Diverging results between STAR and BWA aligned inputs

Hi,

I am running squid with two types of inputs: 1) Samples aligned with STAR 2) Samples aligned with BWA

When using STAR output, I get less than 100 SV calls per sample. When I run with BWA output, I get several thousand SVs per sample. What is the source of this difference? Can you please give some information about this?

Thread value - squid

Hello!
I was wondering if there is an option for the number of threads that can be used by Squid

Parameter recommendations for benign tumor types

The software is fantastic and has been very easy to use! I am opening an issue to ask what some parameter recommendations would be for clonal benign tumor types? Specifically, I'm looking at the -r option for normal/tumor cell ratio. I appreciate your time and any insight.

SQUID Error - Segmentation fault (core dumped)

Hello!
I get this error while running Squid. The chimeric sam file generated through STAR run is empty and it fails when run as a bam file with Squid. Please help me to sort this error out. Is it because the chimeric reads are not detected for the datasets I used?

/home/software/STAR-2.7.9a/bin/Linux_x86_64/STAR --runThreadN 55 --genomeDir /home/genome_files/CTAT_Library/ctat_genome_lib_build_dir/ref_genome.fa.star.idx --readFilesIn /home/results/read_1_trim.fastq /home/results/read_2_trim.fastq --outFileNamePrefix /home/results/star_run. --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx --chimSegmentMin 20 --outSAMstrandField intronMotif --chimOutType SeparateSAMold
        STAR version: 2.7.9a   compiled: 2021-05-04T09:43:56-0400 vega:/homedobin/data/STAR/STARcode/STAR.master/source
Sep 28 18:13:51 ..... started STAR run
Sep 28 18:13:51 ..... loading genome
Sep 28 18:13:53 ..... started mapping
Sep 28 18:14:21 ..... finished mapping
Sep 28 18:14:21 ..... started sorting BAM
Sep 28 18:14:22 ..... finished successfully
Reference name 1        -->     0
Reference name 2        -->     1
Reference name 3        -->     2
Reference name 4        -->     3
Reference name 5        -->     4
Reference name Mt       -->     5
Reference name Pt       -->     6
[Tue Sep 28 18:14:23 2021] Start reading bam file.
[Tue Sep 28 18:14:23 2021] Finish sorting Chimeric bam reads.
[Tue Sep 28 18:14:23 2021] Finish removing PCR duplicates.
[Tue Sep 28 18:14:23 2021] Building nodes. |bamdiscordant|=0
[Tue Sep 28 18:14:23 2021] Building nodes, finish seeding.
../scripts/no_bloom_fusion_tools.sh: line 103: 17949 Segmentation fault      (core dumped) $SQUID -b $run_dir/star_run.Aligned.sortedByCoord.out.bam -c $run_dir/star_run.Chimeric.out.bam -o $run_dir/squid_out

Linker error when compiling SQUID

Here is the error message when I run make on the Makefile of squid

src/ReadRec.o: In function BuildChimericSBamRecord(std::vector<ReadRec_t, std::allocator<ReadRec_t> >&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)':
ReadRec.cpp:(.text+0x34f9): undefined reference to BamTools::BamAlignment::BamAlignment(BamTools::BamAlignment const&)' src/SegmentGraph.o: In function SegmentGraph_t::BuildNode_STAR(std::vector<int, std::allocator > const&, std::vector<ReadRec_t, std::allocator<ReadRec_t> >&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)':
SegmentGraph.cpp:(.text+0x2a1f): undefined reference to BamTools::BamAlignment::BamAlignment(BamTools::BamAlignment const&)' src/SegmentGraph.o: In function SegmentGraph_t::BuildNode_BWA(std::vector<int, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)':
SegmentGraph.cpp:(.text+0x9355): undefined reference to BamTools::BamAlignment::BamAlignment(BamTools::BamAlignment const&)' src/SegmentGraph.o: In function SegmentGraph_t::RawEdgesOther(std::vector<ReadRec_t, std::allocator<ReadRec_t> >&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)':
SegmentGraph.cpp:(.text+0x1225e): undefined reference to BamTools::BamAlignment::BamAlignment(BamTools::BamAlignment const&)' src/SegmentGraph.o: In function SegmentGraph_t::RawEdges(std::vector<ReadRec_t, std::allocator<ReadRec_t> >&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)':
SegmentGraph.cpp:(.text+0x13b99): undefined reference to BamTools::BamAlignment::BamAlignment(BamTools::BamAlignment const&)' collect2: error: ld returned 1 exit status make: *** [bin/squid] Error 1

Segmentation fault (core dumped)

Hi,

I am trying to use Squid for calling SVs in my datasets. I have 87 samples and all are processed in the same way. I ran Squid as follows:
$ squid-v1.5_linux_x86_64/squid --bwa -b Sample_X.readgroup.sorted.bam -o Sample_X
Squid ran perfectly on 85 of them. But for the two remaining samples, I face segmentation fault as follows:

Reference name chr1     -->     0
Reference name chr10    -->     9
Reference name chr11    -->     10
Reference name chr12    -->     11
Reference name chr13    -->     12
Reference name chr14    -->     13
Reference name chr15    -->     14
Reference name chr16    -->     15
Reference name chr17    -->     16
Reference name chr18    -->     17
Reference name chr19    -->     18
Reference name chr2     -->     1
Reference name chr20    -->     19
Reference name chr21    -->     20
Reference name chr22    -->     21
Reference name chr3     -->     2
Reference name chr4     -->     3
Reference name chr5     -->     4
Reference name chr6     -->     5
Reference name chr7     -->     6
Reference name chr8     -->     7
Reference name chr9     -->     8
Reference name chrM     -->     22
Reference name chrX     -->     23
Reference name chrY     -->     24
[Thu Feb 23 10:57:43 2023] Starting reading bam file.
[Thu Feb 23 11:06:06 2023] Building nodes, finish seeding.
[Thu Feb 23 11:06:06 2023] Building nodes, finish expanding to whole genome.
[Thu Feb 23 11:06:08 2023] Finish calculating reads per node.
[Thu Feb 23 11:06:08 2023] Starting building edges.
[Thu Feb 23 12:01:03 2023] Finish raw edges.
[Thu Feb 23 12:01:05 2023] Finish filtering edges from multi-aligned reads.
[Thu Feb 23 14:11:18 2023] Finish adding partial aligned reads.
[Thu Feb 23 14:11:32 2023] Finish building edges.
Maximum connected component size=2275
64636   30528
line 13: 142068 Segmentation fault      (core dumped) squid-v1.5_linux_x86_64/squid --bwa -b Sample_X.readgroup.sorted.bam -o Sample_X

I am using 200GB of memory. What may be the problem? Could you please help?

Support for STAR alignment feeding using WithinBAM output

--chimOutType SeparateSAMold will be deprecated from STAR.
Would it be possible to migrate to a single bam file feed as in bwa to make squid future-proof?

Segmentation fault: 11

Trying to run squid with star-fusion output but it terminates with segmentation fault: 11

Any suggestions on how to fix this issue?

Long reads RNA

Hi,

Can squid call SV for long reads RNA-seq (cDNA) data from Oxford Nanopore or Pacbio now?
Thank you very much!

Best,
ping

Example output file?

Hi,

I would like to know if the output data that SQUID produces can be used with https://github.com/stianlagstad/chimeraviz. Do you have any example output files that you can share?

Thank you!

Increase sensitivity of SQUID

Hi,
I'm wonder if there is a parameter where you can increase the sensitivity of SQUID by lowering the number of reads supporting the fusion (the SCORE field in the bedpe file)? I wonder if there is a minimum cutoff set somewhere where you can change? I ask this because I clearly saw some chimeric alignment in my chimeric bam file but I don't see SQUID reporting the fusion.

Thanks,
Readman Chiu

Error while running SQUID

I have been having this issue while running SQUID to obtain transcriptomic structural variants from RNASEQ data. This is the error that I have been receiving.

[Thu Aug 12 00:29:10 2021] Start reading bam file.
[Thu Aug 12 00:29:34 2021] Finish sorting Chimeric bam reads.
[Thu Aug 12 00:29:35 2021] Finish removing PCR duplicates.
[Thu Aug 12 00:29:37 2021] Building nodes. |bamdiscordant|=805622
[Thu Aug 12 00:30:07 2021] Building nodes, finish seeding.
squid: src/SegmentGraph.cpp:662: void SegmentGraph_t::BuildNode_STAR(const std::vector<int>&, SBamrecord_t&, std::string): Assertion `vNodes[i].Length>0 && vNodes[i].Position+vNodes[i].Length<=RefLength[vNodes[i].Chr]' failed.
Aborted (core dumped)

I ran the code with the following command

./squid -b mybam_sorted.bam -c chimeric.bam -o out1

I tried to contact the professor in-charge of this tool with the exact same query, who then forwarded my email to one of the developers of this tool. This was their response email that I received.
Extremely disheartened by the response and unacceptable from the developers who made this tool to be used by the community.