Giter Club home page Giter Club logo

shatterseek's Introduction

ShatterSeek

ShatterSeek is an R package that provides utilities to detect chromothripsis events from next-generation sequencing (NGS) data. It takes as input copy number (CN) and structural variation (SV) calls calculated with the user preferred method. Hence, it is compatible with virtually any CN and SV caller.

The main function of the package, shatterseek, first uses intrachromosomal SVs to detect clusters of interleaved rearrangements. Next, it evaluates a set of statistical criteria in each of these regions. The output consists of a data frame reporting the value for the statistical criteria used and additional information for each chromosome. ShatterSeek also provides functionalities for the visualization of SV and CN profiles, thus enabling the visual inspection of the candidate chromothripsis regions detected. ShatterSeek is fast, with an average running time of ~20s per sample.

Further details about the implementation and validation of ShatterSeek using ~2,600 whole-genome sequencing data sets from The Pan-Cancer Analysis of Whole Genomes (PCAWG) project can be found in the following publication:

Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing

Cortes-Ciriano et al. Nature Genetics, 2020

All the chromothripsis calls reported in that publication can be visualized at: http://compbio.med.harvard.edu/chromothripsis/

A detailed tutorial illustrating how to use ShatterSeek is provided in ./inst/tutorial/tutorial.pdf

ShatterSeek is free for academic use only. For non-academic use, please email Dr. Sonalee Barthakur at Harvard University Office of Technology Development ([email protected])

Prerequisites

ShatterSeek is written entirely in R (>= 3.0.1) and depends on the following packages: methods, BiocGenerics, graph, S4Vectors, GenomicRanges, IRanges, MASS, ggplot2, grid, and gridExtra

Installation

$ git clone https://github.com/parklab/ShatterSeek.git

$ R CMD INSTALL ShatterSeek

Alternatively you can install ShatterSeek directly from R using devtools:

library(devtools)
install_github("parklab/ShatterSeek")

How to use

Please see the package tutorial (especially section "How to use ShatterSeek").

A few considerations that have caused some confusion in active users:

  1. Please note that the parameter min.Size of the main function of the package (i.e., shatterseek) is set by default to 1. This parameter defines the minimum number of SVs required to identify a cluster of SVs in each chromosome. The statistical criteria are used to identify which clusters of SVs conform with the features of chromothripsis. For example, we require a minimum number of 6 SVs for a cluster to be considered as chromothripsis. We recommend to use min.Size=1 (default) because in some cases a chromothripsis event in a given chromosome (say chrA) might be linked with a cluster of a few SVs in another chromosomome (say chrB). If the value of min.Size is set to a value higher than 1 (say 6), the connection between these two chromosomes (which might be biologically important) would be missed, as the cluster of SVs in chrB would not be detected.

  2. Please note that ShatterSeek expects adjacent copy number segments to have a different copy number value. If two adjacent regions have the same copy number value but are considered as two separate entries in your copy number data.frame, please merge them. You can use the following code:

## d is a data.frame with colums: chr, start, end, total_cn
dd <- d
dd$total_cn[dd$total_cn == 0] <- 150000
dd$total_cn[is.na(dd$total_cn)] <- 0
library(GenomicRanges)
dd <- as(dd,"GRanges")
cov <- coverage(dd,weight = dd$total_cn)
dd1 <- as(cov,"GRanges")
dd1 <- as.data.frame(dd1)
dd1 <- dd1[dd1$score !=0,]
dd1 = dd1[,c(1,2,3,6)]
names(dd1) <- names(d)[1:4]
dd1$total_cn[dd1$total_cn == 150000] <- 0
d= dd1; rm(dd)

Example

Once installed you can run ShatterSeek with example data.

library(ShatterSeek)
data(DO17373)

Prepare structural variation (SV) data:

SV_data <- SVs(chrom1=as.character(SV_DO17373$chrom1), 
			   pos1=as.numeric(SV_DO17373$start1),
			   chrom2=as.character(SV_DO17373$chrom2), 
			   pos2=as.numeric(SV_DO17373$end2),
			   SVtype=as.character(SV_DO17373$svclass), 
			   strand1=as.character(SV_DO17373$strand1),
			   strand2=as.character(SV_DO17373$strand2))

Prepare copy number (CN) data:

CN_data <- CNVsegs(chrom=as.character(SCNA_DO17373$chromosome),
				   start=SCNA_DO17373$start,
				   end=SCNA_DO17373$end,
				   total_cn=SCNA_DO17373$total_cn)

Run ShatterSeek:

chromothripsis <- shatterseek(
                SV.sample=SV_data,
                seg.sample=CN_data,
                genome="hg19")

Plot chromothripsis per chromosome:

library(gridExtra)
library(cowplot)

plots_chr21 <- plot_chromothripsis(ShatterSeek_output = chromothripsis, 
              chr = "chr21", sample_name="DO17373", genome="hg19")
              
plot_chr21 = arrangeGrob(plots_chr21[[1]],
                         plots_chr21[[2]],
                         plots_chr21[[3]],
                         plots_chr21[[4]],
                         nrow=4,ncol=1,heights=c(0.2,.4,.4,.4))

plots_chrX <- plot_chromothripsis(ShatterSeek_output = chromothripsis, 
              chr = "chrX", sample_name="DO17373", genome="hg19")
plot_chrX = arrangeGrob(plots_chrX[[1]],
                        plots_chrX[[2]],
                        plots_chrX[[3]],
                        plots_chrX[[4]],
                         nrow=4,ncol=1,heights=c(0.2,.4,.4,.4))
                         
plot_grid(plot_chr21, plot_chrX)

Contact

If you have any questions or suggestions please contact us:

Isidro Cortes Ciriano: isidrolauscher at gmail.com or [email protected]

Peter J Park: peter_park at hms.harvard.edu

shatterseek's People

Contributors

asntech avatar isidroc avatar kerstinth avatar ppark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

shatterseek's Issues

Definition of copy number oscillation

Thank you for this great work.

I understand the basic idea written in the supplementary note.
If the copy number status in the SV cluster is 1-2-1-2-1-2, the number of CN segments is 6 in total and the oscillating copy number between 2 CN states is also 6. Is it right?

In the example case(DO17373), in chromosome 2, there are only two CN states (1 and 2).
However, in shatterseek result (attached this file), "max_number_oscillating_CN_segments_3_states" is the same number as "max_number_oscillating_CN_segments_2_states".
What does this mean? How to define "max_number_oscillating_CN_segments_3_states" ?

Related to this, in your original paper, canonical chromothripsis is defined as the event which has more than 60% of the segments oscillate between two CN states. So Is it OK to calculate "max_number_oscillating_CN_segments_2_states" / "number of CN segments"?(For example, in the upper case, this value is supposed to be 1.)


Lastly, I found there might be some mistakes in plot_function.R.
I could not change the color for SV types when drawing the plot using the plot_chromothripsis. This is because, in the function, the color is assigned directly again such as lines 101 to 104 apart from the arguments.
Could you please fix the code?

Thank you for your generous supports!

Best,
Yuta

github.txt

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'nrow': wrong sign in 'by' argument

platform x86_64-apple-darwin17.0
arch x86_64
os darwin17.0
system x86_64, darwin17.0
status
major 4
minor 1.2
year 2021
month 11
day 01
svn rev 81115
language R
version.string R version 4.1.2 (2021-11-01)
nickname Bird Hippie

library(ShatterSeek)
PC_data <- read.table("PC_SV.tsv", header = TRUE)
PC_CN <- read.table("CN_PC.tsv", header = TRUE)

PC_SV_data <- SVs(chrom1 = as.character(PC_data$chrom1),
pos1 = as.numeric(PC_data$start1),
chrom2 = as.character(PC_data$chrom2),
pos2 = as.numeric(PC_data$end2),
SVtype = as.character(PC_data$svclass),
strand1 = as.character(PC_data$strand1),
strand2 = as.character(PC_data$strand2))

PC_CN_data <- CNVsegs(chrom = as.character(PC_CN$chromosome),
start = PC_CN$start,
end = PC_CN$end,
total_cn = PC_CN$total_cn)

start_time <- Sys.time()
chromothripsis <- shatterseek(SV.sample = PC_SV_data, seg.sample = PC_CN_data)
end_time <- Sys.time()
print(paste0("Running time (s): ",round(end_time - start_time,digits=2)))

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'nrow': wrong sign in 'by' argument

ShatterSeek on other references

Dear Shatterseek team,
I would like to ask if Shatterseek can be used on other references. In principles, all statistical tests seem portable. However, I see that for some criteria (e.g. the chromosomal enrichment of breakpoints) you also correct for genome-specific features like mappability. And of course, the only chromosomes supported are 1-22,X.

Can one try to tweak it to make it work on other reference genomes other than humans, for example tweaking chromosome names?
Thanks!

plot_chromothripsis function is wrong

Hello! I found that the chr parameter in the help document example of the plot_chromothripsis function seems to be wrong. I hope you can check it. Have a good day.

issue about layout with plots_chr[[4]]

hi, ShatterSeek team
thank you for this wonderful tool to analyze chromothripsis. I have check your recent publication on nature genetics, it seems that the figure layout in your paper is quite different from the default layout in the R package. Especially the info in plots_chr[[4]], the figure from the ng paper have much more information to determine whether the event is high or low confidence. By contrast, the default layout only have very limited info. Could you please show me how to generate the same layout as shown in your figure?
Thanks a lot.

The panel in the paper is like
image

The default panel in R is like
image

A question about strands

Hi!
I have a questions about stands .In ShatterSeek,deletion(3 to 5)is +- , duplication(5 to 3)is -+ ,but a h2hINV(++)is 3'o3' or 5'to5' ?and how about TRA(++)?

run multiple samples

Hi -

It looks like this needs to be run on an individual sample basis. In other words I cant feed it with one file with information for multiple samples?

THanks,
Michael

Colored dots without connecting lines

Hello!

I really love this tool and a lot of discussions over here that helped me a lot when analyzing my data. I noticed that in the SV plot generated by plot_chromothripsis, there were dots in different colors (including black and orange/green/blue) that did not connect to any line. My question is what do these dots mean? Do they represent inter-chromosomal translocation or just local SVs that impact very small regions? If so, does color mean anything for them? Thank you very much.

Sincerely,
Chennan

support mm10 genome

Hi ShatterSeek team, thank you so much for such a great package, may I know is there any way to perform ShatterSeek on mm10 mouse genome? the current version seems only support hg19 and hg38. Thank you so much!

Code for assigning confidence to candidate regions

Hi,
I just wondering if you could provide the R code used to assign candidates as high, medium and low confidence candidates in the 2,600 whole genomes. e.g. which columns of the object returned by shatterseek, p-value cut-off etc. were use? Sorry, if this has already been provided elsewhere and I've missed it.

How to use Manta output for ShatterSeek? No Strand info, and different SVtype

Hi,

I'd like to use the output from Manta as an input for SVs function, however, I don't have the strand info in the Manta calls, is it ok to put + instead for both strands? and which of the manta SVtypes are supported here (BND, DEL, DUP, INS, INV)? and if not supported, what do I need to do with those calls, perhaps changing them to TRA?

Thanks for this great tool and support.

Multiple warnings when using tutorial data

Hello,

I am attempting to run ShatterSeek using the information presented in the tutorial before moving onto my datasets. The code I used is the following

library(devtools)
install_github("parklab/Shatterseek", force=TRUE)
library(gridExtra)
library(ShatterSeek)
data(DO17373)

head(SV_DO17373)

SV_data <- SVs (chrom1 = as.character(SV_DO17373$chrom1),
                pos1 = as.numeric(SV_DO17373$start1),
                chrom2 = as.character(SV_DO17373$chrom2),
                pos2 = as.numeric(SV_DO17373$end2),
                SVtype = as.character(SV_DO17373$svclass),
                strand1 = as.character(SV_DO17373$strand1),
                strand2 = as.character(SV_DO17373$strand2))


CN_data <- CNVsegs (chrom = as.character(SCNA_DO17373$chromosome),
                    start = SCNA_DO17373$start,
                    end = SCNA_DO17373$end,
                    total_cn = SCNA_DO17373$total_cn
                    )

chromothripsis <- shatterseek(SV.sample = SV_data, seg.sample = CN_data)
head(chromothripsis@chromSummary)
plots_chr3 <-  plot_chromothripsis(ShatterSeek_output = chromothripsis,chr = "3")

head(SV_DO17373) returns the same information as in the tutorial PDF but chromothripsis <- shatterseek(SV.sample = SV_data, seg.sample = CN_data) yields 44 errors. Each error is related to SVsnow$pos1 and SVsnow$pos2, for example
In min(c(SVsnow$pos1, SVsnow$pos2)) : no non-missing arguments to min; returning Inf
print(head(chromothripsis@chromSummary)) has many NA values after this, which is not the case in the tutorial data.
Is there a missing package that needs to be installed? I am running R version 4.1.1 (2021-08-10) if that is helpful.
Thank you

Got error when trying to execute ShatterSeek with other DOxxxx dataset

Dear author,

We encountered an error "Error in if (sum(vec == 0) == length(vec)) { : missing value where TRUE/FALSE needed" when we were trying to execute ShatterSeek with other DOxxxxx datasets obtained from https://dcc.icgc.org/. We have checked the SVs and CNVsegs contents to make sure the data format looks the same as the one in DO17373 from the tutorial. Do you have any idea for possible causes? I can provide more details for reproducing the result if you want.

On the other hand, we thought of trying to debug it ourselves, but we found that the tutorial code will only work with library(ShatterSeek). When we run the code in ShatterSeek.R manually, the tutorial code would not work. It would return an error when we try to create the SVs class. The error is "Error during wrapup: error in evaluating the argument 'table' in selecting a method for function 'match': object 'chromNames' not found
Error: no more error handlers available (recursive errors?); invoking 'abort' restart".

Would be grateful for any possible hints, thank you!

Merge copy number segments to have different copy number values in adjacent segments

Hi Isidro,

Our copy number segment file has adjacent entries with same copy number values. So I have tried your code to merge them.

However,I found it actually doubled the copy number entries by adding copy number segment of width 1 between original copy number entries. I think it will double the number of oscillation between 2 CN states and eventually affect the results of high-confidence chromothripsis region detection. There are also some entries with copy number of 150001. Are those supposed to see? Did I miss anything?

I have also tried to merge by myself, as following:

d is a data.frame with colums: chr, start, end, total_cn

d$numeric_chrom <- as.character(d$chrom)
d[d$numeric_chrom == "X",]$numeric_chrom <- "23"
d$numeric_chrom <- as.numeric(d$numeric_chrom)
d$total_cn <- as.numeric(d$total_cn)
tmp <- group_by(d,Diff = (cumsum(c(1,diff(total_cn)) | c(1,diff(numeric_chrom)) !=0 )))
dd <- summarise(tmp, numeric_chrom = c(numeric_chrom)[1], start = c(start)[1],end = c(end)[length(c(end))], total_cn = mean(total_cn))
dd$chrom <- as.character(dd$numeric_chrom)
dd[dd$chrom == "23",]$chrom <- "X"
dd <- data.frame(chrom=dd$chrom, start=dd$start, end=dd$end, total_cn=dd$total_cn)
d <- dd

What I found is that some samples have all chromothripsis regions having same max_number_oscillating_CN_segments_2_states and max_number_oscillating_CN_segments_2_states. Is this expected to see?

Thank you in advance for any comments and suggestions!

Tingting

can't change color in plot_chromothripsis

Hello,

Thanks for this tool!

I don't seem to be able to change the colors for the various SV types in the method "plot_chromothripsis". Are these hard coded? They always stay as the defaults:

DEL_color='darkorange1'
DUP_color='blue1'
t2tINV_color="forestgreen"
h2hINV_color="black"

Thanks,
Ahwan

questions for SV cluster detection

Hi,

I read through the tutorial, which is nice. But I am not so clear on description "SVs are interleaved if the genome regions bridged by their breakpoints overlap but not nested". Suppose we have two SVs at [1000, 2000] and [1800, 3500]. The breakpoints overlap exist, thus these two intervals can have edge on the graph. But what do you mean by nested? Is that mean a SV is inside another one? Let's say [1000, 2000] and [1200, 1800].

Thanks!

Only interchromosomal SVs used for chr_breakpoint_enrichment and pval_exp_chr ?

Hello,

I want to use ShatterSeek, and I tried testing it on a sample which seems (based on visual inspection) to have a chromothripsis event on chromosome 3. In the chromSummary that I get, most fields seem coherent to me, but for some reason I get NA for the fields chr_breakpoint_enrichment and pval_exp_chr for chr3, even though there are 11 intrachromosomal SVs on this chromosome.

I had a quick look at the code to try to understand what cause the issue, and it looks like only interchromosomal SVs are considered for these tests (https://github.com/parklab/ShatterSeek/blob/master/R/function_criteria.R#L264), even though, based on my understanding of the ’chromosomal breakpoint enrichment test’ and ’random distribution of breakpoints test’, intrachromosomal SVs should be taken into account for these tests.

Is this the intended behavior, or am I misunderstanding something ?

Error in statistical_criteria(): error in evaluating the argument 'x' in selecting a method for function 'nrow': wrong sign in 'by' argument

#-----------------------------------------------------#
DATA ATTACHMENT TO REPRODUCE THE ERROR
SV and CN data.zip
#-----------------------------------------------------#

Hi there,

I encountered an error like below when running ShatterSeek using genome version of hg38.
Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'nrow': wrong sign in 'by' argument

Please decompressed the attachment and use the following code and you can reproduce the error:

library(ShatterSeek)
SV_data <- readRDS("SV_data.rds")
CN_data <- readRDS("CN_data.rds")
chromothripsis <- 
  shatterseek(SV.sample = SV_data,
              seg.sample = CN_data,
              genome = "hg38")

I reviewed carefully from the source code and found it occured in function of statistical_criteria() when running till the following code:
nb_oscill = ShatterSeek:::fmax2(CNV_window_now[seq(max(idxa),min(idxb),1),],cutoff=1)

In my data from the compressed file, the max(idxa) is 170 while the min(idxb) is 169 which caused a wrong sign direction.

Could you please help resolving this issue at your convenience?

Many thanks in advance!
Xiaofan

Having circos plot

Thank you so this software

Can I please have access to a sample code for plotting the chromotripsis by a circos as your paper or what is in the tutorial file?

Thanks a million

Define SVtype for ShatterSeek

Hi,

Thanks for developing ShatterSeek. I have been trying to correctly convert our BRASS svclass into right ShatterSeek SVtype according to the ShatterSeek documentation:

  SVtype (character): type of SV, encoded as: DEL (deletion-like; +/-), DUP (duplication-like;-/+), h2hINV (head-to-head inversion; +/+), and t2tINV (tail-to-tail inversion; -/-).

Our BRASS results (from about 100 samples) have following svclass information:

svclass strand1 strand2
deletion + +
tandem-duplication - -
translocation - +
translocation - -
translocation - +
translocation + +
translocation + -
inversion + -
inversion - +

As you see, none of our BRASS svclass can be correctly converted into ShatterSeek SVtype if strand is considered.

An example BRASS:
example.brass.raw.bedpe.txt

I may have missed something. Your help will be greatly appreciated.

Jack

Check-up prior to indexing possibly missing

Hi ShatterSeek Team,

ShatterSeek runs successfully on a number of samples (various combinations of CNV and SV callers) but with one of the samples this error occurs:

Error in seq.default(max(idxa), min(idxb), 1): wrong sign in 'by' argument

What happens is that in function_criteria() value of min(idxb) is 354, and max(idxa) is 355, and step argument is 1. Here's the if statement before the error occurs:

if (length(idxb)!=0 & length(idxa)!=0 ){

Above in the code there is a similar if statement, checking lengths of idxa and idxb, but also checks whether max(idxa) <= min(idxb). Perhaps this check is missing in the reported line.

Please have a look.

Thanks!

Best, Vladimir

Problems about the CN & SV input.

Dear all, we used GATK for CN info and lumpy for SV. The input files are attached. We use LINEAR_COPY_RATIO * 2 as total CN.
genotyped-denoised-copy-ratios-229005560FD.vcf.gz
229005560FD_lumpySV_gt.zip
Here we got some confusion:

  1. We wonder if the CN info must be continuous because there are gaps between each segment in our data.
  2. Are the SVs strictly accurate and filtered, or can they be kind of loose?
  3. We can not get the interleaved SVs in 60 samples. Is there any potential reason or suggestion to solve this? Actually, we did not understand how interleaved SVs formed.
  4. Is looking by eyes the only way to tell interleaved SVs?
  5. Here is the criteria we used to judge the chromothripsis. Is it feasible?
    (1) Osillating CN >= 7.
    (2) Pval chr breakpoint enrichment < 0.05 | Pval exp breakpoints < 0.05.
    (3) Pval fragment joins >0.05.
    (4) Interleaved SVs >= 6.

Interpreting the results

Sorry this is not an issue with your software at all

I have run your software on one of my samples, but from the results I really don't know if this sample has chromotripsis or not.

For instance, pval_fragment_joins for 3 chromosomes is < 0.05 but in conclusion I don't know if I could call this sample having chromotripsis or not

Even by visualize inspection can not say

I saw this

pval_fragment_joins > 0.05
chr_breakpoint_enrichment < 0.05
pval_exp_cluster < 0.05
number_DEL + number_DUP + number_h2hINV + number_t2tINV >=3
number_TRA>=4
max_number_oscillating_CN_segments_2_states>=7

If this criteria be true for only one chromosome, can we call that sample with chromotripsis

I have attached my results here if you please give me an intuition

Thank you so much for any help

My_output.txt
Rplot

hg38 support

Cool R package!

Any plans on adding hg38 support?

If not, could I just modify chr_info.rds to get it working?

Error in h (simpleError (msg, call)): Error when evaluating the argument 'x' in the method selection for function 'nrow': 'to' must be a finite number

Hello ShatterSeek team,
I have recently encountered the following error when attempting to run ShatterSeek:

Error in h (simpleError (msg, call)): Error when evaluating the argument 'x' in the method selection for function 'nrow': 'to' must be a finite number

From what I can tell, the error happens in the statistical_criteria function, in lines 148-168, where the calculation of idxa and idxb happen.

For your convinience, and if it is of any help, I've created .rds files with the SV_data and CNV_data in question, which you can download here: https://www.dropbox.com/sh/s0stt5vuvzst5m4/AADn03oezu4F8mdwoqa1JkXOa?dl=0

To run ShatterSeek, I simply run: shatterseek( SV_data , CNV_data )

Any help figuring out this error will be much appreciated.

Best,

-Mathias

Is ShatterSeek compatible with hg19 and hg38 coordinates ?

Dear ShatterSeek team,

I have a couple of questions regarding the method and its output(s):

  1. If I understood the method correctly, the statistical criteria used for detecting Chromothripsis, do not depend on whether the SVs and CNs coordinates come from a hg19 or hg38 reference genome. Is this correct?

  2. In line with the previous question, and regarding to the output of plot_chromothripsis(), you indicate in the documentation that only hg19 idiograms are supported. Does this mean that the remaining plot are based solely on chromosome coordinates, irrespective of whether they come from hg19 or hg38 ?

  3. As it was mentioned in #14, How do you generate the Circos plots that you use to represent the Chromothripsis events? Could you share an example code for that ?

Thanks in advance for the help!

Best,

-Mathias

ShatterSeek PCAWG High Confidence and Low Confidence cutoffs

Hello,

Thanks again for this tool. I just really wanted to make sure I am doing the correct interpretation of the output of the tool and would really appreciate if you could have a look at my workflow. It is quite important that I do this right for a paper we are about to publish.

I've been trying to implement the cutoffs that are mentioned in the supplementary methods of the PCAWG paper:
https://www.nature.com/articles/s41588-019-0576-7

Namely, the section:

1.2 Confidence of chromothripsis calls
After visual inspection of a number of chromothripsis calls, we considered as high-confidence
chromothripsis those regions satisfying at least one of the following sets of cut-off values for the
statistical criteria described above: (i) at least 6 interleaved intrachromosomal SVs, 7 adjacent
segments oscillating between 2 CN states, the fragment joins test, and either the chromosomal
enrichment or the exponential distribution of breakpoints test; and (ii) at least 3 interleaved
intrachromosomal SVs and 4 or more interchromosomal SVs, 7 adjacent segments oscillating
between 2 CN states and the fragment joins test. Given that the oscillating CN pattern is often
only partly retained in chromothripsis events in Bone-Osteosarc, SoftTissue-Leiomyo, and
SoftTissue-Liposarc tumors due to subsequent amplification events1,10,11, we also considered as
high-confidence chromothripsis regions those detected in these cancer types involving at least
40 intra- or interchromosomal SVs, and satisfying the fragments joins test (this change in criteria
affects six samples, but, because many of them have multiple events, only one sample that
previously was not identified as having chromothripsis was identified as having one event).
We classified as low-confidence those chromothripsis regions meeting the following cut-off
values: at least 6 interleaved intrachromosomal SVs, 4, 5 or 6 adjacent segments oscillating
between 2 CN states (following the criteria previously used2), the fragment joins test, and either
the chromosomal enrichment or the exponential distribution of breakpoints test. Therefore, the
main difference between the low- and high-confidence calls resides in the number of oscillating
CN segments.

I am interpreting the above as passing any of the following 3 High Confidence filters:

# HC1 : High confidence recommendation by PCAWG and Shatterseek tutorial.pdf
- at least 6 interleaved intrachromosomal SVs, 
- 7 contiguous segments oscillating between 2 CN states, 
- failing the fragment joins test,  (i.e. pval is not significant)
- passing (either the chromosomal enrichment ) or (the exponential distribution of breakpoints test.) [i.e. pval is significant]

# HC2 : High confidence recommendation by PCAWG and Shatterseek tutorial.pdf
- at least 3 interleaved intrachromosomal SVs
- 4 or more interchromosomal SVs
- 7 contiguous segments oscillating between 2 CN states
- failing the fragment joins test.  (i.e. pval is not significant)

# HC3 : High confidence recommendation by PCAWG
- high-confidence chromothripsis regions those detected in these cancer types involving at least
- 40 intra or interchromosomal SVs
- failing the fragments joins test  (i.e. pval is not significant)

# Low confidence: LC1
- at least 6 interleaved intrachromosomal SVs, 
- 4, 5 or 6 adjacent segments oscillating between 2 CN states, 
- failing the fragment joins test,  (i.e. pval is not significant)
- passing (either the chromosomal enrichment) or (the exponential distribution of breakpoints test.) [i.e. pval is significant]

So a candidate Chromothripsis event would be passing (HC1 or HC2 or HC3)?

Now, I am interpreting this in R with the ShatterSeek results and the exact columns it generates as:

# create a column with the intrachromosomal SVs count for the cluster
all_data$number_intrachromosomal_SVs = all_data %>% dplyr::select(number_DEL,number_DUP,number_h2hINV,number_t2tINV) %>% apply(1,sum)

# 1st High Confidence filter
filt1 = all_data$number_intrachromosomal_SVs >= 6
filt2 = all_data$max_number_oscillating_CN_segments_2_states >= 7
filt3 = all_data$pval_fragment_joins >= 0.05
filt4 = (all_data$chr_breakpoint_enrichment <= 0.05) | (all_data$pval_exp_chr <= 0.05)
HC1 = (filt1) & (filt2) & (filt3) & (filt4)
HC1[is.na(HC1)] <- FALSE
all_data$HC1 <- HC1

# 2nd High Confidence filter
filt1 = all_data$number_intrachromosomal_SVs >= 3
filt2 = all_data$number_TRA >= 4
filt3 = all_data$max_number_oscillating_CN_segments_2_states >= 7
filt4 = all_data$pval_fragment_joins >= 0.05
HC2 = (filt1) & (filt2) & (filt3) & (filt4)
HC2[is.na(HC2)] <- FALSE
all_data$HC2 <- HC2

# 3rd High Confidence filter
filt1 = all_data$clusterSize_including_TRA >= 40
filt2 = all_data$pval_fragment_joins >= 0.05
HC3gte40 = (filt1) & (filt2)
HC3gte40[is.na(HC3gte40)] <- FALSE
all_data$HC3gte40 <- HC3gte40

# Low Confidence filter
filt1 = all_data$number_intrachromosomal_SVs >= 6
filt2 = (all_data$max_number_oscillating_CN_segments_2_states >= 4) & (all_data$max_number_oscillating_CN_segments_2_states <= 6)
filt3 = all_data$pval_fragment_joins >= 0.05
filt4 = (all_data$chr_breakpoint_enrichment <= 0.05) | (all_data$pval_exp_chr <= 0.05)
LC1 = (filt1) & (filt2) & (filt3) & (filt4)
LC1[is.na(LC1)] <- FALSE
all_data$LC1 <- LC1

# all_data$HCany if TRUE means a Chromothripsis event has occurred
all_data$HCany <- all_data$HC1 | all_data$HC2 | all_data$HC3gte40

I have 126 samples in my cohort so I will probably use FDR corrected pvalues for any "pval" column. Upon correction, maybe the cutoffs could be adjusted to :

all_data$pval_fragment_joins.fdr >= 0.2
(all_data$chr_breakpoint_enrichment.fdr <= 0.1) | (all_data$pval_exp_chr.fdr <= 0.1)

I would really appreciate if you could have a look and let me know if I am on the right track or completely off the mark!

Thanks so much for your time.

Ahwan

Cannot allocate memory

Hello,

image

When I ues 20G to post tasks to the server,it still reports errors. Have you ever encountered this problem before? Besides increasing the memory, do you have any solutions?

Error in statistical_criteria

Although I have used the code provided by Readme to merge adjacent regions with the same copy number value, it still reported an error in the function "statistical_criteria" that "error in evaluating the argument 'x' in selecting a method for function 'nrow': wrong sign in 'by' argument" .
When cand=7 and chr_inter=21, it reported this error.

This is the sample, Thanks
TRCC.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.