Giter Club home page Giter Club logo

dryclean's People

Contributors

evanbiederstedt avatar jrafailov avatar mskilab avatar sc13-bioinf avatar sebastian-brylka avatar shaiberalon avatar shihabdider avatar tanubrata avatar zining01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dryclean's Issues

Typo in tutorial

Hi. Section 2 of the tutorial uses a data.table termed normal_table_example with the last column termed decompose_cov. This has to be corrected since the R code uses decomposed_cov (dryclean.R#L299).

na.omit wiping out data.table

Hi. I have an issue using the identify_germline function, where na.omit (dryclean.R#L310) destroys my entire data.table (0 lines = 0 samples). My understanding is that na.omit removes the entire line (corresponding to a sample) if it finds any NA value. But you do have NA values in there using WGS because of low-complexity regions, telomeres, centromeres, etc. Typically, my first positions correspond to chr1 telomere where I don't have any mapped read. Shouldn't it remove columns instead (corresponding to a genomic window where a single sample has a NA value? Aren't lines 310 and 311 inverted (transpose the data.table and then remove NA regions)?

could not find function "identify_germline"

Hi
I run the following code

grm = identify_germline(normal.table.path = "~/git/dryclean/inst/extdata/normal_table.rds", path.to.save = "~/git/dryclean/inst/extdata/", signal.thresh=0.5, pct.thresh=0.98)

get an error

could not find function "identify_germline"

Has identify_germline been discarded? Do it still need the step of Identifying germline events?

Issue with pon.binsize

Hello,

I am attempting to use Dryclean with the PON provided at the bottom of the Readme, however I am receiving the below error:

`(Let's dryclean the genomes!)

Loading PON...
PON loaded
Loading coverage
Loading PON a.k.a detergent
Error in if (tumor.binsize != pon.binsize & testing == FALSE) { :
argument is of length zero
Calls:
2: (function ()
traceback(2))()
1: dryclean_object$clean(cov = opt$input, center = opt$center, cbs = opt$cbs,
cnsignif = opt$cnsignif, mc.cores = opt$cores, verbose = TRUE,
use.blacklist = opt$blacklist, blacklist_path = opt$blacklist_path,
germline.filter = opt$germline.filter, field = opt$field,
testing = opt$testing)`

When running dryclean_object$clean in R directly, it appears that my coverage file has a 1000bp bin size as expected, but the PON is returning NULL when pon.binsize is set. Do you have any insight into what may be causing this?

"Error in m.vec - s : non-conformable arrays" when running the tutorial

Hi,

I've tried to follow the tutorial to run dryclean on tumor sample within R but have run into this error:

Error in m.vec - s : non-conformable arrays

What I have run is:

# Install the latest version of dryclean
devtools::install_github("mskilab/dryclean", ref = "87a1a4f")

#
# Start of the tutorial
#
options(warn = 1)
library("dryclean")
library("magrittr")
library("GenomicRanges")

normal_dt <-
  data.frame(sample = c("samp1", "samp2", "samp3")) %>%
  dplyr::mutate(
    normal_cov =
      system.file(
        "extdata", paste0(.data[["sample"]], ".rds"), package = "dryclean"
      ),
  ) %>%
  data.table::setDT()

saveRDS(normal_dt, "normal_table.rds")

dir.create("detergent", showWarnings = FALSE)

# use.all: Use all samples
# save.pon: Saves the PoN (detergent) to the destinated folder
detergent <-
  prepare_detergent(
    normal.table.path = "normal_table.rds",
    path.to.save = "detergent",
    num.cores = 1,
    use.all = TRUE,
    save.pon = TRUE
  )

# Running dryclean on tumor sample within R
coverage_file <-
  readRDS(system.file("extdata", "dummy_coverage.rds", package = "dryclean"))
cov_out <-
  start_wash_cycle(
    cov = coverage_file,
    detergent.pon.path = file.path("detergent", "detergent.rds"),
    whole_genome = TRUE,
    chr = NA,
  )

This was the output:

Starting the preparation of Panel of Normal samples a.k.a detergent
3 samples available
Using all samples
Balancing pre-decomposition
PAR file not provided, using hg19 default.
If this is not the correct build, please provide a GRange object delineating for corresponding build
PAR read
Checking for existence of files
3 files present
  |=================================================================================================================================================================================================================================
Warning in .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Starting decomposition
This is version 2
Finished making the PON or detergent and saving it to the path provided

Loading PON a.k.a detergent from path provided
Let's begin, this is whole exome/genome
Initializing wash cycle
Using the detergent provided to start washing
lambdas calculated
calculating A and B
calculating v and s
Error in m.vec - s : non-conformable arrays

Any idea what is wrong?

prepare_detergent failing when using all samples

Hello,

After collecting a test set of fragCounter coverage profiles for 4 normal samples, I attempted to run the dryclean workflow.
I encountered the following error while trying the first step of creating the PoN in prepare_detergent:

pon_detergent <- prepare_detergent(normal.table.path = "/drycleanRun/test_ton.rds",
                                   use.all = TRUE,
                                   num.cores = 2,
                                   build = "hg38",
                                   path.to.save = "drycleanRun/",
                                   nochr = T,
                                   save.pon = T)

### OUTPUT ###
Starting the preparation of Panel of Normal samples a.k.a detergent
4 samples available
Using all samples
PAR file not provided, using hg38 default. If this is not the correct build, please provide a GRange object delineating for corresponding build
PAR read
Checking for existence of files
4 files present
  |=====================================================================================================================| 100%, Elapsed 07:21
Error in setattr(ans, "names", c(keep.names, paste0("V", seq_len(length(ans) -  : 
  'names' attribute [1] must be the same length as the vector [0]

While troubleshooting, it seems like others have encountered the same error, but at a different stage of the workflow (#2).
Based on the output message, it looks like the error occurs within pbmclapply function call at line 259 although I am not exactly sure where.

I then decided to test prepare_detergent under the other possible approaches instead of using all samples.
Interestingly, using either of the two alternative options choose.randomly = TRUE or choose.by.clustering = TRUE both executed without an error.

Here using choose.randomly = TRUE and selecting 2 of the 4 samples:

pon_detergent <- prepare_detergent(normal.table.path = "/drycleanRun/test_ton.rds",
                                   use.all = FALSE,
                                   choose.randomly = TRUE,
                                   number.of.samples = 2,
                                   choose.by.clustering = FALSE,
                                   num.cores = 2,
                                   build = "hg38",
                                   path.to.save = "drycleanRun/",
                                   nochr = T,
                                   save.pon = T)

### OUTPUT ###
Starting the preparation of Panel of Normal samples a.k.a detergent
4 samples available
Selecting 2 normal samples randomly
PAR file not provided, using hg38 default. If this is not the correct build, please provide a GRange object delineating for corresponding build
PAR read
Checking for existence of files
2 files present
  |============================================================================================================| 100%, Elapsed 03:28
Starting decomposition
This is version 2
Warning: Item 1 has 3031053 rows but longest item has 15155223; recycled with remainder.Finished making the PON or detergent and saving it to the path provided

And here using choose.by.clustering = TRUE

pon_detergent <- prepare_detergent(normal.table.path = "/drycleanRun/test_ton.rds",
                                   use.all = FALSE,
                                   choose.randomly = FALSE,
                                   number.of.samples = 2,
                                   choose.by.clustering = TRUE,
                                   num.cores = 2,
                                   build = "hg38",
                                   path.to.save = "drycleanRun/",
                                   nochr = T,
                                   save.pon = T)

### OUTPUT ###
Starting the preparation of Panel of Normal samples a.k.a detergent
4 samples available
Starting the clustering
Starting decomposition on a small section of genome
This is version 2
Starting clustering
PAR file not provided, using hg38 default. If this is not the correct build, please provide a GRange object delineating for corresponding build
PAR read
Checking for existence of files
2 files present
  |============================================================================================================| 100%, Elapsed 01:52
Starting decomposition
This is version 2
Warning: Item 1 has 3031053 rows but longest item has 15155223; recycled with remainder.Finished making the PON or detergent and saving it to the path provided

The output detergent.rds is in working order as I was able to run start_wash_cycle without any problems.
I will likely use the clustering method for further analysis but wanted to point out this issue for others who encounter it.

Best,
Patrick

apply dryclean to other ASCN algoritms

Hi, you guys did great jobs on this. I just want to feed the corrected rds data to other ASCN algoritms like sequenza. Btw, I noticed that you mentioned in the paper that output of dryclean could be fed to more spphiscated segmentation algorithms. Could u give me an example? Thx in advance!

Jay

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.