Giter Club home page Giter Club logo

Comments (8)

asd1289 avatar asd1289 commented on July 4, 2024

Hello @tlesluyes.
The order of 310 and 311 is correct. Each normal is read in and transposed in the mclapply loop. The purpose of na.omit is to remove NAs added by the tryCatch() in case a file is missing. This enables proceeding without completely breaking the loop if a file is missing or due to a bad file path.
I agree that in WGS there are bins with missing values. But this will be dealt with when running dryclean on individual normal samples. Since the decomposition doesn't allow for missing values, we replace them with mean value for each chromosome. Consequently, the output of dryclean should have no NA bins.
It is important that the identify_germline step is run on normal samples that have undergone decomposition (as is stated in the tutorial) and not on raw data.
Hopefully this is helpful and please let me know if you have further problems/need clarification.

from dryclean.

tlesluyes avatar tlesluyes commented on July 4, 2024

Hi @asd1289,
Thank you for clarifying this. I confirm that I used the identify_germine step on decomposed normal samples. From what you've said, I understand that the decomposition (start_wash_cycle) used on a normal sample (using PON) replaces NAs (from fragCounter) by the mean value of each chromosome. Can you confirm that (is it the mean or the median? There is no mean function in the code)? Can you have a look at dryclean.R lines 680 to 691 because what's NA remains NA here, there isn't any replacement of NA values by the median/mean.

Actually, the function works well until line 679 because I do have foreground.log numeric values but those are different from the mean/median. For examples: chr1:1-1000 and chr1:1001-2000 (my two first positions) are equal to -0.476 and -0.187, respectively, whereas median.chr is 1.004. But they still have NA values for input.read.counts so they are set to NA at line 680.

Here is the code I use to decompose my normal sample and where I end up with NAs (is the mc.cores useful as it's not explicitely used in the function?):

decomp=start_wash_cycle(cov=readRDS('OUTPUT/Sample_1.rds'),
mc.cores=12,
detergent.pon.path="/camp/home/lesluyt/working/lesluyt/Dryclean/PON/OUTPUT/deter_PON.rds",
whole_genome=T,
germline.filter=F)

EDIT
I'm not using a blacklist and lines 669 and 670 seems to play a major role as those actually replace NA by median value (regardless if you use a blacklist or not). Would it make sense to get those out the if statement?

from dryclean.

kcygan avatar kcygan commented on July 4, 2024

I have the same issue. WES analysis. Everything works till the germline events identification step which fails and gives the following error:

Starting the preparation of Panel of Normal samples a.k.a detergent Error in setattr(ans, "names", c(keep.names, paste0("V", seq_len(length(ans) - : 'names' attribute [1] must be the same length as the vector [0]

I do have drycleaned normals and NA value appears there as well:

GRanges object with 6 ranges and 10 metadata columns:
      seqnames      ranges strand |      background.log    foreground.log
         <Rle>   <IRanges>  <Rle> |           <numeric>         <numeric>
  [1]        1 11869-12227      * |  -0.136516325393426                 0

          reads                gc       map  input.read.counts
      <numeric>         <numeric> <numeric>          <numeric>
  [1]         0 0.512534818941504         0                  0

             median.chr       foreground        background          log.reads
              <numeric>        <numeric>         <numeric>          <numeric>
  [1] 0.670632457759282                0                 0               <NA>

from dryclean.

asd1289 avatar asd1289 commented on July 4, 2024

Hello @tlesluyes and @kcygan thanks for pointing out this bug. I am working on this and will push a patch asap.

from dryclean.

kcygan avatar kcygan commented on July 4, 2024

Thanks a lot @asd1289 When you push the changes could you please also update the start_wash_cycle function, line 694 from
germ.file = readRDS(germline.filter)
to
germ.file = readRDS(germline.file)

from dryclean.

asd1289 avatar asd1289 commented on July 4, 2024

@kcygan I have changed that typo. Please stay tuned for the NA error.

from dryclean.

willhooper avatar willhooper commented on July 4, 2024

I'm having a similar error when using identify_germline() on WGS data as well:

Error in setattr(ans, "names", c(keep.names, paste0("V", seq_len(length(ans) -  :
  'names' attribute [1] must be the same length as the vector [0]
Calls: identify_germline -> transpose -> setattr

from dryclean.

asd1289 avatar asd1289 commented on July 4, 2024

@tlesluyes @kcygan @willhooper I have added a temporary fix as suggested by @tlesluyes for now. I am keeping this issue open to follow up on this bug. Please let me know of additional issues. Thank you for reporting bugs and take care!

from dryclean.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.