Giter Club home page Giter Club logo

Comments (8)

JEFworks avatar JEFworks commented on June 9, 2024

Hi Florian,

Thanks for the thorough documentation of your issue. Can you please try to see if the error persists with knn.error.models in addition to scde.error.models?

Also, can you please check if the multicore issue is being caused by mclapply or bplapply? We currently handle multicore processing using this function:

papply <- function(...,n.cores=detectCores()) {
  if(n.cores>1) {
    # bplapply implementation
    if(is.element("parallel", installed.packages()[,1])) {
      mclapply(...,mc.cores=n.cores)
    } else {
      # last resort
      bplapply(... , BPPARAM = MulticoreParam(workers = n.cores))
    }
  } else { # fall back on lapply
    lapply(...);
  }
}

You can explicitly modify the papply function to use one or the other:

papply <- function(...,n.cores=detectCores()) {
  if(n.cores>1) {
      # no longer use mclapply
      bplapply(... , BPPARAM = MulticoreParam(workers = n.cores))
  } else { # fall back on lapply
    lapply(...);
  }
}

or just add a print or cat.

from scde.

FloWuenne avatar FloWuenne commented on June 9, 2024

I tried running knn.error.models and it seems to calculate error models correctly when using

linear.fit=TRUE

When setting linear fit to false, it will still give the same error.

Error in .Fortran("dqrls", qr = x[good, ] * w, n = ngoodobs, p = nvars,  :
  "dqrls" not resolved from current namespace (scde)

Using the papply code right in front of calculating error models seem to fix the multicore issue still gives me this error after execution for the code for 1 comparison group in my data. (I am iterating over clusters from my single cells and doing DE for two groups in each cluster).

models:
Error: 'bplapply' receive data failed:
  error reading from connection

from scde.

FloWuenne avatar FloWuenne commented on June 9, 2024

Sorry, so the multicore module still fails for me at this point. I think I am using it wrong. How exactly do I have to implement the papply solution?

I actually need the multicore since single core is really slow when comparing hundreds of cells. Any other tips for speed improvement?

from scde.

FloWuenne avatar FloWuenne commented on June 9, 2024

So I have been trying around with this a bit now and have not found a solution. Single core on my cells takes way too long. The error seems to come from bplapply for me but none of the modes in the papply function seems to work on our cluster. Any suggestions?!

from scde.

JEFworks avatar JEFworks commented on June 9, 2024

papply is just a wrapper function to call either bplapply or mclapply. So it seems like your cluster may be having trouble with either or both of those functions. Unfortunately I have not been able to reproduce this error, which makes helping you debug more challenging.

Can you please try a simple test:

## regular non-parallelized lapply
start_time <- Sys.time() 
lapply(1:10, function(x) { Sys.sleep(1) }) 
end_time <- Sys.time() 
t1 <- end_time - start_time 

This should take about 10 seconds.

## mclapply
start_time <- Sys.time() 
require(parallel) 
mclapply(1:10, function(x) { Sys.sleep(1) }, mc.cores=10) 
end_time <- Sys.time() 
t2 <- end_time - start_time 

This should take less than 10 seconds, though there's time associated with forking so it should be more than 1 second. If you want to benchmark the time spent for forking, you can try:

start_time <- Sys.time() 
require(parallel) 
mclapply(1:10, function(x) { }, mc.cores=10) 
end_time <- Sys.time() 
t3 <- end_time - start_time 

And the difference between t3 and t2 should be about 1 second if your mclapply is working properly.

And lastly:

## bplapply
start_time <- Sys.time() 
require(BiocParallel) 
bplapply(1:10, function(x) { Sys.sleep(1) }, BPPARAM = MulticoreParam(workers = 10)) 
end_time <- Sys.time() 
t4 <- end_time - start_time 

If these are failing on your cluster, then it is an issue with the parallelization.

from scde.

FloWuenne avatar FloWuenne commented on June 9, 2024

Just ran the tests and it seemed to execute all of them without any problems...

t1
Time difference of 10.03305 secs
t2
Time difference of 1.132771 secs
t3
Time difference of 0.1887727 secs
t4
Time difference of 1.676646 secs

So seems its not a problem with these packages? Do I need to specify BiocParallel or parralel when running scde? I was guessing they are dependencies and are automatically loaded?

from scde.

JEFworks avatar JEFworks commented on June 9, 2024

They should be automatically loaded.

Based on the error you saw previously:

models:
Error: 'bplapply' receive data failed:
  error reading from connection

it sounds like this could be a memory issue ("*** caught segfault *** address 0xf2d1, cause 'memory not mapped'" is a more common error message for really large datasets). Does the error reproduce for the example data provided with the package or just your own data?

from scde.

Cumol avatar Cumol commented on June 9, 2024

I might be having a similar problem as I am getting the following error when running scde.error.models:

Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection
> 
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal

I will try out the different apply functions and let you know

from scde.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.