Giter Club home page Giter Club logo

Comments (12)

deevdevil88 avatar deevdevil88 commented on August 13, 2024 1

Hi @sjfleming
I have emailed you the file.

Best,
Devika

from cellbender.

deevdevil88 avatar deevdevil88 commented on August 13, 2024 1

Hi Stephen,
Thats completely understandable. I have now got access to a GPU to run my samples so i havent had a problem running them on the GPU. I did have to re-run some samples with different parameters as training dint converge, but no errors otherwise.
thanks,
Devika

from cellbender.

sjfleming avatar sjfleming commented on August 13, 2024

Hi Devika,

It does look like you are having some issues running the tool on this dataset. Running with the default z-dim and z-layers normally does not result in the kind of learning curve you have attached here. The big spike near epoch 75 and the overall wobbly look are bad signs. I have never seen this before with z-dim 20 and z-layers 500.

The other confusing thing is that cellbender should run deterministically, and there certainly should not be different behavior based on the number of epochs. Again, I haven't seen this before.

If you are willing to share this h5 file with me, I can take a look and try to debug.

What happens if you use --total-droplets-included 40000?

from cellbender.

deevdevil88 avatar deevdevil88 commented on August 13, 2024

Hi @sjfleming

I reran the analysis with only -total-droplets-included 40000 reading previous posts perhaps i need to reduce this from 60,000 and even then it always fails with default z-dims and z-layers with the same error as before : "Encountered NaN loss" (see attached error report
MBR1_v3_error.txt

. But when i ran it with either increasing the z-dims and z-layers and or decreasing them it worked fine. However i dont know which result has converged the best ( i suspect decreasing the z-layers version) and also what was the problem.

I would be happy for you to have the h5 file. But do you mean the cell ranger output h5 file or do you mean the cellbender out put from my first version with the problematic result? I wasnt sure.

See attached my log and pdf report for try 1 (decrease z-layers with total cells 40,000
MaleBrainRep1_v2.log
MaleBrainRep1_v2.pdf

) and try 2(increased z-layers and z-dims with total cells as 40,000)
MaleBrainRep1_v4.log
MaleBrainRep1_v4.pdf

So would really like some advice on this.

Devika

from cellbender.

sjfleming avatar sjfleming commented on August 13, 2024

Yes, if you could email me the CellRanger raw output h5 file at
[email protected]
then I can take a quick look. I would like to understand what's causing these NaNs, because I don't usually see this behavior.

For now I would guess that decreasing z-layers (your try 1) would probably produce a better result, but I'll know more once I take a look.

from cellbender.

deevdevil88 avatar deevdevil88 commented on August 13, 2024

Sure thing. I did attach my reports and logs for each try as well in my previous post.

from cellbender.

sjfleming avatar sjfleming commented on August 13, 2024

I don't seem to be able to reproduce the issue you had with the NaN loss. It seems to run just fine for me. I'm not sure what has caused the issue here... The only thing I've done differently is to include the --cuda flag, but that should make no difference on the computations that get carried out.

devika_out.log
devika_out.pdf

I will send you my version of the output h5 file, so you can compare.

from cellbender.

deevdevil88 avatar deevdevil88 commented on August 13, 2024

Hi Stephen,
Thank you for having a look on your end. I think maybe it might have to do with the way I had to install Cellbender in the conda environment on the server. As our server OS is Centos 6, i kept getting the GlibC library error even within the Conda environment for pytorch.
I used this fix detailed (link: https://gist.github.com/michaelchughes/85287f1c6f6440c060c3d86b4e7d764b) to compile my own GLIBC libraries and then get conda to recognize it.

I mean initially i thought it was my install. But considering other samples ran fine it couldnt be.
We should be getting an update in the coming months. But in the meanwhile , i dint find anyother solutions for the GLIBC errors for pytorch on Centos 6

if you could send me you output to compare that would be great.

Best,
Devika

from cellbender.

deevdevil88 avatar deevdevil88 commented on August 13, 2024

Hi @sjfleming
Just to give you an update. I ran my sample that was giving me trouble on a Centos 7 OS , where CellBender was installed using the manual install through a conda environment. I had no trouble while installing and everything worked fine. But the sample still gave the same error of UserWarning: Encountered NaN: loss".

maybe it needs to be run on a GPU. but on a clean install on a CPU system this error is recurring.
Thanks,
Devika

from cellbender.

sjfleming avatar sjfleming commented on August 13, 2024

Hi Devika,

I will look into this when I get a chance. So far, the GPU performance has been the priority since it takes so long to run on a CPU. But I definitely do not want it to error out on CPU. When I'm testing v2, I will test CPU performance as well.

from cellbender.

mbabadi avatar mbabadi commented on August 13, 2024

@deevdevil88 is this issue resolved?

from cellbender.

deevdevil88 avatar deevdevil88 commented on August 13, 2024

from cellbender.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.