Giter Club home page Giter Club logo

Comments (18)

pcarbo avatar pcarbo commented on August 23, 2024

@willwerscheid Thanks for your feedback.

You shouldn't get log-likelihoods of Inf. Is there any way you could share with me the inputs to fit_topic_model so I can understand why this is happening?

from fasttopics.

pcarbo avatar pcarbo commented on August 23, 2024

Finally the description of the output in the fit_poisson_NMF docs (version 0.5-52) does not exactly correspond to the output as it's rendered.

@willwerscheid Do you mean the fit_poisson_nmf return value, or what is displayed on the screen when verbose = "detailed"?

from fasttopics.

willwerscheid avatar willwerscheid commented on August 23, 2024

what's displayed on screen

from fasttopics.

pcarbo avatar pcarbo commented on August 23, 2024

@willwerscheid Could share with me a code snippet for reproducing this result? Ideally it would have all the inputs to fit_topic_model, and the seed. The code you shared with me is not quite enough.

from fasttopics.

pcarbo avatar pcarbo commented on August 23, 2024

Finally the description of the output in the fit_poisson_NMF docs (version 0.5-52) does not exactly correspond to the output as it's rendered.

@willwerscheid Can you please take a look at the new descriptions of the verbose input argument and the progress return value and let me know if there are additional points you find unclear, or confusing?

from fasttopics.

willwerscheid avatar willwerscheid commented on August 23, 2024

With dat as the PBMCs dataset I'm using K = 12. I don't think the seed should matter because this was happening repeatedly for me. I expect that the +Infs should appear for a variety of K as well. I'm unable to give an exactly reproducible example now but will do so later.

from fasttopics.

pcarbo avatar pcarbo commented on August 23, 2024

@willwerscheid The Inf likelihoods are due to having cells with no expression:

> sum(rowSums(X > 0) == 0)
[1] 52

This is technically allowed for the Poisson NMF model but doesn't make sense for the multinomial topic model.
I will have to think a bit how to handle this since this is a special case that I hadn't thought of.

from fasttopics.

marlaherr avatar marlaherr commented on August 23, 2024

Hi there,

Thanks for the great tool!

I am experiencing a similar issue as @willwerscheid after running fit_topic_model. All my log-likelihoods are Inf.

> unique(fit_6_Normal[["progress"]][["loglik.multinom"]])
[1] Inf

As expected, running plot_progress gives me an empty plot, but structure_plot still produces a normal looking structure plot. Can I trust this plotting result though?

I don’t seem to have cells without expression, although I do have non-expressed genes.

> sum(rowSums(counts_Normal > 0) == 0)
[1] 0
> sum(colSums(counts_Normal > 0) == 0)
[1] 2562

Removing these all 0 columns did not solve it, though.

from fasttopics.

pcarbo avatar pcarbo commented on August 23, 2024

@marlaherr Can you try running fit_poisson_nmf instead, then converting to a topic model with poisson2multinom?

from fasttopics.

marlaherr avatar marlaherr commented on August 23, 2024

Thanks for the quick response @pcarbo. I tried your suggestion but still get the same output. The Inf values appear after fitting.

> fit <- fit_poisson_nmf(counts, k = 6)
Using 128 RcppParallel threads.
Initializing factors using Topic SCORE algorithm.
Initializing loadings by running 10 SCD updates.
Using 128 RcppParallel threads.
Fitting rank-6 Poisson NMF to 140 x 21189 sparse matrix.
Running 100 SCD updates, without extrapolation (fastTopics 0.6-98).
                                                                            
> topic_model <- poisson2multinom(fit)

This is the structure of my counts object:

> str(counts)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:536353] 6 36 42 52 62 72 93 122 124 133 ...
  ..@ p       : int [1:21190] 0 11 53 55 63 70 85 116 122 123 ...
  ..@ Dim     : int [1:2] 140 21189
  ..@ Dimnames:List of 2
  .. ..$ : chr [1:140] "AACAAAGAGTCAATCC-1_1" "ACCCTTGGTCGTACAT-1_1" "AGGCTGCAGGTTCTAC-1_1" "ATCCACCGTCAACCAT-1_1" ...
  .. ..$ : chr [1:21189] "AL627309.1" "AL669831.5" "LINC00115" "AL645608.7" ...
  ..@ x       : num [1:536353] 0.867 0.887 0.812 0.13 0.765 ...
  ..@ factors : list()

And this is how the fit object looks like:

> str(fit)
List of 13
 $ F        : num [1:21189, 1:6] 1.00e-10 6.86e-02 1.00e-10 1.00e-10 1.00e-10 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:21189] "AL627309.1" "AL669831.5" "LINC00115" "AL645608.7" ...
  .. ..$ : chr [1:6] "k1" "k2" "k3" "k4" ...
 $ L        : num [1:140, 1:6] 1.00e-10 1.00e-10 1.66e-03 1.00e-10 6.74e-04 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:140] "AACAAAGAGTCAATCC-1_1" "ACCCTTGGTCGTACAT-1_1" "AGGCTGCAGGTTCTAC-1_1" "ATCCACCGTCAACCAT-1_1" ...
  .. ..$ : chr [1:6] "k1" "k2" "k3" "k4" ...
 $ Fn       : num [1:21189, 1:6] 1.00e-10 6.86e-02 1.00e-10 1.00e-10 1.00e-10 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:21189] "AL627309.1" "AL669831.5" "LINC00115" "AL645608.7" ...
  .. ..$ : chr [1:6] "k1" "k2" "k3" "k4" ...
 $ Ln       : num [1:140, 1:6] 1.00e-10 1.00e-10 1.66e-03 1.00e-10 6.74e-04 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:140] "AACAAAGAGTCAATCC-1_1" "ACCCTTGGTCGTACAT-1_1" "AGGCTGCAGGTTCTAC-1_1" "ATCCACCGTCAACCAT-1_1" ...
  .. ..$ : chr [1:6] "k1" "k2" "k3" "k4" ...
 $ Fy       : num [1:21189, 1:6] 1.00e-10 6.86e-02 1.00e-10 1.00e-10 1.00e-10 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:21189] "AL627309.1" "AL669831.5" "LINC00115" "AL645608.7" ...
  .. ..$ : chr [1:6] "k1" "k2" "k3" "k4" ...
 $ Ly       : num [1:140, 1:6] 1.00e-10 1.00e-10 1.66e-03 1.00e-10 6.74e-04 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:140] "AACAAAGAGTCAATCC-1_1" "ACCCTTGGTCGTACAT-1_1" "AGGCTGCAGGTTCTAC-1_1" "ATCCACCGTCAACCAT-1_1" ...
  .. ..$ : chr [1:6] "k1" "k2" "k3" "k4" ...
 $ loss     : num -1454258
 $ loss.fnly: num -1454258
 $ iter     : num 100
 $ beta     : num 0.5
 $ beta0    : num 0.5
 $ betamax  : num 0.99
 $ progress :'data.frame':	100 obs. of  13 variables:
  ..$ iter           : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ loglik         : num [1:100] -1536953 -1386621 -1342385 -1324116 -1314149 ...
  ..$ loglik.multinom: num [1:100] Inf Inf Inf Inf Inf ...
  ..$ dev            : num [1:100] NA NA NA NA NA NA NA NA NA NA ...
  ..$ res            : num [1:100] 8908 5893 2826 1598 2111 ...
  ..$ delta.f        : num [1:100] 772.1 105.4 48.4 29.4 27.3 ...
  ..$ delta.l        : num [1:100] 3.148 3.403 1.887 1.066 0.795 ...
  ..$ nonzeros.f     : num [1:100] 0.495 0.497 0.521 0.537 0.547 ...
  ..$ nonzeros.l     : num [1:100] 0.812 0.844 0.748 0.738 0.692 ...
  ..$ extrapolate    : num [1:100] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ beta           : num [1:100] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
  ..$ betamax        : num [1:100] 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 ...
  ..$ timing         : num [1:100] 0.076 0.072 0.068 0.076 0.068 ...
 - attr(*, "class")= chr [1:2] "poisson_nmf_fit" "list"

I am grateful for any suggestions, thanks in advance!

from fasttopics.

pcarbo avatar pcarbo commented on August 23, 2024

@marlaherr Would you mind sharing the result of running fit_poisson_nmf?

from fasttopics.

marlaherr avatar marlaherr commented on August 23, 2024

@pcarbo, do you mean the β€œfit” R object? I am not entirely sure how to best share it. Does a csv file work? Otherwise I could send it to you via email.

fit_marla.csv

from fasttopics.

pcarbo avatar pcarbo commented on August 23, 2024

Yes, but can you save the fit object using save (to create an RData file) or saveRDS (to create an rds file)? Then you should be able to attach it to your reply to this GitHub Issue, or you can send an email.

from fasttopics.

marlaherr avatar marlaherr commented on August 23, 2024

I tried to attach it as either .RData or .rds file but neither is supported by GitHub. Would you let me know your email address so I can send you the file? Thanks for your help!

from fasttopics.

pcarbo avatar pcarbo commented on August 23, 2024

@marlaherr It looks like there is a bug or numerical issue in the multinomial likelihood calculations insidefit_poisson_nmf because loglik_multinom_topic_model works fine:

sum(loglik_multinom_topic_model(X,fit))
# -1293678

So for now please use loglik_multinom_topic_model.

from fasttopics.

marlaherr avatar marlaherr commented on August 23, 2024

Thank you @pcarbo, loglik_multinom_topic_model works!

from fasttopics.

pcarbo avatar pcarbo commented on August 23, 2024

@marlaherr I've fixed the likelihood calculations so it should work for your data set now. The issue was that the likelihood calculations sometimes did not work for non-integer counts. Please update your version of fastTopics to the latest version.

from fasttopics.

marlaherr avatar marlaherr commented on August 23, 2024

Yes, now it works, many thanks @pcarbo.

from fasttopics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.