Giter Club home page Giter Club logo

Comments (9)

nsheff avatar nsheff commented on July 24, 2024

If your userSets have regions that overlap multiple universe regions, this can happen. So, you haven't defined your userSets in terms of your universe. One giant region in the userSet could cover 15 universe regions, scoring 15 overlaps, but only registered as a single entity in the userSet, which can mess up the contingency table.

Try setting redefineUserSets=TRUE in runLOLA. Does it solve it?

from lola.

enricoferrero avatar enricoferrero commented on July 24, 2024

No, unfortunately it doesn't. I get the same warning and output with:
res <- runLOLA(userSets, userUniverse, lolaCustom, cores=detectCores(), redefineUserSets=TRUE)

The regions in my userSets are all FIMO scans of motifs and therefore quite short in length (between 8 and 31 bp in fact).

I don't really understand what could possibly be wrong with this data. Any idea?

Thinking out loud:

  • Would disjoin(), resize() or some other operation on userSets help in any way?
  • What about removing all entries that have a negative c value from the scoreTable?

Thanks,

from lola.

nsheff avatar nsheff commented on July 24, 2024

Your userSets should already be disjoint, or else you would get an error.
Removing negative c entries isn't the best because the fact that you have negative c entries is warning you that something is wrong with the way the test is constructed; eliminating these doesn't make the other tests accurate.

Ok, another idea. Try running reduce instead of disjoin on your database. I think maybe you are getting lots of little overlapping regions in your database because of the way you're combining potentially overlapping stuff and then disjoining.

from lola.

enricoferrero avatar enricoferrero commented on July 24, 2024

OK, I made some tests and randomly stumbled upon a solution (that I don't understand).

The custom LOLA database I built initially contains three collections: JASPAR, Taipale and UniProbe (see here: http://www.uwencode.org/proj/CATO/).

Running runLOLA() with the whole database gives me the error above.
However, rather unexpectedly, if I create three databases with one collection each, runLOLA() works flawlessly!

Any idea why that would happen? Maybe it has to do with the length of the object? Each of the collection has several hundreds files.

from lola.

enricoferrero avatar enricoferrero commented on July 24, 2024

Side question:

is:

sheffield_dnase <- unlist(lolaCore$regionGRL[which(lolaCore$regionAnno$collection == "sheffield_dnase")])
userUniverse <- disjoin(c(sheffield_dnase, unlist(userSets)))
res <- runLOLA(userSets, userUniverse, regionDB, cores=detectCores(), redefineUserSets=FALSE)

equivalent to:

sheffield_dnase <- unlist(lolaCore$regionGRL[which(lolaCore$regionAnno$collection == "sheffield_dnase")])
userUniverse <- disjoin(sheffield_dnase)
res <- runLOLA(userSets, userUniverse, regionDB, cores=detectCores(), redefineUserSets=TRUE)

Why/Why not?

The first approach (i.e.: include your own regions in the universe) makes more sense to me but maybe I misunderstood what redefineUserSets=TRUE does?

from lola.

nsheff avatar nsheff commented on July 24, 2024

do you mind putting the database somewhere where I can download it? Or email it to me (contact info)

from lola.

nsheff avatar nsheff commented on July 24, 2024

And for your side question: No, these are very different.

Option 1: just puts your user sets into the universe (and then disjoins them so there are no overlaps, separating regions out).
Option 2: your user sets will not be in the universe. Then you redefine the usersets; any that were not in the universe would no longer be in your user sets.

Try combining your options. Put your user sets in the universe, and then redefine the userSets.

you can check out ?redefineUserSets
This function will take the user sets, overlap with the universe, and redefine the user sets as the set of regions in the user universe that overlap at least one region in user sets. this makes for a more appropriate statistical enrichment comparison, as the user sets are actually exactly the same regions found in the universe. otherwise, you can get some weird artifacts from the many-to-many relationship between user set regions and universe regions.

from lola.

enricoferrero avatar enricoferrero commented on July 24, 2024

Sorry I can't as it also contain some unpublished data that I don't own.
So, would you recommend always using redefineUserSets=TRUE then?
Also, can you please confirm that including userSets in userUniverse is a sensible thing to do?

from lola.

nsheff avatar nsheff commented on July 24, 2024

Sorry I can't as it also contain some unpublished data that I don't own.

Just a reproducible minimal example would be fine, if you can concoct one.

So, would you recommend always using redefineUserSets=TRUE then?

Ideally you don't need to redefine user sets because you have already defined them in the same way that you defined your universe. If you defined them in almost the same way, you probably don't have to worry about it. If they're very different, then I would at least see how things change when you redefine them. So, it's like a helper function in case you are constructing a universe on the fly. So the short answer is: it probably doesn't hurt, but it may not be necessary and it adds compute time. Look also at ?buildRestrictedUniverse.

Also, can you please confirm that including userSets in userUniverse is a sensible thing to do?

Well, any userSets that are not in the universe must be ignored... because sort of by definition, a userSet must be in the universe (that's what the universe is: the regions that could have been included in userSets). Again, ideally, your universe is defined in a way that already includes all your userSets. If not, you can sort of hack it by just adding the userSets into the universe (as you do above). I think this is a reasonable thing to do becomes sometimes it's difficult to come up with the exact appropriate universe for your test. But if there are many userSets not in the universe you want to use, you're probably using the wrong universe, and just adding the userSets may not be what you want. So, in some cases it's sensible, but you should think a bit about the universe you are using and what it means for your contingency table.

from lola.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.