Giter Club home page Giter Club logo

gopca's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gopca's Issues

Do not disable global filter completely when no ontology file is provided

To support running GO-PCA with custom gene sets ("GS-PCA"), GO-PCA can be given only a GO annotation file (-a parameter) and no ontology file (-t parameter). In this case, GO-PCA's "global filter" (filtering against related gene sets found to be enriched among previously analyzed principal components) cannot rely on the GO hierarchy (the custom gene sets do not represent GO terms).

Currently, when no ontology file is provided, the "global filter" is automatically disabled. However, even if there is no known relationship between the custom gene sets provided, we can at least prevent the same gene set from generating multiple signatures. This should be the default behavior, and the current behavior should only occur when the global filter is explicitly disabled by the user.

Erron in importing genometools

Hi,
I have installed genometools using pip. No error messages were generated during the installation. However, I'm not able to import the module. I always get the following error:
Python 2.7.10 (default, Oct 11 2015, 15:42:07)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import genometools
Traceback (most recent call last):
File "", line 1, in
File "/home/epicardi/bin/python27/lib/python2.7/site-packages/genometools/init.py", line 24, in
version = pkg_resources.require('genometools')[0].version
File "/home/epicardi/bin/python27/lib/python2.7/site-packages/distribute-0.6.10-py2.7.egg/pkg_resources.py", line 648, in require
needed = self.resolve(parse_requirements(requirements))
File "/home/epicardi/bin/python27/lib/python2.7/site-packages/distribute-0.6.10-py2.7.egg/pkg_resources.py", line 546, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: genometools

As you can see, I'm using python 2.7.10 running on Linux CentOS.
Any help is very welcome.
Best,
Ernesto

AssertionError on isinstance(gene_set_db, GeneSetDB)

I'm having trouble running the program: it fails with the following exception even on test datasets.

$ uname -a
Linux myserver 3.19.0-47-generic #53~14.04.1-Ubuntu SMP Mon Jan 18 16:09:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ awk -F'\t' '{print$5}' GO_gene_sets_fly_ensembl83_goa54_ontology2016-01-18.tsv | tr , '\n' | sort | uniq | awk -v OFS='\t' 'BEGIN{print"","A","B","C","D","E"}{printf$1;for(i=1;i<6;i++)printf"\t%d",rand()*100;printf"\n"}' > expression_test.ts
$ head -n5 expression_test.tsv 
    A   B   C   D   E
128up   23  29  84  15  58
14-3-3epsilon   19  81  17  48  15
14-3-3zeta  36  49  91  26  89
18w 22  63  57  33  10

$ go-pca.py -e expression_test.tsv -o test -s GO_gene_sets_fly_ensembl83_goa54_ontology2016-01-18.tsv
[2016-08-04 22:09:49] INFO: Expression matrix size: (p = 6225 genes) x (n = 5 samples).
Traceback (most recent call last):
  File "/usr/local/bin/go-pca.py", line 9, in <module>
    load_entry_point('gopca==1.2.3', 'console_scripts', 'go-pca.py')()
  File "/usr/local/lib/python2.7/dist-packages/gopca/main.py", line 325, in main
    M = GOPCA(config, gene_sets, gene_ontology)
  File "/usr/local/lib/python2.7/dist-packages/gopca/go_pca.py", line 85, in __init__
    assert isinstance(gene_set_db, GeneSetDB)
AssertionError

selected pca signature matrix

Dear Florian,
I am a developmental and cellular biologist and I am applying your method to the analysis of microarray data of embryonic samples and stem cells-derived in-vitro generated cells.
So far, I have been analyzing microarray data using classic R packages (limma...) and functions. In particular, PCA analysis groups nicely my samples according to origins (PC1: in vivo vs in vitro), and positional identity (PC2 and PC3). Applying your method (Python API) I am able to appreciate the gene sets enriched in those components, but I am not able to performed this analysis on selected components. For instance, I would like to focus on the positional identity of my sample, thus to obtain a signature matrix for the second and the third components alone, omitting the first one.
How can I modify your method to accomplish this results?
Moreover, I used prcomp (R) method to compute the PCA in R. The variance explained by the different components is different from what I obtain with your method with the same dataset. To what is it due this difference? Is it possible to plot a classic PCA-plot to assess whether with your method I obtain the same grouping I observe with R?

Thanks in advance for your help and thank you for your work.
Best wishes,
Marco

extracting signatures and Plots

Hi Florian,
I tried your tool on my dataset. It worked correctly generating the binary output file. Then I applied your accessory script gopca_extract_signature_matrix.py to get all signatures in textual format. However, I get the following error:
macbook-pro-15-di-ernesto-2:SLAnew ernesto$ gopca_extract_signature_matrix.py -g gopca.pickle -o out
Traceback (most recent call last):
File "/usr/local/bin/gopca_extract_signature_matrix.py", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/gopca/cli/extract_signature_matrix.py", line 92, in main
E = ExpMatrix(genes=sig_labels, samples=samples, X=result.S)
File "/usr/local/lib/python2.7/site-packages/pandas/core/generic.py", line 2744, in getattr
return object.getattribute(self, name)
AttributeError: 'GOPCASignatureMatrix' object has no attribute 'S'

In addition, in the main paper you indicate Plot scripts gopca_plot_*.py but I'm unable to find them in the package.

Many thanks in advance,
Ernesto

GO-PCA demo no longer works. Dead dropbox links

Tried running the GO-PCA: dmap demo found here:

https://nbviewer.jupyter.org/github/flo-compbio/gopca-demos/blob/master/dmap/01%20-%20Demo.ipynb

The demo contains two links to dropbox:

go_gene_sets_url = 'https://www.dropbox.com/s/s9osj0lfnoonjtt/GO_gene_sets_human_ensembl83_goa153_ontology2016-01-18.tsv?dl=1'

gene_expression_url = 'https://www.dropbox.com/s/obn0imd623yul7i/dmap_expression_mapped.tsv?dl=1'

Both of these links result in a 404 error, I think they're expired.

Review global filtering rules

Currently, the global filter might not work exactly as intended (however, this version of the filter was used in all analyses in the PLoS One paper, so this will only be fixed in version 1.2).

get_xlmhg_stat() takes at most 4 arguments (6 given)

Hi Florian,
When I was running go-pca.py, I got the following errors:

Traceback (most recent call last):
File "go-pca.py", line 11, in
sys.exit(main())
File "/usr/local/python27/lib/python2.7/site-packages/gopca/cli/go_pca.py", line 340, in main
run = M.run()
File "/usr/local/python27/lib/python2.7/site-packages/gopca/gopca.py", line 730, in run
self.matrix, config.params, gse_analysis, W, d+1)
File "/usr/local/python27/lib/python2.7/site-packages/gopca/gopca.py", line 513, in _generate_pc_signatures
exact_pval='if_significant')
File "/usr/local/python27/lib/python2.7/site-packages/genometools/enrichment/analysis.py", line 429, in get_rank_based_enrichment
exact_pval=exact_pval, table=table)
File "/usr/local/python27/lib/python2.7/site-packages/xlmhg/test.py", line 223, in get_xlmhg_test_result
stat, cutoff = mhg_cython.get_xlmhg_stat(indices, N, K, X, L, tol)
TypeError: get_xlmhg_stat() takes at most 4 arguments (6 given)

I found in file "/usr/local/python27/lib/python2.7/site-packages/xlmhg/mhg.py", line 60, in get_xlmhg_stat, there were 4 arguments (v, X, L, tol=DEFAULT_TOL).

Do I miss something, or it is the internal error?

PCA on domains frequencies?

I have a dataset that is quite a bit different from the input expected by gopca.

It is a set of draft transcriptomes from several closely related species. For many genes I have InterPro domain annotations which in turn have associated GO-terms. The idea is to analyze the domains which have higher representation in certain clades. My ad hoc approach is to make a PCA on domain frequencies (absolute numbers are biased due to the fact that exact gene numbers are unknown): the first component differentiates very well between the expected groupings. I need a test similar to gopca to show which GO-terms are significantly overrepresented based on the loadings for individual domains.

So, instead of a gene expression table I have a list of domain frequencies. Some genes actually have more than one domain annotation which might make fulfillment of some assumptions even harder.

Do you see gopca applicable to this type of data?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.