flo-compbio / gopca Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 2.0 2.26 MB

GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge

Home Page: https://gopca.readthedocs.org

License: GNU General Public License v3.0

Python 100.00%

gopca's People

Stargazers

Watchers

Forkers

indera johnwsteill milescsmith genieus

gopca's Issues

Do not disable global filter completely when no ontology file is provided

To support running GO-PCA with custom gene sets ("GS-PCA"), GO-PCA can be given only a GO annotation file (-a parameter) and no ontology file (-t parameter). In this case, GO-PCA's "global filter" (filtering against related gene sets found to be enriched among previously analyzed principal components) cannot rely on the GO hierarchy (the custom gene sets do not represent GO terms).

Currently, when no ontology file is provided, the "global filter" is automatically disabled. However, even if there is no known relationship between the custom gene sets provided, we can at least prevent the same gene set from generating multiple signatures. This should be the default behavior, and the current behavior should only occur when the global filter is explicitly disabled by the user.

Erron in importing genometools

Hi,
I have installed genometools using pip. No error messages were generated during the installation. However, I'm not able to import the module. I always get the following error:
Python 2.7.10 (default, Oct 11 2015, 15:42:07)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import genometools
Traceback (most recent call last):
File "", line 1, in
File "/home/epicardi/bin/python27/lib/python2.7/site-packages/genometools/init.py", line 24, in
version = pkg_resources.require('genometools')[0].version
File "/home/epicardi/bin/python27/lib/python2.7/site-packages/distribute-0.6.10-py2.7.egg/pkg_resources.py", line 648, in require
needed = self.resolve(parse_requirements(requirements))
File "/home/epicardi/bin/python27/lib/python2.7/site-packages/distribute-0.6.10-py2.7.egg/pkg_resources.py", line 546, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: genometools

As you can see, I'm using python 2.7.10 running on Linux CentOS.
Any help is very welcome.
Best,
Ernesto

Font size of axis labels in signature matrix is not set according to `font` parameter

Add test cases for scripts (command-line interface)

Currently, the command line API (most importantly, go-pca.py) is not covered by tests, which needs to change.

AssertionError on isinstance(gene_set_db, GeneSetDB)

I'm having trouble running the program: it fails with the following exception even on test datasets.

$ uname -a
Linux myserver 3.19.0-47-generic #53~14.04.1-Ubuntu SMP Mon Jan 18 16:09:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ awk -F'\t' '{print$5}' GO_gene_sets_fly_ensembl83_goa54_ontology2016-01-18.tsv | tr , '\n' | sort | uniq | awk -v OFS='\t' 'BEGIN{print"","A","B","C","D","E"}{printf$1;for(i=1;i<6;i++)printf"\t%d",rand()*100;printf"\n"}' > expression_test.ts
$ head -n5 expression_test.tsv 
    A   B   C   D   E
128up   23  29  84  15  58
14-3-3epsilon   19  81  17  48  15
14-3-3zeta  36  49  91  26  89
18w 22  63  57  33  10

$ go-pca.py -e expression_test.tsv -o test -s GO_gene_sets_fly_ensembl83_goa54_ontology2016-01-18.tsv
[2016-08-04 22:09:49] INFO: Expression matrix size: (p = 6225 genes) x (n = 5 samples).
Traceback (most recent call last):
  File "/usr/local/bin/go-pca.py", line 9, in <module>
    load_entry_point('gopca==1.2.3', 'console_scripts', 'go-pca.py')()
  File "/usr/local/lib/python2.7/dist-packages/gopca/main.py", line 325, in main
    M = GOPCA(config, gene_sets, gene_ontology)
  File "/usr/local/lib/python2.7/dist-packages/gopca/go_pca.py", line 85, in __init__
    assert isinstance(gene_set_db, GeneSetDB)
AssertionError

Silence warning from xlmhg during run

selected pca signature matrix

Dear Florian,
I am a developmental and cellular biologist and I am applying your method to the analysis of microarray data of embryonic samples and stem cells-derived in-vitro generated cells.
So far, I have been analyzing microarray data using classic R packages (limma...) and functions. In particular, PCA analysis groups nicely my samples according to origins (PC1: in vivo vs in vitro), and positional identity (PC2 and PC3). Applying your method (Python API) I am able to appreciate the gene sets enriched in those components, but I am not able to performed this analysis on selected components. For instance, I would like to focus on the positional identity of my sample, thus to obtain a signature matrix for the second and the third components alone, omitting the first one.
How can I modify your method to accomplish this results?
Moreover, I used prcomp (R) method to compute the PCA in R. The variance explained by the different components is different from what I obtain with your method with the same dataset. To what is it due this difference? Is it possible to plot a classic PCA-plot to assess whether with your method I obtain the same grouping I observe with R?

Thanks in advance for your help and thank you for your work.
Best wishes,
Marco

extracting signatures and Plots

Hi Florian,
I tried your tool on my dataset. It worked correctly generating the binary output file. Then I applied your accessory script gopca_extract_signature_matrix.py to get all signatures in textual format. However, I get the following error:
macbook-pro-15-di-ernesto-2:SLAnew ernesto$ gopca_extract_signature_matrix.py -g gopca.pickle -o out
Traceback (most recent call last):
File "/usr/local/bin/gopca_extract_signature_matrix.py", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/gopca/cli/extract_signature_matrix.py", line 92, in main
E = ExpMatrix(genes=sig_labels, samples=samples, X=result.S)
File "/usr/local/lib/python2.7/site-packages/pandas/core/generic.py", line 2744, in getattr
return object.getattribute(self, name)
AttributeError: 'GOPCASignatureMatrix' object has no attribute 'S'

In addition, in the main paper you indicate Plot scripts gopca_plot_*.py but I'm unable to find them in the package.

Many thanks in advance,
Ernesto

GO-PCA demo no longer works. Dead dropbox links

Tried running the GO-PCA: dmap demo found here:

https://nbviewer.jupyter.org/github/flo-compbio/gopca-demos/blob/master/dmap/01%20-%20Demo.ipynb

The demo contains two links to dropbox:

go_gene_sets_url = 'https://www.dropbox.com/s/s9osj0lfnoonjtt/GO_gene_sets_human_ensembl83_goa153_ontology2016-01-18.tsv?dl=1'

gene_expression_url = 'https://www.dropbox.com/s/obn0imd623yul7i/dmap_expression_mapped.tsv?dl=1'

Both of these links result in a 404 error, I think they're expired.

Expand documentation on how to use `gopca_extract_go_gene_sets.py`

Review global filtering rules

Currently, the global filter might not work exactly as intended (however, this version of the filter was used in all analyses in the PLoS One paper, so this will only be fixed in version 1.2).

GOPCA.run() should take both an expression matrix and a set of genes

Currently, GOPCA is intialized with a GOPCAConfig object and a set of genes, but it would make more sense to consider the gene sets part of the input and therefore give them to GOPCA.run().

get_xlmhg_stat() takes at most 4 arguments (6 given)

Hi Florian,
When I was running go-pca.py, I got the following errors:

Traceback (most recent call last):
File "go-pca.py", line 11, in
sys.exit(main())
File "/usr/local/python27/lib/python2.7/site-packages/gopca/cli/go_pca.py", line 340, in main
run = M.run()
File "/usr/local/python27/lib/python2.7/site-packages/gopca/gopca.py", line 730, in run
self.matrix, config.params, gse_analysis, W, d+1)
File "/usr/local/python27/lib/python2.7/site-packages/gopca/gopca.py", line 513, in _generate_pc_signatures
exact_pval='if_significant')
File "/usr/local/python27/lib/python2.7/site-packages/genometools/enrichment/analysis.py", line 429, in get_rank_based_enrichment
exact_pval=exact_pval, table=table)
File "/usr/local/python27/lib/python2.7/site-packages/xlmhg/test.py", line 223, in get_xlmhg_test_result
stat, cutoff = mhg_cython.get_xlmhg_stat(indices, N, K, X, L, tol)
TypeError: get_xlmhg_stat() takes at most 4 arguments (6 given)

I found in file "/usr/local/python27/lib/python2.7/site-packages/xlmhg/mhg.py", line 60, in get_xlmhg_stat, there were 4 arguments (v, X, L, tol=DEFAULT_TOL).

Do I miss something, or it is the internal error?

PCA on domains frequencies?

I have a dataset that is quite a bit different from the input expected by gopca.

It is a set of draft transcriptomes from several closely related species. For many genes I have InterPro domain annotations which in turn have associated GO-terms. The idea is to analyze the domains which have higher representation in certain clades. My ad hoc approach is to make a PCA on domain frequencies (absolute numbers are biased due to the fact that exact gene numbers are unknown): the first component differentiates very well between the expected groupings. I need a test similar to gopca to show which GO-terms are significantly overrepresented based on the loadings for individual domains.

So, instead of a gene expression table I have a list of domain frequencies. Some genes actually have more than one domain annotation which might make fulfillment of some assumptions even harder.

Do you see gopca applicable to this type of data?

flo-compbio / gopca Goto Github PK

gopca's People

Stargazers

Watchers

Forkers

gopca's Issues

Hi Florian, When I was running go-pca.py, I got the following errors:

Recommend Projects

Recommend Topics

Recommend Org

Hi Florian,
When I was running go-pca.py, I got the following errors: