Giter Club home page Giter Club logo

orange3-bioinformatics's Introduction

Orange3-bioinformatics

Documentation Status PyPI Conda PyPI - License

Orange Bioinformatics extends Orange, a data mining software package, with common functionality for bioinformatics. The provided functionality can be accessed as a Python library or through a visual programming interface (Orange Canvas). The latter is also suitable for non-programmers.

In Orange Canvas the analyst connects basic computational units, called widgets, into data flow analytics schemas. Two units-widgets can be connected if they share a data type. Compared to other popular tools like Taverna, Orange widgets are high-level, integrated potentially complex tasks, but are specific enough to be used independently. Even elaborate analyses rarely consist of more than ten widgets; while tasks such as clustering and enrichment analysis could be executed with up to five widgets. While building the schema each widget is independently controlled with settings, the settings do not conceptually burden the analyst.

Orange Bioinformatics provides access to publicly available data, like GEO data sets, GO and KEGG. All features can be combined with powerful visualization, network exploration and data mining techniques from the Orange data mining framework.

Installation

To install the add-on with pip use

pip install Orange3-bioinformatics

To register this add-on with Orange, but keep the code in the development directory (do not copy it to Python's site-packages directory), run

pip install -e .

Documentation / widget help can be built by running

make html htmlhelp

from the doc directory.

Usage

After the installation, the widgets from this add-on are registered with Orange. To run Orange from the terminal, use

python3 -m Orange.canvas

or

orange-canvas

The new widgets are in the toolbox bar under Bioinformatics section.

orange3-bioinformatics's People

Contributors

ajdapretnar avatar ales-erjavec avatar blazzupan avatar jakakokosar avatar janezd avatar jurezmrzlikar avatar larazupan avatar lenatr99 avatar markotoplak avatar mstrazar avatar nejcdebevec avatar pavlin-policar avatar primozgodec avatar rokgomiscek avatar vesnat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

orange3-bioinformatics's Issues

taxonomy does not work with older pythons

On my system 3.4 Python (Ubuntu 14.04) the taxonomy does not work:

Traceback (most recent call last):
  File "/home/marko/orange3-bioinformatics/orangecontrib/bioinformatics/ncbi/taxonomy/utils.py", line 359, in <module>
    main()
  File "/home/marko/orange3-bioinformatics/orangecontrib/bioinformatics/ncbi/taxonomy/utils.py", line 354, in main
    strains = test.get_all_strains("562")
  File "/home/marko/orange3-bioinformatics/orangecontrib/bioinformatics/ncbi/taxonomy/utils.py", line 112, in get_all_strains
    return self._tax.strains(tax_id)
  File "/home/marko/orange3-bioinformatics/orangecontrib/bioinformatics/ncbi/taxonomy/utils.py", line 218, in strains
    """, (tax_id, ))
sqlite3.OperationalError: near "WITH": syntax error

My sqlite3.sqlite_version is 3.8.2. The WITH statement is only supported since 3.8.3.

Add hypergeometric test to Differential Expression

Add a Hypergeometric test for binary expression data.

  • add a new score_hypergeom(a, b) method
  • add a thresholding parameter to GUI, with default value 1.0

Some experimentation will be needed on how to assign score to display in the distribution plot.

See attached code snippet for a concrete example.
scratch-hypergeom.py.zip

Workflows are outdated

Showcase workflows must be updated to use Preprocess/Batch Effect Removal widgets, etc.

Gene Sets: remember selection of Entity Sets

Expected behavior

The selection in Entity Sets should be remembered according to the selected organism. Either use context-depending setting or simply remember the selection for a specific organism. For the new organism, select the first item (do not leave selection blank).

Actual behavior

Selection is blank for every new data set.

KEGG Pathways: use gene id information, simplify the interface

The widget should use gene id and organism information from the input data set, and no longer require a user to provide any of these information:

  • Remove "Organism" and "Gene Names" pane
  • Issue an error if the data on gene ids or organism is missing ("Missing Gene ID information in the input data.", "Missing organism information in the input data", "Missing annotation on gene IDs and organism in the input data.").
  • Include gene id/organism info in the output data sets.

Docs: widgets documentation is out outdated

Since legacy orange-bio add-on, a lot has changed. For example, #60 introduced GUI changes for some of the widgets.

Some widgets are not supported anymore and others were completely re-designed, e.g., #47

Documentation needs to be updated:

  • Databases Update
  • dictyExpress
  • Differential Expression (new scoring method, #29)
  • Gene Info -> is now Gene name matcher, #47
  • GO Browser (organism and gene selection was removed)
  • KEGG Pathways (organism and gene selection was removed)

Documentation missing:

  • Gene Sets
  • Set Enrichemnt (#67)

Link to the documentation
Read the Docs -> #70

GEO Data Sets

GEO Data Sets should attach information on the name of the gene_id_attribute ("Entrez ID"). Currently, this annotation is missing.

Volcano Plot "negative values in input"

Volcano Plot
Orange version

Orange version 3.4 to 3.9

Expected behavior

Plot Volcano Plot

Actual behavior

Using the caffeine data set from the volcano plot example... error reads "negative values in input". This happens no matter what using any data set from GEO. Or if I provide my own. Even if I get read of all negatives in dataset.

Steps to reproduce the behavior

Run the example Geo Data set from example documentation.

Gene Sets: links to GO, Cytobands, and Reactome pages are broken

When clicking on the term in, say, a label from Molecular Function, I get 404 Not Found warning. An example is a link to molecular_function term, which is linked to

http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0003674

the page that does not exist.

None of the Cytobands and Reactome terms work, but they seem not to have a link. E.g., 1q23.1 is printed in blue, but the link is empty (nothing happens).

On the other hand, KEGG links seem to work.

Gene Set Enrichment: custom gene sets handling

  • Rename "Custom Gene Sets" group label to "Custom Gene Set Term Column"
  • Currently, the widget appropriately deals with terms that are defined in string variables, but displays a number instead of the value name for categorical variable. Please change the widget to appropriately show the term column when this of the categorical type as well.
  • When the widget receives custom gene sets where organism does not match, it correctly issues the warning. The "Custom sets" are (correctly) not displayed In Gene Set Categories. But all other categories should that are available are hidden as well. Instead, they should be shown and only "Custom sets" should be hidden (not displayed).
  • Instead of "Custom sets" use the name of the data set, when available.
  • The widget should remember which variable was used for the term column. The setting should be content-dependent, based on the constitution of the domain of the custom data set. Currently, the variable is reset upon any change in any input.
  • Currently, any categorical and text variables qualify for term column. This wrongly includes a column that lists genes. Columns where there are too many unique values should be discarded. Thinking back, it would be best if only categorical variables can be used, but prior to this change make sure that the output of Gene Markers complies to this.

GO Browser: sorting by p-value

Expected behavior

Sorting of the list of GO terms should be by p-value in order from lowest p-value to the highest.

Actual behavior

The list of GO terms is sorted, but in the opposite order, from high to low p-values.

Gene Info defaults

Sort organisms in Gene Info alphabetically. Now they are not sorted and it is hard to find the right organism. Also, before we implement a functionality that would guess the "right" organism, make Homo sapiens a default organism.

Replace "Use attribute names" in the GUI with "Use column names".

dictyExpress: data set name

Expected behavior

Data on the output of dictyExpress should be named. Use "%s (%s)" % (project, experiment)

Actual behavior

Data sets are at present not named.

Gene Sets: enrichment

In its current implementation Gene Sets widgets allows filtering of input data sets according to a set of genes from some data base, like GO or KEGG. Gene Sets would report on how many genes from the input data set are present in some gene set (GO term, KEGG pathway). We could use this same widget on some gene selection, and turn it into enrichment analysis. This was already implemented in the previous version of bioinformatics add-on, and the idea here is to reintroduce enrichment-based analytics in the current version. For a reference, here is how "old" Gene Set Enrichment widget looked like:

gene-set-enrichment

  • from the current widget, keep Category and Term column
  • keep "Matched" but rename it to "Count", and include "Reference" column (number of genes in the term from the reference data set). Include p-value and FDR (reduce display precision to two decimals). Include also enrichment, but display it as number (single decimal only).
  • copy the filtering interface from the old widget, rename "Entities" to "Count",
  • default values for filtering are Count 5, p-value 0.0001 and FDR 0.01, only Count and FDR are switched on,
  • include the Reference box, but instead of "All entitites" write "Entire genome" and instead of "Reference set (input)" write "Reference gene set (input)"
  • minor changes: Entity Sets -> Gene Set Databases, Input info -> Info

GO Browser: use gene id information, simplify the interface

The widget should use gene id and organism information from the input data set, and no longer require a user to provide any of these information:

  • Remove "Organism" and "Gene Names" pane
  • Issue an error if the data on gene ids or organism is missing ("Missing Gene ID information in the input data.", "Missing organism information in the input data", "Missing annotation on gene IDs and organism in the input data.").
  • Include gene id/organism info in the output data sets.

GO Browser crashes when computer is not online

Bioinformatics version

3.2.0

Expected behavior

GO Browser should work when computer is offline. It should use the local and previously loaded information files, or, if these are not available for given organism, the widget should issue an error.

Actual behavior

The widget crashes when computer is offline.

Additional info (worksheets, data, screenshots, ...)
Exception: | requests.exceptions.ConnectionError: HTTPConnectionPool(host='orange.biolab.si', port=80): Max retries exceeded with url: /serverfiles-bio2/__INFO__ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1881d3860>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
-- | --
Module: | requests.adapters:513
Widget Name: | GO Browser
Widget Module: | orangecontrib.bioinformatics.widgets.OWGOBrowser:326
Widget Scheme: | /var/folders/xs/sz7rs6w902g8gvtvhyt68w640000gn/T/ows-xlkov6mh.ows.xml
Version: | 3.16.0
Environment: | Python 3.6.1 on Darwin 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64
Installed Packages: | AnyQt==0.0.8, Bottleneck==1.2.0, CommonMark==0.7.3, Genesis-PyAPI==1.2.1, Orange3-Bioinformatics==3.2.0, Orange3-SingleCell==0.8.1, Orange3==3.16.0, PyQt5==5.9, astropy==3.0.4, certifi==2018.8.24, chardet==3.0.4, cycler==0.10.0, decorator==4.3.0, dill==0.2.6, docutils==0.13.1, fastdtw==0.3.2, future==0.16.0, h5py==2.8.0, idna==2.7, joblib==0.11, keyring==10.3.1, keyrings.alt==2.2, kiwisolver==1.0.1, loompy==2.0.12, matplotlib==3.0.0, networkx==2.1, numpy==1.12.1, pandas==0.23.4, pip==9.0.1, pyparsing==2.2.1, pyqtgraph==0.10.0, python-dateutil==2.7.3, python-louvain==0.11, pytz==2018.5, requests-cache==0.4.13, requests==2.19.1, scikit-learn==0.18.2, scipy==0.19.1, serverfiles==0.2.1, setuptools==40.4.1, sip==4.19.3, six==1.10.0, slumber==0.7.1, typing==3.6.6, urllib3==1.23, wheel==0.31.1, xlrd==1.0.0
Machine ID: | 154505275726980
Stack Trace: | Traceback (most recent call last):  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 171, in _new_conn    (self._dns_host, self.port), self.timeout, **extra_kw)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/util/connection.py", line 56, in create_connection    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 743, in getaddrinfo    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):socket.gaierror: [Errno 8] nodename nor servname provided, or not knownDuring handling of the above exception, another exception occurred:Traceback (most recent call last):  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen    chunked=chunked)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request    conn.request(method, url, **httplib_request_kw)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request    self._send_request(method, url, body, headers, encode_chunked)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request    self.endheaders(body, encode_chunked=encode_chunked)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders    self._send_output(message_body, encode_chunked=encode_chunked)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output    self.send(msg)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send    self.connect()  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 196, in connect    conn = self._new_conn()  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 180, in _new_conn    self, "Failed to establish a new connection: %s" % e)urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x1881d3860>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not knownDuring handling of the above exception, another exception occurred:Traceback (most recent call last):  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 445, in send    timeout=timeout  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 667, in urlopen    **response_kw)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 667, in urlopen    **response_kw)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen    _stacktrace=sys.exc_info()[2])  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/util/retry.py", line 398, in increment    raise MaxRetryError(_pool, url, error or ResponseError(cause))urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='orange.biolab.si', port=80): Max retries exceeded with url: /serverfiles-bio2/__INFO__ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1881d3860>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))During handling of the above exception, another exception occurred:Traceback (most recent call last):  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Orange/canvas/scheme/widgetsscheme.py", line 573, in create_widget_instance    widget.__init__()  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/orangecontrib/bioinformatics/widgets/OWGOBrowser.py", line 326, in __init__    for _, annotation_file in set(serverfiles.ServerFiles().listfiles(DOMAIN) + serverfiles.listfiles(DOMAIN))  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/serverfiles/__init__.py", line 188, in listfiles    self._download_server_info()  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/serverfiles/__init__.py", line 179, in _download_server_info    t = self._open("__INFO__")  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/serverfiles/__init__.py", line 294, in _open    return self._server_request(self.server, *args)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/serverfiles/__init__.py", line 291, in _server_request    return self.req.get(root+"/".join(path), auth=auth, verify=False, timeout=TIMEOUT, stream=True)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 525, in get    return self.request('GET', url, **kwargs)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 512, in request    resp = self.send(prep, **send_kwargs)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 622, in send    r = adapter.send(request, **kwargs)  File "/Applications/scOrange.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 513, in send    raise ConnectionError(e, request=request)requests.exceptions.ConnectionError: HTTPConnectionPool(host='orange.biolab.si', port=80): Max retries exceeded with url: /serverfiles-bio2/__INFO__ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1881d3860>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
Local Variables: | OrderedDict([('cert', None),             ('chunked', False),             ('conn',              <urllib3.connectionpool.HTTPConnectionPool object at 0x1881c8320>),             ('proxies', OrderedDict()),             ('request', <PreparedRequest [GET]>),             ('self', <requests.adapters.HTTPAdapter object at 0x1881c8390>),             ('stream', True),             ('timeout', <urllib3.util.timeout.Timeout object at 0x1881c81d0>),             ('url', '/serverfiles-bio2/__INFO__'),             ('verify', False)])

Databases Update: freezes on Download all

Bioinformatics version

3.2.0

Orange version

3.17.dev

Expected behavior

Database Updates downloads selected dataset.

Actual behavior

Database Updates downloads all cached datasets.

Steps to reproduce the behavior

Database Updates. Deselect all the pre-selected datasets. Then select only one (say, the smallest). Upon pressing Download all, old datasets get selected and the download begins. I think intuitively (and judging by documentation), only the selected datasets should be downloaded.

Additional info (worksheets, data, screenshots, ...)

Download is very slow, perhaps since all datasets get downloaded.
Also, I cannot use Cancel button - it doesn't work.

Differential expression does not output gene scores

Expected behavior

The DE widget should output a table with a score for each gene. Score is dependent on
the currently selected DE scoring method.

Alternatively (perhaps using a checkbox), all scores can be appended as columns in the table.

Actual behavior

The corresponding output signal always remains empty.

GeneSets: use gene id information, simplify the interface

The widget should use gene id and organism information from the input data set, and no longer require a user to provide any of these information:

  • Remove "Organism" and "Gene Names" pane
  • Issue an error if the data on gene ids or organism is missing ("Missing Gene ID information in the input data.", "Missing organism information in the input data", "Missing annotation on gene IDs and organism in the input data.").
  • Include gene id/organism info in the output data sets.

dictyExpress: include gene id and info in the output

dictyExpress should output the data that includes gene id and data attributes that provide info on where this information is found (gene_id_attribute). This probably requires gene name - entrez matching just after loading of the data.

GO Browser: reference data functionality fails

Expected behavior

GO Browser is supposed to accept reference data, with the reference set of genes that would replace the "Entire genome" when computing the enrichment.

Actual behavior

When provided the reference data set, the widget issues a warning: 'list' object has no attribute 'intersection'. It seems that this functionality was not ported from Python 2.

partial bioinformatics add-on installation

Bioinformatics version

3.2

Orange version

3.16

Expected behavior

Hi,
After installation on bioinformatics add-on through GUI interface I've obtained only nine widgets Databases update, GEO, Dicty, Gene Name, Differential Expression, GO brow, KEGG path, Gen Set enrich and cluster analysis)
I uninstall it and use orange commande prompt. It ask for cython installation. I've done it and reinstall through commad prompt without any errors.

However, the problem is still there...

Thanks for your Help

Guillaume.

KEGG Pathways: Nothing can be selected

Bioinformatics version

3.2.0

Orange version

3.17.dev

Expected behavior

When one clicks on a pathway, the widget should output selected data.

Actual behavior

Widget outputs nothing, despite constant attempts. Nodes in the pathway seem clickable, but there is nothing on the output (or at least some instances with no features).

Steps to reproduce the behavior

GEO Data Sets (GDS2914) - KEGG Pathways (AGE-RAGE pathway) - Data Table. Data Table is empty, regardless of what one clicks.

Additional info (worksheets, data, screenshots, ...)

screen shot 2018-10-02 at 15 03 02

Gene Sets: custom gene sets

Provide support for additional input file that provides for custom-defined genes and gene sets:

  • call the input "Custom Gene Sets"
  • add a block in the control area of the widget called "Custom Gene Sets" containing "Set Label Column:" label and a pull down menu to select meta or class attribute of the type string or categorical that defines gene set label.
  • Custom Gene Sets is enabled only if the data set is provided on Custom Gene Sets input
  • test this functionality with the input from Marker Genes widget

Gene Name Matcher: revision

Here is a request for a revision of user interface used by Gene Name Matcher widget that would better expose gene names and allow selection of the subset of input data.

screen shot 2018-07-18 at 10 48 36

  • "Stored in data column" is disabled if no suitable data column exists (categorical or text feature)
  • In Stored in data column drop down include only features with enough diversity of names (say, take a sample of 1000 data instances, the most frequent item should not be repeated more than 10 times)
  • Exclude unmatched genes is disabled if all gene IDs are matched
  • Replace feature IDs constructs a table with feature names that correspond to gene (symbol) names. If gene is not matched and we do not exclude unmatched gene, the feature name is copied from the input
  • Include gene description includes a column, or feature attribute with name "Description"
  • Entrez IDs are included in column/feature attribute called "Entrez ID"
  • Include all other available information includes "Type" (type of gene, text), "Synonyms" (a text field with synonims), "UniProt" (ID, text), "dictyBase" (ID, text, if applicable)

Widget has one output (see Data Table for similar design):

  • Data, data instances selected in the gene list table or data with selected columns (depending where the gene IDs are). If no current selection than ALL data instances/columns are on this output
  • Genes, a table with complete gene information (include all info). Genes are in rows. The table is in constitution similar to that shown in the widget, but includes ALL genes from the input. There is a another column (meta, discrete) with values Matched|Unmatched|Match Conflict. When no genes are matched, this output is None (to avoid construction of large data table where user has not set the right parameters yet).

dictyExpress: cache hint not accurate

Expected behavior

Cache hint should indicate which data files are local.

Actual behavior

Cache hint is displayed only for most recently loaded file, other hints are set to False.

Steps to reproduce the behavior

Open widget, load several files in sequence.

Sort the results in GO Browser by p-values

The default sorting of the terms in the GO Browser (before user changes anything) should be by the p-value, in both upper (hierarchical) and lower pane with results.

Gene Sets: entity set names

Expected behavior

Entity set names should be human-readable. They should start with upper case. Do not use underscores. For instance, instead of biological_process use Biological process.

This "bug" is probably not the feature of this widget but needs changes in a database system or on the server.

Differential Expression

Differential expression is broken. Test it on GDS531 and compare it to the old implementation.

Databases Update is non-responsive

Databases widget looks dead and does not show anything. Perhaps the server that serves server files is not responding, but then widget should raise a warning ("Cannot reach the files on the server.").

screen shot 2018-06-19 at 14 22 18

Gene Sets: output class and meta data

Expected behavior

Gene Sets should output selected genes along with any metadata and class information.

Actual behavior

Class information and metadata is not on the output if genes are in the columns.

Gene Info synonyms

Separate synonym in Gene Info with comma (","), rather than with current bar sign ("|"). Also, remove leading and trailing bar.

Databases Update

In case of preliminary annotation data, it would be great to load annotations from local files.

Expected behavior

Include "Load from local file ..." button in the same row as "Update all" and other buttons, but push it all the way to the right. Enable loading GO and KEGG annotations for a start. Annotation file should include all other necessary information, like data source name.

Gene Set Enrichment: output and input

Gene set enrichment widget should output the data profiled with selected genes. The data format should follow the format of input data: if the genes are in rows, so should be in the output, if in columns, the same. The output should also pass the information on genes and organism.

Currently, upon selection of gene sets in this widget, the widget does not produce any output.

I was testing the widget with brown-selected data set (File -> Gene Name Matcher -> Gene Set Enrichment). Note that the selection of, say, biosynthetic process, should output the data containing 139 genes (in rows) and all the data (all columns). Note that selection of several gene sets should produce the data with union of genes from the sets.

Test also this functionality on single cell data, where the number of columns (genes) on the output should be reduced, while the number of rows (cells) should be retained.

Note: keep meta and class data, if in input, on the output as well.

Minor:

  • rename "Gene" input channel to "Data" (to conform to what all other widgets use)
  • "Missing Gene ID information..." -> "Missing gene ID information..." (g is in lowercase)
  • When arbitrary Orange file is on input, the widget throws correct error (missing gene ID info...), but still displays "No data on input." in the info box. Change this message to "Input data with incorrect meta data.\n Use Gene Name Matcher widget."

Gene Name Matcher: add id even when Gene ID present in input data

Expected behavior

The widget should output column (or attribute of an attribute) with the ID of the gene even if an object (column, or attribute's attribute) with the name "Gene ID" already exists. In this case, construct the name "Gene ID (1)" or "Gene ID (i)", where i is an integer and "Gene ID (i)" is the first occurrence of this name with lowest i that does not exist yet. Do this only if new ID's are different from the old ones.

Actual behavior

When "Gene ID" exists, the widget does not add gene IDs.

OWClusterAnalysis: output results in a Table;

Add another output signal to the Cluster Analysis widget, that outputs results in a tabular form.
The table should have one row per result (gene-cluster pair) and all the relevant variables and metadata.

Add another output signal, which is similar, but lists GO terms per cluster in the same way.

Example results:
owca

Example output:
table

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.