Giter Club home page Giter Club logo

tribe's Introduction

Tribe

Welcome to Tribe!

Tribe is an open-source webserver that allows for easy, reproducible genomics analyses between different webservers. It allows for web-browser access via our web interface or programatic access via our API. You can visit Tribe at https://tribe.greenelab.com

Our database includes 9 of the most commonly used model organisms (human, mouse, fly, rat, worm, yeast, zebrafish, arabidopsis and pseudomonas) and 17 gene identifiers (Symbol, Ensembl, Entrez, HGNC, HPRD, MGI, MIM, SGD, UniProtKB, TAIR, WormBase, RGD, FLYBASE, ZFIN, Vega, IMGT/GENE-DB, and miRBase).

Video Tutorials:

You can check out our video tutorials on our YouTube page: https://www.youtube.com/channel/UCuR7hyPD76JyuqEHmJetUjA

Full Tribe documentation:

Tribe's full documentation is hosted at Read the Docs, and you can check it out here: http://tribe-greenelab.readthedocs.org

tribe's People

Contributors

cgreene avatar dependabot[bot] avatar dongbohu avatar ramenhog avatar rzelayafavila avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tribe's Issues

Settings not friendly

Right now two configuration files are read in when tribe/setting.py configures Django:

  • settings/secrets.ini (includes confidential info, ignored by the repo)
  • settings/<included_file> (an extra config file that is included by secrets.ini in [configfile] section, mainly for configurations that can be open to the public, such as database engine, 3rd party modules, etc)

But the sections that should be included in these two files are hard coded. If for some reason we want to move database engine parameter from <config_file> to secrets.ini, we have to modify settings.py too.

We can make the configuration scheme more user-friendly. Here is the idea:

  • Rename [configfile] section into [include].
  • Rename settings/ dir into config/.
  • When settings.py reads secrets.ini, if secrets.ini has [include] section, the options in the included file will be treated as secondary configuration and merged with options in secrets.ini. For example, if the included file is production.ini and it specifies DATABASE_PORT as 5432, but secrets.ini specifies the same parameter value as 5433, then 5433 will be the final option used in Django settings.

"BaseCommand.option_list" is deprecated

Since Django 1.8+, BaseCommand.option_list has been deprecated.
https://docs.djangoproject.com/en/1.8/howto/custom-management-commands/#django.core.management.BaseCommand.option_list

This affects the following management commands:

genesets/management/commands/genesets_update_tip_item_count_all.py
genesets/management/commands/genesets_load_kegg.py
genesets/management/commands/genesets_create_update_user.py
genesets/management/commands/genesets_load_disease.py
genesets/management/commands/genesets_add_geneset_tags.py
genesets/management/commands/genesets_load_go.py

Surprisingly, these management commands were only being used in this test file:
genesets/tests.py

Some packages need to be updated

Some warnings showed up in the last deployment:

.../tribe/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:318: 
SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication)
extension to TLS is not available on this platform. This may cause the server to present an incorrect
TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python 
to solve this. 
For more information, see https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
[tribe.greenelab.com] out:   SNIMissingWarning

and

.../tribe/local/lib/python2.7/site-packages/celery_haystack/utils.py:2: RemovedInDjango19Warning:
django.utils.importlib will be removed in Django 1.9.

We should update some related packages to fix them.

Generalized database schema

Here is a generalized DB schema that takes advantage of registries in identifiers.org. I added some comments to explain the purpose.

from django.db import models

# Registry in identifiers.org
class Registry(models.Model):
    name = models.CharField()
    prefix = models.CharField()
    description = models.CharField()
    # and other attributes of a registry ...


# Entity includes common attributes of any entity (such as gene,
# publication, disease, tissue, etc)
class Entity(models.Model):
    accession = models.CharField(null=True)  # accession in identifiers.org
    registry = models.ForeignKey(Registry, null=True)
    # and other attributes shared by all entities ...


# "Gene" is one kind of entity
class Gene(models.Model):
    entity = models.OneToOneField(
        Entity,
        on_delete=models.CASCADE,
        primary_key=True,
    )
    # specific attributes for a gene
    scientific_name = models.CharField(max_length=32)
    systematic_name = models.CharField(max_length=32)
    organism = models.ForeignKey(Organism, ...)


# "Publication" is another kind of entity
class  Publication(models.Model):
    entity = models.OneToOneField(
        Entity,
        on_delete=models.CASCADE,
        primary_key=True,
    )
    # specific attributes for a publication
    pmid = models.IntegerField(null=True, unique=True, db_index=True)
    title = models.TextField()
    authors = models.TextField()
    date = models.DateField()
    journal = models.TextField()
    volume = models.TextField(blank=True, null=True)
    pages = models.TextField(blank=True, null=True)
    issue = models.TextField(blank=True, null=True)


# "Disease" is another kind of entity
class Disease(models.Model):
    entity = models.OneToOneField(
        Entity,
        on_delete=models.CASCADE,
        primary_key=True,
    )
    # and specific attributes for a disease ...


# "Entityset" includes common attributes for any kind of entity set.
# It may include different types of entities.
class Entityset(models.Model):
    creator = models.ForeignKey(User)
    title = models.TextField()
    abstract = models.TextField(null=True)
    slug = models.SlugField(help_text="Slugified title field", max_length=75)
    public = models.BooleanField(default=False)
    deleted = models.BooleanField(default=False)
    fork_of = models.ForeignKey('self', editable=False, null=True)
    tip_item_count = models.IntegerField(null=True)


# "Geneset" is one kind of "Entityset"
class Geneset(models.Model):
    entityset = models.OneToOneField(
        Entityset,
        on_delete=models.CASCADE,
        primary_key=True,
    )
    organism = models.ForeignKey(Organism)
    # and other attributes for a geneset


# Similar models can be defined for "Publicationset" or "Diseaseset" ...


# Version of an Entityset
class Version(models.Model):
    entityset = models.ForeignKey(entityset)
    creator = models.ForeignKey(User)
    ver_hash = models.CharField(db_index=True, max_length=40)
    description = models.TextField(null=True)
    commit_date = models.DateTimeField(auto_now_add=True)
    parent = models.ForeignKey('self', null=True)


# Annotations of entities
class Annotation(models.Model):
    version = models.ForeignKey(Version)
    primary_entity = models.ForeignKey(Entity)   # entity that is being annotated
    annotator_entity = models.ForeignKey(Entity) # entity that is the annotation

Replace Bower with npm or other tools

Bower is going to be deprecated. bower install gives the following message:

npm WARN deprecated [email protected]: We don't recommend using Bower for new projects. 
Please consider Yarn and Webpack or Parcel. You can read how to migrate legacy project here:
https://bower.io/blog/2017/how-to-migrate-away-from-bower/
/usr/local/bin/bower -> /usr/local/lib/node_modules/bower/bin/bower

Elasticsearch 5.X: "snowball" analyzer deprecated?

In Elasticsearch 1.X and 2.X, snowball has been the default analyzer for string data type in django-haystack:

But this analyzer is not mentioned in Elasticsearch official documentation since 5.X:

According to:
https://stackoverflow.com/questions/41859821/why-snowball-analyser-was-removed-in-elasticsearch-5-1
it seems to be replaced by english analyzer:
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/analysis-lang-analyzer.html#english-analyzer

This affects Tribe, not probably doesn't affect Adage web server, because the latter is using a customized adage_snowball as the default analyzer for strings.

EDIT:

Although snowball is not listed as an analyzer in Elasticsearch 5.X and 6.X documentation, it seems to be still available. Confirmed by this command:

curl -XGET 'localhost:9200/dhutest/_analyze?pretty' -H 'Content-Type: application/json' -d'
{
  "analyzer": "snowball",
  "text": "foo bart"
}
'

Output:

{
  "tokens" : [
    {
      "token" : "foo",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "bart",
      "start_offset" : 4,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

This is probably why the latest django-haystack dev version still uses snowball analyzer and claims that it supports Elasticsearch 5.X.

Source code in Elasticsearch 6.5:

https://github.com/elastic/elasticsearch/blob/6.5/modules/analysis-common/src/main/java/org/elasticsearch/analysis/common/SnowballAnalyzer.java

Trivial: `/favicon.ico` needed

Due to the absence of /favicon.ico, there is a warning in the log file /var/log/supervisor/tribe-gunicorn-stderr-* (which was generated by gunicorn process):

Not Found: /favicon.ico

and /var/log/nginx/access.log:

"GET /favicon.ico HTTP/1.1" 404 261

Use Postgres built-in search functionalities instead of Elasticsearch?

Postgres has been improving full text search and trigram search a lot since version 9.6. I wonder whether it is possible to use them to replace Elasticsearch. If we can, the backend architecture (and deployment) can be greatly simplified. With the GIN or GIST indexes on search fields, we don't have to worry about the index updates (which invoke celery jobs right now).

Right now, Elasticsearch is being used to search genes and genesets. We have 312,983 genes and 408,237 genesets in Postgres backend database.

Dockerize Backend Services

Right now both a back end and a front end are required for a developer to work on this repo (even for any front-end-only issue). The installation of a local backend is not friendly to a front end developer.

To ease front end development, we can dockerize the backend services (such as web server and DB server). Adage-server repo is using docker now:
https://github.com/greenelab/adage-server

We can tailor the Dockerfiles there for this project.

Link to a specific version of a Tribe geneset via UI

I got a request:

Is there a way to share a tribe link to the specific version of a geneset.

This should be possible, but is likely to require a bit of work updating our URLs to incorporate a version. That doesn't currently appear to be included.

Deprecated front end pkg: "grunt-ngmin" and "grunt-recess"

grunt-ngmin is deprecated:

It is supposed to replaced by ng-annotate, which is also deprecated now:

ng-annotate is supposed to replaced by babel-plugin-angularjs-annotate:

grunt-recess is deprecated too:
https://www.npmjs.com/package/grunt-recess
No replacement was stated.

Front end building warnings

npm install output:

npm WARN deprecated [email protected]: use grunt-ng-annotate instead
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Deprecated as RECESS is unmaintained
npm WARN deprecated [email protected]: Use the built-in module in node 9.0.0 or newer, instead
npm WARN deprecated [email protected]: use ng-annotate instead
npm WARN deprecated [email protected]: graceful-fs v3.0.0 and before will fail on node releases >= v7.0. Please update to graceful-fs@^4.0.0 as soon as possible. Use 'npm ls graceful-fs' to find it in the tree.
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: All versions below 4.0.1 of Nodemailer are deprecated. See https://nodemailer.com/status/
npm WARN deprecated [email protected]: stop using this version
npm WARN deprecated [email protected]: This project is unmaintained
npm WARN deprecated [email protected]: If using 2.x branch, please upgrade to at least 2.1.6 to avoid a serious bug with socket data flow and an import issue introduced in 2.1.0
npm WARN deprecated [email protected]: Use uuid module instead
npm WARN deprecated [email protected]: This project is unmaintained
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: graceful-fs v3.0.0 and before will fail on node releases >= v7.0. Please update to graceful-fs@^4.0.0 as soon as possible. Use 'npm ls graceful-fs' to find it in the tree.

> [email protected] install /home/dhu/github/tribe/interface/node_modules/coffeelint
> [ -e lib/commandline.js ] || npm run compile


> [email protected] install /home/dhu/github/tribe/interface/node_modules/uws
> node-gyp rebuild > build_log.txt 2>&1 || exit 0


> [email protected] postinstall /home/dhu/github/tribe/interface/node_modules/circular-json
> echo ''; echo "\x1B[1mCircularJSON\x1B[0m is in \x1B[4mmaintenance only\x1B[0m, \x1B[1mflatted\x1B[0m is its successor."; echo ''

\x1B[1mCircularJSON\x1B[0m is in \x1B[4mmaintenance only\x1B[0m, \x1B[1mflatted\x1B[0m is its successor.

npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN [email protected] requires a peer of jasmine-core@* but none is installed. You must install peer dependencies yourself.
npm WARN [email protected] requires a peer of karma@~0.9.4 || ~0.10 but none is installed. You must install peer dependencies yourself.
npm WARN [email protected] No repository field.
npm WARN [email protected] No license field.
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules/fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})

added 570 packages from 766 contributors and audited 3005 packages in 22.861s
found 25 vulnerabilities (9 low, 6 moderate, 10 high)
  run `npm audit fix` to fix them, or `npm audit` for details

bower install warning of package incompatibility:

Unable to find a suitable version for spin.js, please choose one by typing one of the numbers below:
    1) spin.js#~2.0.0 which resolved to 2.0.2 and is required by angular-spinner#0.6.2
    2) spin.js#~2.1.0 which resolved to 2.1.2 and is required by tribe-interface

Vulnerabilities reported by github:

https://github.com/greenelab/tribe/network/dependencies#interface%252Fpackage-lock.json

Django Upgrade (1.8 to 1.11)

According to Django's official doc:
https://www.djangoproject.com/download/#supported-versions
Djanog 1.11.x seems to be a reasonable version to upgrade to for two reasons:

  • It is an LTS that will be supported until at least April 2020.
  • It is the last version that supports Python 2.7.

Once Django is upgrade to 1.11, many other Django-related packages should be upgraded too. I am going to use this issue to keep track of the upgrade info for all backend packages that depend upon Django.

Tweaks for interface deployment needed

Today, when doing a new deployment, I needed to run the following commands manually:

a) Once the initial_setup_and_check command in the fabfile created the symlink to the static folder (via the private method _make_static()), I needed to run (from inside the /tribe/tribe folder):
ln -s ../static/index.html templates/index.html to create the symlink to the index.html file for django to use.

b) Also, the deploy fabfile command asked me to pick between spin.js versions - 2.0.0 and 2.1.0. I chose 2.1.0.

However, both of these steps should not be run manually, they should be automated in the deployment process.

Replace Codeship with CircleCI or Travis?

Codeship doesn't seem to be working on Tribe. CircleCI or Travis may be better choice for automatic testing.

When removing codeship, codeship configuration in tribe/settings.py should be removed too.

Confirmation Emails of signup and password reset: content should be customized

The confirmation emails sent from Tribe for new account signup and password reset are controlled by the following two templates in allauth:

allauth/templates/account/email/password_reset_key_message.txt
allauth/templates/account/email/email_confirmation_message.txt

Some of the fields (such as {{ site_name }} and {{ site_domain }}) should be customized to replace the default such as example.com. Here is an example of email confirmation of new account signup:

From: [email protected]
Subject: Tribe:Please Confirm Your E-mail Address
========================================
Hello from example.com!

You're receiving this e-mail because user xxx has given yours as an e-mail address to connect their account.

To confirm this is correct, go to http://tribe.greenelab.com/accounts/confirm-email/xyz.../

Thank you from example.com!
tribe.dartmouth.edu

Here is an example of email confirmation of password reset:

From: [email protected]
Subject: Tribe:Password Reset E-mail
=======================================
Hello from example.com!

You're receiving this e-mail because you or someone else has requested a password for your user account.
It can be safely ignored if you did not request a password reset. Click the link below to reset your password.

http://tribe.greenelab.com/accounts/password/reset/key/blah.../

Thank you for using example.com!
tribe.dartmouth.edu

Obviously example.com ({{ site_name }}) and tribe.dartmouth.edu ({{ site_domain }}) in both emails should be customized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.