Giter Club home page Giter Club logo

horsegeneannotation's People

Contributors

beeso018 avatar hugh0335 avatar micke001 avatar mollymccue-dvm-phd avatar schae234 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

horsegeneannotation's Issues

Create a mock-up for website interface

After outlines for Issues #6 #7 and #8 are a little more fleshed out, we need a mock-up for a web interface so that we can accomplish and serve these tasks to a user. We need to identify what information needs to be pulled from NCBI.

A page containing the reference genome files can be found here

Objectives

  • Break the use cases into small enough chunks to be designed
  • Identify what data needs to be pulled/queried from NCBI
  • Create a mock-up showing a rudimentary website layout

Set up Jekyll

Set up jekyll to build a static website from our github repo.

Set up Django

I don't have a lot of experience with Django, but I'm willing to give it a try. I have more experience with Flask so some of the issues wont be totally fleshed out as they will be based in terms of how Flask works. Im assuming the the API is path based, but I could be waaaay off.

Some of these tasks also depend on how Django is going to interact with LocusPocus. See LocusPocus issues for more about what is actually being served by Django

Overview

  • Add django deps to dockerfile
  • Set up a Django instance
  • Research how the Django API works and how content is served

Outline steps needed for BLAST

This issue references the BLAST content in the NCBI BLAST tutorial.

Objectives

  • write instructions on "Using BLAST"
    • Provide motivation for why blast is useful
    • Provide instructions on how to access BLAST
    • Provide background on default BLAST parameters and algorithm
  • BLAST Tutorial
    • Provide example sequence
    • Document what to write down and what to take away from this

Backend Code (Python) Roadmap

Roadmap

This project section contains the vision and overview on how the backend software will be organized. Project ideas and objectives will be outlined and discussed here in conjunctions with use cases defined in the Tuorials and content section (#3).

As tasks are defined, new issues can be created specific to each objective and implemented individually.

When opening a Pull Request related to a task, please tag it in the commit message so we can track task progress.

Feel free to ask questions and open discussions for issues related to the backend code here:

Overview:

This portion of the project (backend) will be implemented using this stack:

Set up Docker (Issue #10)

The entirety of the project will be packaged as a Docker Container so it can be deployed in collaboration with Cyverse.

Set up Django (Issue #11)

I don't have a lot of experience with Django, but I'm willing to give it a try. I have more experience with Flask so some of the issues wont be totally fleshed out as they will be based in terms of how Flask works.

Tutorials, content and documentation roadmap

Roadmap

This section contains an overview of items/issues related to the tutorials, documentation and any teachable content in this project. The discussion and issues described here are software agnostic. No coding experience or knowledge necessary! Project ideas and objectives will be described and planned out here. Objectives defined here will be shared with the software project sections (backend and frontend) who will implement the use cases (i.e. project objectives).

As issues are fleshed out, cross reference them here.

Use Cases:

Lets start with a few high level use cases!

NCBI Based tutorial (Issue #6)

  • Create a NCBI tutorial/document that performs manual gene annotation without a website

This objective will help us flesh out what steps need to be taken to perform manual gene annotation without a fancy new website.

  • Outline the steps that need to be taken for Exon identification

BLAST tutorial (Issue #7)

  • Create a NCBI Blast based tutorial on how to compare sequence between human and horse

This objective will explain how to run and identify a BLAST search to compare gene sequence between human and horse

Multiple Sequence Alignment tutorial (Issue #8)

  • Create a Multiple Sequence alignment (MSA) tutorial to compare gene sequence between horse and related organisms.

This objective will explain how to run a MSA search to compare gene sequence

Mock up (Issue #9)

  • Create a mock-up of how these use cases could be covered by a website

Look into WikiGenomes as a "web based gene annotation program to help hack the horse genome"

I see that your "web based gene annotation program to help hack the horse genome" is currently listed as "coming soon!", so I would like to suggest that you take a look at WikiGenomes.

It is a tool for genome annotation that combines a "web based gene annotation program" (aka wikigenomes_base) with Wikidata as the backend (including editing via OAuth). Sample entry: Listeria monocytogenes EGD-e (NCBI TaxID: 169963).

As far as I can tell, it has so far only been used for microbial genes, so some development work would still be needed to handle vertebrate genomes, but many of the pieces you need are there already.

One fork of it โ€” Chlambase โ€” has also been customized further for genomes from the genus Chlamydia. Sample entry: Chlamydia trachomatis D/UW-3/CX (NCBI TaxID: 272561).

All of this is overseen by the Gene Wiki team, who can be reached both on GitHub and on Wikidata and who maintain a list of SPARQL queries that allow to explore the corpus of genetic and related information that is available through Wikidata, which includes the human, mouse and rat genomes.

Frontend Code (Javascript) Roadmap

Roadmap

This project section is an overview of the website front-end for this project. Please discuss high level ideas and issues here, then form a separate issue and assign it to the "Code Frontend" project so we can track progress/changes.

Overview:

The front-end website for the project will be implemented using this stack:

  • Javascript
  • Jekyll

Set up Jekyll (Issue #13)

Set up Jekyll so that it will create a website from the markdown docs in our repo.

Set up the Javascript to pull from the backend

Javascript will pull gene annotations from the Django backend (Issue #11). More details are needed to determine how information will be sent back and forth.

Convert Markdown to be species independent

Right now the website is horse specific, but there are many species that would benefit from manual gene annotation. Make the wesbite species agnostic and code the project name and the species name as Jekyll variables.

Jekyll will take variables from _config.yml and convert them into the appropriate text when it translates markdown to HTML. An example of this is currently in contributing.md where the tag, {{site.title}}, is converted to the appropriate name in the HTML text (rendered by Jekyll).

  • Convert all instances of HorseGeneAnnotation to the {{site.title}} tag.

Containerize the workflow

One of our biggest priorities should be to containerize this workflow/pipeline so that it can be deployed anywhere and we can track software deps.

Set up Docker

Using docker will allow developers to easily spawn dev instances to test web server functions.

Beginning tasks

  • Create a dockerfile with base requirements
  • Build internal databases with docker build
  • Allow deployment of the web server with docker run
    • This will start a web server listening on some port on the host
    • Traffic can be routed to the container using a virtualhost or by visiting the port directly (i.e. localhost:50000
  • Upload production images to dockerhub

Outline steps needed for exon identification

This issue references the content in the NCBI exon tutorial.

Outline objectives

  • write instructions on "Finding exons with NCBI"
    • Accessing NCBI
    • Pull up gene table view
  • Outline exon checklist:
    • Compare Amino Acid length
    • Compare number of exons
    • Compare exon length
    • Identify presence of 5' and 3' UTR
    • Is there a start Codon
    • Do splice junctions start with GT and end with AG?
    • Is there a stop codon?
    • Are there n's in the sequence

Import GFF into LocusPocus

The definition for a GFF is here.

Essentially, this file format can be summarized as:

Fields **must be tab-separated**. Also, all but the final field in each feature line must contain a value; "empty" columns should be denoted with a '.'

    seqname - name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix. Important note: the seqname must be one used within Ensembl, i.e. a standard chromosome name or an Ensembl identifier such as a scaffold ID, without any additional content such as species or assembly. See the example GFF output below.
    source - name of the program that generated this feature, or the data source (database or project name)
    feature - feature type name, e.g. Gene, Variation, Similarity
    start - Start position of the feature, with sequence numbering starting at 1.
    end - End position of the feature, with sequence numbering starting at 1.
    score - A floating point value.
    strand - defined as + (forward) or - (reverse).
    frame - One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..
    attribute - A semicolon-separated list of tag-value pairs, providing additional information about each feature.

An example of the most up to date equine FASTA/GFF file can be found here:

ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Equus_caballus/latest_assembly_versions/GCF_002863925.1_EquCab3.0/GCF_002863925.1_EquCab3.0_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Equus_caballus/latest_assembly_versions/GCF_002863925.1_EquCab3.0/GCF_002863925.1_EquCab3.0_genomic.gff.gz

Outline steps needed for Multiple Sequence Alignment

This issue references the BLAST content in the NCBI BLAST tutorial.

MSA: multiple sequence alignment

Objectives

  • Write instructions on "Using MSA"
    • provide example sequence
    • provide background on algorithm
  • MSA tutorial
    • provide example sequences
    • Clarify what is need to be taken away from this and what should be documented

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.