umn-eggl / horsegeneannotation Goto Github PK
View Code? Open in Web Editor NEWA crowd sourced gene annotation website for the horse
Home Page: https://umn-eggl.github.io/HorseGeneAnnotation/
License: Other
A crowd sourced gene annotation website for the horse
Home Page: https://umn-eggl.github.io/HorseGeneAnnotation/
License: Other
After outlines for Issues #6 #7 and #8 are a little more fleshed out, we need a mock-up for a web interface so that we can accomplish and serve these tasks to a user. We need to identify what information needs to be pulled from NCBI.
A page containing the reference genome files can be found here
Set up jekyll to build a static website from our github repo.
I don't have a lot of experience with Django, but I'm willing to give it a try. I have more experience with Flask so some of the issues wont be totally fleshed out as they will be based in terms of how Flask works. Im assuming the the API is path based, but I could be waaaay off.
Some of these tasks also depend on how Django is going to interact with LocusPocus. See LocusPocus issues for more about what is actually being served by Django
This issue references the BLAST content in the NCBI BLAST tutorial.
This project section contains the vision and overview on how the backend software will be organized. Project ideas and objectives will be outlined and discussed here in conjunctions with use cases defined in the Tuorials and content section (#3).
As tasks are defined, new issues can be created specific to each objective and implemented individually.
When opening a Pull Request related to a task, please tag it in the commit message so we can track task progress.
Feel free to ask questions and open discussions for issues related to the backend code here:
This portion of the project (backend) will be implemented using this stack:
The entirety of the project will be packaged as a Docker Container so it can be deployed in collaboration with Cyverse.
I don't have a lot of experience with Django, but I'm willing to give it a try. I have more experience with Flask so some of the issues wont be totally fleshed out as they will be based in terms of how Flask works.
This section contains an overview of items/issues related to the tutorials, documentation and any teachable content in this project. The discussion and issues described here are software agnostic. No coding experience or knowledge necessary! Project ideas and objectives will be described and planned out here. Objectives defined here will be shared with the software project sections (backend and frontend) who will implement the use cases (i.e. project objectives).
As issues are fleshed out, cross reference them here.
Lets start with a few high level use cases!
This objective will help us flesh out what steps need to be taken to perform manual gene annotation without a fancy new website.
This objective will explain how to run and identify a BLAST search to compare gene sequence between human and horse
This objective will explain how to run a MSA search to compare gene sequence
I see that your "web based gene annotation program to help hack the horse genome" is currently listed as "coming soon!", so I would like to suggest that you take a look at WikiGenomes.
It is a tool for genome annotation that combines a "web based gene annotation program" (aka wikigenomes_base) with Wikidata as the backend (including editing via OAuth). Sample entry: Listeria monocytogenes EGD-e (NCBI TaxID: 169963).
As far as I can tell, it has so far only been used for microbial genes, so some development work would still be needed to handle vertebrate genomes, but many of the pieces you need are there already.
One fork of it โ Chlambase โ has also been customized further for genomes from the genus Chlamydia. Sample entry: Chlamydia trachomatis D/UW-3/CX (NCBI TaxID: 272561).
All of this is overseen by the Gene Wiki team, who can be reached both on GitHub and on Wikidata and who maintain a list of SPARQL queries that allow to explore the corpus of genetic and related information that is available through Wikidata, which includes the human, mouse and rat genomes.
Think more about this? Related to #3
Jekyll ( related to #13) links are broken in the side bar.
This project section is an overview of the website front-end for this project. Please discuss high level ideas and issues here, then form a separate issue and assign it to the "Code Frontend" project so we can track progress/changes.
The front-end website for the project will be implemented using this stack:
Set up Jekyll so that it will create a website from the markdown docs in our repo.
Javascript will pull gene annotations from the Django backend (Issue #11). More details are needed to determine how information will be sent back and forth.
Make sure that all the things on this checklist are done and ready to go for the global sprint.
Right now the website is horse specific, but there are many species that would benefit from manual gene annotation. Make the wesbite species agnostic and code the project name and the species name as Jekyll variables.
Jekyll will take variables from _config.yml
and convert them into the appropriate text when it translates markdown to HTML. An example of this is currently in contributing.md
where the tag, {{site.title}}
, is converted to the appropriate name in the HTML text (rendered by Jekyll).
HorseGeneAnnotation
to the {{site.title}}
tag.The word docs in the docs/
folder should be converted to MD files so they can be remixed by others.
One of our biggest priorities should be to containerize this workflow/pipeline so that it can be deployed anywhere and we can track software deps.
Using docker will allow developers to easily spawn dev instances to test web server functions.
docker build
docker run
localhost:50000
This issue references the content in the NCBI exon tutorial.
The definition for a GFF is here.
Essentially, this file format can be summarized as:
Fields **must be tab-separated**. Also, all but the final field in each feature line must contain a value; "empty" columns should be denoted with a '.'
seqname - name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix. Important note: the seqname must be one used within Ensembl, i.e. a standard chromosome name or an Ensembl identifier such as a scaffold ID, without any additional content such as species or assembly. See the example GFF output below.
source - name of the program that generated this feature, or the data source (database or project name)
feature - feature type name, e.g. Gene, Variation, Similarity
start - Start position of the feature, with sequence numbering starting at 1.
end - End position of the feature, with sequence numbering starting at 1.
score - A floating point value.
strand - defined as + (forward) or - (reverse).
frame - One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..
attribute - A semicolon-separated list of tag-value pairs, providing additional information about each feature.
An example of the most up to date equine FASTA/GFF file can be found here:
ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Equus_caballus/latest_assembly_versions/GCF_002863925.1_EquCab3.0/GCF_002863925.1_EquCab3.0_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Equus_caballus/latest_assembly_versions/GCF_002863925.1_EquCab3.0/GCF_002863925.1_EquCab3.0_genomic.gff.gz
This issue references the BLAST content in the NCBI BLAST tutorial.
MSA: multiple sequence alignment
In addition to importing and exporting the GFF file and a FASTA file, format the names so they are formatted nicely, e.g. chr1, chr2, chr3, etc
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.