legumeinfo / arachispheno Goto Github PK
View Code? Open in Web Editor NEWThis project forked from 1001genomes/arapheno
AraPheno source code for http://arapheno.1001genomes.org
License: MIT License
This project forked from 1001genomes/arapheno
AraPheno source code for http://arapheno.1001genomes.org
License: MIT License
Split off from issue #5:
Inconsistency in the number of accessions:
This file (study) has Unique acc = 794. The page with this link
http://dev.lis.ncgr.org:50007/phenotype/5/ shows 'Geographic
distribution of 876 accessions' above the world map.
But the table view in the same page has correct number of accessions: 793.
I already changed the map label count from replicates to accessions, but then determined that the map itself shows the number of replicates for each country, not accessions. Working on that now.
On April-08-2020, a week from today, there is a meeting with the researchers whose data we are using here. They have insisted to keep the data private until it is published (in future). But we need to share with them this utility. Can we have some kind of authenticated access ready for them before the meeting?
Refer: #2
From the schema it looks like there is author, group and public access for studies. This could work for us too for this purpose. Or go the quick temporary way Andrew suggested in #2, "... use apache basic authentication (after we get it running via apache, that is)"
The number of digits should be just right.
Each phenotype in AraPheno has a DOI link that goes to DataCite, for example
https://search.datacite.org/works/10.21958/phenotype:672
as well as a "Cite" link that can generate properly formatted citations in various formats (APA, BibTeX, etc). Do our phenotypes already have a DOI link that goes to PubMed (or elsewhere), or is this something we have to set up?
Also, does our study have an associated publication?
Missing or incorrect in the study-phenotype table, the accession-phenotype table, and also the REST API.
This problem existed in AraPheno, so do a pull request once fixed.
Requires updating the database.zip, using
python manage.py generate_database_dump
to keep it current.
These replace each phenotype value v with a functional transformation, like log(v), sqr(v), sqrt(v).
Some of the resulting histograms look reasonable, like that for seed_weight. Note that the untransformed value with the highest frequency is around 150, and log(150) ~ 5 which has the highest frequency in the log-transformation histogram.
http://dev.lis.ncgr.org:50007/phenotype/20/transformation/
Others, like #1_kernel_weight, seem off. Here, note that the highest value is around 125, log(125) ~ 4.8 but the corresponding value in the log-transformation histogram is about 5.1, and sqr(125) = 15625 but the highest value in the sqr-transformation histogram is about 27000, etc.
http://dev.lis.ncgr.org:50007/phenotype/1/transformation/
On closer examination, AraPheno's default behavior for a transform f is not to use f(v) as expected, but to use
f(v - min(vv) + 0.1*var(vv))
where vv is the list of all values. This must be some kind of statistical correction (? I am still researching it). However, it is possible to tell it to not do this in ArachisPheno.
This behavior existed in AraPheno, it throws an error instead of displaying the Home page if no Studies exist. I initially had to add a spurious Study to the database by hand to get it to run.
Going to any phenotype or the study page shows:
OperationalError at /phenotype/19/
no such column: phenotypedb_submission.is_private
... ... ...
Error during template rendering
In template /home/svengato/ArachisPheno/html/base.html, error at line 0
no such column: phenotypedb_submission.is_private
... ... ...
The ArachisPheno doesn't have to be an exact replicate of the Arabidopsis portal. We will need only certain features, at least, in the beginning. This issue is to help us gradually settle on which features, UI included, we need. The features list should evolve as we work through the site and the data we have keeping in mind the flexibility factor for future needs.
Make a matrix format csv file from my summary spreadsheet of Roshan's trait data. Then, Sven can check its suitability for loading and other related issues.
Easy to do, whenever we are ready for it.
Maybe just 'pheno'?
Document issues with test loading the data in csv format.
I will run through the migration process one more time when we finalize the [accession id, accession name, replicate id] format (with underscore or whatever).
I have been holding off on this in order to implement it in the next migration event (such as adding model fields for distinguishing private data).
Among other things, this would require a categorical bar chart rather than a numerically binned histogram.
We previously added user authentication (issue #6), but would like the ability to configure a public version that does not require it.
Sudhansu requests the ability to put line breaks and URL links in the study description, so that the study detail and study list pages format them correctly.
Issue description copied from email.
0327, 10.45 a.m. sd-adf-sg
There is a related issue in my mind but I will spell it out in an issue in Arachpheno: Is it possible to have some datasets public and some datasets private in Arapheno. This becomes relevant when we serve the public minicore data. May be Sven already knows about the public vs. private data.
I doubt that it has this ability- I haven't seen any sort of login mechanism. could be wrong, though.
As far as I can tell, all of the AraPheno data are public. One of the database tables (auth_permission) lists various permission levels, but I do not think they are used for anything. I have no idea how hard it would be to add private data, as in the InterMines.
do you mean the MyMine feature of the intermines? that's a bit different I think, in that only the lists and queries one makes are private;
I don't think there's any private data per se.
I am in favour of password protection that you have suggested, we should be able to share the presentation of the data among us insiders and the collaborators.
Like an initial login page? I will look into it.
If nothing else, we could just use apache basic authentication
(after we get it running via apache, that is)
This is part of the study curation process. Once a user submits a study, someone with (a) knowledge of the phenotypes, (b) admin access has to add ontology terms and other metadata for each phenotype. Once this is complete, the curator may publish the study.
For example, I set the unit ontology for yield_avg_kg_ha as it is clearly in units of kg/hectare.
These bugs are all present in the original AraPheno. There may be others.
Another alternative is to hide the REST API for now.
(split off from issue #21)
from Sudhansu:
The peanut pheno public data is in our DS and below is the link to the two relevant files.
The relevant DS directory: https://v1.legumefederation.org/data/public/Arachis_hypogaea/minicore.trt.JWYM/
Descriptors file: https://v1.legumefederation.org/data/public/Arachis_hypogaea/minicore.trt.JWYM/arahy.mincore.trt.JWYM.descriptors.xlsx
Data file: https://v1.legumefederation.org/data/public/Arachis_hypogaea/minicore.trt.JWYM/arahy.mincore.trt.JWYM.observations.xlsxThe Data file (observations.xlsx) has several sheets and the two sheets that has the pheno data are:
- Obs-mini_core sheet
These phenotype data should have two observations for each accession, one for year 2013 and the other for year 2015. I think we should be able to treat them as replications.
- protein_oil-mini_core sheet
The structure comes out after you sort on the basis of the 'accession' column. Each accession has mostly 2-3 replications.
Because we are able to store replicate data in in our database we should do that to preserve the original data as much as possible.Please let me know if we should meet to talk about the data after you go through the files.
Thanks for moving it forward quickly.
Sudhansu
Lurking in static/img, invoked in html/home/[about | links].html and maybe elsewhere.
My curiosity finally got the better of me enough to figure out the rather byzantine rigamarole required to enable flash to work in chrome on a specific site. The result (for AraPheno) is shown below, showing perhaps that stomata in Spanish Arabidopsis tend to be larger than those in Scandinavian countries (due to the falling of the rains mainly on the plains? or maybe another climate-related factor)
It also works on ArachisPheno, though I have yet to find a cute example. Seems like a reasonably flexible tool for display along various dimensions, could probably be reimplemented in something less universally deprecated than Flash but would take some time. Just wanted to note that it's actually possible to use this, if you are willing to be sufficiently sycophantic to your browser. I followed steps given here:
https://www.freecodecamp.org/news/how-to-enable-adobe-flash-player-in-google-chrome/
which worked as long as I followed them to the very end rather than assuming that it would work after the basic setting to Allow Flash was enabled
The ones provided by AraPheno are for Arabidopsis, so I removed them. We can add peanut accessions and genotypes later if desired.
sqlite> delete from phenotypedb_genotype_accessions;
sqlite> delete from phenotypedb_genotype;
sqlite> delete from phenotypedb_accession;
This legal disclaimer pops up when you click on Impressum at the bottom of the original AraPheno application, but is commented out in ArachisPheno. In issue #3, I speculated that it may be a European legal requirement, and wondered whether we need it.
Among other things, it mentions that the application uses Google Analytics and suggests how the user can prevent it by refusing to set cookies. Another option would be to remove that functionality, if possible.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.