Giter Club home page Giter Club logo

arachispheno's People

Watchers

 avatar  avatar  avatar  avatar

arachispheno's Issues

Phenotype detail map shows number of replicates, not accessions

Split off from issue #5:

Inconsistency in the number of accessions:
This file (study) has Unique acc = 794. The page with this link
http://dev.lis.ncgr.org:50007/phenotype/5/ shows 'Geographic
distribution of 876 accessions' above the world map.
But the table view in the same page has correct number of accessions: 793.

I already changed the map label count from replicates to accessions, but then determined that the map itself shows the number of replicates for each country, not accessions. Working on that now.

Authentication to access ArachisPheno site

On April-08-2020, a week from today, there is a meeting with the researchers whose data we are using here. They have insisted to keep the data private until it is published (in future). But we need to share with them this utility. Can we have some kind of authenticated access ready for them before the meeting?
Refer: #2
From the schema it looks like there is author, group and public access for studies. This could work for us too for this purpose. Or go the quick temporary way Andrew suggested in #2, "... use apache basic authentication (after we get it running via apache, that is)"

Fix # Values column

Missing or incorrect in the study-phenotype table, the accession-phenotype table, and also the REST API.

This problem existed in AraPheno, so do a pull request once fixed.

Phenotype transformations

These replace each phenotype value v with a functional transformation, like log(v), sqr(v), sqrt(v).

Some of the resulting histograms look reasonable, like that for seed_weight. Note that the untransformed value with the highest frequency is around 150, and log(150) ~ 5 which has the highest frequency in the log-transformation histogram.
http://dev.lis.ncgr.org:50007/phenotype/20/transformation/

Others, like #1_kernel_weight, seem off. Here, note that the highest value is around 125, log(125) ~ 4.8 but the corresponding value in the log-transformation histogram is about 5.1, and sqr(125) = 15625 but the highest value in the sqr-transformation histogram is about 27000, etc.
http://dev.lis.ncgr.org:50007/phenotype/1/transformation/

On closer examination, AraPheno's default behavior for a transform f is not to use f(v) as expected, but to use
f(v - min(vv) + 0.1*var(vv))
where vv is the list of all values. This must be some kind of statistical correction (? I am still researching it). However, it is possible to tell it to not do this in ArachisPheno.

ArachisPheno crashes if no Studies exist

This behavior existed in AraPheno, it throws an error instead of displaying the Home page if no Studies exist. I initially had to add a spurious Study to the database by hand to get it to run.

OperationalError at phenotype and study

Going to any phenotype or the study page shows:
OperationalError at /phenotype/19/
no such column: phenotypedb_submission.is_private
... ... ...
Error during template rendering
In template /home/svengato/ArachisPheno/html/base.html, error at line 0
no such column: phenotypedb_submission.is_private

... ... ...

ArachisPheno: Customization

The ArachisPheno doesn't have to be an exact replicate of the Arabidopsis portal. We will need only certain features, at least, in the beginning. This issue is to help us gradually settle on which features, UI included, we need. The features list should evolve as we work through the site and the data we have keeping in mind the flexibility factor for future needs.

Finalize accession format

I will run through the migration process one more time when we finalize the [accession id, accession name, replicate id] format (with underscore or whatever).

I have been holding off on this in order to implement it in the next migration event (such as adding model fields for distinguishing private data).

  1. Should I update these this morning, before people start looking at ArachisPheno? I can change them directly in the database for now.
  2. As I understand it, the final format will involve underscores: accession id = PI_152111, accession name = PI_152111, replicate id = PI_152111_1 ?

Allow HTML code in study description

Sudhansu requests the ability to put line breaks and URL links in the study description, so that the study detail and study list pages format them correctly.

Private data: Can a dataset/study be kept private?

Issue description copied from email.
0327, 10.45 a.m. sd-adf-sg

There is a related issue in my mind but I will spell it out in an issue in Arachpheno: Is it possible to have some datasets public and some datasets private in Arapheno. This becomes relevant when we serve the public minicore data. May be Sven already knows about the public vs. private data.

I doubt that it has this ability- I haven't seen any sort of login mechanism. could be wrong, though.

As far as I can tell, all of the AraPheno data are public. One of the database tables (auth_permission) lists various permission levels, but I do not think they are used for anything. I have no idea how hard it would be to add private data, as in the InterMines.

do you mean the MyMine feature of the intermines? that's a bit different I think, in that only the lists and queries one makes are private;
I don't think there's any private data per se.

Site access restriction:

I am in favour of password protection that you have suggested, we should be able to share the presentation of the data among us insiders and the collaborators.

Like an initial login page? I will look into it.

If nothing else, we could just use apache basic authentication
(after we get it running via apache, that is)

Add phenotype metadata

This is part of the study curation process. Once a user submits a study, someone with (a) knowledge of the phenotypes, (b) admin access has to add ontology terms and other metadata for each phenotype. Once this is complete, the curator may publish the study.

For example, I set the unit ontology for yield_avg_kg_ha as it is clearly in units of kg/hectare.

Fix REST API

These bugs are all present in the original AraPheno. There may be others.

  1. Phenotype lists: fields like num_values are often missing.
  2. Do num_values and number_replicates mean the same thing? If so, we could eliminate the latter.
  3. Phenotype lists use AraPheno's DOI (10.21958), as defined in arapheno/arapheno/settings/defaults.py. Do we have our own DOI?
  4. Missing commas between ontology types in the header. (This is a simple oversight in arapheno/phenotypedb/renderer.py, easy to fix).

Another alternative is to hide the REST API for now.

(split off from issue #21)

Scrub and prepare minicore data (for public version of ArachisPheno)

from Sudhansu:

The peanut pheno public data is in our DS and below is the link to the two relevant files.

The relevant DS directory: https://v1.legumefederation.org/data/public/Arachis_hypogaea/minicore.trt.JWYM/

Descriptors file: https://v1.legumefederation.org/data/public/Arachis_hypogaea/minicore.trt.JWYM/arahy.mincore.trt.JWYM.descriptors.xlsx
Data file: https://v1.legumefederation.org/data/public/Arachis_hypogaea/minicore.trt.JWYM/arahy.mincore.trt.JWYM.observations.xlsx

The Data file (observations.xlsx) has several sheets and the two sheets that has the pheno data are:

  1. Obs-mini_core sheet

These phenotype data should have two observations for each accession, one for year 2013 and the other for year 2015. I think we should be able to treat them as replications.

  1. protein_oil-mini_core sheet

The structure comes out after you sort on the basis of the 'accession' column. Each accession has mostly 2-3 replications.
Because we are able to store replicate data in in our database we should do that to preserve the original data as much as possible.

Please let me know if we should meet to talk about the data after you go through the files.

Thanks for moving it forward quickly.

Sudhansu

The Flash-y Explorer: disable or give guidance?

My curiosity finally got the better of me enough to figure out the rather byzantine rigamarole required to enable flash to work in chrome on a specific site. The result (for AraPheno) is shown below, showing perhaps that stomata in Spanish Arabidopsis tend to be larger than those in Scandinavian countries (due to the falling of the rains mainly on the plains? or maybe another climate-related factor)
image

It also works on ArachisPheno, though I have yet to find a cute example. Seems like a reasonably flexible tool for display along various dimensions, could probably be reimplemented in something less universally deprecated than Flash but would take some time. Just wanted to note that it's actually possible to use this, if you are willing to be sufficiently sycophantic to your browser. I followed steps given here:
https://www.freecodecamp.org/news/how-to-enable-adobe-flash-player-in-google-chrome/
which worked as long as I followed them to the very end rather than assuming that it would work after the basic setting to Allow Flash was enabled

Accessions and Genotypes

The ones provided by AraPheno are for Arabidopsis, so I removed them. We can add peanut accessions and genotypes later if desired.

sqlite> delete from phenotypedb_genotype_accessions;
sqlite> delete from phenotypedb_genotype;
sqlite> delete from phenotypedb_accession;

Update or remove "Impressum" footer?

This legal disclaimer pops up when you click on Impressum at the bottom of the original AraPheno application, but is commented out in ArachisPheno. In issue #3, I speculated that it may be a European legal requirement, and wondered whether we need it.

Among other things, it mentions that the application uses Google Analytics and suggests how the user can prevent it by refusing to set cookies. Another option would be to remove that functionality, if possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.