uofs-pulse-binfo / rawphenotypes Goto Github PK
View Code? Open in Web Editor NEWA Tripal module for storing raw phenotypic data. Specifically meant to help researchers contribute raw data, visualize summaries and download for further analysis.
A Tripal module for storing raw phenotypic data. Specifically meant to help researchers contribute raw data, visualize summaries and download for further analysis.
When downloading a file I get:
2018-04-23 11:22:07: Calling: rawpheno_trpdownload_generate_file(Array)
Generating CSV File: /var/www/dev/fresh/sites/default/files/tripal/tripal_downloads/rawpheno_csv2018Apr23_1524504124.csv
0% complete...
0.10288065843621% complete...
Job execution failed: SQLSTATE[22P02]: Invalid text representation: 7 ERROR: invalid input syntax for [error]
integer: "0.10288065843621"
LINE 1: UPDATE tripal_jobs SET progress='0.10288065843621' WHERE job...
Create an API function that retrieves the support email. That API function in rawphenotypes would retrieve a default email and then call the alter function. Then KP nodes would implement the alter function and supply your email
Trim leading and trailing spaces in data before saving to database. Especially for plant prop headers Plot, Entry, Rep etc.
When installing the module I see the following errors:
$ drush en rawpheno
The following extensions will be enabled: rawpheno
Do you really want to continue? (y/n): y
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
rawpheno was enabled successfully. [ok]
Custom table, 'rawpheno_rawdata_mview' , created successfully. [status]
Materialized view 'rawpheno_rawdata_summary' created
An error happened on knowpulse at directory "Home » Administration » Tripal » Extensions» Manage Projects" when I tried to add a column header to one project (Lentil Diversity Panel Biomass or LR-11 Flowering Time) .
The webpage showed "Error: The website encountered an unexpected error. Please try again later." after I filled in blanks and submitted.
Other two functions "ADD EXISTING COLUMN HEADER" AND "ADD USER" on this webpage work fine.
Add an option to include environment data. The idea would be per project + location an environment data (when available) will be achived/zipped/compressed/packaged together with the raw phenotypic data generated by this form.
A separate tab for environment data lists all files, as well as, allow admin to add more files for a project and location. Additional information, year is requested which will become part of the filename and distinguishes environment data file from one year to another.
To implement: Update generate_file() function to fetch environment data file based on selected project + location combination and archive. To establish relationship between environment data file and project+location, a custom table containing the following fields would be necessary.
environment_data_id (serial) primary key, project_id fk, location (varchar), year (varchar), rank/sequence/version (varchar) and file_id fk. field rank/sequence/version is a series number for each file in case 2 or more environment data for a given project + location and year.
We have requested "Days to" for many of our AGILE Phenotypes. However, some data collectors are still recording dates. This is going to cause issues with the validator. Kirstin has requested we think about automatically converting a date to "days to" with the argument that this would be less error prone then the data collector converting them all.
However, we are concerned because Excel does a lot of auto correcting of dates that is not only hard to predict but can also cause data collection errors. Do we really want to support this?
Accept all variations (case-insensitive) of not applicable NA, N/A and N.A. in Plot, Replicate, Location
and other phenotypes.
Expected Behaviour: Each unique combination of plot, germplasm, year, rep and location should have one record in the pheno_plant table.
Current Behaviour: There is one entry in the pheno_plant table per row/file combination.
There are two researchers working on the same field trial (the same set of plots). One is taking data for traits 1-5 and the other is taking data for traits 6-10. This data is collected in two files (one per researcher) and uploaded independently.
Expected: On download the supervisor expects to have a single row for a given plot with data for traits 1-10. This means the underlying data should be attached to a single pheno_plant record.
Current: The download file has two rows for a given plot. The first has data for traits 1-5 with empty data for 6-10. The second has data for 6-10 with empty data for 1-5.
When reviewing Pull request #32, I realized that allowing column headers to omit the unit in the format (unit) makes it difficult to validate that the unit actually makes sense. For example, cm = integer, date actually reflects a date, and so on. Sometimes this has resulted in an error during step 3 as in #32, but this is not always the case. Regardless, we have discussed the issue of whether or not we want to allow this kind of flexibility in the first place. Concerns that arose were:
We propose the following solution to address all 3 concerns. This will occur during step 3:
Check if the trait is a newly-defined trait. If yes:
Validate the unit. If validation passes:
Save values
Else if validation fails:
Ignore values, but send an email to the administrator detailing the problem trait
Else if not a new trait:
Save values
Thus, this issue can be solved by confirming the unit manually by the admin (or asking a local expert) or even contacting the original phenotyper for clarification (as the notification will be immediate), but the remainder of the data still gets saved.
To be made Tripal v3 compatible this module simply needs the dependency in the .info file changed from tripal_core
to tripal
.
This is due to the main change between Tripal v2 and Tripal v3 being Nodes => Entities and this module does not interact with Tripal Nodes.
It has been pointed out that raw phenotypes data collectors struggle to locate links when working with the module. To address this issue, add/relocate links relevant to rawphenotypes module to a section of KP where it can be easily seen or accessed.
Location column header stores the location of a field trial. The module does not have a uniform way of encoding the value in this column and so in one project it shows only the country information and in another it shows region/city plus the country information. This issue will implement a standard format for location by wrapping text to two rows and ideally the first line should show the country and the second line to be the region/city information. For example Saskatoon, Canada
CANADA
Saskatoon
or Cordoba, Spain
SPAIN
Cordoba
Allow heatmap elements and select fields in data summary page to be data donwload filter options.
Create filter by year option in data download page.
Create filter by RIL option in data download page.
Currently the Backup fails to save the file when Measurements tab is renamed. This is not okay as the file should always be saved during backup, regardless of validation failing.
Reps in lodging trait triggers validation error - trait not properly formatted.
When a user adds space between words in the column header for any trait (essential, optional or new), the trait should still be recognized. For example, Planting Date
should match Planting Date
. This should match for all column types throughout all stages: stage 1 validation, new trait detection and when loading the data in stage 3.
There is a partial fix for this in 40c2090#diff-b6f1b3636044514f9512c37ba5031205R938 but it is still showing errors.
I'm experiencing a WSOD on the download page (phenotypes/raw/download) of my Tripal2 KnowPulse clone. It appears to be related to an empty location being passed to rawpheno_download_load_traits and fed directly into the SQL query. This results in the following PDO Exception:
The trait barchart currently says the y-axis is the "number of germplasm" when what is actually represented is the "number of plots". Furthermore, it says the x-axis is the average which is misleading. It only averages if there is more then one measurement for a unique trait/plot combination which is highly unlikely. It should be "Average Observed Measurements per Plot".
Current summary chart shows location as the name of the country e.g. India, Spain etc. that is vague and might cause confusion when there are multiple trials in a same country.
Suggest a naming format that includes the specific town/city and the country.
For example: Sutherland Saskatoon, Canada
Central Ferry Washington, USA
Fill Location in advance when downloading data collection spreadsheet file.
Support the use of GPS coordinates.
Non-Microsoft Excel spreadsheet (eg. from LibreOffice) fails validation originally reported in #31 by @Jiu9Shen.
From @carolyncaron
@Jiu9Shen confirmed that your test spreadsheets prompts an error because of no content.
However, other test cases that contain data did not prompt errors during validation when expected. :-(
I think this is a difficult problem for you to debug without easy access to Linux or LibreOffice. And, since it is not urgent, I suggest we create a separate issue for this bug and ask @Jiu9Shen to make an attempt at it once we are upgraded to Tripal 3.
Function uses comma symbol to separate location(s) selected hence when location has this symbol to include ie city or country information, it interprets location as multiple and unrelated values when it should be treated as one.
This bug was found while testing PR #58. Pre-existing traits will not be detected by the system if a non-breaking space is present at the beginning or end of the column header for those traits.
Currently, the module allows multiple values for a single phenotypic observation in the cases where multiple phenotypers may be uploading for the same project, and it does this by appending new values separated by a comma.
Derek suggests that commas, especially where numerical values occur, can be confusing for users who download the data down the road since other parts of the world use commas in place of decimals points. For example, 1st value = 2,3
and 2nd value = 2,6
to result in 2,3,2,6
. Additionally, comments may also become hard to understand or separate.
He suggests we can use semicolons instead, which are R-friendly as well as human-readable. Thus, the previous example would look like 2,3;2,6
.
Any thoughts?
Speed up histogram in raw data summary page.
We've received feedback that the way projects are currently sorted in the dropdown appears "random" and is not intuitive. Reynold already addressed this in PR #65 but it might be buggy - instead of trying to fix it we think we should opt for the classic alphabetical sort as it looks to be the most intuitive option.
Currently (before PR #65) the default is set to "All Projects". PR #65 now sets the default to the project that has data uploaded most recently. We want to change this to "Select a Project" to force the user to choose when there are 2 or more. Otherwise, if there is only a single project it will be selected by default. :-)
When I download data (I have tried every combination of select all, select one, etc.) on the KnowPulse production site, once the file is generated and I click the link Chrome says "Failed - No file".
This worked on our development site... I double checked permissions of the tripal_downloads file directory so it's not that. I also checked and the file is not there. The only error in the Drupal log is that the file isn't there and the apache error log is silent.
Show environment data option by default and enable or disable option based on filter combination selected in download page.
Remove All Projects option in select project select box and
Sort project options based on the recent data uploaded and/or by planting year.
Work on this has been begun by @reynoldtan on branch 9.x-2.x. We have a PR open #83 for testing of the current upgrade and would appreciate feedback on this issue or the PR if you are interested in using this module for Tripal 4.
Update api that uses tripal_get_cvterm() and tripal_get_cv (implentation returns empty result in Tripal 3).
We need to check that when the stock_id for a given row is looked up, we restrict the query to the organism. Currently, if you try to load data for any of the breeding program crosses, the system returns validation telling you they are not unique when in fact they are -by organism.
Each phenotyping project should only collect data for a single organism. Therefore, we can solve this bug by saving the organism for a project in the projectprop table and then looking it up when validating or loading a raw phenotype dataset. We should add form elements to the admin manage projects interface to allow setting of the organism.
Rawdata summary page takes a while ( > 2 mins) to load with under 500K phenotypes.
An error happened on knowpulse at directory "Home » Administration » Tripal » Extensions» Manage Projects" when I tried to delete column headers (R7 Traits: Canopy Height (1st; cm) and R7 Traits: Canopy Height (2nd; cm)) in one project (Lentil Diversity Panel Biomass).
After I clicked delete button on webpage, a knowpuse notice showed and asked: "Are you sure to delete this column header?". It leads to an error page when I choose yes. The column headers I tried to delete still exist after several tries. However, they disappeared after several minutes without operation.
The headers I tried to delete are EXISTING COLUMN HEADERS.
When uploading a test (file: Test-2-NoErrors.xlsx):
2018-04-23 13:19:14: Calling: rawpheno_load_spreadsheet(63, a:0:{}, 1548, a:4:{i:0;s:5:"Entry";i:1;s:8:"Location";i:2;s:4:"Plot";i:3;s:3:"Rep";})
0% complete...
WD rawpheno: Uploading Phenoypic Data: Germplasm doesn't exist (name=Nugget; row=2) [error]
CRITICAL (RAWPHENO): Uploading Phenoypic Data: Germplasm doesn't exist (name=Nugget; row=2)
[site http://default] [TRIPAL CRITICAL] [RAWPHENO] Uploading Phenoypic Data: Germplasm doesn't exist (name=Nugget; row=2)
WD rawpheno: [CODE 103] Failed to load phenoypic data (job 7895) [error]
CRITICAL (RAWPHENO): [CODE 103] Failed to load phenoypic data (job 7895)
[site http://default] [TRIPAL CRITICAL] [RAWPHENO] [CODE 103] Failed to load phenoypic data (job 7895)
Drush command terminated abnormally due to an unrecoverable error. [error]
NOTE: The germplasm does exists:
kp3_fresh=# SELECT * FROM chado.stock WHERE name~'ugget';
stock_id | dbxref_id | organism_id | name | uniquename | description | type_id | is_obsolete
----------+-----------+-------------+--------+-------------+-------------+---------+-------------
8110 | 1910939 | 4 | Nugget | KP:GERM8110 | | 3683 | f
(1 row)
Unsure what is causing this, but report of this error started when moving host to HTTPS.
Related Drupal community discussion:
https://www.drupal.org/node/1232416
I found this bug during the Tripal 3 upgrade but it is not related to the Tripal version.
Symptom: The summary barchart displayed at phenotypes/raw when a trait is chosen, says there is no data when there is. This was experienced consistently for all traits on a KP clone site but not on the production site.
On the instructions page the icons for collect/backup/submit data are overlaying the vertical tab pane instead of underneath.
See http://knowpulse.usask.ca/portal/phenotypes/raw/instructions (image attached).
This will handle all column headers. A partial fix for similar issue but only covers column headers as new column header.
The barchart provided on phenotypes/raw for a specific trait utilizes raw data and as such should never be used in publication. This should be made more clear to researchers by including a disclaimer on the chart.
The following chart uses raw data and, as such, should never be used in publication. It is meant to give you a quick visual and to identify problems such as outliers to aid you in your analysis.
Some of our plots are segregating for specific phenotypes. For example, some plots might be segregating for flower colour (e.g. white/purple) or days to flower (e.g. 42/59 days). In these cases, some data collectors will record both phenotypes they observed (as shown in brackets above). Kirstin feels the loader should handle this.
See c46f04b.
There is a concern with the the increase of max execution time needed for validation of long files. This causes there to be a long ajax upload spinner with no progress reported to the user. Furthermore since it is dependant upon the size of the file, at some point the files will likely reach a size to break this.
One option is to move it into a Tripal Job. This pulls validation out of the page load and allows us to provide progress reporting to the user.
One Concern (@Reynold):
This step can be subjected to numerous repetition, and steps to Register a job, wait for job queue and execute in tribal job each time might cause unnecessary wait time to user and might not give a
relatively quick response as what we currently have.
There has been some confusion between the upload and backup pages since they look so similar. As such, it might be prudent to add a warning to the upload page indicating that this should only be done once per dataset once data collection has been completed. It would be helpful to point them to the backups page.
We need to update the README to demonstrate the following functionality:
Update the wiki to show:
Is there anything I missed?
Changes: cb18bae...master
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.