Giter Club home page Giter Club logo

rawphenotypes's People

Contributors

carolyncaron avatar dependabot[bot] avatar laceysanderson avatar reynoldtan avatar tinygoprogs avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

rawphenotypes's Issues

Download: invalid input syntax for tripal_job progress

When downloading a file I get:

2018-04-23 11:22:07: Calling: rawpheno_trpdownload_generate_file(Array)
Generating CSV File: /var/www/dev/fresh/sites/default/files/tripal/tripal_downloads/rawpheno_csv2018Apr23_1524504124.csv
0% complete...
0.10288065843621% complete...
Job execution failed: SQLSTATE[22P02]: Invalid text representation: 7 ERROR: invalid input syntax for [error]
integer: "0.10288065843621"
LINE 1: UPDATE tripal_jobs SET progress='0.10288065843621' WHERE job...

Spaces in data

Trim leading and trailing spaces in data before saving to database. Especially for plant prop headers Plot, Entry, Rep etc.

Errors when the module is installed.

When installing the module I see the following errors:

$ drush en rawpheno
The following extensions will be enabled: rawpheno
Do you really want to continue? (y/n): y
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
CRITICAL (RAW PHENOTYPES): Chado/Tripal failed to insert cvterm (traits)
[site http://default] [TRIPAL CRITICAL] [RAW PHENOTYPES] Chado/Tripal failed to insert cvterm (traits)
rawpheno was enabled successfully.                                                                                                                                                                                                   [ok]
Custom table, 'rawpheno_rawdata_mview' ,  created successfully.                                                                                                                                                                      [status]
Materialized view 'rawpheno_rawdata_summary' created

Fail to add column header in manage project assets

An error happened on knowpulse at directory "Home » Administration » Tripal » Extensions» Manage Projects" when I tried to add a column header to one project (Lentil Diversity Panel Biomass or LR-11 Flowering Time) .

The webpage showed "Error: The website encountered an unexpected error. Please try again later." after I filled in blanks and submitted.

Other two functions "ADD EXISTING COLUMN HEADER" AND "ADD USER" on this webpage work fine.

Issue #24 - Allow user to download environment data in download.

image
Add an option to include environment data. The idea would be per project + location an environment data (when available) will be achived/zipped/compressed/packaged together with the raw phenotypic data generated by this form.

image
A separate tab for environment data lists all files, as well as, allow admin to add more files for a project and location. Additional information, year is requested which will become part of the filename and distinguishes environment data file from one year to another.

To implement: Update generate_file() function to fetch environment data file based on selected project + location combination and archive. To establish relationship between environment data file and project+location, a custom table containing the following fields would be necessary.

environment_data_id (serial) primary key, project_id fk, location (varchar), year (varchar), rank/sequence/version (varchar) and file_id fk. field rank/sequence/version is a series number for each file in case 2 or more environment data for a given project + location and year.

Dates instead of Days to

We have requested "Days to" for many of our AGILE Phenotypes. However, some data collectors are still recording dates. This is going to cause issues with the validator. Kirstin has requested we think about automatically converting a date to "days to" with the argument that this would be less error prone then the data collector converting them all.

However, we are concerned because Excel does a lot of auto correcting of dates that is not only hard to predict but can also cause data collection errors. Do we really want to support this?

plant/plot records only inserted (not re-used)

Expected Behaviour: Each unique combination of plot, germplasm, year, rep and location should have one record in the pheno_plant table.

Current Behaviour: There is one entry in the pheno_plant table per row/file combination.

Example

There are two researchers working on the same field trial (the same set of plots). One is taking data for traits 1-5 and the other is taking data for traits 6-10. This data is collected in two files (one per researcher) and uploaded independently.

Expected: On download the supervisor expects to have a single row for a given plot with data for traits 1-10. This means the underlying data should be attached to a single pheno_plant record.

Current: The download file has two rows for a given plot. The first has data for traits 1-5 with empty data for 6-10. The second has data for 6-10 with empty data for 1-5.

Validation does not occur in Step 1 for newly added traits which do not specify unit

When reviewing Pull request #32, I realized that allowing column headers to omit the unit in the format (unit) makes it difficult to validate that the unit actually makes sense. For example, cm = integer, date actually reflects a date, and so on. Sometimes this has resulted in an error during step 3 as in #32, but this is not always the case. Regardless, we have discussed the issue of whether or not we want to allow this kind of flexibility in the first place. Concerns that arose were:

  • Implementing validation at the second stage, or redesigning the entire flow of the module will take up too much time
  • "Yelling" at users who are uploading bonus traits does not feel right, and may discourage further use of the module or encourage abandonment at the second stage
  • What happens if the same "bonus" trait is uploaded twice? It will then be picked up by validation in step 1. Now what happens if it was fine the first time but not the second time? Not only can this discourage the user, but we shudder at the thought of the data being fixed only after a first attempt was successfully made, resulting in heterogenous values for this trait.

We propose the following solution to address all 3 concerns. This will occur during step 3:

Check if the trait is a newly-defined trait. If yes:
   Validate the unit. If validation passes:
      Save values
   Else if validation fails:
      Ignore values, but send an email to the administrator detailing the problem trait
Else if not a new trait:
   Save values

Thus, this issue can be solved by confirming the unit manually by the admin (or asking a local expert) or even contacting the original phenotyper for clarification (as the notification will be immediate), but the remainder of the data still gets saved.

Make Tripal v3 compatible

To be made Tripal v3 compatible this module simply needs the dependency in the .info file changed from tripal_core to tripal.

This is due to the main change between Tripal v2 and Tripal v3 being Nodes => Entities and this module does not interact with Tripal Nodes.

Make rawphenotypes pages easily accessible to data collectors.

It has been pointed out that raw phenotypes data collectors struggle to locate links when working with the module. To address this issue, add/relocate links relevant to rawphenotypes module to a section of KP where it can be easily seen or accessed.

Summary page update - wrap text in location

Location column header stores the location of a field trial. The module does not have a uniform way of encoding the value in this column and so in one project it shows only the country information and in another it shows region/city plus the country information. This issue will implement a standard format for location by wrapping text to two rows and ideally the first line should show the country and the second line to be the region/city information. For example Saskatoon, Canada

CANADA
Saskatoon

or Cordoba, Spain

SPAIN
Cordoba

Download Data Enhancements.

  1. Allow heatmap elements and select fields in data summary page to be data donwload filter options.

  2. Create filter by year option in data download page.

  3. Create filter by RIL option in data download page.

Whitespace between words in column header --not recognized

When a user adds space between words in the column header for any trait (essential, optional or new), the trait should still be recognized. For example, Planting Date should match Planting Date. This should match for all column types throughout all stages: stage 1 validation, new trait detection and when loading the data in stage 3.

There is a partial fix for this in 40c2090#diff-b6f1b3636044514f9512c37ba5031205R938 but it is still showing errors.

WSOD of Download page

I'm experiencing a WSOD on the download page (phenotypes/raw/download) of my Tripal2 KnowPulse clone. It appears to be related to an empty location being passed to rawpheno_download_load_traits and fed directly into the SQL query. This results in the following PDO Exception:
screen shot 2018-02-07 at 3 06 18 pm

Trait summary barchart labels incorrect

The trait barchart currently says the y-axis is the "number of germplasm" when what is actually represented is the "number of plots". Furthermore, it says the x-axis is the average which is misleading. It only averages if there is more then one measurement for a unique trait/plot combination which is highly unlikely. It should be "Average Observed Measurements per Plot".

Country name as location

Current summary chart shows location as the name of the country e.g. India, Spain etc. that is vague and might cause confusion when there are multiple trials in a same country.

  • Suggest a naming format that includes the specific town/city and the country.
    For example: Sutherland Saskatoon, Canada
    Central Ferry Washington, USA

  • Fill Location in advance when downloading data collection spreadsheet file.

  • Support the use of GPS coordinates.

Non-Microsoft Excel spreadsheet (eg. from LibreOffice) fails validation

Non-Microsoft Excel spreadsheet (eg. from LibreOffice) fails validation originally reported in #31 by @Jiu9Shen.

From @carolyncaron

@Jiu9Shen confirmed that your test spreadsheets prompts an error because of no content.
However, other test cases that contain data did not prompt errors during validation when expected. :-(
I think this is a difficult problem for you to debug without easy access to Linux or LibreOffice. And, since it is not urgent, I suggest we create a separate issue for this bug and ask @Jiu9Shen to make an attempt at it once we are upgraded to Tripal 3.

Should we reconsider separating multiple values for a single phenotype with a comma?

Currently, the module allows multiple values for a single phenotypic observation in the cases where multiple phenotypers may be uploading for the same project, and it does this by appending new values separated by a comma.

Derek suggests that commas, especially where numerical values occur, can be confusing for users who download the data down the road since other parts of the world use commas in place of decimals points. For example, 1st value = 2,3 and 2nd value = 2,6 to result in 2,3,2,6. Additionally, comments may also become hard to understand or separate.

He suggests we can use semicolons instead, which are R-friendly as well as human-readable. Thus, the previous example would look like 2,3;2,6.

Any thoughts?

Sorting of projects in the download page should be alphabetical

We've received feedback that the way projects are currently sorted in the dropdown appears "random" and is not intuitive. Reynold already addressed this in PR #65 but it might be buggy - instead of trying to fix it we think we should opt for the classic alphabetical sort as it looks to be the most intuitive option.

Currently (before PR #65) the default is set to "All Projects". PR #65 now sets the default to the project that has data uploaded most recently. We want to change this to "Select a Project" to force the user to choose when there are 2 or more. Otherwise, if there is only a single project it will be selected by default. :-)

Download is resulting in "Failed -No file"

When I download data (I have tried every combination of select all, select one, etc.) on the KnowPulse production site, once the file is generated and I click the link Chrome says "Failed - No file".
screen shot 2016-11-29 at 1 25 00 pm

This worked on our development site... I double checked permissions of the tripal_downloads file directory so it's not that. I also checked and the file is not there. The only error in the Drupal log is that the file isn't there and the apache error log is silent.

Upgrade to Tripal 4 + Drupal 9

Work on this has been begun by @reynoldtan on branch 9.x-2.x. We have a PR open #83 for testing of the current upgrade and would appreciate feedback on this issue or the PR if you are interested in using this module for Tripal 4.

THIS ISSUE SHOULD NOT BE CLOSED UNTIL

  • Tripal 4 stable is released
  • 9.x-2.x branch is made the default branch
  • all documentation has been upgraded to the Drupal 9 / Tripal 4 / Raw Phenotypes 2.x

Stock Names Look-up should be restricted to an organism.

We need to check that when the stock_id for a given row is looked up, we restrict the query to the organism. Currently, if you try to load data for any of the breeding program crosses, the system returns validation telling you they are not unique when in fact they are -by organism.

Each phenotyping project should only collect data for a single organism. Therefore, we can solve this bug by saving the organism for a project in the projectprop table and then looking it up when validating or loading a raw phenotype dataset. We should add form elements to the admin manage projects interface to allow setting of the organism.

Error in deleting column header in manage project assets

An error happened on knowpulse at directory "Home » Administration » Tripal » Extensions» Manage Projects" when I tried to delete column headers (R7 Traits: Canopy Height (1st; cm) and R7 Traits: Canopy Height (2nd; cm)) in one project (Lentil Diversity Panel Biomass).

After I clicked delete button on webpage, a knowpuse notice showed and asked: "Are you sure to delete this column header?". It leads to an error page when I choose yes. The column headers I tried to delete still exist after several tries. However, they disappeared after several minutes without operation.

The headers I tried to delete are EXISTING COLUMN HEADERS.

Upload failing in Step3: "Germplasm doesn't exist" when it does

When uploading a test (file: Test-2-NoErrors.xlsx):

  • step1: successfully finds all germplasm giving me a beautiful green checkmark beside all germplasm exist.
  • step2: no new traits
  • step3: when job is run on the command-line it fails with the following output:

2018-04-23 13:19:14: Calling: rawpheno_load_spreadsheet(63, a:0:{}, 1548, a:4:{i:0;s:5:"Entry";i:1;s:8:"Location";i:2;s:4:"Plot";i:3;s:3:"Rep";})
0% complete...
WD rawpheno: Uploading Phenoypic Data: Germplasm doesn't exist (name=Nugget; row=2) [error]
CRITICAL (RAWPHENO): Uploading Phenoypic Data: Germplasm doesn't exist (name=Nugget; row=2)
[site http://default] [TRIPAL CRITICAL] [RAWPHENO] Uploading Phenoypic Data: Germplasm doesn't exist (name=Nugget; row=2)
WD rawpheno: [CODE 103] Failed to load phenoypic data (job 7895) [error]
CRITICAL (RAWPHENO): [CODE 103] Failed to load phenoypic data (job 7895)
[site http://default] [TRIPAL CRITICAL] [RAWPHENO] [CODE 103] Failed to load phenoypic data (job 7895)
Drush command terminated abnormally due to an unrecoverable error. [error]

NOTE: The germplasm does exists:

kp3_fresh=# SELECT * FROM chado.stock WHERE name~'ugget';
stock_id | dbxref_id | organism_id | name | uniquename | description | type_id | is_obsolete
----------+-----------+-------------+--------+-------------+-------------+---------+-------------
8110 | 1910939 | 4 | Nugget | KP:GERM8110 | | 3683 | f
(1 row)

Summary Barchart incorrectly states "No Data"

I found this bug during the Tripal 3 upgrade but it is not related to the Tripal version.

Symptom: The summary barchart displayed at phenotypes/raw when a trait is chosen, says there is no data when there is. This was experienced consistently for all traits on a KP clone site but not on the production site.

Warn users not to use barchart for publication

The barchart provided on phenotypes/raw for a specific trait utilizes raw data and as such should never be used in publication. This should be made more clear to researchers by including a disclaimer on the chart.

The following chart uses raw data and, as such, should never be used in publication. It is meant to give you a quick visual and to identify problems such as outliers to aid you in your analysis.

Collecting data for plots that are segregating.

Some of our plots are segregating for specific phenotypes. For example, some plots might be segregating for flower colour (e.g. white/purple) or days to flower (e.g. 42/59 days). In these cases, some data collectors will record both phenotypes they observed (as shown in brackets above). Kirstin feels the loader should handle this.

Upload/Backup Validation: long process for in page load.

See c46f04b.

There is a concern with the the increase of max execution time needed for validation of long files. This causes there to be a long ajax upload spinner with no progress reported to the user. Furthermore since it is dependant upon the size of the file, at some point the files will likely reach a size to break this.

One option is to move it into a Tripal Job. This pulls validation out of the page load and allows us to provide progress reporting to the user.

One Concern (@Reynold):
This step can be subjected to numerous repetition, and steps to Register a job, wait for job queue and execute in tribal job each time might cause unnecessary wait time to user and might not give a
relatively quick response as what we currently have.

Warning on Upload page

There has been some confusion between the upload and backup pages since they look so similar. As such, it might be prudent to add a warning to the upload page indicating that this should only be done once per dataset once data collection has been completed. It would be helpful to point them to the backups page.

Documentation needs updating

We need to update the README to demonstrate the following functionality:

  • The dashboard on the front page
  • Reflect the environmental data option in the download screenshot as well as mention this functionality

Update the wiki to show:

  • How to upload environmental data files
  • How to set the email address for email support
  • Show tips for various drupal hooks to customize the module (examples: ignore column(s), add prefixes/suffixes to stock names)

Is there anything I missed?

Issue #31 - Page redirect, Non-MS Excel spreadsheet, Stage 2 and 3 errors.

  • Admin pages redirect to page not found in:
    1. Create a project
    2. Delete header, user and environment data file
  • Similar trait not suggested in Stage 2 - Describe Trait
  • Non-Microsoft Excel spreadsheet (eg. from LibreOffice) fails validation
  • Extra spaces in column headers fail Stage 3 - Save Spreadsheet

Gap in x axis

screen shot 2018-11-05 at 10 46 52 am

There appears to be gap along the x axis of the Trait histogram. I believe this is caused by the extra space I've added to the heatmap to accommodate multi-line location names.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.