Giter Club home page Giter Club logo

epnd-fairification's People

Contributors

aiaragomes avatar brianfred avatar cvdl-um avatar jaspersnel avatar jvsoest avatar matthijssloep avatar putssander avatar shashankmc avatar varshagouthamchand avatar weeleon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

jinzhouyang666

epnd-fairification's Issues

Proper logging

Use Python logging library for proper logging instead of printing statements.

Upload FIP file directly

Current setup -

  1. Provide URL of SHACL file or
  2. Provide URL of Nanopub #41

Enhancement - upload local SHACL file

Possibility of updating metadata

It should be possible to update metadata. Now the only way of changing something is by completely deleting the cohort and re-adding it.

Metadata templates use title of template

Configure in FIP which predicate (within the Cedar template) is used to describe the title and/or description.

Use this description from the FIP to retrieve the title/description, to show this instead of the UUID in the overview of Cedar instances.

Maintaining navigation consistency across pages

  1. Basically, an user should be able to navigate between pages easily but currently there are multiple steps required to get to a certain page.

Solution: Maintain same page navigation consistency across pages.

Docker-compose doesn't pull the latest image for webui

Description:

  1. Although webui service mentions "latest" tag for obtaining the latest image, it doesn't obtain the same.
  2. On running docker system prune and running docker-compose up -d from the root directory, the issue persists.

Fix: On checking the release tags for the image, there are two - "latest" and "main". If "main" is used as the tag, the latest release is pulled.

Reason: No idea, what do you think @jvsoest ?

Mappings under a single graph

The current setup does column mapping in a different graph compared to the cell mapping. It isn't a major issue but be better if everything is under a single graph. Work for the future.

Maintain frontend consistency

From a design perspective, there is an deviation from "standard" front end i.e in terms of consistency in colors and things alike. Need to change that for better usability.

Upload of SHACL fails

Line:

curl -d @../shapeTest/shacl.ttl --header "Content-Type: application/x-turtle" http://localhost:7200/repositories/epnd_dummy/statements?context=%3Chttp://shacl.local/%3E

For some reason the cURL upload does fail. Possible causes:

  • Comments in the SHACL file (hashtag), which is parsed but then giving errors
  • Something else

We need to test which triples are being uploaded, and where the upload starts to fail. Manual upload into GraphDB does work (as a temporary workaround)

Support for excel and/or SPSS files

Convert Excel/SPSS on-the-fly to CSV, so that triplifier is able to use it.

Other solution: a different tool which inspects Excel/SPSS and generates the ontology and data which is equivalent to the triplifier output.

Ability to change fip

Current setup -

  1. Once fip is added, it cannot be modified. Everything needs to be deleted in graphDB.

Problem:
Not convenient and might require lot of workarounds.

Solution:

  1. Ability to modify FIP. That implies deletion of all data associated to that FIP.

Delete column mapping should delete cell rules

When deleting a column mapping, only the column mapping gets deleted. This means the reasoning rule for cell values are still available. When adding the mapping again, this will show the old mappings.

Solution: remove reasoning rules when removing column mappings

Mapping view does not take into account the CEDAR metadata

The mapping view is now for all projects. Make sure the mapping only applies to one given cohort/CEDAR metadata object. otherwise mappings from different projects will overlap

Solution: take the cohort or dataset instance identifier in account when creating the graph of mapped data.
When using the latter, this will also solve #25

Temporal annotations - relationships between columns

Identify relationships between columns (e.g. a tumor stage column references to the date of diagnosis column), and give users the option to define these relationships.

This could link to the SHACL file where these relationships are defined

Mapping of cell values does not happen on URI

If I understand the code in mapDatasets.py correctly, the cell value (categorical mapping) happens on the rdfs:label of the option available in the Shape (SHACL definition), instead of the URI.
The SPARQL query involved here is in getCategories.py function getCategoryCode(). However it relies on proper matching of rdfs:labels, with some magic happening on the category string. This breaks in the EPND CS1 SHACL file for the APOE genotype.

Please change that it uses the actual URI (which should be unique) as value for mapping, instead of relative labels (which might change).

ADWB: implement file ingestion API

Two options:

  • select file from API at current "upload file" stage
  • Crawl all files regularly in triplifier, and only assign file at "upload file" stage

Related to #59 if we decide for 2nd option

GitLab registry: "HTTP Basic: Access denied. The provided password or token ..."

Error when running docker-compose up for the first time:

$ docker-compose up -d
[+] Running 0/3
 ⠿ rdf-store Error                                                                                 0.7s
 ⠿ webui Error                                                                                     0.7s
 ⠿ triplifier Error                                                                                0.7s
Error response from daemon: Head "https://registry.gitlab.com/v2/um-cds/fair/tools/docker-graphdb/manifests/latest": unauthorized: HTTP Basic: Access denied. The provided password or token is incorrect or your account has 2FA enabled and you must use a personal access token instead of a password. See https://gitlab.com/help/user/profile/account/two_factor_authentication#troubleshooting

Current solution:

  1. Create access token with read_registry scope
  2. Run docker login to authenticate
    $ docker login https://registry.gitlab.com -u <username> -p <token>

Preferred solution:
Publish images to GitHub registry?

In mapping, SHACL should be leading

Currently, the column names are leading to specify a mapping.

We maybe should turn this around, and make the SHACL leading to identify the columns containing this information. Or, as an alternative, a UI where you at least can see which SHACL shapes have been mapped.

Custom shapes for unknown variables

When a variable is unknown, I want to add some definitions to create a custom shape.
Think about someone retrieving the class name using a BioPortal API, and otherwise defining the entity (title, description) themselves. They should identify numeric/categorical variables, and their properties (min/max value, unit, categories etc.)

Multiple entries for column mapping

Issue:
Performing column mapping causes multiple mapping entries, for a single column, due to redundant options in the drop down menu.

For replication -

  1. FIP files refers to CS1 SHACL file.
  2. Running a copy of main branch (no other modifications)
  3. Using dataset_reduced file for mapping
  4. Running developer mode

Issue might be originating from -

  1. Shacl file
  2. Bad SPARQL query

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.