maastrichtu-cds / epnd-fairification Goto Github PK

License: Apache License 2.0

Python 15.65% Jupyter Notebook 24.84% HTML 2.98% Dockerfile 0.18% CSS 0.13% Shell 0.40% TypeScript 0.27% JavaScript 55.53% Less 0.01% SCSS 0.01%

epnd fair

epnd-fairification's People

Contributors

Stargazers

Watchers

Forkers

jinzhouyang666

epnd-fairification's Issues

Proper logging

Use Python logging library for proper logging instead of printing statements.

Mapping view does not scope to one uploaded file

Give users the option to view all or only one uploaded file

Possibility to delete dataset

It should be possible to delete a dataset.

Parse published FIP

Use officially published FIP as input.

Example:
https://np.petapico.org/RAeYFt1mIp9R0ckxOdIXEIT1iMzfMiZ6nxcUeUM--xs6k

Upload FIP file directly

Current setup -

Provide URL of SHACL file or
Provide URL of Nanopub #41

Enhancement - upload local SHACL file

Nanopub information needs to be updated

The current nanopublication refers to FIP link that is expired or merged into the main branch.

Current link in nanopub: https://github.com/MaastrichtU-CDS/EPND-FAIRification/blob/case-study-1/EPNDCS1shacl.ttl

Link that should exist: https://github.com/MaastrichtU-CDS/EPND-FAIRification/blob/main/fip/fip.ttl

CEE has a hard-link to localhost

in configuration file of the CEE, change to javascript host + port

Possibility of updating metadata

It should be possible to update metadata. Now the only way of changing something is by completely deleting the cohort and re-adding it.

Metadata templates use title of template

Configure in FIP which predicate (within the Cedar template) is used to describe the title and/or description.

Use this description from the FIP to retrieve the title/description, to show this instead of the UUID in the overview of Cedar instances.

Maintaining navigation consistency across pages

Basically, an user should be able to navigate between pages easily but currently there are multiple steps required to get to a certain page.

Solution: Maintain same page navigation consistency across pages.

Docker-compose doesn't pull the latest image for webui

Description:

Although webui service mentions "latest" tag for obtaining the latest image, it doesn't obtain the same.
On running docker system prune and running docker-compose up -d from the root directory, the issue persists.

Fix: On checking the release tags for the image, there are two - "latest" and "main". If "main" is used as the tag, the latest release is pulled.

Reason: No idea, what do you think @jvsoest ?

Lock versions of python dependencies

Mappings under a single graph

The current setup does column mapping in a different graph compared to the cell mapping. It isn't a major issue but be better if everything is under a single graph. Work for the future.

Single container - One front-end (flask app)

Maintain frontend consistency

From a design perspective, there is an deviation from "standard" front end i.e in terms of consistency in colors and things alike. Need to change that for better usability.

Upload of SHACL fails

Line:

EPND-FAIRification/management_webpage/development_endpoint/run.sh

Line 17 in 791a5f5

 curl -d @../shapeTest/shacl.ttl --header "Content-Type: application/x-turtle" http://localhost:7200/repositories/epnd_dummy/statements?context=%3Chttp://shacl.local/%3E 

For some reason the cURL upload does fail. Possible causes:

Comments in the SHACL file (hashtag), which is parsed but then giving errors
Something else

We need to test which triples are being uploaded, and where the upload starts to fail. Manual upload into GraphDB does work (as a temporary workaround)

Create one container/image for embedding in ADWB

Support for excel and/or SPSS files

Convert Excel/SPSS on-the-fly to CSV, so that triplifier is able to use it.

Other solution: a different tool which inspects Excel/SPSS and generates the ontology and data which is equivalent to the triplifier output.

ADWB: remove download button (and reduce preview)

Do not give the option to download the file (and do not present the whole file)

Related to #61

Ability to change fip

Current setup -

Once fip is added, it cannot be modified. Everything needs to be deleted in graphDB.

Problem:
Not convenient and might require lot of workarounds.

Solution:

Ability to modify FIP. That implies deletion of all data associated to that FIP.

Delete column mapping should delete cell rules

When deleting a column mapping, only the column mapping gets deleted. This means the reasoning rule for cell values are still available. When adding the mapping again, this will show the old mappings.

Solution: remove reasoning rules when removing column mappings

Mapping view does not take into account the CEDAR metadata

The mapping view is now for all projects. Make sure the mapping only applies to one given cohort/CEDAR metadata object. otherwise mappings from different projects will overlap

Solution: take the cohort or dataset instance identifier in account when creating the graph of mapped data.
When using the latter, this will also solve #25

Separate NPM build and docker container build

We don't need all the NPM/NodeJS in the docker container, that's overhead/waste.

Investigate usage of GRLC to make FDP

See if we can use GRLC to make an FDP out of the data available in GraphDB

Temporal annotations - relationships between columns

Identify relationships between columns (e.g. a tumor stage column references to the date of diagnosis column), and give users the option to define these relationships.

This could link to the SHACL file where these relationships are defined

Instance details preview

Add colour contrast or a line between columns in the preview for a better user experience.

Merge new term mapping

Pulling in main branch to sprint branch, and compare

Mapping of cell values does not happen on URI

If I understand the code in mapDatasets.py correctly, the cell value (categorical mapping) happens on the rdfs:label of the option available in the Shape (SHACL definition), instead of the URI.
The SPARQL query involved here is in getCategories.py function getCategoryCode(). However it relies on proper matching of rdfs:labels, with some magic happening on the category string. This breaks in the EPND CS1 SHACL file for the APOE genotype.

Please change that it uses the actual URI (which should be unique) as value for mapping, instead of relative labels (which might change).

ADWB: implement file ingestion API

Two options:

select file from API at current "upload file" stage
Crawl all files regularly in triplifier, and only assign file at "upload file" stage

Related to #59 if we decide for 2nd option

ADWB - pre-load FIP & dependencies into container

In user interface (when creating GraphDB repo if not exists), also load FIP/SHACL/CEDAR files if present in given folder

Guide people how to create SHACL and FIP files

Guide people how to build their SHACL file, Vincent pinged to a tool, and @matthijssloep made some script. We need to see what the best option is.

Generate VoID metadata at "publication"

When annotation is done, and maybe the dataset is "finalized" (e.g. to version it), include VoID automated metadata.

GitLab registry: "HTTP Basic: Access denied. The provided password or token ..."

Error when running docker-compose up for the first time:

$ docker-compose up -d
[+] Running 0/3
 ⠿ rdf-store Error                                                                                 0.7s
 ⠿ webui Error                                                                                     0.7s
 ⠿ triplifier Error                                                                                0.7s
Error response from daemon: Head "https://registry.gitlab.com/v2/um-cds/fair/tools/docker-graphdb/manifests/latest": unauthorized: HTTP Basic: Access denied. The provided password or token is incorrect or your account has 2FA enabled and you must use a personal access token instead of a password. See https://gitlab.com/help/user/profile/account/two_factor_authentication#troubleshooting

Current solution:

Create access token with read_registry scope
Run docker login to authenticate
$ docker login https://registry.gitlab.com -u <username> -p <token>

Preferred solution:
Publish images to GitHub registry?

Dashboard with all functionalities

Add a dashboard with all functionalities to facilitate user navigation.

In mapping, SHACL should be leading

Currently, the column names are leading to specify a mapping.

We maybe should turn this around, and make the SHACL leading to identify the columns containing this information. Or, as an alternative, a UI where you at least can see which SHACL shapes have been mapped.

Custom shapes for unknown variables

When a variable is unknown, I want to add some definitions to create a custom shape.
Think about someone retrieving the class name using a BioPortal API, and otherwise defining the entity (title, description) themselves. They should identify numeric/categorical variables, and their properties (min/max value, unit, categories etc.)