maastrichtu-cds / epnd-fairification Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Use Python logging
library for proper logging instead of printing statements.
Give users the option to view all or only one uploaded file
It should be possible to delete a dataset.
Use officially published FIP as input.
Example:
https://np.petapico.org/RAeYFt1mIp9R0ckxOdIXEIT1iMzfMiZ6nxcUeUM--xs6k
Current setup -
Enhancement - upload local SHACL file
The current nanopublication refers to FIP link that is expired or merged into the main branch.
Current link in nanopub: https://github.com/MaastrichtU-CDS/EPND-FAIRification/blob/case-study-1/EPNDCS1shacl.ttl
Link that should exist: https://github.com/MaastrichtU-CDS/EPND-FAIRification/blob/main/fip/fip.ttl
in configuration file of the CEE, change to javascript host + port
It should be possible to update metadata. Now the only way of changing something is by completely deleting the cohort and re-adding it.
Configure in FIP which predicate (within the Cedar template) is used to describe the title and/or description.
Use this description from the FIP to retrieve the title/description, to show this instead of the UUID in the overview of Cedar instances.
Solution: Maintain same page navigation consistency across pages.
Description:
docker system prune
and running docker-compose up -d
from the root directory, the issue persists.Fix: On checking the release tags for the image, there are two - "latest" and "main". If "main" is used as the tag, the latest release is pulled.
Reason: No idea, what do you think @jvsoest ?
The current setup does column mapping in a different graph compared to the cell mapping. It isn't a major issue but be better if everything is under a single graph. Work for the future.
From a design perspective, there is an deviation from "standard" front end i.e in terms of consistency in colors and things alike. Need to change that for better usability.
Line:
For some reason the cURL upload does fail. Possible causes:
We need to test which triples are being uploaded, and where the upload starts to fail. Manual upload into GraphDB does work (as a temporary workaround)
Convert Excel/SPSS on-the-fly to CSV, so that triplifier is able to use it.
Other solution: a different tool which inspects Excel/SPSS and generates the ontology and data which is equivalent to the triplifier output.
Do not give the option to download the file (and do not present the whole file)
Related to #61
Current setup -
Problem:
Not convenient and might require lot of workarounds.
Solution:
When deleting a column mapping, only the column mapping gets deleted. This means the reasoning rule for cell values are still available. When adding the mapping again, this will show the old mappings.
Solution: remove reasoning rules when removing column mappings
The mapping view is now for all projects. Make sure the mapping only applies to one given cohort/CEDAR metadata object. otherwise mappings from different projects will overlap
Solution: take the cohort or dataset instance identifier in account when creating the graph of mapped data.
When using the latter, this will also solve #25
We don't need all the NPM/NodeJS in the docker container, that's overhead/waste.
See if we can use GRLC to make an FDP out of the data available in GraphDB
Identify relationships between columns (e.g. a tumor stage column references to the date of diagnosis column), and give users the option to define these relationships.
This could link to the SHACL file where these relationships are defined
Add colour contrast or a line between columns in the preview for a better user experience.
Pulling in main branch to sprint branch, and compare
If I understand the code in mapDatasets.py correctly, the cell value (categorical mapping) happens on the rdfs:label of the option available in the Shape (SHACL definition), instead of the URI.
The SPARQL query involved here is in getCategories.py function getCategoryCode(). However it relies on proper matching of rdfs:labels, with some magic happening on the category string. This breaks in the EPND CS1 SHACL file for the APOE genotype.
Please change that it uses the actual URI (which should be unique) as value for mapping, instead of relative labels (which might change).
Two options:
Related to #59 if we decide for 2nd option
In user interface (when creating GraphDB repo if not exists), also load FIP/SHACL/CEDAR files if present in given folder
Guide people how to build their SHACL file, Vincent pinged to a tool, and @matthijssloep made some script. We need to see what the best option is.
When annotation is done, and maybe the dataset is "finalized" (e.g. to version it), include VoID automated metadata.
Error when running docker-compose up
for the first time:
$ docker-compose up -d
[+] Running 0/3
⠿ rdf-store Error 0.7s
⠿ webui Error 0.7s
⠿ triplifier Error 0.7s
Error response from daemon: Head "https://registry.gitlab.com/v2/um-cds/fair/tools/docker-graphdb/manifests/latest": unauthorized: HTTP Basic: Access denied. The provided password or token is incorrect or your account has 2FA enabled and you must use a personal access token instead of a password. See https://gitlab.com/help/user/profile/account/two_factor_authentication#troubleshooting
Current solution:
read_registry
scopedocker login
to authenticate$ docker login https://registry.gitlab.com -u <username> -p <token>
Preferred solution:
Publish images to GitHub registry?
Add a dashboard with all functionalities to facilitate user navigation.
Currently, the column names are leading to specify a mapping.
We maybe should turn this around, and make the SHACL leading to identify the columns containing this information. Or, as an alternative, a UI where you at least can see which SHACL shapes have been mapped.
When a variable is unknown, I want to add some definitions to create a custom shape.
Think about someone retrieving the class name using a BioPortal API, and otherwise defining the entity (title, description) themselves. They should identify numeric/categorical variables, and their properties (min/max value, unit, categories etc.)
For a file like https://raw.githubusercontent.com/MaastrichtU-CDS/EPND-FAIRification/main/dataset_reduced.csv
File upload files for shacl like - https://github.com/MaastrichtU-CDS/EPND-FAIRification/blob/main/EPNDCS1shacl.ttl
Issue:
Performing column mapping causes multiple mapping entries, for a single column, due to redundant options in the drop down menu.
For replication -
Issue might be originating from -
Issue: Cedar editor not loading for specific FIP shacl file.
Resolution: Incorrect triple was codified in the SHACL file which led to the issue. Commit sorted it out.
P.S. creating this issue as a tracker retro actively for future reference.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.