oeg-upm / ya2ro Goto Github PK

Python package designed to create Research Objects out of simple YAML files. Given the dataset dois, source code links and author DOIs, ya2ro will generate an HTML representation of the aggregated contents, as well as an RO-Crate with the machine-readable representation of the Research Object

License: Apache License 2.0

Python 13.47% HTML 86.50% Dockerfile 0.04%

machine-readable research-object ro-crate json-ld yaml

ya2ro's Introduction

ya2ro

Example

Ya2ro generates Research Objects (ROs) like the following: https://w3id.org/dgarijo/ro/sepln2022. Given a few ROs, ya2ro can also create a landing page: https://oeg-upm.github.io/ya2ro/output/landing_page.html

Requirements

The latest version of ya2ro works in Python 3.10.

Installation

To run ya2ro, please follow the next steps:

Install from PyPI

pip install ya2ro

Install from GitHub

git clone https://github.com/oeg-upm/ya2ro
cd ya2ro
pip install -e .

Installing through Docker

We provide a Dockerfile with ya2ro already installed. To run through Docker, you may build the Dockerfile provided in the repository by running:

docker build -t ya2ro .

Then, to run your image just type:

docker run -it ya2ro /bin/bash

And you will be ready to use ya2ro (see section below). If you want to have access to the results we recommend mounting a volume. For example, the following command will mount the current directory as the out folder in the Docker image:

docker run -it --rm -v $PWD/:/out ya2ro /bin/bash

If you move any files produced by ya2ro into /out, then you will be able to see them in your current directory.

Usage

Configure

Before running ya2ro, you must configure it appropriately. Please add your GitHub personal token in ya2ro properties file. This needed if you want ya2ro to extract your software metadata automatically. The file can be found at:

--> ~/ya2ro/src/ya2ro/resources/properties.yaml <--

Add a line like the following:

# Add here your GitHub personal access token
GITHUB_PERSONAL_ACCESS_TOKEN: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

ya2ro will work if this is not configured, but is highly recommended to apply this setting, as the GitHub API has restricted access.

Test ya2ro installation

ya2ro --help

If everything goes fine, you should see:

                        ad888888b,
                       d8"     "88
                               a8P
8b       d8 ,adPPYYba,      ,d8P"  8b,dPPYba,  ,adPPYba,
`8b     d8' ""     `Y8    a8P"     88P'   "Y8 a8"     "8a
 `8b   d8'  ,adPPPPP88  a8P'       88         8b       d8
  `8b,d8'   88,    ,88 d8"         88         "8a,   ,a8"
    Y88'    `"8bbdP"Y8 88888888888 88          `"YbbdP"'
    d8'
   d8'
_________________________________________________________

usage: ya2ro [-h] (-i YAML_PATH | -l YA2RO_PREV_OUTPUT) [-o OUTPUT_DIR] [-p PROPERTIES_FILE] [-ns]

Human and machine readable input as a yaml file and create RO-Object in jsonld and/or HTML view. Run 'ya2ro -configure GITHUB_PERSONAL_ACCESS_TOKEN' this the first time to configure ya2ro
properly

options:
  -h, --help            show this help message and exit
  -i YAML_PATH, --input YAML_PATH
                        Path of the required yaml input. Follow the documentation or the example given to see the structure of the file.
  -l YA2RO_PREV_OUTPUT, --landing_page YA2RO_PREV_OUTPUT
                        Path of a previous output folder using the ya2ro tool. This flag will make a landing page to make all the resources accessible.
  -o OUTPUT_DIR, --output_directory OUTPUT_DIR
                        Output directory.
  -p PROPERTIES_FILE, --properties_file PROPERTIES_FILE
                        Properties file name.
  -ns, --no_somef       Disable SOMEF for a faster execution (software cards will not work).

How to use

The first thing to do is create some input for ya2ro. To create valid a yaml you should follow the documentation bellow.

Create a yaml from scratch or use one of the supplied templates. Currently ya2ro supports two formats:

paper
project

Please find a template for each type under the directory templates. Once you have a valid yaml (project or paper) is time to run ya2ro.

Create machine and human readable content

It is possible to process batches of yamls at the same time, to do that just specify as input a folder with all the yamls inside.

Simple execution

ya2ro -i templates

ya2ro -i templates/project_template.yaml

With optional arguments

ya2ro -input templates --output_directory out --properties_file custom_properties.yaml

ya2ro -i templates -o out -p custom_properties.yaml

Faster execution?

Use the flag --no_somef or -ns for disabling SOMEF which is the most time consuming process.

ya2ro -i templates -ns

WARNING: Software cards will no longer work on github links. Therefore you will need to manually insert the software data in the yaml file.

Create landing page

ya2ro offers the option to create a landing page where all the resources produced are easily accessible. Just indicate the folder where this resources are, for example:

ya2ro -l output

Documentation

Please have a look at our documentation to know which metadata fields are supported by ya2ro.

Funding

This work has been funded by the EELISA European University network (https://eelisa.eu/), by the European Commission within the H2020 Programme in the context of the project RELIANCE under grant agreement no. 101017501 and by the Madrid Government (Comunidad de Madrid-Spain) under the Multiannual Agreement with Universidad Polit ́ecnica de Madrid in the line Support for R&D projects for Beatriz Galindo researchers, in the context of the V PRICIT (Regional Programme of Research and Technological Innovation) and the call Research Grants for Young Investigators from Universidad Politécnica de Madrid.

ya2ro's People

Contributors

Stargazers

Watchers

Forkers

str3am786

ya2ro's Issues

Extracted metadata should be present in JSON-LD

I have to double check that the metadata fields that have been extracted are present in the JSON-LD.

Add description, keywords and license in GitHub

In particular license is key for reusability

DOI from paper should also extract authors and citation

Right now we only extract title and description of papers, but the authors, venue, year, etc should be used to be accurately represented in the html. If .bib is available in the DOI, we can use one of the many libraries to show bib in HTML.

Documentation on which fields may be used is missing

There are templates, but the field documentation is missing for each of them, which makes it difficult to know how to complete the templates.

Sketch may be too big, recommend adding limitation based on width

See https://oeg-upm.github.io/ya2ro/output/ya2ro_project/index-en.html for an example

Add minimal example

I would like an example where I can just fill in links from dois and github ids.
For example:

# Mandatory field
type: "paper"

title: "Work done in SOSEN"

datasets:
  - doi_dataset: https://doi.org/10.6084/m9.figshare.14916684.v1
  - doi_dataset: https://doi.org/10.5281/zenodo.5139550

software:
  - link: https://github.com/KnowledgeCaptureAndDiscovery/somef

bibliography:
  - DOI WHEN SUPPORTED
 
authors:
  # Alternative way if participant has an ORCID

  -
    orcid: http://orcid.org/0000-0003-0454-7145
    role: "Supervisor"

ya2ro RO is not well described

There are many sections in latin, we should lead by example

yalm --> YAML

It's the acronym for yet another markup language

null fields should not be present in json-ld

For example:

{
            "@id": "https://orcid.org/0000-0001-5375-8024",
            "@type": "Person",
            "description": null,
            "name": "Ana Iglesias-Molina",
            "position": [
                "Universidad Polit\u00e9cnica de Madrid"
            ]
        },

This is what I obtained after running ya2ro with the ORCID in the snippet. It should not show nulls!.

Document how to change the styles in website

How can I change the styles used in the website?
I think we should document how to use alternative styles without having to go and modify the html/css every time

Bug when executing ya2ro

Input command: ya2ro -i templates/sample_links.yaml

Result:

WARNING: 'summary' is not defined. Add it to be eligible for beeing an EELISA project.
Traceback (most recent call last):
  File "/home/dgarijo/Documents/GitHub/ya2ro/env_3.9/bin/ya2ro", line 33, in <module>
    sys.exit(load_entry_point('ya2ro', 'console_scripts', 'ya2ro')())
  File "/home/dgarijo/Documents/GitHub/ya2ro/src/ya2ro/ya2ro.py", line 74, in main
    process_yaml(args.input)
  File "/home/dgarijo/Documents/GitHub/ya2ro/src/ya2ro/ya2ro.py", line 117, in process_yaml
    rhtml.load_data(data)
  File "/home/dgarijo/Documents/GitHub/ya2ro/src/ya2ro/ro_html.py", line 61, in load_data
    self.func_attr_init[attr_name](attr_val)
  File "/home/dgarijo/Documents/GitHub/ya2ro/src/ya2ro/ro_html.py", line 389, in init_software
    software_cards_scc += card_html_view(s.metadata,embedded=True)
  File "/home/dgarijo/Documents/GitHub/ya2ro/env_3.9/lib/python3.9/site-packages/scc/commands/software_catalog_portal/card.py", line 71, in html_view
    {md.recently_updated()}
  File "/home/dgarijo/Documents/GitHub/ya2ro/env_3.9/lib/python3.9/site-packages/scc/commands/software_catalog_portal/metadata.py", line 186, in recently_updated
    delta = self.last_update_days()
  File "/home/dgarijo/Documents/GitHub/ya2ro/env_3.9/lib/python3.9/site-packages/scc/commands/software_catalog_portal/metadata.py", line 293, in last_update_days
    date_of_extraction_str = safe_dic(safe_dic(safe_dic(self.md,'stargazersCount'),'excerpt'),'date')[:-4]
TypeError: 'NoneType' object is not subscriptable

What does the properties file do?

It is not clear what it does, and why should I change it. This should be documented.

Publication links are not clickabe

The DOI of the publications are not clickable, it's just the URL

Given a series of YAMLs, YA2RO should generate a portal

New functionality: given a series of YAMLs, YA2RO should generate:

HTMLs for each of them
A landing page that links to each of the entries. With title and description

Extract biography from ORCID

The biography fields does exist, but may not be present in all orcids:

When retrieving software, add a warning stating that it may take a while

We download the full repo, so if the repo is big, it may take a while

Somef configuration

I tried doing somef configure -a, but it did not work, I had to import nltk and download the missing package.
Why does this happen?

Tool name

We should think about the name, in order to refer to the tool

Replace YALM with YAML

I am using ya2ro, and the help command is wrong. It mentions YALM all over the place.

YAML stands for yet another markup language.

Which fields can I add to the yaml config?

It is unclear which fields go where.
This needs to go in the documentaiton

Improve software description

Right now a link to github is added, together with description and license.
I would like to enhance this by adding small icons:

Docker support
Notebook support
Programming language
Help channels
Documentation
These should be small, non-intrusive icons

Bibliography does not retrieve DOI data

I tried adding:

bibliography:
  - doi: https://dx.doi.org/10.1109/BigData47090.2019.9006447

And I got the same result in the HTML, which is wrong. Instead, it should show me the APA style citation of the paper.

Improve logging

If the program is successful, it should show a message. At the moment it just finishes and you have to look whether the output exists or not.

If there are errors, the program should show what happened.

If a DOI is not correct, ya2ro fails

I used https://zenodo.org/record/6518802#.YnO73lxBx9B instead of the right DOI, and I got this error:

Traceback (most recent call last):
  File "/home/dgarijo/Documents/GitHub/ya2ro/env_3.9/bin/ya2ro", line 33, in <module>
    sys.exit(load_entry_point('ya2ro', 'console_scripts', 'ya2ro')())
  File "/home/dgarijo/Documents/GitHub/ya2ro/src/ya2ro/ya2ro.py", line 66, in main
    process_yaml(args.input)
  File "/home/dgarijo/Documents/GitHub/ya2ro/src/ya2ro/ya2ro.py", line 101, in process_yaml
    data = data_wrapper.load_yaml(yaml)
  File "/home/dgarijo/Documents/GitHub/ya2ro/src/ya2ro/data_wrapper.py", line 200, in load_yaml
    init_data = init_paper(p.input_to_vocab, data)
  File "/home/dgarijo/Documents/GitHub/ya2ro/src/ya2ro/data_wrapper.py", line 305, in init_paper
    populate_datasets(paper, input_to_vocab, data)
  File "/home/dgarijo/Documents/GitHub/ya2ro/src/ya2ro/data_wrapper.py", line 411, in populate_datasets
    if doi:
UnboundLocalError: local variable 'doi' referenced before assignmen

DOI from datasets should also extract authors and license

The dataset extraction does not show authors or license, which are critical fields for giving credit.

Installation instructions are incomplete

I created a sample project and tried the project.
python3 ya2ro.py -i test_files/test_dg.yaml -o out/
Response:

Traceback (most recent call last):
  File "/home/dgarijo/Documents/GitHub/EELISA-research-object/ya2ro.py", line 119, in <module>
    main()
  File "/home/dgarijo/Documents/GitHub/EELISA-research-object/ya2ro.py", line 47, in main
    import properties
ModuleNotFoundError: No module named 'properties'

I think the installation instructions are incomplete. Somef is not installed in my environment either, which will likely make fail the application

Complete readme

Specifying how to run the script; and how to install any missing dependencies of the project

Tests

ya2ro does not have tests. I would like integration tests (e.g., with other tools/packages) and tool tests to assess functionality (for example, if there are missing fields, empty readmes, etc.)

what is doi_datasets?

In theory each dataset should have its own DOI

What if link to data is not a DOI

There are tools for extracting dataset meatadata which should be applied in order to detect basic info in case a DOI is not provided. For example, if someone adds a landing page, we may be able to extract some stuff

URLs should not need to be between quotes

We don't have to require that, it's unclear what the benefit is

Validation of YAML and fields

The tool should tell you if you are missing critical fields (e.g., datasets, people).

If a wrong DOI is added, ya2ro fails

The app should not fail. Instead, it should issue a warning:

Parsing and fetching info from sepln_2022.yaml

    - Title: Done.
    - Summary: Done.
Traceback (most recent call last):
  File "/usr/local/bin/ya2ro", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/ya2ro/ya2ro.py", line 66, in main
    process_yaml(args.input)
  File "/usr/local/lib/python3.9/site-packages/ya2ro/ya2ro.py", line 101, in process_yaml
    data = data_wrapper.load_yaml(yaml)
  File "/usr/local/lib/python3.9/site-packages/ya2ro/data_wrapper.py", line 200, in load_yaml
    init_data = init_paper(p.input_to_vocab, data)
  File "/usr/local/lib/python3.9/site-packages/ya2ro/data_wrapper.py", line 305, in init_paper
    populate_datasets(paper, input_to_vocab, data)
  File "/usr/local/lib/python3.9/site-packages/ya2ro/data_wrapper.py", line 411, in populate_datasets
    if doi:
UnboundLocalError: local variable 'doi' referenced before assignment

This is what I get when I used https://zenodo.org/record/6554604 instead of https://doi.org/10.5281/zenodo.6554604 (common mistake)

Make link to orcid in author's name

Right now the author name is not clickable. I think it should take you to the corresponding ORCID.

Error when fields from a person (e.g. institution) is not available from ORCID

Example, with ORCID: https://orcid.org/0000-0001-7588-6094 I got an error

Error:

ERROR: Unable to retrieve the affiliations, check if https://orcid.org/0000-0001-7588-6094 is up.
Traceback (most recent call last):
  File "/home/dgarijo/Documents/GitHub/ya2ro/ya2ro.py", line 119, in <module>
    main()
  File "/home/dgarijo/Documents/GitHub/ya2ro/ya2ro.py", line 58, in main
    process_yaml(args.input)
  File "/home/dgarijo/Documents/GitHub/ya2ro/ya2ro.py", line 93, in process_yaml
    data = data_wrapper.load_yaml(yaml)
  File "/home/dgarijo/Documents/GitHub/ya2ro/data_wrapper.py", line 174, in load_yaml
    data = init_paper(p.input_to_vocab, data)
  File "/home/dgarijo/Documents/GitHub/ya2ro/data_wrapper.py", line 273, in init_paper
    populate_authors(paper, input_to_vocab, data)
  File "/home/dgarijo/Documents/GitHub/ya2ro/data_wrapper.py", line 509, in populate_authors
    object.authors[i].position = ", ".join(orcid.get_affiliation())
TypeError: can only join an iterable

Make somef extraction under demand

Somef may be a little time consuming. Maybe we should have a command if you want to use somef (some repos are huge).

Or maybe a "lightweight" somef invocation. The difference will be bigger once we put in the software cards.

pypi workflow

Set up pypi releases and Python version support.

Align ids to full URIs

I noticed that when we assign ids in the JSON ld, those which were URIs are no longer there. They should be preserved.

Alternative serialization

As discussed, RO-Crates can be embedded in one website. This is an example on how to do it:
https://www.researchobject.org/ro-crate/1.1/ro-crate-preview.html

Click on view source to see how it's done.

Ideally, this would be supported with a different flag in the program.

Make ya2ro a package

So it can be installed without having to do the python3 pathtoya2ro.py flags

ORCID ids produce an error

I see ERROR: ORCID is not valid or not up 'https://orcid.org/0000-0002-9260-0753
But the ORCID is correct.
I wonder why this happens

If there is a data folder, list contents as datasets

Sometimes datasets will not only be DOIs. They will be local files. In that case, we can just list the Datasets and point to the files with local paths.

ya2ro version

I need to know the version of the package with ya2ro --version or similar

Position for Persons is not correct

Right now if a person has had two positions, the JSON-LD reads:

position:  "Universidad Politécnica de Madrid \| University of Southern California"

However, there should be two "position" fields, each with its respective value.

Docker image does not build

I have tried to build the docker image, but returns in a result.
I thin this is cause by scc not installing properly.

Sending build context to Docker daemon  1.033GB
Step 1/4 : FROM python:3.9
 ---> e2d7fd224b9c
Step 2/4 : RUN git clone https://github.com/oeg-upm/ya2ro
 ---> Running in f448cf6991da
Cloning into 'ya2ro'...
Removing intermediate container f448cf6991da
 ---> 68a29577a5b1
Step 3/4 : RUN cd ya2ro && pip install .
 ---> Running in 4b306cf15461
Processing /ya2ro
  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.
  Installing build dependencies: started
  Installing build dependencies: still running...
  Installing build dependencies: still running...
  Installing build dependencies: finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python /tmp/pip-standalone-pip-15nvo590/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-xeiitwfy/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'PyYAML>=6.0' 'bs4>=0.0.1' 'requests>=2.22.0' 'bibtexparser>=1.2.0' 'Pygments>=2.11.2' 'somef>=0.7.2' 'scc @ git+https://github.com/dakixr/scc' metadata-parser
       cwd: None
  Complete output (214 lines):
  Collecting scc@ git+https://github.com/dakixr/scc
    Cloning https://github.com/dakixr/scc to /tmp/pip-install-ymrlek44/scc_a4cfc7828bd443a5b97393e08010206c
    Running command git clone -q https://github.com/dakixr/scc /tmp/pip-install-ymrlek44/scc_a4cfc7828bd443a5b97393e08010206c
    Resolved https://github.com/dakixr/scc to commit 7afb3fc5ba925d2230edc7ea2b70b0bf2ee525c7
    Installing build dependencies: started
    Installing build dependencies: finished with status 'done'
    Getting requirements to build wheel: started
    Getting requirements to build wheel: finished with status 'done'
      Preparing wheel metadata: started
      Preparing wheel metadata: finished with status 'done'
  Collecting PyYAML>=6.0
    Downloading PyYAML-6.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (661 kB)
  Collecting bs4>=0.0.1
    Downloading bs4-0.0.1.tar.gz (1.1 kB)
  Collecting requests>=2.22.0
    Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
  Collecting bibtexparser>=1.2.0
    Downloading bibtexparser-1.2.0.tar.gz (46 kB)
  Collecting Pygments>=2.11.2
    Downloading Pygments-2.12.0-py3-none-any.whl (1.1 MB)
  Collecting somef>=0.7.2
    Downloading somef-0.8.0-py3-none-any.whl (552 kB)
  Collecting metadata-parser
    Downloading metadata_parser-0.10.5.tar.gz (50 kB)
    Installing build dependencies: started
    Installing build dependencies: finished with status 'done'
    Getting requirements to build wheel: started
    Getting requirements to build wheel: finished with status 'done'
      Preparing wheel metadata: started
      Preparing wheel metadata: finished with status 'done'
  Collecting beautifulsoup4
    Downloading beautifulsoup4-4.11.1-py3-none-any.whl (128 kB)
  Collecting charset-normalizer~=2.0.0
    Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
  Collecting certifi>=2017.4.17
    Downloading certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
  Collecting urllib3<1.27,>=1.21.1
    Downloading urllib3-1.26.9-py2.py3-none-any.whl (138 kB)
  Collecting idna<4,>=2.5
    Downloading idna-3.3-py3-none-any.whl (61 kB)
  Collecting pyparsing>=2.0.3
    Downloading pyparsing-3.0.8-py3-none-any.whl (98 kB)
  Collecting future>=0.16.0
    Downloading future-0.18.2.tar.gz (829 kB)
  Collecting matplotlib==3.5.0
    Downloading matplotlib-3.5.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
  Collecting click-option-group==0.5.3
    Downloading click_option_group-0.5.3-py3-none-any.whl (11 kB)
  Collecting markdown==3.3.6
    Downloading Markdown-3.3.6-py3-none-any.whl (97 kB)
  Collecting rdflib==6.0.2
    Downloading rdflib-6.0.2-py3-none-any.whl (407 kB)
  Collecting pandas==1.3.4
    Downloading pandas-1.3.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.5 MB)
  Collecting validators==0.18.2
    Downloading validators-0.18.2-py3-none-any.whl (19 kB)
  Collecting nltk==3.6.6
    Downloading nltk-3.6.6-py3-none-any.whl (1.5 MB)
  Collecting rdflib-jsonld==0.6.2
    Downloading rdflib_jsonld-0.6.2-py2.py3-none-any.whl (4.0 kB)
  Collecting scikit-learn==1.0
    Downloading scikit_learn-1.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.7 MB)
  Collecting numpy==1.21.4
    Downloading numpy-1.21.4-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
  Collecting textblob==0.17.1
    Downloading textblob-0.17.1-py2.py3-none-any.whl (636 kB)
  Collecting Click==7.0
    Downloading Click-7.0-py2.py3-none-any.whl (81 kB)
  Collecting xgboost==1.5.0
    Downloading xgboost-1.5.0-py3-none-manylinux2014_x86_64.whl (173.5 MB)
  Collecting importlib-metadata>=4.4
    Downloading importlib_metadata-4.11.3-py3-none-any.whl (18 kB)
  Collecting python-dateutil>=2.7
    Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
  Collecting packaging>=20.0
    Downloading packaging-21.3-py3-none-any.whl (40 kB)
  Collecting kiwisolver>=1.0.1
    Downloading kiwisolver-1.4.2-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB)
  Collecting fonttools>=4.22.0
    Downloading fonttools-4.33.3-py3-none-any.whl (930 kB)
  Collecting pillow>=6.2.0
    Downloading Pillow-9.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
  Collecting cycler>=0.10
    Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
  Collecting setuptools-scm>=4
    Downloading setuptools_scm-6.4.2-py3-none-any.whl (37 kB)
  Collecting joblib
    Downloading joblib-1.1.0-py2.py3-none-any.whl (306 kB)
  Collecting regex>=2021.8.3
    Downloading regex-2022.4.24-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (763 kB)
  Collecting tqdm
    Downloading tqdm-4.64.0-py2.py3-none-any.whl (78 kB)
  Collecting pytz>=2017.3
    Downloading pytz-2022.1-py2.py3-none-any.whl (503 kB)
  Collecting setuptools
    Using cached setuptools-62.1.0-py3-none-any.whl (1.1 MB)
  Collecting isodate
    Downloading isodate-0.6.1-py2.py3-none-any.whl (41 kB)
  Collecting scipy>=1.1.0
    Downloading scipy-1.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.1 MB)
  Collecting threadpoolctl>=2.0.0
    Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
  Collecting decorator>=3.4.0
    Downloading decorator-5.1.1-py3-none-any.whl (9.1 kB)
  Collecting six>=1.4.0
    Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
  Collecting htmlmin
    Downloading htmlmin-0.1.12.tar.gz (19 kB)
  Collecting inspect4py
    Downloading inspect4py-0.0.1-py3-none-any.whl (54 kB)
  Collecting html2image
    Downloading html2image-2.0.1-py3-none-any.whl (17 kB)
  Collecting mistune
    Downloading mistune-2.0.2-py2.py3-none-any.whl (24 kB)
  Collecting progressbar2
    Downloading progressbar2-4.0.0-py2.py3-none-any.whl (26 kB)
  INFO: pip is looking at multiple versions of xgboost to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of validators to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of textblob to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of scikit-learn to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of rdflib-jsonld to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of rdflib to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of pandas to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of numpy to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of nltk to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of matplotlib to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of markdown to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of click-option-group to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of click to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of somef to determine which version is compatible with other requirements. This could take a while.
  Collecting somef>=0.7.2
    Downloading somef-0.7.2.tar.gz (33.5 MB)
    Installing build dependencies: started
    Installing build dependencies: finished with status 'done'
    Getting requirements to build wheel: started
    Getting requirements to build wheel: finished with status 'done'
    Installing backend dependencies: started
    Installing backend dependencies: finished with status 'done'
      Preparing wheel metadata: started
      Preparing wheel metadata: finished with status 'done'
  INFO: pip is looking at multiple versions of pygments to determine which version is compatible with other requirements. This could take a while.
  Collecting Pygments>=2.11.2
    Downloading Pygments-2.11.2-py3-none-any.whl (1.1 MB)
  INFO: pip is looking at multiple versions of bibtexparser to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of requests to determine which version is compatible with other requirements. This could take a while.
  Collecting requests>=2.22.0
    Downloading requests-2.27.0-py2.py3-none-any.whl (63 kB)
  INFO: pip is looking at multiple versions of xgboost to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of validators to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of textblob to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of scikit-learn to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of rdflib-jsonld to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of rdflib to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of pandas to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of numpy to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of nltk to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of matplotlib to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of markdown to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of click-option-group to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of click to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of somef to determine which version is compatible with other requirements. This could take a while.
    Downloading requests-2.26.0-py2.py3-none-any.whl (62 kB)
    Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
  Collecting idna<3,>=2.5
    Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
  Collecting chardet<5,>=3.0.2
    Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  INFO: pip is looking at multiple versions of pygments to determine which version is compatible with other requirements. This could take a while.
  Collecting requests>=2.22.0
    Downloading requests-2.25.0-py2.py3-none-any.whl (61 kB)
  Collecting chardet<4,>=3.0.2
    Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
  Collecting requests>=2.22.0
    Downloading requests-2.24.0-py2.py3-none-any.whl (61 kB)
  Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
    Downloading urllib3-1.25.11-py2.py3-none-any.whl (127 kB)
  Collecting requests>=2.22.0
    Downloading requests-2.23.0-py2.py3-none-any.whl (58 kB)
  INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
    Downloading requests-2.22.0-py2.py3-none-any.whl (57 kB)
  Collecting idna<2.9,>=2.5
    Downloading idna-2.8-py2.py3-none-any.whl (58 kB)
  INFO: pip is looking at multiple versions of bibtexparser to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of requests to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of bs4 to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
  INFO: pip is looking at multiple versions of pyyaml to determine which version is compatible with other requirements. This could take a while.
  ERROR: Cannot install scc==0.0.1 and somef>=0.7.2 because these package versions have conflicting dependencies.
  
  The conflict is caused by:
      The user requested somef>=0.7.2
      scc 0.0.1 depends on somef>=0.8.0
  
  To fix this you could try to:
  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict
  
  ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
  WARNING: You are using pip version 21.2.4; however, version 22.0.4 is available.
  You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
  ----------------------------------------
WARNING: Discarding file:///ya2ro. Command errored out with exit status 1: /usr/local/bin/python /tmp/pip-standalone-pip-15nvo590/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-xeiitwfy/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'PyYAML>=6.0' 'bs4>=0.0.1' 'requests>=2.22.0' 'bibtexparser>=1.2.0' 'Pygments>=2.11.2' 'somef>=0.7.2' 'scc @ git+https://github.com/dakixr/scc' metadata-parser Check the logs for full command output.
ERROR: Command errored out with exit status 1: /usr/local/bin/python /tmp/pip-standalone-pip-15nvo590/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-xeiitwfy/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'PyYAML>=6.0' 'bs4>=0.0.1' 'requests>=2.22.0' 'bibtexparser>=1.2.0' 'Pygments>=2.11.2' 'somef>=0.7.2' 'scc @ git+https://github.com/dakixr/scc' metadata-parser Check the logs for full command output.
WARNING: You are using pip version 21.2.4; however, version 22.0.4 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
The command '/bin/sh -c cd ya2ro && pip install .' returned a non-zero code: 1

Have a small module for ingesting metadata from notebooks

Notebooks are often developed in papers. If we want to aggregate them as part of an RO, we should be able to process them.

Notebooks can be local (file) or URL.

When ORCID has incomplete information, ya2ro fails

Example: with orcid: "https://orcid.org/0000-0001-7588-6094" which has no affiliation, I get an error:

Traceback (most recent call last):
  File "/home/dgarijo/Documents/GitHub/EELISA-research-object/ya2ro.py", line 119, in <module>
    main()
  File "/home/dgarijo/Documents/GitHub/EELISA-research-object/ya2ro.py", line 55, in main
    process_yaml(args.input)
  File "/home/dgarijo/Documents/GitHub/EELISA-research-object/ya2ro.py", line 93, in process_yaml
    data = data_wrapper.load_yaml(yaml)
  File "/home/dgarijo/Documents/GitHub/EELISA-research-object/data_wrapper.py", line 173, in load_yaml
    data = init_paper(p.input_to_vocab, data)
  File "/home/dgarijo/Documents/GitHub/EELISA-research-object/data_wrapper.py", line 274, in init_paper
    populate_authors(paper, input_to_vocab, data)
  File "/home/dgarijo/Documents/GitHub/EELISA-research-object/data_wrapper.py", line 455, in populate_authors
    object.authors[i].position = ", ".join(orcid.get_affiliation())
  File "/home/dgarijo/Documents/GitHub/EELISA-research-object/req_orcid.py", line 25, in get_affiliation
    if self.json["affiliation"] is None:

Fields are inconsistent

Right now we have doi for datasets, link for software and nothing for papers.

When we use the minimal version (which is the one I intend to use the most), I think it should be accepted to just have lists of links.

If you detect key-value pairs, then you know you have something a little more qualified.

ya2ro configuration is confusing

Now there is a properties file (which by browsing the help it is not clear what it does), and a configuration.

The difference between them, or why 2 options are needed is not clear. I would expect the properties file to add the github access token if needed. If there is none, state a warning and proceed without access token

incorporate software cards for software

I like how software is displayed here: https://lynx-project.eu/doc/api/
I would like to have "cards" for each software showing additional details, not just description and license. Ideally we would show citation, paper (if available), github repo and docker, notebooks and documentation if available.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.