Giter Club home page Giter Club logo

ons-wikidata's Introduction

Open Notebook for Bioclipse scripts for Wikidata

To use these scripts you need to install either Bioclipse 2.6.2 or Bacting.

After that, you can run the scripts with Groovy. If you are not using Bacting, make sure to remove the following lines at the top of the script:

@Grab(group='io.github.egonw.bacting', module='managers-cdk', version='0.0.33')

workspaceRoot = System.properties['user.dir']
cdk = new net.bioclipse.managers.CDKManager(workspaceRoot);

How to cite this

If you use scripts in this repository, please cite the JOSS article as specified in the CITATION.cff file.

ons-wikidata's People

Contributors

adafede avatar egonw avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

ons-wikidata's Issues

[suggestion] Exclude automatically statements generated with "null"

This would make the QuickStatements cleaner.

Here is an example:

groovy quickstatements.groovy -d 10.1101/2021.12.24.474089

returns

qid,P2860,S248,s854,s813
# Fetching 10.1101/2021.12.24.474089 from https://opencitations.net/index/coci/api/v1/citations/10.1101/2021.12.24.474089 ...
# Found citing DOIs for 10.1101/2021.12.24.474089: 0
# citing articles for 10.1101/2021.12.24.474089
# Fetching 10.1101/2021.12.24.474089 from https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089 ...
# Found cited DOIs for 10.1101/2021.12.24.474089: 41
# cited articles for 10.1101/2021.12.24.474089
null,Q27818844,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q108844055,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q44704388,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q98513836,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q36179301,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q98665248,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q113307511,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q102075594,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q36226791,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q29616057,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q29547435,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q40020934,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q24616873,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q41016968,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q105742243,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q100951293,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q91135365,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q27807488,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q41474027,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q24629036,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q27136473,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q91218352,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q63352058,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q107272666,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q35129381,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q39993461,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q106856815,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q38050514,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q37473763,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q30004214,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q27921801,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q91903041,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q38655770,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q84573952,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q39115283,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q38864380,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q108126799,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11
null,Q51147300,Q107507940,"""https://opencitations.net/index/coci/api/v1/references/10.1101/2021.12.24.474089""",+2022-10-25T00:00:00Z/11

(because the preprint is not on Wikidata)

The same behavior exists in the citing articles:

groovy quickstatements.groovy -d 10.26434/CHEMRXIV.13721770

returns:

qid,P2860,S248,s854,s813
# Fetching 10.26434/CHEMRXIV.13721770 from https://opencitations.net/index/coci/api/v1/citations/10.26434/CHEMRXIV.13721770 ...
# Found citing DOIs for 10.26434/CHEMRXIV.13721770: 2
# citing articles for 10.26434/CHEMRXIV.13721770
Q109715863,null,Q107507940,"""https://opencitations.net/index/coci/api/v1/citations/10.26434/CHEMRXIV.13721770""",+2022-10-25T00:00:00Z/11
Q109736854,null,Q107507940,"""https://opencitations.net/index/coci/api/v1/citations/10.26434/CHEMRXIV.13721770""",+2022-10-25T00:00:00Z/11
# Fetching 10.26434/CHEMRXIV.13721770 from https://opencitations.net/index/coci/api/v1/references/10.26434/CHEMRXIV.13721770 ...
# Found cited DOIs for 10.26434/CHEMRXIV.13721770: 0
# cited articles for 10.26434/CHEMRXIV.13721770

last bacting update broke something?

(old version)

OpenCitations git:(master) ✗ groovy quickstatements.groovy -d 10.1039/D1NP00040C     
Fetching 10.1039/D1NP00040C from https://opencitations.net/index/coci/api/v1/citations/10.1039/D1NP00040C...
Found citing DOIs: 0
Fetching 10.1039/D1NP00040C from https://opencitations.net/index/coci/api/v1/references/10.1039/D1NP00040C...
Found cited DOIs: 0
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
qid,P2860,S248,s854,s813
# citing articles
# cited articles

(new version)

groovy quickstatements.groovy -d 10.1073/PNAS.2013344118
Fetching 10.1073/PNAS.2013344118 from https://opencitations.net/index/coci/api/v1/citations/10.1073/PNAS.2013344118...
Found citing DOIs: 4
Fetching 10.1073/PNAS.2013344118 from https://opencitations.net/index/coci/api/v1/references/10.1073/PNAS.2013344118...
Found cited DOIs: 0
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
qid,P2860,S248,s854,s813
# citing articles
# cited articles

Upload of non-regexp compliant formulae to Wikidata

Hi @egonw,

Looks like the upload of formulae writes some non-regexp compliant ones.
(see https://www.wikidata.org/w/index.php?title=Q123340453&diff=prev&oldid=2005427699 for an example)

And the constraint here: https://www.wikidata.org/wiki/Property:P274#P2302

In case here is my python way to do it from the RDKit formula:

"""Convert chemical formula."""
from __future__ import annotations

import re

__all__ = [
    "convert_chemical_formula",
]


def convert_chemical_formula(text: str) -> str:
    """
    Convert chemical formula.

    :param text: Text.
    :type text: str

    :returns: Modified text.
    :rtype: str
    """

    def replace(match: re.Match) -> str:
        """
        Matches subscripts.

        :param match: Match object.
        :type match: re.Match

        :returns: Modified text.
        :rtype: str
        """
        number, symbol = match.groups()
        subscript_map = {
            "0": "₀",
            "1": "₁",
            "2": "₂",
            "3": "₃",
            "4": "₄",
            "5": "₅",
            "6": "₆",
            "7": "₇",
            "8": "₈",
            "9": "₉",
        }
        superscript_map = {
            "0": "⁰",
            "1": "¹",
            "2": "²",
            "3": "³",
            "4": "⁴",
            "5": "⁵",
            "6": "⁶",
            "7": "⁷",
            "8": "⁸",
            "9": "⁹",
            "+": "⁺",
            "-": "⁻",
        }

        if number:
            # Check if the number is preceded by `+` or `-`
            prev_char = text[match.start(1) - 1 : match.start(1)]
            if prev_char in "+-":
                return "".join(superscript_map[char] for char in number)
            else:
                return "".join(subscript_map[char] for char in number)  # Subscript by default
        elif symbol:
            return "".join(superscript_map[char] for char in symbol)
        return ""

    pattern = re.compile(r"(\d+)|([+-])")
    result = re.sub(pattern, replace, text)
    # move `⁺` and `⁻` to the after the number (Wikidata formatting)
    result = re.sub(r"(\⁺|\⁻)+([⁰¹²³⁴⁵⁶⁷⁸⁹]+)", r"\2\1", result)

    return result


if __name__ == "__main__":
    convert_chemical_formula("C19H36Cl2CrN3O-3")

P.S.: Thank you for all the correctly formatted content you keep on uploading, this part is an absolutely minimal one, but we rarely fill issues for correct things 😊

`OpenCitations/quickstatements.groovy` fails on some DOIs

# Fetching 10.1002/1099-0690(200204)2002:8<1397::AID-EJOC1397>3.0.CO;2-6 from https://opencitations.net/index/coci/api/v1/citations/10.1002/1099-0690(200204)2002:8<1397::AID-EJOC1397>3.0.CO;2-6 ...
Caught: java.io.IOException: Server returned HTTP response code: 400 for URL: https://opencitations.net/index/coci/api/v1/citations/10.1002/1099-0690(200204)2002:8<1397::AID-EJOC1397>3.0.CO;2-6
java.io.IOException: Server returned HTTP response code: 400 for URL: https://opencitations.net/index/coci/api/v1/citations/10.1002/1099-0690(200204)2002:8<1397::AID-EJOC1397>3.0.CO;2-6
        at quickstatements$_run_closure1.doCall(quickstatements.groovy:73)
        at jdk.internal.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at quickstatements.run(quickstatements.groovy:68)

list of DOIs that cannot be handled because of response length

When trying to import citation data on highly cited (Nobel-related) scientific articles, some of them have such a long list the script fails.

I do not really know where to store this so opening a thread here (I'll update the list as I encounter them)...

  • 10.1126/SCIENCE.1102896 (47,621 citing DOIs)
  • 10.2307/1912352 (14,780 citing DOIs)

@csisc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.